We saw the R code for a one way ANOVA on the Spock trial in lecture:
We are actually doing two things in one go, fitting a separate means model:
then running an ANOVA on the separate means model:
When you give the anova
function a single model, it runs what we call a sequential analysis of variance, but for our purposes we just need to know it compares our full model to a model with a single mean, i.e. a one way anova. If you give anova
two models it will compare them with an Extra Sums of Squares F-test. So, for example we could fit an equal means model explicitly and compare it to the full model,
And you’ll see we get exactly the same results.
The lm
function deserves a little more explanation. lm
is short for linear model and is R’s general purpose function for fitting regression models. The first argument specifies the model. For lm
the column on the left hand side of the ~
is the mean we want to model, on the right the terms we want to model it with. So Percent ~ Judge
says we want to model the mean Percent as a function of the Judge. Since Judge
contains categories, this means one mean parameter for each Judge (if Judge was numeric we would end up with a simple linear regression). Percent ~ 1
is interpreted as modelling the mean Percent by a single mean.
To check the anova assumptions it’s generally easiest to examine the residuals from the full model, although sometimes you will pick up gross violations just from a plot of the raw response. We’ve already fit the full model, so we can generate the plots we covered in lecture with:
What would evidence of a violation look like in each plot? Your TA should go through these plots with you.
Now to get some practice. InsectSprays
contains counts of insects in an experiment comparing different insecticide sprays. We are interested in whether there is evidence any of the sprays have a different mean count of insects. Take a look at the raw data:
Do you see any evidence of violations of the ANOVA assumptions?
Take a look at the residual plots. Can you see the violations more easily?
Do the residual plots look better if we instead model the mean of the squareroot of insect count?
What would you conclude from this ANOVA?
In lecture on Friday we talked about inference on linear combinations of means in multiple group settings. Let’s do an example using R and the Spock data. Imagine we want to compare Spock to Judge C. Since this is a two group comparison the natural comparison to make is the difference of their means, i.e. \(\gamma = \mu_{\text{Spock}} - \mu_{\text{C}} \). Let’s work through finding a 95% CI for this difference and testing whether it is zero in R. First, we need to calculate sample averages, standard deviations and sizes, and the pooled standard deviation:
Take a look at averages
When we write our linear combination we need to make sure it’s in the same order as our averages are in R. Here Judge C is the 3rd entry and Spock’s Judge is the 7th entry. Our constants in our linear combination would be: \( C_1 = 0, C_2 = 0, C_3 = -1, C_4 = 0, C_5 = 0, C_6 = 0, C_7 = 1 \). In R we create a new vector to represent the C’s:
Now finding the estimate of the linear combination and it’s standard error is easy:
And we can use them to get 95% confidence intervals or the p-value for the t-test that the linear combination is zero
Repeat the steps to compare Spock’s mean to the average of the means of the other judges, i.e. \( C_1 = -1/6, C_2 = -1/6, C_3 = -1/6, C_4 = -1/6, C_5 = -1/6, C_6 = -1/6, C_7 = 1 \)
Do the practice problems posted under lecture for Nov 9th.