Stat 411/511
# Lab 7

##### Nov 16/17

- Running anova in R
- Checking anova assumptions
- Linear combinations of means in R

We saw the R code for a one way ANOVA on the Spock trial in lecture:

We are actually doing two things in one go, fitting a separate means model:

then running an ANOVA on the separate means model:

When you give the `anova`

function a single model, it runs what we call a sequential analysis of variance, but for our purposes we just need to know it compares our full model to a model with a single mean, i.e. a one way anova. If you give `anova`

two models it will compare them with an Extra Sums of Squares F-test. So, for example we could fit an equal means model explicitly and compare it to the full model,

And you’ll see we get exactly the same results.

The `lm`

function deserves a little more explanation. `lm`

is short for linear model and is R’s general purpose function for fitting regression models. The first argument specifies the model. For `lm`

the column on the left hand side of the `~`

is the mean we want to model, on the right the terms we want to model it with. So `Percent ~ Judge`

says we want to model the **mean** Percent as a function of the Judge. Since `Judge`

contains categories, this means one mean parameter for each Judge (if Judge was numeric we would end up with a simple linear regression). `Percent ~ 1`

is interpreted as modelling the mean Percent by a single mean.

To check the anova assumptions it’s generally easiest to examine the residuals from the full model, although sometimes you will pick up gross violations just from a plot of the raw response. We’ve already fit the full model, so we can generate the plots we covered in lecture with:

What would evidence of a violation look like in each plot? **Your TA should go through these plots with you**.

Now to get some practice. `InsectSprays`

contains counts of insects in an experiment comparing different insecticide sprays. We are interested in whether there is evidence any of the sprays have a different mean count of insects. Take a look at the raw data:

**Do you see any evidence of violations of the ANOVA assumptions?**

Take a look at the residual plots. Can you see the violations more easily?

**Do the residual plots look better if we instead model the mean of the squareroot of insect count?**

**What would you conclude from this ANOVA?**

In lecture on Friday we talked about inference on linear combinations of means in multiple group settings. Let’s do an example using R and the Spock data. Imagine we want to compare Spock to Judge C. Since this is a two group comparison the natural comparison to make is the difference of their means, i.e. \(\gamma = \mu_{\text{Spock}} - \mu_{\text{C}} \). Let’s work through finding a 95% CI for this difference and testing whether it is zero in R. First, we need to calculate sample averages, standard deviations and sizes, and the pooled standard deviation:

Take a look at `averages`

When we write our linear combination we need to make sure it’s in the same order as our averages are in R. Here Judge C is the 3rd entry and Spock’s Judge is the 7th entry. Our constants in our linear combination would be: \( C_1 = 0, C_2 = 0, C_3 = -1, C_4 = 0, C_5 = 0, C_6 = 0, C_7 = 1 \). In R we create a new vector to represent the C’s:

Now finding the estimate of the linear combination and it’s standard error is easy:

And we can use them to get 95% confidence intervals or the p-value for the t-test that the linear combination is zero

Repeat the steps to compare Spock’s mean to the average of the means of the other judges, i.e. \( C_1 = -1/6, C_2 = -1/6, C_3 = -1/6, C_4 = -1/6, C_5 = -1/6, C_6 = -1/6, C_7 = 1 \)

Do the practice problems posted under lecture for Nov 9th.