Stat 411/511

Lab 3

Oct 19/20

Goals

  • Review the t.test function and it’s uses
  • Do a randomization test using the coin package

t.test

We’ve used the function t.test to do paired and two-sample t-tests (and get the corresponding 95% confidence intervals), later we’ll also use it to do Welch’s t-test. The way you tell R which one to do depends on how you pass t.test your data and what arguments you specify.

Let’s consider the schizophrenia data set again:

library(Sleuth3)
case0202

Paired t-test

If you give t.test a single vector of data, it will by default do a one sample t-test. If this single vector happens to be differences between two variables you are doing a paired t-test. If you give t.test two vectors of data, it will be default do a Welch’s t-test, but you can change this behaviour by specifing additional arguments. Add paired = TRUE to get a paired t-test instead, or var.equal = TRUE to get the two sample t-test based on the equal variance assumption.

So, one way to get a paired t-test for the Schizophrenia data, is to manually calculate the differences then pass them to t-test:

diffs <- with(case0202, Unaffected - Affected)
t.test(diffs)

An alternative is to pass the two responses of interest and add the argument, paired = TRUE.

t.test(case0202$Unaffected, case0202$Affected, paired = TRUE)

You should check the output from the two above commands is the same (ignoring the labels)

Two sample t-test

The two sample t-test is not appropriate for this data, but R won’t stop us from doing a two-sample t-test (you should stop you from doing a two sample t-test when you should do a paired t-test). We will do one here just to illustrate the difference in code. I’ve already alluded that one way to get a two sample t-test is to pass t.test two vectors of data and add the var.equal = TRUE argument.

t.test(case0202$Unaffected, case0202$Affected, var.equal = TRUE)

An alternative is to use the formula approach: t.test(response ~ group, data = dataframe) We need to pass a column that contains the responses (the brain volumes) and a column that contains the designation of which group the observation belongs to (Affected or Unaffected). Unfortunately, our data isn’t in an arrangement that allows that. Can you see the difference in these two ways of organizing the same data?

case0202$twin <- 1:nrow(case0202)
library(reshape)
case0202_2 <- melt(case0202, id.var = "twin")
case0202_2 <- rename(case0202_2, c("value" = "Volume", "variable" = "Status"))
case0202
##    Unaffected Affected twin
## 1        1.94     1.27    1
## 2        1.44     1.63    2
## 3        1.56     1.47    3
## 4        1.58     1.39    4
## 5        2.06     1.93    5
## 6        1.66     1.26    6
## 7        1.75     1.71    7
## 8        1.77     1.67    8
## 9        1.78     1.28    9
## 10       1.92     1.85   10
## 11       1.25     1.02   11
## 12       1.93     1.34   12
## 13       2.04     2.02   13
## 14       1.62     1.59   14
## 15       2.08     1.97   15
case0202_2
##    twin     Status Volume
## 1     1 Unaffected   1.94
## 2     2 Unaffected   1.44
## 3     3 Unaffected   1.56
## 4     4 Unaffected   1.58
## 5     5 Unaffected   2.06
## 6     6 Unaffected   1.66
## 7     7 Unaffected   1.75
## 8     8 Unaffected   1.77
## 9     9 Unaffected   1.78
## 10   10 Unaffected   1.92
## 11   11 Unaffected   1.25
## 12   12 Unaffected   1.93
## 13   13 Unaffected   2.04
## 14   14 Unaffected   1.62
## 15   15 Unaffected   2.08
## 16    1   Affected   1.27
## 17    2   Affected   1.63
## 18    3   Affected   1.47
## 19    4   Affected   1.39
## 20    5   Affected   1.93
## 21    6   Affected   1.26
## 22    7   Affected   1.71
## 23    8   Affected   1.67
## 24    9   Affected   1.28
## 25   10   Affected   1.85
## 26   11   Affected   1.02
## 27   12   Affected   1.34
## 28   13   Affected   2.02
## 29   14   Affected   1.59
## 30   15   Affected   1.97

The second is arranged in a way we can use the formula approach to do a two sample t-test:

t.test(Volume ~ Status, data = case0202_2, var.equal = TRUE)

Actually, once the data are in this form we also have a third alternative to the paired t-test:

t.test(Volume ~ Status, data = case0202_2, paired = TRUE)

Personally, I like the formula interface better, there’s less typing of the data.frame name, and it extends more easily to the more complicated modelling functions in R that do ANOVA and regression. The downside is that sometimes your data doesn’t come in the right “shape” to use it. I’ve summarised the options in this document.

Run the following code to get some datasets to practice with:

load(url("http://stat511.cwick.co.nz/labs/lab-3-data.rda"))

For the following questions, examine the dataset to determine which arrangement it is, then use the appropriate t.test command to complete the test:

  • Compare salary between the two sexes using a two sample t-test with the dataset case0102.
  • Compare the hours of tv watched between husbands and wives using a paired t-test with the dataset tv.
  • Compare the mpg of automatic and manual versions of the same car using a paired t-test with the dataset mpg.

Randomization Test

Randomization tests can be quite easily programmed (see Winter 2012’s Lab #2 if you are interested), but there are also packages that will do them for you.

One example is the oneway_test in the package coin. First, install and load the coin pacakge.

install.packages("coin")
library(coin)

The syntax requires a formula specifying the outcome variable and grouping variable, a data argument, and a specification of how the null distirbution should calculated: assymptotically, approximately or exactly.

The 500,000 random groupings used for the creativity study in the textbook and class can be reproduced with

# approximating by taking 1000 random groupings
oneway_test(Score ~ Treatment, data = case0101,
            distribution = approximate(B = 500000))
## 
## 	Approximative 2-Sample Permutation Test
## 
## data:  Score by Treatment (Extrinsic, Intrinsic)
## Z = -2.7115, p-value = 0.005188
## alternative hypothesis: true mu is not equal to 0

The p-value will be slightly different every time becasue only 500,000 of the possible 16 trillion random groupings are sampled.

An exact p-value (equivalant to finding all 16 trillion random groupings) can be obtained by changing the distribution argument

# this can take awhile!
oneway_test(Score ~ Treatment, data = case0101,
            distribution = "exact")
## 
## 	Exact 2-Sample Permutation Test
## 
## data:  Score by Treatment (Extrinsic, Intrinsic)
## Z = -2.7115, p-value = 0.005149
## alternative hypothesis: true mu is not equal to 0

Got Extra Time?

Read the help on t.test (i.e. ?t.test). Can you figure out how to make it report a 90% confidence interval? Can you figure out how to test the null hypothesis other than the mean difference equals zero (paired) or difference in means equals zero (two-sample)?