t.test
function and it’s usescoin
packageWe’ve used the function t.test
to do paired and two-sample t-tests (and get the corresponding 95% confidence intervals), later we’ll also use it to do Welch’s t-test. The way you tell R which one to do depends on how you pass t.test
your data and what arguments you specify.
Let’s consider the schizophrenia data set again:
If you give t.test
a single vector of data, it will by default do a one sample t-test. If this single vector happens to be differences between two variables you are doing a paired t-test. If you give t.test
two vectors of data, it will be default do a Welch’s t-test, but you can change this behaviour by specifing additional arguments. Add paired = TRUE
to get a paired t-test instead, or var.equal = TRUE
to get the two sample t-test based on the equal variance assumption.
So, one way to get a paired t-test for the Schizophrenia data, is to manually calculate the differences then pass them to t-test:
An alternative is to pass the two responses of interest and add the argument, paired = TRUE
.
You should check the output from the two above commands is the same (ignoring the labels)
The two sample t-test is not appropriate for this data, but R won’t stop us from doing a two-sample t-test (you should stop you from doing a two sample t-test when you should do a paired t-test). We will do one here just to illustrate the difference in code. I’ve already alluded that one way to get a two sample t-test is to pass t.test
two vectors of data and add the var.equal = TRUE
argument.
An alternative is to use the formula approach: t.test(response ~ group, data = dataframe)
We need to pass a column that contains the responses (the brain volumes) and a column that contains the designation of which group the observation belongs to (Affected or Unaffected). Unfortunately, our data isn’t in an arrangement that allows that. Can you see the difference in these two ways of organizing the same data?
The second is arranged in a way we can use the formula approach to do a two sample t-test:
Actually, once the data are in this form we also have a third alternative to the paired t-test:
Personally, I like the formula interface better, there’s less typing of the data.frame name, and it extends more easily to the more complicated modelling functions in R that do ANOVA and regression. The downside is that sometimes your data doesn’t come in the right “shape” to use it. I’ve summarised the options in this document.
Run the following code to get some datasets to practice with:
For the following questions, examine the dataset to determine which arrangement it is, then use the appropriate t.test
command to complete the test:
case0102
.tv
.mpg
.Randomization tests can be quite easily programmed (see Winter 2012’s Lab #2 if you are interested), but there are also packages that will do them for you.
One example is the oneway_test
in the package coin
. First, install and load the coin pacakge.
The syntax requires a formula specifying the outcome variable and grouping variable, a data
argument, and a specification of how the null distirbution should calculated: assymptotically, approximately or exactly.
The 500,000 random groupings used for the creativity study in the textbook and class can be reproduced with
The p-value will be slightly different every time becasue only 500,000 of the possible 16 trillion random groupings are sampled.
An exact p-value (equivalant to finding all 16 trillion random groupings) can be obtained by changing the distribution argument
Read the help on t.test (i.e. ?t.test
). Can you figure out how to make it report a 90% confidence interval? Can you figure out how to test the null hypothesis other than the mean difference equals zero (paired) or difference in means equals zero (two-sample)?