Stat 411/511
# Lab 3

##### Oct 19/20

- Review the
`t.test`

function and it’s uses - Do a randomization test using the
`coin`

package

We’ve used the function `t.test`

to do paired and two-sample t-tests (and get the corresponding 95% confidence intervals), later we’ll also use it to do Welch’s t-test. The way you tell R which one to do depends on how you pass `t.test`

your data and what arguments you specify.

Let’s consider the schizophrenia data set again:

If you give `t.test`

**a single** vector of data, it will by default do a one sample t-test. If this single vector happens to be differences between two variables you are doing a paired t-test. If you give `t.test`

**two vectors** of data, it will be default do a Welch’s t-test, but you can change this behaviour by specifing additional arguments. Add `paired = TRUE`

to get a paired t-test instead, or `var.equal = TRUE`

to get the two sample t-test based on the equal variance assumption.

So, one way to get a paired t-test for the Schizophrenia data, is to manually calculate the differences then pass them to t-test:

An alternative is to pass the two responses of interest and add the argument, `paired = TRUE`

.

**You should check the output from the two above commands is the same (ignoring the labels)**

The two sample t-test is not appropriate for this data, but R won’t stop us from doing a two-sample t-test (**you** should stop **you** from doing a two sample t-test when you should do a paired t-test). We will do one here just to illustrate the difference in code. I’ve already alluded that one way to get a two sample t-test is to pass `t.test`

two vectors of data and add the `var.equal = TRUE`

argument.

An alternative is to use the formula approach: `t.test(response ~ group, data = dataframe)`

We need to pass a column that contains the responses (the brain volumes) and a column that contains the designation of which group the observation belongs to (Affected or Unaffected). Unfortunately, our data isn’t in an arrangement that allows that. **Can you see the difference in these two ways of organizing the same data?**

The second is arranged in a way we **can** use the formula approach to do a two sample t-test:

Actually, once the data are in this form we also have a third alternative to the **paired** t-test:

Personally, I like the formula interface better, there’s less typing of the data.frame name, and it extends more easily to the more complicated modelling functions in R that do ANOVA and regression. The downside is that sometimes your data doesn’t come in the right “shape” to use it. I’ve summarised the options in this document.

Run the following code to get some datasets to practice with:

For the following questions, examine the dataset to determine which arrangement it is, then use the appropriate `t.test`

command to complete the test:

- Compare salary between the two sexes using a two sample t-test with the dataset
`case0102`

. - Compare the hours of tv watched between husbands and wives using a paired t-test with the dataset
`tv`

. - Compare the mpg of automatic and manual versions of the same car using a paired t-test with the dataset
`mpg`

.

Randomization tests can be quite easily programmed (see Winter 2012’s Lab #2 if you are interested), but there are also packages that will do them for you.

One example is the `oneway_test`

in the package `coin`

. First, install and load the coin pacakge.

The syntax requires a formula specifying the outcome variable and grouping variable, a `data`

argument, and a specification of how the null distirbution should calculated: assymptotically, approximately or exactly.

The 500,000 random groupings used for the creativity study in the textbook and class can be reproduced with

The p-value will be slightly different every time becasue only 500,000 of the possible 16 trillion random groupings are sampled.

An exact p-value (equivalant to finding all 16 trillion random groupings) can be obtained by changing the distribution argument

Read the help on t.test (i.e. `?t.test`

). Can you figure out how to make it report a 90% confidence interval? Can you figure out how to test the null hypothesis other than the mean difference equals zero (paired) or difference in means equals zero (two-sample)?