#' ---
#' title: "Homework 3"
#' author: "Charlotte"
#' date: "23 October 2015"
#' output: word_document
#' ---
#'
library(ggplot2)
#' ## Q1 Two sample t-test
cdc <- read.csv(url('http://stat511.cwick.co.nz/homeworks/cdc.csv'))
cdc$wt_diff <- with(cdc, weight - wtdesire)
t.test(wt_diff ~ exerany, data = cdc, var.equal = TRUE)
#'
#'*(1pt for code)*
#'
#' ### Statistical Summary:
#'
#' There is **strong** evidence to suggest that the **mean desired weight loss** for **those who exercised in the last month** is not equal to the **mean desired weight loss** for **those who did not exercise in the last month**
#' (two sample t-test statistic = 3.245, df = 998, **p = 0.001**).
#'
#' It is estimated that those who exercised in the last month
#' have a mean desired weight loss **5.5 lbs lower** than those who did not
#' exercise in the last month.
#'
#' With 95% confidence the mean desired weight loss for those who
#' exercised in the last month is between **2.2 and 8.8 lbs lower** than the
#' mean desired weight loss of those who did not
#' exercise in the last month.
#'
#'*(3pts summary)*
#'
#' ## Q2 A Randomization Test
#' ### Parts 1 and 2)
#'
#' *(1pt for each part: groupings, test-stats, null dist , p-value, summary, study design)*
#'
#+ echo = FALSE
hw3 <- data.frame(Score = c(0, 3, 0, 3, 9),
Treatment = c("Placebo", "Placebo",
"New drug", "New drug", "New drug"))
obs_test_stat <- mean(subset(hw3, Treatment == "New drug")$Score) -
mean(subset(hw3, Treatment == "Placebo")$Score)
placebo_groups <- combn(1:nrow(hw3), 2, simplify = FALSE)
test_stats <- sapply(placebo_groups, function(group){
stat <-mean(hw3[-group,"Score"]) - mean(hw3[group, "Score"])
})
group_labels <- t(sapply(placebo_groups, function(group){
c(paste(hw3[group,"Score"], collapse = ", "),
paste(hw3[-group,"Score"], collapse = ", "))
}))
tab <- data.frame( group_labels, test_stats)
names(tab) <- c("Placebo scores", "New drug Scores", "Test statistic")
tab
#'
#' ### Part 3)
#' Since, the patients were randomly assigned to groups, all 10 random combinations
#' of patients to two groups (size 2 and 3) are equally likely to occur. So,
#' the test-statistics from these 10 groupings represent the randomization distribution.
#' By assuming the null hypothesis is true, we are able to calculate these test
#' statistics because if the effect of the new drug does not differ from the effect of
#' the placebo, then the 5 patients' observed BP changes would occur no matter which
#' treatment they were assigned to (e.g., no matter what group subject A was in we would
#' observe a reduction in BP of 0).
#' These test statistics are therefore the randomization distribution assuming
#' the null is true, by definition the null distribution.
#' The null distribution is plotted below (not required).
#+ fig.width = 6, fig.height = 2, echo = FALSE
ggplot(data=tab,aes(x=test_stats)) +
geom_dotplot(binwidth = .5) +
ggtitle("Null Distribution for Drug Randomization Test") +
scale_x_continuous(breaks = c(-5, -2.5, 0, 2.5, 5))
p <- mean(abs(test_stats) >= obs_test_stat)
#' ### Part 4)
#' The observed test statistic is `r obs_test_stat`. In total there were
#' `r sum(abs(test_stats) >= obs_test_stat)` random groupings that
#' gave a test-statistic as extreme or more extreme than the observed
#' test-statistic (i.e. further or as far from zero).
#' **p-value = `r sum(abs(test_stats) >= obs_test_stat)`/10** = `r p`.
#'
#' ### Part 5)
#' There is **no evidence** to suggest that the effect of the new drug
#' differed from the effect of the placebo **for patients in this
#' study** (randomization test, p = 0.9).
#'
#' ### Part 6)
#' This experiment was poorly designed because there is no way to reject
#' the null hypothesis given this study's null distribution. The smallest
#' possible p-value we could get in this study is 1/10 = 0.10. A p-value of 0.10
#' is not small enough to reject the null hypothesis; therefore, given
#' this data, there is no way to reject the null.