**Stat 301 - HW 7**

**Due midnight, Friday,
March 10**

*Remember to put your name(s) inside each file and, if
submitting together, join a HW group before you submit.
Remember to show your work/calculations/computer details (even if not
specifically asked) and to integrate this into the body of the solution. *

**1) **To investigate an
association between violent video games and aggressive behavior, British
researchers Hollingdale and Greitemeyer (2014)
randomly assigned 49 students from a university in the United Kingdom to play *Call of Duty: Modern Warfare* (a violent
video game) and 52 students to play *LittleBigPlanet
2* (a nonviolent/neutral video game). After 30 minutes of playing the video
games, the subjects were asked to complete a
marketing survey investigating a new hot chili sauce recipe. They were told
they were to prepare some chili sauce for a taste tester and that the taste
tester “couldn't stand hot chili sauce but was taking part due to good
payment.” They were then presented with what appeared to be a very hot chili
sauce and asked to spoon what they thought would be an appropriate amount into
a bowl for a new recipe. The amount of chili sauce was weighed in grams after
the participant left the experiment. The amount of chili sauce was used as a
measure of aggression: the more chili sauce, the greater the subject’s
aggression.

(a) Explain how and why random
assignment was used in this study.

(b) Load the VideoAgression data into the **Comparing Groups (Quantitative)** applet
(better yet, use the version from the pull-down menu,
but note the direction of subtraction changes.) Screen capture the
numerical and graphical summaries of the data comparing the two groups. Summarize what you learn about the shapes,
centers, and spreads of each group (= sample).

(c) *In words, *state appropriate
null and alternative hypotheses to test whether there is an association between
type of video games and level of aggression.

(d) Carry out a randomization test for these data. (Use 10,000 shuffles, might take a second 😊. Note: R won’t do the exact distribution for me because the sample size is too large!) Include a screen capture of the resulting null distribution with the p-value shaded. Summarize the conclusions you would draw in terms of significance, causation, and generalizability.

(e) Do you think two-sample *t*-procedures are likely to be valid with
these data?

(f) Use the pull-down menu to select the
*t*-statistic. Report the observed value of the *t*-statistic for the
actual study (this is unpooled if you want to verify
its value) and use it to determine the simulation-based **and** the *t*-distribution-based p-values. Include a screen
capture. How do they p-values compare?

(g) Calculate (you
can use the applet) a 95% confidence interval for the difference in the
treatment means. Carefully interpret your interval (*Hint*: What is the
parameter?)

(h) Calculate the “independent samples” unpooled standard error for the difference in sample means.
(Show your work.)

(i) Calculate the “independent samples”
pooled standard error for the difference in sample means. (Show your work.)

(j) Give one reason why the pooled
standard error might be a reasonable assumption here (e.g., how was the study
conducted) and one piece of evidence that indicates it is not a reasonable
assumption (*Hint*: An informal check of the equal variance assumption is
the larger sample standard deviation is not more than twice the smaller sample
standard deviation.)

(k) In randomization test, where we
shuffle all existing observations across the two groups, another formula makes
use of the finite population correction factor, the assumption of the null
distribution, and the fact that the groups aren’t really independent. In
particular, if all of the high scores are randomly assigned to one group, then
all of the low scores must go to the other group, creating a larger difference
between the groups.

But in our case, the
correlation between the sample means (because of our assumptions in the
simulation) is simply -1.

Note that I am calling these “sigmas” because I am going to treat these *N* = 101 observations ~~(in each
group)~~ as the population. . You can use Excel to calculate using STDEV.P, the population standard
deviation divides by the population size, rather than the sample size minus
one. Calculate this standard deviation.
(Excel or some other online tool wouldn’t be a bad idea here either, but show
your work!)

(l) Which of the 3 standard deviations
best matches your simulation results in (d)? Which SD largest?

Note:
Instead of using this very complicated standard deviation formula, we
will trust in the t-distribution to make the right adjustments (uses a bigger
denominator because has more of the bigger differences than might predict)!

Does 10 appear to be a plausible value
for the increase in average aggression with more violent games? In the applet, change
the statistic back to the difference in group means and specify the
hypothesized difference in “population means” as **10** (or -10 depending on
direction of subtraction). Specify **1** as the number of shuffles as select
the **Plot** radio button.

(m) Press **Shuffle Responses** and watch the animation. Explain in your own
words what this animation is doing and why.
Explain how this matches our new null hypothesis.

Set the number of Shuffles to 1000 and
regenerate the randomization distribution of the *difference in sample means.*
Include a screen capture.

(n) How do the values of the mean and
standard deviation compare to (d). Which change(s) and why/why not?

(o) Generate a two-sided p-value
(include a screen capture). What conclusion do you draw in context?

(p) Is your conclusion in (o) consistent with your confidence interval in
(g)? Explain.

** Demo** Using the fancy new formula with our
sleep deprivation/reaction time data where we did see a bit more of a
difference between the initial formula and the simulation results.

Population standard deviation for all 21
subjects: 15.06

Does it better match the simulation results
than the unpooled standard error formula?