**Stat 301 – Review 2
Problems **

**1) (~5 pts)** Weights of 30 (fun-size) *Mounds* candy bars and 20 (fun-size) *PayDay* candy bars, in grams, are shown
in the dotplots below.

Explain your
answers to the following:

(a) Which
distribution would you consider skewed to the right?

(b) Which
distribution do you expect has a larger mean?

(c) Which
distribution do you expect has a larger standard deviation?

(d) Which
distribution would you suspect will have its mean larger than its median?

**2)** **(~8 pts)** The highway miles per
gallon rating of the 1999 Volkswagen Passat was 31 mpg (*Consumer Reports*,
1999). The fuel efficiency that a driver obtains on an individual tank of
gasoline naturally varies from tankful to tankful. Suppose the mpg calculations
per tank of gas have a (long-run) mean of = 31 mpg and a standard deviation of = 3 mpg.

(a) Would it be
surprising to obtain 30.4 mpg on one tank of gas? Explain.

(b) Would it be surprising for a sample of 30
tanks of gas to produce a sample mean of 30.4 mpg or less? Explain, referring
to the CLT and to a sketch that you draw of the sampling distribution.

(c) Assess the
validity of your calculations in (a) and (b).

**3) (~15 pts) **The file AgeGuesses.txt
contains students’ guesses of my age on the first day of class a few years ago.

(a) Estimate and interpret a 95%
confidence interval for the population mean.

(b) Estimate and interpret a 95%
confidence interval for the next student’s guess of my age.

(c) Which interval do you feel is more
meaningful in this context? Explain your reasoning.

(d) What information would you need to
know to decide whether, in the long-run, students are “biased” in how they
guess my age? If you did a test of
significance, would this be a one-sided or a two-sided test?

(e) Evaluate the validity of your
calculations in (a) and (b).

(g) Column 2 of the AgeGuesses data file
indicates whether the data were collected in Section 1 or Section 2. I changed something about my appearance
between the two sections, flipping a coin in advance to decide which appearance
I would use in each section. Suppose I find a statistically significant
difference in the average guess of my age between the two classes. Would you be
willing to attribute the change in the ages to the change I made in my
appearance? Explain why or why not.

**4)
(~ 7 pts) **In a recent study (Klein, Thomas, and Sutter, 2007),
researchers found that current smokers were more likely to have used candy
cigarettes as children than current nonsmokers were.

(a) Identify and classify the
explanatory and response variables.

(b) When first hearing of this study,
someone responded by saying, “Isn’t the smoking status of the parents a
confounding variable here?” Explain what “confounding variable” means in this
context, and describe how parents’ smoking status could be confounding (i.e.,
describe what would need to be true).

**5) (~ 20 pts)** Newspaper headlines proclaimed
that chocolate lovers live longer, following the publication of a study titled
“Life is Sweet: Candy Consumption and Longevity” in the *British Medical
Journal *(Lee and Paffenbarger, 1998). In 1988, researchers sent a health
questionnaire to men who entered Harvard University as undergraduates between
1916 and 1950. The study included 7841 men, free of cardiovascular disease and
cancer. From the questionnaire they determined whether the respondents consumed
candy “almost never” (3312 men) or “sometimes or often” (4529 men), and then
they tracked the participants to determine whether or not they had died by
1993.

(a) Identify
the observational units.

(b) Identify
the response variable.

(c) Identify
the explanatory variable.

(d) Was this an
experiment or an observational study? If an experiment, was it a randomized,
comparative experiment? If observational, was it a cohort, cross-classified, or
case-control study?

(e) Researchers
found that of respondents who admitted to consuming candy regularly, 267 had
died by the end of 1993, compared to 247 of the non-consumers of candy. Set up
the calculation for Fisher’s Exact Test for deciding whether candy consumers
are significantly less likely to have died than non-consumers by completing the
following:

p-value = P(X ) where X follows a __ __ distribution with parameters

*N*
= *M* = *n* =

(f) Suppose you
wanted to carry out a simulation to determine how surprising it is for two
random samples from the same population to give a difference in sample proportions
at least this large. Describe the
simulation process (if describing an applet, name the applet and the input
information you would use).

(g) The study
reported: “Between 1988 and 1993, 514 men died: 7.5%
of non-consumers, but only 5.9% of consumers (age adjusted relative risk 0.83;
95% confidence interval 0.70 to 0.98).” Interpret
this statement as if to someone who has never taken a statistics class.

Extra credit:
What do you think is meant by “age adjusted relative risk”?

(h) Based on
this interval, I would consider the comparison statistically significant. Why?

(i) This does
not appear to be a large difference (7.5% vs. 5.9%), are you surprised that
this result is statistically significant? Explain.

~~(j) The
study also reports: ~~~~We then examined different levels of candy
intake. Compared with non-consumers, the relative risks of mortality among men
who consumed candy 1-3 times a month (1704 men), 1-2 times a week (1589 men),
and 3 or more times a week (1236 men) were 0.64 (0.48 to 0.86), 0.73 (0.55 to
0.96), and 0.84 (0.64 to 1.11),~~

~~Does this
result provide evidence of a “dose-response”? Explain. ~~

(k) And then: “Finally, using life table analysis
truncated at age 95, we estimated that (after adjustment for age and cigarette
smoking) candy consumers enjoyed, on average, 0.92 (0.04 to 1.80) added years
of life, up to age 95, compared with non-consumers.“ Based on these results, are you willing
to conclude that eat candy leads to a longer life?

(l) To what
population are you willing to generalize these results? Explain.

**6) (~ 20 pts)** A
study of whether AZT helps to reduce transmission of AIDS from mother to baby
(Connor et al., 1994): Of the 180 babies whose mothers had been randomly
assigned to receive AZT, 13 babies were HIV-infected, compared to 40 of the 183
babies in the placebo group.

(a) Create a segmented bar graph to display
these results. Comment on what the graph reveals.

(b) Check the validity conditions for whether a
two-sample *z*-test can be applied to these data.

(c) If you were to carry out a simulation to
obtain a p-value, would you simulate random sampling or random assignment? Explain.

(d) Conduct an appropriate test of significance
to determine whether the data provide convincing evidence that AZT is more
effective than a placebo for reducing mother-to-infant transmission of AIDS.
Report the hypotheses, test statistic, and p-value. Also indicate the test
decision using .01 as the level of significance.

(e) Estimate the relative risk of transmission
with the placebo compared to AZT with a 95% confidence interval. Also be sure
to interpret this interval in context.

(f) Summarize the conclusion that you could draw
from this study (significance, estimation, causation, and generalizability).
Also explain the reasoning behind each component.

**7)
(~10 pts)** Consider the question of whether exposure to
second-hand smoke is harmful to the health of children.

(a) Describe a prospective
cohort observational study that could address this question.

(b) Describe a retrospective
case-control observational study that could address this question.

(c) Describe a cross-classified
observational study that could address this question.

(d) Describe how you could (in principle)
design an experiment to address this question.

(e) Would it be ethical to
conduct an experiment to address this question? Explain.

**8) Investigation 3.10**

**See also Examples 2.1, 3.1, 3.2**