Stat 301 – Review 2 Problems

 

1) Weights of 30 (fun-size) Mounds candy bars and 20 (fun-size) PayDay candy bars, in grams, are shown in the dotplots below.

(a) Which distribution would you consider skewed to the right?

(b) Which distribution do you expect has a larger mean?

(c) Which distribution do you expect has a larger standard deviation?

(d) Which distribution would you suspect will have its mean larger than its median?

 

2) The highway miles per gallon rating of the 1999 Volkswagen Passat was 31 mpg (Consumer Reports, 1999). The fuel efficiency that a driver obtains on an individual tank of gasoline naturally varies from tankful to tankful. Suppose the mpg calculations per tank of gas have a mean of  = 31 mpg and a standard deviation of  = 3 mpg.

(a) Would it be surprising to obtain 30.4 mpg on one tank of gas? Explain.

(b) Would it be surprising for a sample of 30 tanks of gas to produce a sample mean of 30.4 mpg or less? Explain, referring to the CLT and to a sketch that you draw of the sampling distribution.

(c) Assess the validity of your calculations in (a) and (b).

 

3) The file AgeGuesses.txt contains students’ guesses of my age on the first day of class a few years ago.

(a) Determine and interpret a 95% confidence interval for the population mean.

(b) Determine and interpret a 95% confidence interval for the next student’s guess of my age.

(c) Which interval do you feel is more meaningful in this context?

(d) What information would you need to know to decide whether students’ are “biased” in how they guess my age in this activity?  If you did a test of significance, would this be a one-sided or a two-sided test?

(e) Evaluate the validity of your calculations in (a) and (b).

(f)  Interpret the following JMP output

What is being estimated? What do you think is meant by “actual confidence” and why is it important?

(g) Column 2 indicates whether the data were collected in Section 1 or Section 2.  I changed something about my appearance between the two sections. Suppose I find a statistically significant difference in the average guess of my age between the two classes, flipping a coin in advance to decide which appearance I would use in each section. Would you be willing to attribute the change in the ages to the change I made in my appearance? Explain why or why not.

 

4) In a recent study (Klein, Thomas, and Sutter, 2007), researchers found that current smokers were more likely to have used candy cigarettes as children than current nonsmokers were.

(a) Identify and classify the explanatory and response variables.

(b) When first hearing of this study, someone responded by saying, “Isn’t the smoking status of the parents a confounding variable here?” Explain what “confounding variable” means in this context, and describe how parents’ smoking status could be confounding (i.e., describe what would need to be true).


5)
Newspaper headlines proclaimed that chocolate lovers live longer, following the publication of a study titled “Life is Sweet: Candy Consumption and Longevity” in the British Medical Journal (Lee and Paffenbarger, 1998). In 1988, researchers sent a health questionnaire to men who entered Harvard University as undergraduates between 1916 and 1950. The study included 7841 men, free of cardiovascular disease and cancer. From the questionnaire they determined whether the respondents consumed candy “almost never” (3312 men) or “sometimes or often” (4529 men), and then they tracked the participants to determine whether or not they had died by 1993.

(a) Identify the observational units.

(b) Identify the response variable.

(c) Identify the explanatory variable.

(d) Was this an experiment or an observational study? If an experiment, was it a randomized, comparative experiment? If observational, was it a case-control study?

(e) Researchers found that of respondents who admitted to consuming candy regularly, 267 had died by the end of 1993, compared to 247 of the non-consumers of candy. Set up the calculation for Fisher’s Exact Test for deciding whether candy consumers are significantly less likely to have died than non-consumers by completing the following:

 

          p-value = P(X                   ) where X follows a                                    distribution with parameters

 

          N =                                    M =                         n =                         

 

(f) The study reported: “Between 1988 and 1993, 514 men died: 7.5% of non-consumers, but only 5.9% of consumers (age adjusted relative risk 0.83; 95% confidence interval 0.70 to 0.98).” Interpret this statement as if to someone who has never taken a statistics class.  In particular, what do you think is meant by “age adjusted relative risk”?

(g) Based on this interval, I would consider the comparison statistically significant.  Why?

(h) This does not appear to be a large difference (7.5% vs. 5.9%), are you surprised that this result is statistically significant? Explain.

(i) The study also reports: We then examined different levels of candy intake. Compared with non-consumers, the relative risks of mortality among men who consumed candy 1-3 times a month (1704 men), 1-2 times a week (1589 men), and 3 or more times a week (1236 men) were 0.64 (0.48 to 0.86), 0.73 (0.55 to 0.96), and 0.84 (0.64 to 1.11),

Does this result provide evidence of a “dose-response”? Explain.

(j) And then: “Finally, using life table analysis truncated at age 95, we estimated that (after adjustment for age and cigarette smoking) candy consumers enjoyed, on average, 0.92 (0.04 to 1.80) added years of life, up to age 95, compared with non-consumers.“ Based on these results, are you willing to conclude that eat candy leads to a longer life?

(k) What population are you willing to generalize these results to? Explain.

 

6) A study of whether AZT helps to reduce transmission of AIDS from mother to baby (Connor et al., 1994): Of the 180 babies whose mothers had been randomly assigned to receive AZT, 13 babies were HIV-infected, compared to 40 of the 183 babies in the placebo group.

(a) Create a segmented bar graph to display these results. Comment on what the graph reveals.

(b) Check the validity conditions for whether a two-sample z-test can be applied to these data.

(c) If you were to carry out a simulation to obtain a p-value, would you simulate random sampling or random assignment?  Explain.

(d) Conduct an appropriate test of significance to determine whether the data provide convincing evidence that AZT is more effective than a placebo for reducing mother-to-infant transmission of AIDS. Report the hypotheses, test statistic, and p-value. Also indicate the test decision using .01 as the level of significance.

(e) Estimate the difference in the risk of transmission with the placebo compared to AZT with a 99% confidence interval. Also be sure to interpret this interval in context.

(f) Summarize the conclusion that you could draw from this study (significance, estimation, causation, and generalizability). Also explain the reasoning behind each component.