Stat 301 – Final Exam Review Questions


These review questions give you more practice with the more recent material, as well as deciding which method to use.  The Case Study might be a better example of the type of question I will ask on the final (you have to do a bit more figuring out of how to proceed, and I can ask questions ranging from data collection, exploratory data analysis, inferential analysis, summarizing conclusions, suggesting next steps).


1) Recall from the Exam 2 review problems the weights of 30 (fun-size) Mounds candy bars and 20 (fun-size) PayDay candy bars, in grams. Suppose we consider these to be independent random samples from these two brands.

(d) State null and alternative hypotheses for comparing the mean weights of these two population, both in symbols and in words.

(e) Do you think a theory-based analysis would be appropriate for these data?  Explain how you are deciding.


Below are the results of 1000 random shuffles of these 50 weights into two groups.

(f) Is this distribution approximately normal? Would you have expected this? Explain.

(g) Would you expect this distribution to follow a t distribution?  Explain. 

(h) Use the above output to roughly roughly approximate the p-value.  Explain how.

(i) Explain a “difficulty” with using this simulation approach to analyze these data.

(j) Assuming it’s valid, how would you interpret this confidence interval



2) A study examined whether a nicotine lozenge can help a smoker to quit. The research article reports on many background variables, such as age, weight, gender, number of cigarettes smoked, and whether the person made a previous attempt to quit smoking (Shiffman et al., 2002). Suppose the researchers want to compare the distributions of the background variables between the two treatment groups (nicotine lozenge or placebo lozenge).

(a) For each of the five variables listed, indicate whether it calls for a comparison of means or a comparison of proportions.

(b) Would the researchers hope to reject the null hypotheses or fail to reject the null hypotheses in these tests? Explain.

(c) Of the 459 nicotine users, 46.0% successfully abstained (didn’t start smoking again) for 6 weeks, compared to 29.7% of the 458 control group (without nicotine). Calculate and interpret a 95% confidence interval.

(d) Are you willing to draw a cause-and-effect conclusion from this study? If not, suggest a possible confounding variable and explain how it is confounding in this study.

(e) Are you willing to generalize these results to all smokers interested in quitting? If not, suggest a possible source of sampling bias and the likely direction of the bias.


3) Researchers examined the long-term survival of doctors graduating from one medical school over one century (Redelmeier and Kwong, 2004), comparing those who were presidents of their class to those who appeared alphabetically before or alphabetically after the president in the graduating class photograph.  Statistics on long-term mortality were obtained from licensing authorities, medical obituaries, professional associations, alumni records, and national physician directories (follow-up 94%).  They reported on 507 presidents and 1014 classmates.

(a) Is it reasonable to treat the presidents and non-presidents as independent random samples?

Assuming the answer to (a) is yes:

(b) The researchers examined several base-line variables, including gender and whether or not the individual wore glasses.  They found 93% of the presidents were male, compared to 85% of their classmates.  They also found 9% of presidents were glasses, compare to 12% of their classmates.  Are either of these differences statistically significant?

(c) The overall-life expectancy for the presidents was 49.0 years compared to 51.4 years for their classmates.  The two-sided p-value was reported to be .036.  Assuming the sample standard deviations were similar in the two samples, use trial-and-error in JMP, R, or Theory-Based Inference applet or algebra to approximate the value of this standard deviation.  What conclusion would you draw from this p-value?

4) A poll conducted March 6–8, 2004, by The Wall Street Journal/NBC News asked 1,018 respondents their opinions about gay marriage. When asked to state whether they would favor or oppose “a constitutional amendment making it illegal for gay couples to marry,” 43% responded in favor and 52% opposed (5% were unsure).  When asked whether they would favor or oppose “a constitutional amendment that defined marriage as a union between a man and a woman and made same-sex marriages unconstitutional,” 54% favored the amendment, 42% opposed (1% said it depends, and 3% were not sure).  Would it be valid to do a two-sample z-test with these data?  Explain.

5) In a study reported in the July 6, 2007 issue of the journal Science, researchers studied 396 American college students and kept track of each student’s sex and also how many words they spoke in a day. They found that females spoke an average of 16,215 words per day and males an average of 15,669 words per day. 

Consider the following variables:

For each research question below, which theory-based method would you consider:

·         One-proportion z-test or interval

·         One-mean t-test or interval

·         Two-proportion z-test or interval

·         Two-mean t-test or interval

Briefly justify your answer.

(a) Do women tend to use more words than men?

(b) How often does the proportion of adjectives a person uses in a day exceed 0.25? In other words, estimate the probability more than 25% of the words someone uses in a day are adjectives.

(c) Are women more likely than men to use more than 15,000 words per day?

(d) Do people tend to talk more (use more words) on the weekends or on the weekdays?


6)The Roller Coaster Database maintains a web site ( with data on roller coasters around the world.  Some of the data recorded include whether the coaster is made of wood or steel and the maximum speed achieved by the coaster, in miles per hour.  The boxplots display the distributions of speed by type of coaster for 145 coasters in the United States as of Nov. 2003.

(a) Do these boxplots allow you to determine whether there are more wooden or steel roller coasters?

(b) Do these boxplots allow you to say which type has a higher percentage of coasters that go faster than 60mph?  Explain and, if so, answer the question.

(c) Do these boxplots allow you to say which type has a higher percentage of coasters that go faster than 50mph?  Explain and, if so, answer the question.

(d) Do these boxplots allow you to say which type has a higher percentage of coasters that go faster than 48mph?  Explain and, if so, answer the question. 

(e) The steel coasters have a “high outlier.” Explain how I know this from the above display and interpret this outlier in context. What would be your next step in analyzing these data?

(f) Conjecture as to how the mean, median, interquartile range, and standard deviation will change (if at all) if that coaster identified in part (e) (Top Thrill Dragster in Cedar Point Amusement Park, Sandusky, Ohio) is removed from the data set.  Explain your reasoning.