**Stat 301 – Final Exam Review Questions**

*These review questions give you more practice with the more recent
material, as well as deciding which method to use. The Case Study might be a better example of
the type of question I will ask on the final (you have to do a bit more
figuring out of how to proceed, and I can ask questions ranging from data
collection, exploratory data analysis, inferential analysis, summarizing
conclusions, suggesting next steps).*

**1)** Recall from the Exam 2 review problems the
weights of 30 (fun-size) *Mounds* candy bars and 20
(fun-size) *PayDay* candy bars, in
grams. Suppose we consider these to be independent random samples from these
two brands.

(d) State null and alternative hypotheses for
comparing the mean weights of these two population, both in symbols and in
words.

(e) Do you think a theory-based analysis would
be appropriate for these data? Explain how you are deciding.

Below are the results of 1000 random shuffles of
these 50 weights into two groups.

(f) Is this distribution approximately normal?
Would you have expected this? Explain.

(g) Would you expect this distribution to follow
a *t* distribution? Explain.

(h) Use the above output to roughly roughly approximate the p-value. Explain how.

(i) Explain a “difficulty” with using this
simulation approach to analyze these data.

(j) Assuming it’s valid, how would you interpret
this confidence interval

**2)** A study examined whether a nicotine lozenge can
help a smoker to quit. The research article reports on many background
variables, such as age, weight, gender, number of cigarettes smoked, and
whether the person made a previous attempt to quit smoking (Shiffman et al.,
2002). Suppose the researchers want to compare the distributions of the
background variables between the two treatment groups (nicotine lozenge or
placebo lozenge).

(a) For each of the five variables listed,
indicate whether it calls for a comparison of means or a comparison of
proportions.

(b) Would the researchers hope to reject the
null hypotheses or fail to reject the null hypotheses in these tests? Explain.

(c) Of the 459 nicotine users, 46.0%
successfully abstained (didn’t start smoking again) for 6 weeks, compared to
29.7% of the 458 control group (without nicotine). Calculate and interpret a
95% confidence interval.

(d) Are you
willing to draw a cause-and-effect conclusion from this study? If not, suggest
a possible confounding variable and explain how it is confounding in this
study.

(e) Are you
willing to generalize these results to all smokers interested in quitting? If
not, suggest a possible source of sampling bias and the likely direction of the
bias.

**3)** Researchers examined the long-term survival of
doctors graduating from one medical school over one century (Redelmeier and Kwong, 2004), comparing those who were
presidents of their class to those who appeared alphabetically before or
alphabetically after the president in the graduating class photograph. Statistics on long-term mortality were
obtained from licensing authorities, medical obituaries, professional
associations, alumni records, and national physician directories (follow-up
94%). They reported on 507 presidents
and 1014 classmates.

(a) Is it reasonable to treat the presidents and
non-presidents as independent random samples?

Assuming the answer to
(a) is yes:

(b) The researchers
examined several base-line variables, including gender and whether or not the
individual wore glasses. They found 93%
of the presidents were male, compared to 85% of their classmates. They also found 9% of presidents were
glasses, compare to 12% of their classmates.
Are either of these differences statistically significant?

(c) The overall-life expectancy for the presidents was 49.0
years compared to 51.4 years for their classmates. The two-sided p-value was reported to be
.036. Assuming the sample standard deviations
were similar in the two samples, use trial-and-error in JMP, R, or Theory-Based
Inference applet or algebra to approximate the value of this standard
deviation. What conclusion would you
draw from this p-value?

**4)** A poll conducted March
6–8, 2004, by *The Wall Street Journal/NBC
News* asked 1,018 respondents their opinions about gay marriage. When asked
to state whether they would favor or oppose “a constitutional amendment making
it illegal for gay couples to marry,” 43% responded in favor and 52% opposed
(5% were unsure). When asked whether
they would favor or oppose “a constitutional amendment that defined marriage as
a union between a man and a woman and made same-sex marriages
unconstitutional,” 54% favored the amendment, 42% opposed (1% said it depends,
and 3% were not sure). Would it be valid
to do a two-sample *z*-test with these
data? Explain.

**5) **In a study reported in the July 6, 2007 issue
of the journal *Science*, researchers studied 396 American college
students and kept track of each student’s sex and also how many words they
spoke in a day. They found that females spoke an average of 16,215 words per
day and males an average of 15,669 words per day.

Consider
the following variables:

- Sex
- Average number of words spoken per day
- Number of adjectives used per day
- Proportion of words spoken in a day by each student that were
adjectives
- Whether more than 15,000 words were spoken

For each research question below, which theory-based method would
you consider:

·
One-proportion *z*-test or
interval

·
One-mean *t*-test or
interval

·
Two-proportion *z*-test or
interval

·
Two-mean *t*-test or
interval

Briefly
justify your answer.

(a) Do
women tend to use more words than men?

(b) How often does the proportion of adjectives a person uses in a
day exceed 0.25? In other words, estimate the probability more than 25% of the
words someone uses in a day are adjectives.

(c) Are
women more likely than men to use more than 15,000 words per day?

(d) Do
people tend to talk more (use more words) on the weekends or on the weekdays?

**6)**The Roller Coaster Database
maintains a web site (www.rcdb.com) with
data on roller coasters around the world. Some of the data recorded
include whether the coaster is made of wood or steel and the maximum speed
achieved by the coaster, in miles per hour. The boxplots display the
distributions of speed by type of coaster for 145 coasters in the United
States as of Nov. 2003.

(a) Do
these boxplots allow you to determine whether there are more wooden or steel
roller coasters?

(b) Do
these boxplots allow you to say which type has a higher percentage of coasters
that go faster than 60mph? Explain and, if so, answer the question.

(c) Do
these boxplots allow you to say which type has a higher percentage of coasters
that go faster than 50mph? Explain and, if so, answer the question.

(d) Do
these boxplots allow you to say which type has a higher percentage of coasters
that go faster than 48mph? Explain and, if so, answer the
question.

(e) The
steel coasters have a “high outlier.” Explain how I know this from the above
display and interpret this outlier in context. What would be your next step in
analyzing these data?

(f)
Conjecture as to how the mean, median, interquartile range, and standard
deviation will change (if at all) if that coaster identified in part (e) (Top
Thrill Dragster in Cedar Point Amusement Park, Sandusky, Ohio) is removed from
the data set. Explain your reasoning.