Sample Final Exam

1) The following data are the point totals for the UOP Men's Basketball team in their first 8 victories this season:

80  72  68  55  80  78  90  85

(a) (5 pts) Make a stemplot of these winning point totals and describe the shape of the distribution.
(b) (3 pts) Would the Five Number Summary or the mean and standard deviation be a better summary for this distribution? Explain your choice.

2) Two investigators wanted to study the heights of 18-24 year old men in Stockton. One investigator, Happy Harry, took a random sample of 100 men. The other investigator, Tired Tony, took a random sample of 1000 men.
(a) (2 pts) If each investigator finds the average height of the men in his sample, which investigator, Harry or Tony, should find a larger average, or will they be about the same? Explain.
(b) (3 pts) Which sample, Harry or Tony's, should have less bias or will they be about the same? Explain.
(c) (3 pts) Which estimate of the population mean, Harry or Tony's, should have higher precision, or will they be about the same? Explain.

3) In 1988, men averaged abut 500 on the math SAT, the standard deviation was about 100, and their scores followed a Normal distribution. One of the men who took the math SAT in 1988 will be picked at random, and you have to guess his test score. You will be given 50 dollars if you guess it right to within 50 points.
(a) (2 pts) What one number should you guess?
(b) (5 pts) With this guess, what is your probability of winning the 50 dollars?
Extra Credit: What is your expected winnings?

4) The distribution for a population of test scores is displayed below on the left. Each of the other five graphs, labeled A to E represent possible sampling distributions of sample means for 500 random samples drawn from the population. (Justify choices)
(a) (2 pts) Which graph represents a sampling distribution of sample means for samples of size 1? A B C D E
(b) (2 pts) Which graph represents a sampling distribution of sample means for samples of size 9? A B C D E

Population Distribution

5) A social research scientist wants to test whether the percentage of Republicans who favor the death penalty is greater than the percentage of Democrats who are in favor of the death penalty.  Suppose the sample data showed that the percentage of Republicans who are in favor of the death penalty is 42% and the percentage of Democrats who are in favor of the death penalty is 40%.
(a) (2 pts) Write down the null and alternative hypotheses for this test.
(b) (3 pts) The p-value for this test is .0021. The 95% confidence interval for p1-p2 is (.00637,.03363). Which of the following conclusions do you think is more appropriate to draw?

  1. There is evidence of a large difference in the two proportions.
  2. There is strong evidence of a difference in the two proportions.
Explain.

(c) (2 pts) Which conclusion does a p-value better support? Explain.
(d) (2 pts) Which conclusion does a confidence interval better support? Explain.

6) In a clinical trial, data collection usually starts at "baseline", when the subjects are recruited into the trial but before they are randomized to treatment and control groups. Data collection continues until the end of follow-up. Two clinical trials on prevention of heart attacks report baseline data on weight, shown below.
 
    Number of persons Average weight Standard deviation
Trial 1 Treatment 1,012 185 lb 25 lb
  Control 997 143 lb 26 lb
Trial 2 Treatment 995 166 lb 27 lb
  Control 1,017 163 lb 25 lb

(a) (4 pts) In one of these trials, the randomization did not achieve the desired result. Which trial and why do you say so? How will this affect our results and conclusions for this study? (Hint: make sure you focus on the most serious difficulty)
(b) (4 pts) Below are ten people and their weights. Randomly assign them to one treatment group and a control group (start with line 139 of Table B). Clearly show your work.
 
Bob 148 Tom 174 Joe 148 Fred 133 Sam 157
Curt 177 Al 162 Harry 188 Gami 160 Dan 188

7) Can pleasant aromas help a student learn better? Two researchers believed that the presence of a floral scent could improve a person's learning ability in certain situations. They had ten people work through a pencil and paper maze 2 times, first wearing an unscented mask and then wearing a scented mask. Tests measured the length of time it took subjects to complete each of the two trials. They reported that, on average, subjects wearing the floral-scented mask completed the maze more quickly than those wearing the unscented mask.
(a) (3 pts) Is this an observational study, survey, or experiment? Explain.
(b) (2 pts) Identify the response and explanatory variables.
(c) (4 pts) Explain how confounding makes the results of this study worthless.
(d) (4 pts) Sketch an outline of a more appropriate design for the study.

8) NCAA collected data on graduation rates of athletes in Division I in the mid-1980s. Among 2,332 men, 1,343 had not graduated from college, and among 959 women, 441 had not graduated.
(a) (3 pts) Set up a two-way table to examine the relationship between gender and graduation.
(b) (3 pts) Calculate a couple of conditional percentages to describe the relationship between gender and graduation.
(c) (3 pts) Identify a test procedure would be appropriate for analyzing this relationship? State the null and alternative hypotheses.
(d) (3 pts) What type of distribution does the test statistic you describe in (c) follow? For what values of this test statistic will you reject the null hypothesis at the 5% level?
(e) (2 pts) If the above result is significant, would this mean that if some people have a sex change they will increase their chance of graduating? Explain briefly.

9) A panel of trained testers judged the flavor quality of different vanilla frozen desserts (frozen yogurts, ice milks, other frozen desserts) measured on a scale from 0 to 100. The data are from a Consumer Reports article "Low-fat frozen desserts: Better for you than ice cream?" (August, 1992). Below is a graphical summary of the data.

Here is most of the ANOVA output from the computer:

ANALYSIS OF VARIANCE ON rating
 
SOURCE
DF
SS
MS
F
p
TYPE
 
6364
3182
   
ERROR
24
3031
126
   
TOTAL
 
9395
     

(a) (2 pts) Explain briefly why ANOVA was the appropriate analysis for these data.
(b) (2 pts) State the null and alternative hypotheses.
(c) (4 pts) Finish the ANOVA table giving the F-statistic, degrees of freedom, and approximating the p-value. Show your work. What is your conclusion about the flavor quality of the different desserts?
(d) (2 pts) Based on the graph, do you feel the technical assumptions needed for the validity of this test procedure are valid?

10) A random sample of 7 households was obtained, and information on their income and food expenditures for the past month was collected. The data (in hundreds of dollars) are given below.
 
Income ($100's) 22 32 16 37 12 27 17
Food Expend ($100's) 7 8 5 10 4 6 6

Here's the Minitab output:

The regression equation is

expend = 1.87 + 0.202 income
 
Predictor
Coef
Stdev
t-ratio
p
Constant
1.8690
0.9068
2.06
0.094
income
0.20195
0.03661
5.52
0.003

s = 0.8181

R-sq = 85.9%

R-sq(adj) = 83.1%

Here's a scatterplot of these data with the regression line superimposed.

(a) (2 pts) Describe the direction and strength of the association.
(b) (2 pts) On the graph, identify the point which you think has the largest residual. Explain.
(c) (2 pts) On the graph, identify the point which you think has the most influence on the position of the regression line, and how the line would change if it was removed. Explain.
(d) (3 pts) Provide an interpretation of the number .202 in the regression equation in the context of these data. Exactly what does this value tell us?
(e) (4 pts) Is there evidence of a statistically significant relationship between income and food expenditure? Make sure you clearly explain the basis for your answer.
(f) (2 pts) Explain why you would not recommend using this relationship to predict the food expenditure for a household with an income of $5,200.

11) National data show that, on the average, college freshmen spend 7.5 hours a week going to parties. President DeRosa doesn't believe that these figures apply at UOP. He takes a simple random sample of 50 freshmen, and interviews them. He finds that the 95% confidence interval for the number of hours spent a week going to parties is (5.72, 7.42).
(a) (4 pts) Explain to the President what he means by the phrase "95% confidence".

Now he wants to test the hypothesis that the mean for UOP is different from the national mean at a 5% significance level.
(b) (2 pts) Specify the null and alternative hypotheses for this test.
(c) (2 pts) Indicate a test procedure he could use to conduct this test.
(d) (3 pts) Eager to gain favor with the president, you tell him that you can save him lots of time because, based on the data already presented, you know what he will conclude and he doesn't have to perform any additional calculations. Does he reject or fail to reject the null hypothesis at the 5% level? Explain.

Extra Credit

Suppose you take 50 measurements on the speed of cars on Interstate 5, and that these measurements follow roughly a Normal distribution. Do you expect the standard deviation of these 50 measurements to be about 1 mph, 5 mph, 10 mph, or 20 mph? Explain.