April, 2015

Chapter 1             Chapter 2            Chapter 3       Chapter 4


This chapter again extends the lessons students have learned in earlier chapters to additional methods involving more than one variable. The material focuses on the new methods (e.g., chi-square tests, ANOVA, and regression) and follows the same progression as earlier topics: starting with exploring appropriate numerical and graphical summaries, proceeding to use simulations to construct empirical sampling/randomization (reference) distributions, and then considering probability models for drawing inferences. The material is fairly flexible if you want to pick and choose topics. Fuller treatments of these methods would require a second course.

Section 1: Two Categorical Variables

In this section with start with a k x 2 case and then expand to larger tables. We again start with a simulation and then apply the Chi-Square distribution and more traditional output.

Investigation 4.1: Dr. Spock's Trial

Timing/materials: Technology is used to create empirical sampling distributions (based on the binomial sampling model) in Investigation 4.1. There is a Minitab macro (SpockSim.mac) and an R script (Spock.R) that can be used. Or you can use the Analyzing Two-way Tables (javascript) applet. This investigation will take approximately 50-60 minutes.

In Investigation 4.1 you may want to spend a bit of time on the background process of the jury selection before students analyze the data. We also encourage you again to stop and discuss the numerical and graphical summaries that apply to these sample data (question (a)) before proceeding to inferential statements, as well as reconsidering the constant theme – could differences this large have plausibly occurred by chance alone? Once you get to inference, the key change is needing a more complicated test statistic (part h) and you may want to ask students to spend more time on that, even developing their own statistics (e.g., sum of all pairwise differences). After question (h), you will also want to explain the logic behind the chi-square statistic – every cell gives a positive contribution to the sum but it is also scaled in a way by the size of the cell's expected cell count. If students understand the formula, question (j) should be straightforward to them. After thinking about the behavior based on the formula, we then use simulation as a way of judging what values of the statistic should be considered large enough to be unusual and also to see what probability distribution might approximate the sampling distribution of the test statistic. Because we are simulating the drawing of independent random samples from several populations, we use the binomial distribution, as opposed to treating both margins as fixed like we did in Chapter 2. This is done by the R and Minitab macros. Alternatively, the two-way table applet shuffles the response variable values randomly among the explanatory variable values. (But note the typo in how to enter the data into the applet, just enter the two-way table of counts.) You can also use the applet to explore other possible statistics such as the MAD and max – min.

[Both R and Minitab simulations randomly generate male/female counts for each judge's panel by sampling from a binomial distribution with ni equal to judge i's sample size and assuming p = .261 for the probability of a juror being female for each judge. Minitab stores the results totals for each gender are stored in C2-C8. Then the expected counts are computed based on the simulated total number of males and females and the "observed" counts are compared to the excepted counts. In Minitab, column 10 contains the 14 terms of the Chi-Square sum and the resulting Chi-Square statistic for each sample are stored in C11. Make sure students using Minitab remember to save both the .mac file and an empty worksheet file in the same folder and students using R define the variables they want to store results in (e.g., mychisq=0).]

[Note: Output windows can be enlarged to improve layout. When you check the Show X2 output box, the theoretical curve also overlays on the empirical randomization distribution. Using R or Minitab will allow you to change the df to try different theoretical models. In the applet, you can also toggle which column is the explanatory and which is the response, as well as which outcome is defined as success. When you enter larger tables, the MAD statistic is only calculated using one row as success, though you can select different rows.]

The simulation results are also used to help students see that the normal distribution does not provide a reasonable model of the empirical sampling distribution of this test statistic. We do not derive the chi-square distribution but do use probability plots to show that the Gamma distribution, of which the chi-square distribution is a special case, is appropriate (questions (p) and (q)). Again, we want them to realize that the probability model applies no matter the true value of the common population proportion p. We also encourage them to follow-up a significant chi-square statistic by seeing which cells of the table contribute the most to the chi-square sum as a way of further defining the source(s) of discrepancy (questions (s) and (t)). A practice problem on Type I Errors is used to further motivate the use of this "multi-sided" procedure for checking the equality of all population proportions simultaneously.

Investigation 4.2: Night Lights and Near-Sightedness (cont.)

Timing/materials: You may want to get students that realize that the randomization modeled here is different from Investigation 4.1, though we end up with the same theoretical model. You can also focus more on using Minitab and R or the applet to generate the chi-square analysis. With this approach, Investigation 2 and 3 and be completed rather quickly in 50-60 minutes.

Investigation 4.2 provides an application of the chi-square procedure but in the case of a cross-classified study. You might want to start by asking them what the segmented bar graph would have looked like if there was no association between the two variables. The "no association model" can also be simulated through a randomization test building on earlier simulations with quantitative response data. [See Technology Exploration: A macro is available for performing this simulation in Minitab (twoway.mac), just remember to save the macro file and the data file in the same folder. The Minitab macro randomizes the assignment of subjects to the lighting groups (assuming no relationship between eye condition and lighting). Then the macro creates the new two-way table (storing those results in columns C10-C12) and calculates the Chi-Square test statistic for each simulated table (storing the results in C20). You might want to ask students how they would modify this macro for a 4 × 4 table. A data file is available for importing the raw data into R. You can also use the Two-way Table applet.]

You may also wish to give students additional practice in applying the procedure, in addition to the practice problems, especially in distinguishing these different situations, (e.g., reminding them of the different data collection scenarios, the segmented bar graphs, the form of the hypotheses – comparing more than 2 population proportions, comparing population distributions on a categorical variable, association between categorical variables).

Investigation 4.3: Newspaper Credibility Decline

Investigation 4.3 focuses on the correspondence between the chi-square procedure for a 2 × 2 table and the two-sample z-test with a two-sided alternative.

Section 2: Comparing Several Population Means

Timing/materials: Minitab is used for descriptive statistics and randomization distributions using R or Minitab (DisabilityEmployment.txt) with a Minitab macro (RandomDisability.mac or the new Comparing Groups (Quantitative) javascript applet. See solutions for possible R code in Investigation 4.4. Technology is used again at the end of the investigation to carry out the ANOVA. Technology can be used briefly in Investigation 4.5 to calculate a p-value from the F distribution. The ANOVA simulation applet is also used heavily. This section should take about 65 minutes.

The focus on this section is on comparing two or more population means (or treatment means). You may want to cast this as the association between one categorical and one quantitative variable to parallel the previous section (though some suggest only applying this description with cross-classified studies). Again, we do not spend a large amount of time developing the details, seeing these analyses as straight forward implementations of previous tools with slight changes in the details of the calculation of a test statistic. We hope that students are well-prepared at this point to understand the reasoning behind the big idea of comparing within-group to between-group variation, but you might want to spend some extra time on this principle. You will also want to focus on emphasizing all the steps of a statistical analysis (examination of study design, numerical and graphical summaries, and statistical inference including defining the parameters of interest, stating the hypotheses, commenting on the technical conditions, calculation of test statistic and p-value, making a decision about the null hypothesis, and then finally stating an overall conclusion that touches on each of the issues).

Investigation 4.4 steps students through the calculations and comparison of within group and between group variability and uses a technology simulation to examine the empirical sampling distribution of the test statistic (question q). If you ask students to develop these simulations themselves they should get there but it may take a while. Question (o) is a key one for assessing whether students understand the basic principle. More details are supplied in the terminology detour and general technology instructions for carrying out an ANOVA analysis.

An applet exploration at the end of the Investigation steps them through using the javascript applet.

In Investigation 4.5, students initially practice calculating the F-statistic by hand. Another applet (ANOVA Simulation) is used to explore the effects of sample size, size of the difference in population means, and the common population variance on the ANOVA table and p-value. We have tried to use values that allow sufficient sensitivity in the applet to see some useful relationships. It is interesting for students to see the variability in the F-statistic and p-value from sample to sample both when the null hypothesis is true and when it is false. An interesting extension would be to collect the p-values from different random samples and examine a graph of their distribution, having students conjecture on its shape first.

Practice Problem 4.5A is a particularly interesting follow-up question, re-analyzing the Spock trial data using ANOVA instead of Chi-square, and considering how the two analyses differ in the information provided. Practice Problem 4.5B demonstrates the correspondence of ANOVA to a two-sided two-sample t-test, when only two groups are being compared, and is worth highlighting. An interesting in-class experiment to consider in the section on ANOVA is the melting time of different types of chips (e.g., milk chocolate vs. peanut butter vs. semi-sweet), especially considering each person as a blocking factor (if you interested in briefly discussing "two-way" ANVOA). You might also consider at least demonstrating multiple comparison procedures to your students. (The confidence interval checkbox in the Comparing Groups applet apply 95% confidence intervals but using the pooled standard deviation.)


Section 3: Relationships Between Quantitative Variables

Timing/materials: Technology is used for basic univariate and bivariate graphs and numerical summaries in Investigation 4.6 (CatJumping.txt). Technology is used to calculate correlation coefficients in Investigation 4.7 (golfers.txt). These two investigations may take about 45 minutes. The applet exploration revolves around the Guess the Correlation applet and will take 10-15 minutes. Investigation 4.8 uses a new version of the Analyzing Two Quantitative Variables (javascript) applet and at the end shows them how to determine a regression equation using technology (HeightFoot.txt) and can take upwards of 60 minutes. An applet exploration also uses this applet to explore the resistance of least squares regression lines and influential observations. Investigation 4.9 also involves technology (movies03.txt) and may take 30 minutes.

This section presents tools for numerical and graphical summaries in the setting of two quantitative variables. Here we are generally less concerned about the type of study used. The next section will focus on inference for regression.

Investigation 4.6 focuses on using technology to create scatterplots and then introducing appropriate terminology for describing them.

Investigation 4.7 uses data from the same source (PGA golfers) to explore varying strengths of linear relationships and then introduces the correlation coefficient as a measure of that strength. One thing to be sure that students understand is that low scores are better than high scores in golf; similarly a smaller value for average number of putts per hole is better than a larger value, but some other variables (like driving distance) have the property that higher numbers are generally considered better. Discussion in this investigation includes how the points line up in different quadrants as a way of visualizing the strength of the linear relationship. Question (i) is a particularly good one to give students a few minutes to work through on their own in collaborative groups. Students should also be able to describe properties of the formula for r (when positive, negative, maximum and minimum values, etc.); in fact, our hope in (k)-(n) is that students can quickly tell you these properties rather than you telling them. Students apply this reasoning to order several scatterplots in terms of strength and then use technology to verify their ordering.

If you want students to have more practice in estimating the size of the correlation coefficient from a scatterplot, the Guess the Correlation Applet Exploration generates random scatterplots, allows students to specify a guess for r and then shows them the actual value. The applet keeps track of their guesses over time (to see if they improve) as well as the guesses vs. actual and errors vs. actual to see which values of r were easier to identify (e.g., closer to -1 and 1). Questions (g)-(i) also get students to think a bit about the meaning of r. Students often believe they are poor guessers and that the correlation between their guesses and the actual values of r will be small. They are often surprised at how large this correlation is, but should realize that this will happen as long as they can distinguish positive and negative correlations and that they may find a high correlation if they guess wrongly in a consistent manner.

Practice Problem 4.7A is a very quick test of students' understanding; question (b) in particular confuses many students. You will also want to continually remind students that r measures the amount of the linear association (e.g., you could jump ahead to the Walmart data and explore the correlation of the number of SuperCenters vs. time).

Investigation 4.8 steps students through a development of least squares regression. Starting after (g), they use a javascript applet with a moveable line feature to explore "fitting the best line" and realize that finding THE best line is nontrivial and even ambiguous, as there are many reasonable ways to measure "fit." We emphasize the idea of a residual, the vertical distance between a point and the line, as the foundation for measuring fit, as prediction is a chief use of regression. In question (o) we briefly ask students to consider the sum of absolute residuals as a criterion, and then we justify using SSE as a measure of the prediction errors. In questions (k)-(m) many students enjoy the competitive aspect of trying to come up with better and better lines according to the two criteria. Students can then use calculus to derive the least squares estimators directly in (t) and (u). Questions (u) and (w) develop the interpretation of the slope coefficient and question (y) focuses on the intercept. Question (z) warns them about making extrapolations from the data. The applet is then used in questions (aa) and (bb) to motive the interpretation of r2. Once the by-hand derivation of the least squares estimates are discussed, instructions are given for obtaining them in Minitab/R. The applet exploration allows students to investigate resistance properties of the least squares lines and the idea of influential observations and how to identify potentially influential observations. (This applet can also be used to obtain basic regression output.) The Excel Exploration also allows them to explore properties of the sum of absolute errors and the corresponding "best file" line.

Investigation 4.9 provides practice in determining and interpreting regression coefficients with the additional aspect, which students often find interesting, of comparing the relationship across different types of movies, although the data is getting a bit dated.

Section 4: Inference for Regression

Timing/materials: Investigation 4.10 revolves around the  Sampling Regression Lines applet (this is really the same applet but with the "Create Population" box checked by default) and takes 35-45 minutes. This simulation approach is then compared in Investigation 4.11 to reshuffling with the Analyzing Two Quantitative Variables (javascript) applet. (Timing will depend on whether you are primarily demonstrating the results or letting students explore). Investigation 4.12 introduces the basic regression model assumptions which are then applied in Investigation 4.13 returning to the CatJumping data. This investigation can also be used to explore confidence vs. prediction intervals. The need for and use of transformations are now explored in Investigation 4.14 (housing.txt).

Investigation 4.10 follows the strategy that we have used throughout the course: taking repeated random samples from a finite population in order to examine the sampling distribution of the relevant sample statistic. We ask students to use an applet to select random samples from a hypothetical population matching the characteristics of the 5K run setting that follows the basic regression model, but where the population has been chosen so that the correlation (and therefore the slope) between time and age is zero. The goal of the applet is for students to visualize sampling variability with regression slopes (and lines) as well as the empirical sampling distribution of the sample slopes. This process should feel very familiar to students at this point, although you should be aware that it feels different to some students because they are watching sample regression lines change rather than seeing simpler statistics such as sample proportions or sample means change. Students also explore the effects of sample size, variability in the explanatory variable, and variability about the regression line on this sampling distribution. This motivates the formula for the standard error of the sample slope. It is interesting to help students realize that when choosing the x values, as in an experiment, more variability in the explanatory variable is preferred, a sometimes counter-intuitive result for them. Students should also note the symmetry of the sampling distribution of sample slope coefficients and believe that a t-distribution will provide a reasonable model for the standardized slopes using an estimate for the standard deviation about the regression line. Students calculate the corresponding t-statistic for the 5K data by hand which can be confirmed with technology in (z). Investigation 4.11 uses a different approach for the simulation, a randomization test approach which scrambles the response variable values. Students may find it interesting to compare these approaches. The implications are not substantial but they may also be able to talk about how the standard errors measure slightly different types of randomness.

Investigation 4.12 begins by having students consider the "ideal" setting for such inferences – normal populations with equal variance that differ only in their means that follow a linear pattern with the explanatory variable. We especially advocate the LINE mnemonic. Residual plots are introduced as a method for checking the appropriateness of this basic regression model. Investigation 4.13 then applies this model to the cat jumping data, including confidence intervals for the population slope and prediction vs. confidence intervals (and the distinction between them, for which you can draw the connection to univariate prediction intervals from Chapter 3) for individual values. Minitab provides for nice visuals for these latter intervals. The bow-tie shape they saw in the applet is also a nice visual here for justifying the "curvature" seen especially in prediction intervals.

Investigation 4.14 finds problems with the residual analysis and explores a transformation (recalling) for addressing the conditions. Students should realize that additional steps can be taken when the conditions are not met and we try not to get too bogged down at this time in interpreting the transformation. The Technology Exploration introduces students to the "regression effect." There is a nice history to this feature of regression and it also provides additional cautions to students about drawing too strong of conclusions from their observations (e.g., "regression to the mean"). We often supplement this discussion with excerpts from the January 21, 2001 Sports Illustrated article on the cover jinx. "It was a hoot to work on the piece. On the one hand, we listened as sober statisticians went over the basics of 'regression to the mean,' which would explain why a hitter who gets hot enough to make the cover goes into a slump shortly thereafter."


This chapter includes four worked-out examples. Each of the first three deals with one of the three main methods covered in this chapter: chi-square tests, ANOVA, and regression. The fourth example analyzes data from a diet comparison study, where we ask several questions and expect students to first identify which method applies to a given question. Again we encourage students to answer the questions and analyze the data themselves before reading the model solutions.


At the end of this chapter, students will most need guidance on when to use each of the different methods. The table may be useful but students will also need practice identifying the proper procedure merely from a description of the study design and variables. We also like to remind students to be very conscious of the technical conditions underlying each procedure and that they must be checked and commented on in any analysis.