*INVESTIGATING STATISTICAL CONCEPTS, APPLICATIONS, AND METHODS,
Second Edition*

**NOTES FOR INSTRUCTORS**

**April, 2015**

Chapter 1 Chapter 2 Chapter 3 Chapter 4

**CHAPTER 4**

This chapter again extends the lessons students have learned in earlier chapters to additional methods involving more than one variable. The material focuses on the new methods (e.g., chi-square tests, ANOVA, and regression) and follows the same progression as earlier topics: starting with exploring appropriate numerical and graphical summaries, proceeding to use simulations to construct empirical sampling/randomization (reference) distributions, and then considering probability models for drawing inferences. The material is fairly flexible if you want to pick and choose topics. Fuller treatments of these methods would require a second course.

**Section 1: Two Categorical Variables **

In this section with start with a *k* x 2 case and then expand to
larger tables. We again start with a simulation and then apply the Chi-Square
distribution and more traditional output.

**Investigation 4.1: Dr. Spock's Trial**

*Timing/materials*: Technology is used to create empirical sampling
distributions (based on the binomial sampling model) in Investigation 4.1.
There is a Minitab macro (SpockSim.mac) and an R script (Spock.R) that can be
used. Or you can use the Analyzing Two-way
Tables (javascript) applet. This investigation will take approximately
50-60 minutes.

In Investigation 4.1 you may want to spend a bit of time on the background process of the jury selection before students analyze the data. We also encourage you again to stop and discuss the numerical and graphical summaries that apply to these sample data (question (a)) before proceeding to inferential statements, as well as reconsidering the constant theme – could differences this large have plausibly occurred by chance alone? Once you get to inference, the key change is needing a more complicated test statistic (part h) and you may want to ask students to spend more time on that, even developing their own statistics (e.g., sum of all pairwise differences). After question (h), you will also want to explain the logic behind the chi-square statistic – every cell gives a positive contribution to the sum but it is also scaled in a way by the size of the cell's expected cell count. If students understand the formula, question (j) should be straightforward to them. After thinking about the behavior based on the formula, we then use simulation as a way of judging what values of the statistic should be considered large enough to be unusual and also to see what probability distribution might approximate the sampling distribution of the test statistic. Because we are simulating the drawing of independent random samples from several populations, we use the binomial distribution, as opposed to treating both margins as fixed like we did in Chapter 2. This is done by the R and Minitab macros. Alternatively, the two-way table applet shuffles the response variable values randomly among the explanatory variable values. (But note the typo in how to enter the data into the applet, just enter the two-way table of counts.) You can also use the applet to explore other possible statistics such as the MAD and max – min.

[Both R and Minitab simulations randomly generate male/female counts for each judge's panel by sampling from a binomial distribution with ni equal to judge i's sample size and assuming p = .261 for the probability of a juror being female for each judge. Minitab stores the results totals for each gender are stored in C2-C8. Then the expected counts are computed based on the simulated total number of males and females and the "observed" counts are compared to the excepted counts. In Minitab, column 10 contains the 14 terms of the Chi-Square sum and the resulting Chi-Square statistic for each sample are stored in C11. Make sure students using Minitab remember to save both the .mac file and an empty worksheet file in the same folder and students using R define the variables they want to store results in (e.g., mychisq=0).]

[Note: Output windows can be enlarged to improve layout. When you check the Show X2 output box, the theoretical curve also overlays on the empirical randomization distribution. Using R or Minitab will allow you to change the df to try different theoretical models. In the applet, you can also toggle which column is the explanatory and which is the response, as well as which outcome is defined as success. When you enter larger tables, the MAD statistic is only calculated using one row as success, though you can select different rows.]

The simulation results are also used to help students see that the normal distribution does not provide a reasonable model of the empirical sampling distribution of this test statistic. We do not derive the chi-square distribution but do use probability plots to show that the Gamma distribution, of which the chi-square distribution is a special case, is appropriate (questions (p) and (q)). Again, we want them to realize that the probability model applies no matter the true value of the common population proportion p. We also encourage them to follow-up a significant chi-square statistic by seeing which cells of the table contribute the most to the chi-square sum as a way of further defining the source(s) of discrepancy (questions (s) and (t)). A practice problem on Type I Errors is used to further motivate the use of this "multi-sided" procedure for checking the equality of all population proportions simultaneously.

**Investigation 4.2: Night Lights and Near-Sightedness (cont.)**

*Timing/materials*: You may want to get students that realize that
the randomization modeled here is different from Investigation 4.1, though we
end up with the same theoretical model. You can also focus more on using
Minitab and R or the applet to generate the chi-square analysis. With this
approach, Investigation 2 and 3 and be completed rather quickly in 50-60
minutes.

Investigation 4.2 provides an application of the chi-square procedure but in the case of a cross-classified study. You might want to start by asking them what the segmented bar graph would have looked like if there was no association between the two variables. The "no association model" can also be simulated through a randomization test building on earlier simulations with quantitative response data. [See Technology Exploration: A macro is available for performing this simulation in Minitab (twoway.mac), just remember to save the macro file and the data file in the same folder. The Minitab macro randomizes the assignment of subjects to the lighting groups (assuming no relationship between eye condition and lighting). Then the macro creates the new two-way table (storing those results in columns C10-C12) and calculates the Chi-Square test statistic for each simulated table (storing the results in C20). You might want to ask students how they would modify this macro for a 4 × 4 table. A data file is available for importing the raw data into R. You can also use the Two-way Table applet.]

You may also wish to give students additional practice in applying the procedure, in addition to the practice problems, especially in distinguishing these different situations, (e.g., reminding them of the different data collection scenarios, the segmented bar graphs, the form of the hypotheses – comparing more than 2 population proportions, comparing population distributions on a categorical variable, association between categorical variables).

**Investigation 4.3: Newspaper Credibility Decline**

Investigation 4.3 focuses on the correspondence between the chi-square procedure for a 2 × 2 table and the two-sample z-test with a two-sided alternative.

**Section 2: Comparing Several Population Means**

*Timing/materials*: Minitab is used for descriptive statistics and
randomization distributions using R or Minitab (DisabilityEmployment.txt)
with a Minitab macro (RandomDisability.mac or the new
Comparing
Groups (Quantitative) javascript applet. See solutions for possible R
code in Investigation 4.4. Technology is used again at the end of the
investigation to carry out the ANOVA. Technology can be used briefly in
Investigation 4.5 to calculate a p-value from the *F* distribution. The
ANOVA simulation applet is also used heavily. This section should take about 65
minutes.

The focus on this section is on comparing two or more population means (or treatment means). You may want to cast this as the association between one categorical and one quantitative variable to parallel the previous section (though some suggest only applying this description with cross-classified studies). Again, we do not spend a large amount of time developing the details, seeing these analyses as straight forward implementations of previous tools with slight changes in the details of the calculation of a test statistic. We hope that students are well-prepared at this point to understand the reasoning behind the big idea of comparing within-group to between-group variation, but you might want to spend some extra time on this principle. You will also want to focus on emphasizing all the steps of a statistical analysis (examination of study design, numerical and graphical summaries, and statistical inference including defining the parameters of interest, stating the hypotheses, commenting on the technical conditions, calculation of test statistic and p-value, making a decision about the null hypothesis, and then finally stating an overall conclusion that touches on each of the issues).

**Investigation 4.4** steps students through the calculations
and comparison of within group and between group variability and uses a technology
simulation to examine the empirical sampling distribution of the test statistic
(question q). If you ask students to develop these simulations themselves they
should get there but it may take a while. Question (o) is a key one for
assessing whether students understand the basic principle. More details are
supplied in the terminology detour and general technology instructions for
carrying out an ANOVA analysis.

An applet exploration at the end of the Investigation steps them through using the javascript applet.

In **Investigation 4.5**, students initially practice
calculating the F-statistic by hand. Another applet (ANOVA Simulation)
is used to explore the effects of sample size, size of the difference in
population means, and the common population variance on the ANOVA table and
p-value. We have tried to use values that allow sufficient sensitivity in the
applet to see some useful relationships. It is interesting for students to see
the variability in the F-statistic and p-value from sample to sample both when
the null hypothesis is true and when it is false. An interesting extension
would be to collect the p-values from different random samples and examine a
graph of their distribution, having students conjecture on its shape first.

Practice Problem 4.5A is a particularly interesting follow-up question, re-analyzing the Spock trial data using ANOVA instead of Chi-square, and considering how the two analyses differ in the information provided. Practice Problem 4.5B demonstrates the correspondence of ANOVA to a two-sided two-sample t-test, when only two groups are being compared, and is worth highlighting. An interesting in-class experiment to consider in the section on ANOVA is the melting time of different types of chips (e.g., milk chocolate vs. peanut butter vs. semi-sweet), especially considering each person as a blocking factor (if you interested in briefly discussing "two-way" ANVOA). You might also consider at least demonstrating multiple comparison procedures to your students. (The confidence interval checkbox in the Comparing Groups applet apply 95% confidence intervals but using the pooled standard deviation.)

**Section 3: Relationships Between Quantitative Variables**

*Timing/materials*: Technology is used for basic univariate and bivariate
graphs and numerical summaries in Investigation 4.6 (CatJumping.txt).
Technology is used to calculate correlation coefficients in Investigation 4.7 (golfers.txt). These two investigations may take about 45
minutes. The applet exploration revolves around the Guess the Correlation
applet and will take 10-15 minutes. Investigation 4.8 uses a new version of the
Analyzing Two
Quantitative Variables (javascript) applet and at the end shows them
how to determine a regression equation using technology (HeightFoot.txt)
and can take upwards of 60 minutes. An applet exploration also uses this applet
to explore the resistance of least squares regression lines and influential
observations. Investigation 4.9 also involves technology (movies03.txt)
and may take 30 minutes.

This section presents tools for numerical and graphical summaries in the setting of two quantitative variables. Here we are generally less concerned about the type of study used. The next section will focus on inference for regression.

**Investigation 4.6** focuses on using technology to create
scatterplots and then introducing appropriate terminology for describing them.

**Investigation 4.7** uses data from the same source (PGA
golfers) to explore varying strengths of linear relationships and then
introduces the correlation coefficient as a measure of that strength. One thing
to be sure that students understand is that low scores are better than high
scores in golf; similarly a smaller value for average number of putts per hole
is better than a larger value, but some other variables (like driving distance)
have the property that higher numbers are generally considered better.
Discussion in this investigation includes how the points line up in different
quadrants as a way of visualizing the strength of the linear relationship.
Question (i) is a particularly good one to give students a few minutes to work
through on their own in collaborative groups. Students should also be able to
describe properties of the formula for r (when positive, negative, maximum and
minimum values, etc.); in fact, our hope in (k)-(n) is that students can
quickly tell you these properties rather than you telling them. Students apply
this reasoning to order several scatterplots in terms of strength and then use
technology to verify their ordering.

If you want students to have more practice in estimating the size of the correlation coefficient from a scatterplot, the Guess the Correlation Applet Exploration generates random scatterplots, allows students to specify a guess for r and then shows them the actual value. The applet keeps track of their guesses over time (to see if they improve) as well as the guesses vs. actual and errors vs. actual to see which values of r were easier to identify (e.g., closer to -1 and 1). Questions (g)-(i) also get students to think a bit about the meaning of r. Students often believe they are poor guessers and that the correlation between their guesses and the actual values of r will be small. They are often surprised at how large this correlation is, but should realize that this will happen as long as they can distinguish positive and negative correlations and that they may find a high correlation if they guess wrongly in a consistent manner.

Practice Problem 4.7A is a very quick test of students' understanding; question (b) in particular confuses many students. You will also want to continually remind students that r measures the amount of the linear association (e.g., you could jump ahead to the Walmart data and explore the correlation of the number of SuperCenters vs. time).

**Investigation 4.8** steps students through a development of
least squares regression. Starting after (g), they use a javascript applet with
a moveable line feature to explore "fitting the best line" and
realize that finding THE best line is nontrivial and even ambiguous, as there
are many reasonable ways to measure "fit." We emphasize the idea of a
residual, the vertical distance between a point and the line, as the foundation
for measuring fit, as prediction is a chief use of regression. In question (o)
we briefly ask students to consider the sum of absolute residuals as a
criterion, and then we justify using SSE as a measure of the prediction errors.
In questions (k)-(m) many students enjoy the competitive aspect of trying to
come up with better and better lines according to the two criteria. Students
can then use calculus to derive the least squares estimators directly in (t)
and (u). Questions (u) and (w) develop the interpretation of the slope
coefficient and question (y) focuses on the intercept. Question (z) warns them
about making extrapolations from the data. The applet is then used in questions
(aa) and (bb) to motive the interpretation of r2. Once the by-hand derivation
of the least squares estimates are discussed, instructions are given for
obtaining them in Minitab/R. The applet exploration allows students to investigate
resistance properties of the least squares lines and the idea of influential
observations and how to identify potentially influential observations. (This
applet can also be used to obtain basic regression output.) The Excel
Exploration also allows them to explore properties of the sum of absolute
errors and the corresponding "best file" line.

**Investigation 4.9** provides practice in determining and
interpreting regression coefficients with the additional aspect, which students
often find interesting, of comparing the relationship across different types of
movies, although the data is getting a bit dated.

**Section 4: Inference for Regression**

*Timing/materials*: Investigation 4.10 revolves around the Sampling
Regression Lines applet (this is really the same applet but with the
"Create Population" box checked by default) and takes 35-45 minutes.
This simulation approach is then compared in Investigation 4.11 to reshuffling
with the Analyzing
Two Quantitative Variables (javascript) applet. (Timing will depend on
whether you are primarily demonstrating the results or letting students
explore). Investigation 4.12 introduces the basic regression model assumptions
which are then applied in Investigation 4.13 returning to the CatJumping data.
This investigation can also be used to explore confidence vs. prediction
intervals. The need for and use of transformations are now explored in
Investigation 4.14 (housing.txt).

**Investigation 4.10 **follows the strategy that we have used
throughout the course: taking repeated random samples from a finite population
in order to examine the sampling distribution of the relevant sample statistic.
We ask students to use an applet to select random samples from a hypothetical
population matching the characteristics of the 5K run setting that follows the
basic regression model, but where the population has been chosen so that the
correlation (and therefore the slope) between time and age is zero. The goal of
the applet is for students to visualize sampling variability with regression
slopes (and lines) as well as the empirical sampling distribution of the sample
slopes. This process should feel very familiar to students at this point,
although you should be aware that it feels different to some students because
they are watching sample regression lines change rather than seeing simpler
statistics such as sample proportions or sample means change. Students also
explore the effects of sample size, variability in the explanatory variable,
and variability about the regression line on this sampling distribution. This
motivates the formula for the standard error of the sample slope. It is
interesting to help students realize that when choosing the x values, as in an
experiment, more variability in the explanatory variable is preferred, a
sometimes counter-intuitive result for them. Students should also note the
symmetry of the sampling distribution of sample slope coefficients and believe
that a t-distribution will provide a reasonable model for the standardized
slopes using an estimate for the standard deviation about the regression line.
Students calculate the corresponding t-statistic for the 5K data by hand which
can be confirmed with technology in (z). **Investigation 4.11**
uses a different approach for the simulation, a randomization test approach
which scrambles the response variable values. Students may find it interesting
to compare these approaches. The implications are not substantial but they may
also be able to talk about how the standard errors measure slightly different
types of randomness.

**Investigation 4.12 **begins by having students consider the
"ideal" setting for such inferences – normal populations with equal
variance that differ only in their means that follow a linear pattern with the
explanatory variable. We especially advocate the LINE mnemonic. Residual plots
are introduced as a method for checking the appropriateness of this basic
regression model. **Investigation 4.13 **then applies this model
to the cat jumping data, including confidence intervals for the population
slope and prediction vs. confidence intervals (and the distinction between
them, for which you can draw the connection to univariate prediction intervals
from Chapter 3) for individual values. Minitab provides for nice visuals for
these latter intervals. The bow-tie shape they saw in the applet is also a nice
visual here for justifying the "curvature" seen especially in
prediction intervals.

**Investigation 4.14** finds problems with the residual
analysis and explores a transformation (recalling) for addressing the
conditions. Students should realize that additional steps can be taken when the
conditions are not met and we try not to get too bogged down at this time in
interpreting the transformation. The **Technology Exploration**
introduces students to the "regression effect." There is a nice
history to this feature of regression and it also provides additional cautions
to students about drawing too strong of conclusions from their observations
(e.g., "regression to the mean"). We often supplement this discussion
with excerpts from the January 21, 2001 *Sports Illustrated* article on
the cover jinx. "It was a hoot to work on the piece. On the one hand, we
listened as sober statisticians went over the basics of 'regression to the
mean,' which would explain why a hitter who gets hot enough to make the cover
goes into a slump shortly thereafter."

**Examples**

This chapter includes four worked-out examples. Each of the first three deals with one of the three main methods covered in this chapter: chi-square tests, ANOVA, and regression. The fourth example analyzes data from a diet comparison study, where we ask several questions and expect students to first identify which method applies to a given question. Again we encourage students to answer the questions and analyze the data themselves before reading the model solutions.

**Summary**

At the end of this chapter, students will most need guidance on when to use each of the different methods. The table may be useful but students will also need practice identifying the proper procedure merely from a description of the study design and variables. We also like to remind students to be very conscious of the technical conditions underlying each procedure and that they must be checked and commented on in any analysis.