Chapter 4

Chapter 4

This chapter continues the theme of Chapter 3, the behavior of random samples from a population and how knowledge of that behavior allows us to make inferences about the population. Most of the chapter, beginning with Section 4.3, is devoted to models that apply when large samples are selected, namely the normal and t distributions. We begin with some background on probability models in general and the normal distribution in particular. Then we focus on the Central Limit Theorem for both categorical (binary) and quantitative data, leading students to discover the need for the t distribution when drawing inferences about the population mean. The last section, on bootstrapping, provides alternative inferential methods when the Central Limit Theorem does not apply (e.g., small sample, sample statistics other than sample proportions or means).

Section 4.1: Models of Quantitative Data

Timing/Materials: Heavy use of Minitab (including features new to version 14) is used in Investigation 4.1.1 and 4.1.2. You may want to assign some of the reading (e.g., p. 282-3) to outside of class. Probability plots (Investigation 4.1.2) may not be on your syllabus, but we ask students to use these plots often and so we do not recommend skipping them. This section can probably be covered in 60-75 minutes.

In this section we try to convey the notion of a model, in particular, probability models for quantitative variables. Investigation 4.1.1 introduces the idea that very disparate variables can follow a common (normal) model (with different parameter values). We do not spend a long time on nonnormal models (e.g., exponential, gamma) but feel students should get a flavor for nonsymmetric models as well and realize that the normal model does not apply to all variables. The subsequent practice problems lead students to overlay different model curves on data histograms. (Minitab 14 automatically scales the curve and thus we do not have them convert the histogram to the density scale first.).

In Investigation 4.1.2, probability plots are introduced as a way to help assess the fit of a probability model to data. There is some debate on the utility of probability plots, but we feel they provide a better guide than simple histograms for judging the fit of a model, especially for small data sets. Still, it can take students a while to become comfortable reading these graphs. We attempt to focus on interpreting these plots by looking for a linear pattern and do not ask students to learn the mechanics behind the construction of the graphs. We use questions (h)-(j) to help them gain some experience in judging the behavior of these graphs when the data are known to come from a normal distribution; many students are surprised at how much variation arises in samples, and therefore in probability plots, even when the population really follows a normal distribution. Some nice features in Minitab 14 make it easy to quickly change the model that is being fit to the data (both in overlaying the curve on the histogram and in the probability plot). If you are very short on time, Investigation 4.1.2 could be skipped but we will make use of probability plots in later chapters.

Section 4.2: Applying the (Normal) Probability Model

Timing/Materials: Minitab is used extensively in Investigations 4.2.1 and 4.2.2. Investigation 4.2.3 centers around a java applet which has the advantage of supplying the visual image of the normal model. You may wish to begin with Minitab until students are comfortable drawing their own sketches and thinking carefully about the scaling and the labeling of the horizontal axis. This section probably requires at least 90 minutes of class time (less if your students have seen normal probably calculations previously).

In Investigation 4.2.1, the transition is made to using the theoretical models to make probability statements. The last box on p. 290 will be an important one to emphasize. We immediately turn the normal probability calculation over to Minitab and do not use a normal probability table at all. (In fact, there is no normal probability table at all in the book as we do not feel that learning to use a table is necessary when students can use a software package or java applet (or even a graphing calculators) to perform the calculations quite efficiently. This also has implications for the testing environment.) We emphasize to students that it is important to continue to accompany these calculations with well-labeled sketches of probability curves and to distinguish between the theoretical probability and the observed proportion of observations in sample data. By the end of Investigation 4.2.1, we would like students to be comfortable applying a model to a situation where they don’t have actual observations. Such calculations are made in Practice Problems 4.2.1 and 4.2.2, including practice with elementary integration techniques and simple geometric methods for finding areas under probability “curves.” You could supplement this material with more on non-normal probability models, and you could make more use of calculus if you would like. In particular, you will find many of the exercises at the end of the chapter that explore more mathematical ideas, many involving use of calculus.

We continue to apply the normal probability model to real sample data in Investigation 4.2.2 and you will want to make sure students are becoming comfortable with the notation and Minitab. On p. 294, we discuss the complement rule for these continuous distributions and you will want to highlight this compared to the earlier adjustments for discrete distributions (once students “get” the discrete adjustment, they tend to over apply it). Beginning with question (i), this investigation also tries to motivate the correspondence between the probabilities calculated in terms of X from a N(m, s) distribution and in terms of Z from the Standard Normal distribution. This conversion may not seem meaningful to students at first (both for the ability to convert the measurements to the same scales and since we are not having them look the z-score up on a table) so you will want to remind them of the utility of reporting the z-value (presenting values on a common scale of standard deviations away from the mean, which enables us to “compare apples and oranges”). In using Minitab, most students will prefer using the menus but it may be worth highlighting some of the Session command short cuts as well. We have attempted to step students through the necessary normal probability calculations (including inverse probability calculations) but afterwards you will want to highlight the different types of problems and how they can recognize what is being asked for in a particular problem.

Exploration 4.2 provides more practice but using the java applet and could be completed by students outside of class. (If you use the latter option, be sure to clarify how much output you want them to turn in). You will want to make sure students are comfortable with the axis scales (note the applet reports both the x values and the z values) and in interpreting the probability that is reported. This investigation also introduces “area between” calculations and provides the justification of the empirical rule that students first saw in Chapter 2.

Section 4.3: Distributions of Sample Counts and Proportions

Timing/Materials: This section covers many important, and difficult for students, ideas related to the sampling distribution of a sample proportion. It introduces students to the normal approximation to the binomial distribution and to z-tests and z-intervals for a proportion. For Investigation 4.3.1, you will want to bring in Reese’s Pieces candies. You may be able to find the individual bags (“fun size”) or you may have to pour from a larger bag to each individual student. This takes some time in class but is always a student favorite. We often pour candies into Dixie cups prior to the start of class to help minimize the distribution time. We aim for at least 25 candies in each cup, and then ask students to select the first 25 “at random” (without regard to color). You can try to give these instructions before they have read too much about the problem context. Also in Investigation 4.3.1, they quickly turn to a java applet to take many more samples. Investigations 4.3.2 and 4.3.3 might actually be good ones to slowly step students through without the “distractions” of technology. Investigation 4.3.4 assumes students will use technology to calculate probabilities and you will want results from the earlier analysis of this study on hand. Similarly, technology is assumed for probability calculations in Investigation 4.3.5 including the 1 Proportion menu in Minitab. Exploration 4.3 involves the confidence interval simulation applet. Students can work through this together in pairs outside of class but you will want to insist on time in class for debriefing of their observations (and/or collection of written observations). You will want to carry out the “which prefer to hear first” survey in Investigation 4.3.6 to obtain the results for your students, possibly ahead of time. This investigation also requires quick use/demonstration of the confidence interval simulation applet. This section could take 3 hours of class time.

In Investigation 4.3.1, we first return to some very basic questions about sampling variability. Hopefully these questions will feel like review for the students but we think it is important to think carefully about these issues and to remind them of the terminology and of the idea of sampling variability. In (d), we often ask students to create a dotplot on the board, but you could alternatively type their results into Minitab and then project the graph to the class. Weaker students can become overwhelmed by the reliance on mathematical notation at this point and you will want to keep being explicit about what the symbols represent. In the investigation they are asked to think about the shape, center, and spread of the sampling distribution of sample proportions as well as using the applet to confirm the empirical rule. You should frequently remind students that the “observational units” are the samples here. They also think about how the sample size and the probability of success, p, affect the behavior of the sampling distribution. At this point students should not be surprised that a larger sample produces less variability in the resulting sample proportions.

Investigation 4.3.2 steps them through the derivations of the mean and standard deviation of the sampling distribution of a sample proportion including introduction to and practice for rules of expectation and variance. Mostly you will want to highlight how these expressions depend on n and p, and that the normal shape depends on both how large the sample size is and how extreme (close to 0 or 1) the success probability is. This (p. 310) is the first time students are introduced to the phrase “technical conditions” that will accompany all subsequent inferential procedures discussed in the course. You will probably have to give some discussion on why the normal approximation is useful since they already have used the binomial and hypergeometric “exact” distributions to make inferences. You might want to say that the normal distribution is another layer of approximation, just as the binomial approximates the hypergeometric in sampling from a large population. You might also highlight the importance of the normal model before the advent of modern computing. You will want to make sure everyone is comfortable with the calculations on p. 310, where all of the pieces are put together. Practice Problems 4.3.1 and 4.3.2 provide more practice doing these calculations and practice 4.3.3 is an optional exercise introducing students to continuity corrections.

Investigation 4.3.3 refers to the context of a statistical investigation and students must consider hypothesis statements and p-values, as they have before, but now using the normal model to perform the (approximate) calculations. You will want to emphasize that the reasoning process is the same. Some students will want to debate the “logic” of this context (for example, assuming that the proportion of women among athletes should be the same as the proportion of women among students, and the idea that the data constitute a sample from a process is not straight-forward here) and you will want to be clear about what this p-value does and does not imply and that there are many other issues involved in such a legal case (e.g., surveys of student interest and demonstrable efforts to increase the participation of women are also used in determining Title IX compliance). The idea of a test statistic is formally introduced on p. 313 (one advantage to using the normal distribution) and the discussion on p. 314 tries to remind students of the different methods for finding p-values with a single categorical variable that they have encountered so far. Students should be encouraged to highlight the summary of the structure of a test of significance p. 314-5 as one they will want to return to often from this point in the course forward. You might also want to show them how this structure applies to the earlier randomization tests from Chapter 1 and 2 as well.

Investigation 4.3.4 returns to an earlier study and re-analyzes the data with the normal approximation. You will want to have the reference for the earlier binomial calculation (from Investigation 3.3.5) handy. After question (h), this investigation continues on to calculate Type I and Type II Error probabilities through the normal distribution. Some students will find this treatment of power easier to follow than the earlier use of the binomial distribution, but you will want to make sure they are comfortable with the standard structure of tests of significance before continuing to these more subtle issues. We also suggest that you draw many pictures of normal curves and rejection regions to help students visualize these ideas, as with the sketches on p. 319.

Similarly, Investigation 4.3.5 shows how much more straight-forward it is to calculate a confidence interval using the normal model (though remind them that it still represents an interval of plausible values of the parameter). Students are introduced to the terms standard error and margin of error. This would be a good place to bring in some recent news reports (or have students find and bring in) to show them how these terms are used more and more in popular media. A subtle point you may want to emphasize with students is how “margin of error” and “confidence level” measure different types of “error.” You might want to emphasize the general structure of “estimate + margin of error” or “estimate + critical value × standard error” as described in the box on p. 324, for these forms arise again (e.g., with confidence intervals for a population mean).

Exploration 4.3 should help students to further understand the proper interpretation of confidence. This exploration can be completed outside of class, but you will probably want to emphasize to students whether you consider their ability to make a correct interpretation of confidence a priority. (We often tell them in advance it will be an exam question and warn them that it will be hard to “memorize” a definition due to the length of a correct interpretation and the insistence on context, so they should understand the process.) We hope the applet provides a visual image they will be able to use for future reference, for example by showing that the parameter value does not change but what does vary is the sample result and therefore the interval. Though we do want students to understand the duality between level of significance and confidence level, we encourage you to have them keep those as separate terms. One place you can trim time is how much you focus on sample size determination calculations, which are introduced in Practice Problem 4.3.9.

Investigation 4.3.6 provides students with a scenario where the normal approximation criteria (we expect) are not met and therefore an alternative method should be considered. We present the formula for the “Wilson Estimator” and then use the applet to have them explore the improved coverage properties of the “adjusted Wald intervals.” You may want to discuss with them some of the intuitive logic of why this would be a better method (but again focus on how the idea of confidence is a statement about the method, not individual intervals). In particular, in the applet, they should see how intervals that previously had length zero (because the sample proportion was 0 or 1), now produce meaningful intervals. Some statisticians argue that this “adjusted Wald” method should always be used instead of the original Wald method, but since Minitab does not yet have this option built in, and because the results are virtually identical for large sample sizes, we tend to have students consider it separately. We also like to emphasize to students how recently (since the year 2000 or so) this method has come into the mainstream to help highlight the dynamic and evolving nature of the discipline of statistics. We also like to emphasize out to students that they have the knowledge and skills at this point to investigate how well one statistical method performs compared to another.

All of these procedures (and technology instructions) are summarized on p. 334-5, another set of pages you will want to remind them to keep handy. Let students know if you will be requiring them to carry out these calculations in other ways.

Section 4.4: Distributions of Sample Means

Timing/Materials: Investigations 4.4.1 and 4.4.2 make heavy use of Minitab (version 14) with students creating more Minitab macros. Exploration 4.4 uses applets to visually reinforce some of the material in these first two investigations while also extending them. Use of Minitab is also assumed in Investigation 4.4.4, where you might consider having students collect their own shopping data for two local stores. A convenient method is to randomly assign each student a product (with size and brand details) and then ask them to obtain the price for their product at both stores. This appears to be less inconvenient for students than asking them to find several products, but you will still want to allow them several days to collect the data. These data can then be pooled across the students to construct the full data set. The sampling frame can be obtained if you can convince one local store to supply an inventory list or you can use a shopping receipt from your family or from a student (or a sample of students). This section will probably take at least 3 hours of class time.

This section parallels the earlier discussions in Section 4.3 but considers quantitative data and therefore focuses on distributions of sample means rather than proportions. It introduces students not only to the Central Limit Theorem for a sample mean but also to t-distributions, t-tests, and t-intervals, so it includes many important ideas. Students work through several technology explorations and you will want to help emphasize the “big picture” ideas. We believe that the lessons learned should be more lasting by having students make the discoveries themselves rather than being told (e.g., this distribution will be normal). In this section, students will be able to apply many of the simulation and probability tools and habits of mind learned earlier in the course. You will of course need to keep reminding students to carefully distinguish between the population, the sample, and the sampling distribution. You may also want to emphasize in Investigations 4.4.1 and 4.4.2 that these are somewhat artificial situations in that students are asked to treat the data at hand as populations and to take random samples from them; this is done for pedagogical purposes, but in real studies one only has access to the sample at hand.

Investigation 4.4.1 gives students two different populations, one close to normal and the other sharply skewed, and asks them to take random samples and study the distributions of the resulting sample means. Students who have become comfortable with Minitab macros will work steadily through the investigation, but those who have struggled with Minitab macros will move slowly and may need some help. When running the macro on p. 338, it is helpful to execute the macro once and create the dotplots of C2 and C3. If these windows are left open (and you have the automatic graph updating feature turned on), then when you run the macro more times, Minitab (version 14) should add observations to the windows and automatically update the displays. (This might be better as a demonstration.) Once students get a feel for how the samples are changing and how the sampling distribution is being built up, closing these windows on the fly will allow the macro to run much more quickly. Make sure that students realize the differences in results between the normal-looking and the skewed populations, which they are to summarize in (k). Once students have made the observations through p. 341, they are ready for the summary, the Central Limit Theorem. We try to emphasize that there’s nothing magical about the “n>30” criterion; rather we stress that the more non-normal the population, the larger the sample size needed for the normal approximation to be accurate. You will again need to decide if you want to present them with the formula s/, and have them verify that it matches the simulation results, and/or go through the derivation themselves. It is important to again give students plenty of practice in applying the CLT to solve problems (e.g., p. 342-3).

Investigation 4.4.2 then continues to have them explore coverage properties of confidence interval procedures and to motivate the need for t intervals to replace z intervals when the population standard deviation is unknown. We think that this is a discovery that is especially effective for students to make on their own; many students are surprised in (c) to see that the normal procedure does not produce close to 95% coverage here. Many students also find that the normal probability plots in (e) are very helpful, because it’s not easy to distinguish a t- from a z-distribution based on histograms/dotplots alone. After students make these observations, we always focus on t intervals (instead of z intervals) with sample means. Again, if you are short on time, you may want to streamline some of this discussion, but we also encourage you to use it as a vehicle to review earlier topics (e.g., confidence, critical values, technical conditions). In particular, you can remind them of the commonality of the general structure of the confidence interval, estimate + margin of error, or estimate + critical value × standard error

Exploration 4.4 is useful for providing students with visual images of the intervals while exploring coverage properties and widths (as in the previous investigation). This exploration also leads students to examine the robustness of t-intervals by considering different population shapes. The second applet asks them to explore how the t-interval procedure behaves for a uniform, a normal, and an exponential population. We want them to review the behavior of the sample and the sampling distribution (and be able to predict how each will behave) and hopefully by the end be able to explain why the sample size does not need to be as large with the (symmetric!) uniform distribution versus the exponential distribution to achieve the desired coverage.

Investigation 4.4.3 is intended as an opportunity for students to apply their knowledge and to make the natural leap to the one-sample t test-statistic. This is another good study to discuss some of the data collection issues. Also, in this case, the score of an individual game might be of more interest than the population mean and so we introduce the idea of a prediction interval and the formula for quantitative data. Be ready for students to struggle with the distinction between a confidence interval and a prediction interval. We do not show them a way to obtain this calculation from Minitab (because we don’t know one!). You should also remind students that the prediction interval method is much more sensitive to the normality condition. We do summarize the t procedures and technology tools on p. 359-360. You may want to give students the option of using either Minitab or the applet to perform such calculations. The applet has the advantage of automatically providing a sketch of the sampling distribution model which we feel you should continue to require as part of the details they include in their analyses. The applet also provides the 95% confidence interval more directly. In Minitab, you must make sure the alternative is set to “not equal” to obtain a two-sided confidence interval (we do not discuss one-sided intervals in the book) but Minitab also allows you to change the confidence level.

Investigation 4.4.4 introduces paired t procedures as an application of the above methods on the differences. This is a rich investigation that first asks students to conduct some data exploration and to consider outliers. There is an obvious outlier and when students look at the Data Window they find that the products were not actually identical. They can then remove such items (any where the size/brand combination does not match exactly) from the list before the analysis continues. You might want to emphasize that this type of exploration, cleaning, and data management is a large component of statistical analyses. While summarizing this investigation, you should emphasize the advantage of using a paired design in the first place.

Section 4.5: Bootstrapping

Timing/Materials: Heavy usage of Minitab is required in this section. Some of these ideas are very difficult for students, so you may want to lead them through this section more than most. If you do not have this enough time in your course, this section can be skipped, and later topics (except for Section 5.5, which could also be skipped) do not depend on students having seen these ideas.

Many advocate bootstrapping as a more modern, flexible procedure for statistical inference when the model based methods students have seen in this chapter do not apply. They also see bootstrapping as helping students understand the intuition of repeated sampling. Furthermore, instead of assuming a normally distributed sampling distribution, bootstrapping just relies on the “model” that the sample obtained reflects the population (and in fact assumes that the population is the sample repeated infinitely many times). In our brief experience in teaching bootstrapping (as an earlier topic in the course), we found it was difficult for students to get past the “sampling with replacement” philosophy and the theoretical details in a short amount of time. We subsequently moved the bootstrapping material to the end of Chapter 4 so that students would already by comfortable with the “traditional” procedures and the idea of sampling distribution. This will help them see how the bootstrapping approach differs while hopefully having enough background to understand the overall goals.

In Investigation 4.5.1, we begin by having students apply the theoretical results to the Gettysburg Address sampling to see that the normal/t distributions are not good models for smaller sample sizes. We provide more pictures/results than usual in this section but you can have students recreate the simulations themselves. Since the “sampling with replacement” approach feels mysterious to many students, we have them take a few samples to see that some words occur more than once and that we are just using an “infinite” population to sample from that has the same characteristics as the observed sample. Then we have them verify that the bootstrap distribution has the same shape and spread as the empirical sampling distribution of means. One way to approach bootstrapping is that it provides a way to estimate the standard error of a statistic (like the median or the trimmed mean) that do not have nice theoretical results (based on rules of variance). You can either stop here or you can continue on p. 369 to apply a “pivot method” to construct a bootstrap confidence interval. The notation becomes complicated and the results are not intuitive, but do help remind students of bigger issues such as the meaning of confidence and the effect of confidence level of the width of the interval. The bootstrap procedure is applied in Investigation 4.5.2. In Investigation 4.5.3 the behavior of the trimmed mean is explored, in a context where the mean is not a reasonable parameter to study due to the skewness and the truncated nature of the data. This “strange statistic” demonstrates a key advantage of bootstrapping (as well as the beauty of the CLT when it does apply). We found the 25% trimmed mean performs reasonably well. Carrying this calculation out in Minitab is a little strange but students should understand the commands in (d).

Examples

Note that this is the first chapter to include two worked-out examples. One deals with z-procedures for a proportion and the other with t-procedures for a mean. Earlier drafts did not include these examples, and students strongly requested that some examples be included, but we have since found that our students tend not to notice or study from the examples. You might encourage students to read them carefully, and especially to answer the questions themselves first before reading the model solutions provided.

Summary

The Chapter Summary includes a table on the different one-sample procedures learned for binary and quantitative data. With these students we like to use different notation (z* vs. z₀) to help them distinguish between critical values and test statistics, often a common source of confusion.

Exercises

Issues of probability distributions (as in Sections 4.1 and 4.2) are addressed in Exercises #1-20. Issues of sampling distributions and inferences for a proportion are addressed in Exercises #21-46. Issues of sampling distributions and inferences for a mean are addressed in Exercise #42 and #47-65.