**Chapter 4**

This chapter continues the theme of Chapter 3, the behavior of
random samples from a population and how knowledge of that behavior allows us
to make inferences about the population.
Most of the chapter, beginning with Section 4.3, is devoted to models
that apply when large samples are selected, namely the normal and *t *distributions. We begin with some background on probability
models in general and the normal distribution in particular. Then we focus on the Central Limit Theorem
for both categorical (binary) and quantitative data, leading students to
discover the need for the *t*
distribution when drawing inferences about the population mean. The last
section, on bootstrapping, provides alternative inferential methods when the
Central Limit Theorem does not apply (e.g., small sample, sample statistics
other than sample proportions or means).

**Section 4.1: Models of Quantitative Data**

*Timing/Materials*: Heavy use of Minitab (including features new
to version 14) is used in Investigation 4.1.1 and 4.1.2. You may want to assign some of the reading
(e.g., p. 282-3) to outside of class.
Probability plots (Investigation 4.1.2) may not be on your syllabus, but
we ask students to use these plots often and so we do not recommend skipping
them. This section can probably be
covered in 60-75 minutes.

In this section we try to convey the notion of a model, in particular, probability models for quantitative variables. Investigation 4.1.1 introduces the idea that very disparate variables can follow a common (normal) model (with different parameter values). We do not spend a long time on nonnormal models (e.g., exponential, gamma) but feel students should get a flavor for nonsymmetric models as well and realize that the normal model does not apply to all variables. The subsequent practice problems lead students to overlay different model curves on data histograms. (Minitab 14 automatically scales the curve and thus we do not have them convert the histogram to the density scale first.).

In Investigation 4.1.2, probability plots are introduced as a way to help assess the fit of a probability model to data. There is some debate on the utility of probability plots, but we feel they provide a better guide than simple histograms for judging the fit of a model, especially for small data sets. Still, it can take students a while to become comfortable reading these graphs. We attempt to focus on interpreting these plots by looking for a linear pattern and do not ask students to learn the mechanics behind the construction of the graphs. We use questions (h)-(j) to help them gain some experience in judging the behavior of these graphs when the data are known to come from a normal distribution; many students are surprised at how much variation arises in samples, and therefore in probability plots, even when the population really follows a normal distribution. Some nice features in Minitab 14 make it easy to quickly change the model that is being fit to the data (both in overlaying the curve on the histogram and in the probability plot). If you are very short on time, Investigation 4.1.2 could be skipped but we will make use of probability plots in later chapters.

**Section 4.2: Applying the ( Normal)
Probability Model **

*Timing/Materials*: Minitab is used extensively in Investigations
4.2.1 and 4.2.2. Investigation 4.2.3
centers around a java applet which has the advantage
of supplying the visual image of the normal model. You may wish to begin with Minitab until
students are comfortable drawing their own sketches
and thinking carefully about the scaling and the labeling of the horizontal
axis. This section probably requires at
least 90 minutes of class time (less if your students have seen normal probably
calculations previously).

In Investigation 4.2.1, the transition is made to using the theoretical models to make probability statements. The last box on p. 290 will be an important one to emphasize. We immediately turn the normal probability calculation over to Minitab and do not use a normal probability table at all. (In fact, there is no normal probability table at all in the book as we do not feel that learning to use a table is necessary when students can use a software package or java applet (or even a graphing calculators) to perform the calculations quite efficiently. This also has implications for the testing environment.) We emphasize to students that it is important to continue to accompany these calculations with well-labeled sketches of probability curves and to distinguish between the theoretical probability and the observed proportion of observations in sample data. By the end of Investigation 4.2.1, we would like students to be comfortable applying a model to a situation where they don’t have actual observations. Such calculations are made in Practice Problems 4.2.1 and 4.2.2, including practice with elementary integration techniques and simple geometric methods for finding areas under probability “curves.” You could supplement this material with more on non-normal probability models, and you could make more use of calculus if you would like. In particular, you will find many of the exercises at the end of the chapter that explore more mathematical ideas, many involving use of calculus.

We continue to apply the normal probability model to real
sample data in Investigation 4.2.2 and you will want to make sure students are
becoming comfortable with the notation and Minitab. On p. 294, we discuss the complement rule for
these continuous distributions and you will want to highlight this compared to
the earlier adjustments for discrete distributions (once students “get” the
discrete adjustment, they tend to over apply it). Beginning with question (i),
this investigation also tries to motivate the correspondence between the
probabilities calculated in terms of *X*
from a N(m, s) distribution and in terms of *Z* from the Standard Normal
distribution. This conversion may not
seem meaningful to students at first (both for the ability to convert the
measurements to the same scales and since we are not having them look the *z*-score up on a table)* *so you will want to remind them of the
utility of reporting the *z*-value
(presenting values on a common scale of standard deviations away from the mean,
which enables us to “compare apples and oranges”). In using Minitab, most students will prefer
using the menus but it may be worth highlighting some of the Session command
short cuts as well. We have attempted to
step students through the necessary normal probability calculations (including
inverse probability calculations) but afterwards you will want to highlight the
different types of problems and how they can recognize what is being asked for
in a particular problem.

Exploration 4.2 provides more practice but using the java
applet and could be completed by students outside of class. (If you use the
latter option, be sure to clarify how much output you want them to turn
in). You will want to make sure students
are comfortable with the axis scales (note the applet reports both the *x* values and the *z* values) and in interpreting the probability that is
reported. This investigation also introduces
“area between” calculations and provides the justification of the empirical
rule that students first saw in Chapter 2.

**Section 4.3: Distributions of Sample Counts and Proportions**

*Timing/Materials*: This section covers many important, and
difficult for students, ideas related to the sampling distribution of a sample
proportion. It introduces students to
the normal approximation to the binomial distribution and to *z*-tests and *z*-intervals for a proportion.
For Investigation 4.3.1, you will want to bring in Reese’s Pieces
candies. You may be able to find the
individual bags (“fun size”) or you may have to pour from a larger bag to each
individual student. This takes some time
in class but is always a student favorite.
We often pour candies into *at least* 25 candies in each cup, and then ask students to select
the first 25 “at random” (without regard to color). You can try to give these instructions before
they have read too much about the problem context. Also in Investigation 4.3.1, they quickly
turn to a java applet to take many more samples. Investigations 4.3.2 and 4.3.3 might actually
be good ones to slowly step students through without the “distractions” of
technology. Investigation 4.3.4 assumes
students will use technology to calculate probabilities and you will want
results from the earlier analysis of this study on hand. Similarly, technology is assumed for
probability calculations in Investigation 4.3.5 including the 1 Proportion menu in Minitab.
Exploration 4.3 involves the confidence interval simulation applet. Students can work through this together in
pairs outside of class but you will want to insist on time in class for
debriefing of their observations (and/or collection of written
observations). You will want to carry
out the “which prefer to hear first” survey in Investigation 4.3.6 to obtain
the results for your students, possibly ahead of time. This investigation also requires quick use/demonstration
of the confidence interval simulation applet.
This section could take 3 hours of class time.

In Investigation 4.3.1, we first return to some very basic questions about sampling variability. Hopefully these questions will feel like review for the students but we think it is important to think carefully about these issues and to remind them of the terminology and of the idea of sampling variability. In (d), we often ask students to create a dotplot on the board, but you could alternatively type their results into Minitab and then project the graph to the class. Weaker students can become overwhelmed by the reliance on mathematical notation at this point and you will want to keep being explicit about what the symbols represent. In the investigation they are asked to think about the shape, center, and spread of the sampling distribution of sample proportions as well as using the applet to confirm the empirical rule. You should frequently remind students that the “observational units” are the samples here. They also think about how the sample size and the probability of success, p, affect the behavior of the sampling distribution. At this point students should not be surprised that a larger sample produces less variability in the resulting sample proportions.

Investigation 4.3.2 steps them through the derivations of
the mean and standard deviation of the sampling distribution of a sample
proportion including introduction to and practice for rules of expectation and
variance. Mostly you will want to
highlight how these expressions depend on *n*
and p, and that the normal shape depends on both
how large the sample size is and how extreme (close to 0 or 1) the success
probability is. This (p. 310) is the
first time students are introduced to the phrase “technical conditions” that
will accompany all subsequent inferential procedures discussed in the
course. You will probably have to give
some discussion on why the normal approximation is useful since they already
have used the binomial and hypergeometric “exact”
distributions to make inferences. You
might want to say that the normal distribution is another layer of
approximation, just as the binomial approximates the hypergeometric
in sampling from a large population. You
might also highlight the importance of the normal model before the advent of
modern computing. You will want to make sure everyone is comfortable with the
calculations on p. 310, where all of the pieces are put together. Practice Problems 4.3.1 and 4.3.2 provide
more practice doing these calculations and practice 4.3.3 is an optional
exercise introducing students to continuity corrections.

Investigation 4.3.3 refers to the context of a statistical
investigation and students must consider hypothesis statements and *p*-values, as they have before, but now
using the normal model to perform the (approximate) calculations. You will want to emphasize that the reasoning
process is the same. Some students will
want to debate the “logic” of this context (for example, assuming that the
proportion of women among athletes should be the same as the proportion of
women among students, and the idea that the data constitute a sample from a
process is not straight-forward here) and you will want to be clear about what
this *p*-value does and does not imply
and that there are many other issues involved in such a legal case (e.g.,
surveys of student interest and demonstrable efforts to increase the
participation of women are also used in determining Title IX compliance). The idea of a test statistic is formally
introduced on p. 313 (one advantage to using the normal distribution) and the
discussion on p. 314 tries to remind students of the different methods for
finding *p*-values with a single
categorical variable that they have encountered so far. Students should be encouraged to highlight
the summary of the structure of a test of significance p. 314-5 as one they
will want to return to often from this point in the course forward. You might also want to show them how this
structure applies to the earlier randomization tests from Chapter 1 and 2 as
well.

Investigation 4.3.4 returns to an earlier study and re-analyzes the data with the normal approximation. You will want to have the reference for the earlier binomial calculation (from Investigation 3.3.5) handy. After question (h), this investigation continues on to calculate Type I and Type II Error probabilities through the normal distribution. Some students will find this treatment of power easier to follow than the earlier use of the binomial distribution, but you will want to make sure they are comfortable with the standard structure of tests of significance before continuing to these more subtle issues. We also suggest that you draw many pictures of normal curves and rejection regions to help students visualize these ideas, as with the sketches on p. 319.

Similarly, Investigation 4.3.5 shows how much more
straight-forward it is to calculate a confidence interval using the normal
model (though remind them that it still represents an interval of plausible
values of the parameter). Students are
introduced to the terms *standard error*
and *margin of error*. This would be a good place to bring in some
recent news reports (or have students find and bring in) to show them how these
terms are used more and more in popular media.
A subtle point you may want to emphasize with students is how “margin of
error” and “confidence level” measure different types of “error.” You might
want to emphasize the general structure of “estimate __+__ margin of error”
or “estimate __+__ critical value × standard error” as described in the box
on p. 324, for these forms arise again (e.g., with confidence intervals for a
population mean).

Exploration 4.3 should help students to further understand
the proper interpretation of *confidence*.* *This exploration can be completed
outside of class, but you will probably want to emphasize to students whether
you consider their ability to make a correct interpretation of *confidence* a priority. (We often tell them in advance it will be an
exam question and warn them that it will be hard to “memorize” a definition due
to the length of a correct interpretation and the insistence on context, so
they should understand the process.) We hope the applet provides a visual image
they will be able to use for future reference, for example by showing that the
parameter value does not change but what does vary is
the sample result and therefore the interval.* *Though we do want students to understand the duality between level
of significance and confidence level, we encourage you to have them keep those
as separate terms. One place you can
trim time is how much you focus on sample size determination calculations,
which are introduced in Practice Problem 4.3.9.

Investigation 4.3.6 provides students with a scenario where
the normal approximation criteria (we expect) are not met and therefore an
alternative method should be considered.
We present the formula for the “Wilson Estimator” and then use the
applet to have them explore the improved coverage properties of the “adjusted Wald intervals.” You
may want to discuss with them some of the intuitive logic of why this would be
a better method (but again focus on how the idea of *confidence* is a statement about the *method*, not individual intervals).
In particular, in the applet, they should see how intervals that
previously had length zero (because the sample proportion was 0 or 1), now
produce meaningful intervals. Some statisticians argue that this “adjusted Wald” method should always be used instead of the original Wald method, but since Minitab does not yet have this
option built in, and because the results are virtually identical for large
sample sizes, we tend to have students consider it separately. We also like to emphasize to students how
recently (since the year 2000 or so) this method has come into the mainstream
to help highlight the dynamic and evolving nature of the discipline of
statistics. We also like to emphasize
out to students that they have the knowledge and skills at this point to
investigate how well one statistical method performs compared to another.

All of these procedures (and technology instructions) are summarized on p. 334-5, another set of pages you will want to remind them to keep handy. Let students know if you will be requiring them to carry out these calculations in other ways.

**Section 4.4: Distributions of Sample Means**

*Timing/Materials*: Investigations 4.4.1 and 4.4.2 make heavy use
of Minitab (version 14) with students creating more Minitab macros. Exploration 4.4 uses applets to visually
reinforce some of the material in these first two investigations while also
extending them. Use of Minitab is also
assumed in Investigation 4.4.4, where you might consider having students collect
their own shopping data for two local stores.
A convenient method is to randomly assign each student a product (with
size and brand details) and then ask them to obtain the price for their product
at both stores. This appears to be less
inconvenient for students than asking them to find several products, but you
will still want to allow them several days to collect the data. These data can then be pooled across the
students to construct the full data set.
The sampling frame can be obtained if you can convince one local store
to supply an inventory list or you can use a shopping receipt from your family
or from a student (or a sample of students).
This section will probably take at least 3 hours of class time.

This section parallels the earlier discussions in Section
4.3 but considers quantitative data and therefore focuses on distributions of
sample means rather than proportions. It introduces students not only to the
Central Limit Theorem for a sample mean but also to *t*-distributions, *t*-tests,
and *t*-intervals, so it includes many
important ideas. Students work through
several technology explorations and you will want to help emphasize the “big
picture” ideas. We believe that the
lessons learned should be more lasting by having students make the discoveries
themselves rather than being told (e.g., this distribution will be
normal). In this section, students will
be able to apply many of the simulation and probability tools and habits of
mind learned earlier in the course. You
will of course need to keep reminding students to carefully distinguish between
the population, the sample, and the sampling distribution. You may also want to emphasize in
Investigations 4.4.1 and 4.4.2 that these are somewhat artificial situations in
that students are asked to treat the data at hand as populations and to take
random samples from them; this is done for pedagogical purposes, but in real
studies one only has access to the sample at hand.

Investigation 4.4.1 gives students two different
populations, one close to normal and the other sharply skewed, and asks them to
take random samples and study the distributions of the resulting sample
means. Students who have become
comfortable with Minitab macros will work steadily through the investigation,
but those who have struggled with Minitab macros will move slowly and may need
some help. When running the macro on p.
338, it is helpful to execute the macro once and create the dotplots
of C2 and C3. If these windows are left
open (and you have the automatic graph updating feature turned on), then when
you run the macro more times, Minitab (version 14) should add observations to
the windows and automatically update the displays. (This might be better as a demonstration.) Once students get a feel for how the samples are
changing and how the sampling distribution is being built up, closing these
windows on the fly will allow the macro to run much more quickly. Make sure that students realize the
differences in results between the normal-looking and the skewed populations,
which they are to summarize in (k). Once
students have made the observations through p. 341, they are ready for the
summary, the Central Limit Theorem. We
try to emphasize that there’s nothing magical about the “*n*__>__30” criterion; rather we stress that the more
non-normal the population, the larger the sample size needed for the normal
approximation to be accurate. You will
again need to decide if you want to present them with the formula s/_{}, and have them verify that it matches the simulation
results, and/or go through the derivation themselves. It is important to again give students plenty
of practice in applying the CLT to solve problems (e.g., p. 342-3).

Investigation 4.4.2 then continues to have them explore
coverage properties of confidence interval procedures and to motivate the need
for *t* intervals to replace *z* intervals when the population standard
deviation is unknown. We think that this
is a discovery that is especially effective for students to make on their own;
many students are surprised in (c) to see that the normal procedure does not
produce close to 95% coverage here. Many
students also find that the normal probability plots in (e) are very helpful,
because it’s not easy to distinguish a *t*-
from a *z*-distribution based on
histograms/dotplots alone. After students make these
observations, we always focus on *t*
intervals (instead of *z* intervals)
with sample means. Again, if you are
short on time, you may want to streamline some of this discussion, but we also
encourage you to use it as a vehicle to review earlier topics (e.g., *confidence*, *critical values*, *technical** conditions*). In particular, you can remind them of the
commonality of the general structure of the confidence interval, estimate __+__
margin of error, or estimate __+__ critical value × standard error

Exploration 4.4 is useful for providing students with visual
images of the intervals while exploring coverage properties and widths (as in
the previous investigation). This exploration
also leads students to examine the robustness of *t*-intervals by considering different population shapes. The second applet asks them to explore how
the *t-*interval procedure behaves for
a uniform, a normal, and an exponential population. We want them to review the
behavior of the sample and the sampling distribution (and be able to predict
how each will behave) and hopefully by the end be able to explain why the
sample size does not need to be as large with the (symmetric!) uniform
distribution versus the exponential distribution to achieve the desired
coverage.

Investigation 4.4.3 is intended as an opportunity for
students to apply their knowledge and to make the natural leap to the
one-sample *t* test-statistic. This is another good study to discuss some of
the data collection issues. Also, in
this case, the score of an individual game might be of more interest than the population mean and so we introduce the idea of a *prediction* interval and the formula for
quantitative data. Be ready for students
to struggle with the distinction between a confidence interval and a prediction
interval. We do not show them a way to
obtain this calculation from Minitab (because we don’t know one!). You should also remind students that the
prediction interval method is much more sensitive to the normality condition.
We do summarize the *t* procedures and
technology tools on p. 359-360. You may
want to give students the option of using either Minitab or the applet to
perform such calculations. The applet
has the advantage of automatically providing a sketch of the sampling
distribution model which we feel you should continue to require as part of the
details they include in their analyses.
The applet also provides the 95% confidence interval more directly. In Minitab, you must make sure the
alternative is set to “not equal” to obtain a two-sided confidence interval (we
do not discuss one-sided intervals in the book) but Minitab also allows you to
change the confidence level.

Investigation 4.4.4 introduces paired *t* procedures as an application of the above methods on the
differences. This is a rich
investigation that first asks students to conduct some data exploration and to
consider outliers. There is an obvious
outlier and when students look at the Data Window they find that the products
were not actually identical. They can
then remove such items (any where the size/brand combination does not match
exactly) from the list before the analysis continues. You might want to emphasize that this type of
exploration, cleaning, and data management is a large component of statistical
analyses. While summarizing this
investigation, you should emphasize the advantage of using a paired design in
the first place.

**Section 4.5: Bootstrapping**

*Timing/Materials:*** **Heavy
usage of Minitab is required in this section.
Some of these ideas are very difficult for students, so you may want to
lead them through this section more than most.
If you do not have this enough time in your course, this section can be
skipped, and later topics (except for Section 5.5, which could also be skipped)
do not depend on students having seen these ideas.

Many advocate bootstrapping as a more modern, flexible procedure for statistical inference when the model based methods students have seen in this chapter do not apply. They also see bootstrapping as helping students understand the intuition of repeated sampling. Furthermore, instead of assuming a normally distributed sampling distribution, bootstrapping just relies on the “model” that the sample obtained reflects the population (and in fact assumes that the population is the sample repeated infinitely many times). In our brief experience in teaching bootstrapping (as an earlier topic in the course), we found it was difficult for students to get past the “sampling with replacement” philosophy and the theoretical details in a short amount of time. We subsequently moved the bootstrapping material to the end of Chapter 4 so that students would already by comfortable with the “traditional” procedures and the idea of sampling distribution. This will help them see how the bootstrapping approach differs while hopefully having enough background to understand the overall goals.

In Investigation 4.5.1, we begin by having students apply
the theoretical results to the Gettysburg Address sampling to see that the
normal/*t* distributions are not good
models for smaller sample sizes. We
provide more pictures/results than usual in this section but you can have
students recreate the simulations themselves. Since the “sampling with replacement” approach
feels mysterious to many students, we have them take a few samples to see that
some words occur more than once and that we are just using an “infinite”
population to sample from that has the same characteristics as the observed
sample. Then we have them verify that
the bootstrap distribution has the same shape and spread as the empirical
sampling distribution of means. One way
to approach bootstrapping is that it provides a way to estimate the standard
error of a statistic (like the median or the trimmed mean) that do not have nice theoretical results (based on rules of
variance). You can either stop here or
you can continue on p. 369 to apply a “pivot method” to construct a bootstrap
confidence interval. The notation
becomes complicated and the results are not intuitive, but do help remind
students of bigger issues such as the meaning of confidence and the effect of
confidence level of the width of the interval.
The bootstrap procedure is applied in Investigation 4.5.2. In Investigation 4.5.3 the behavior of the
trimmed mean is explored, in a context where the mean is not a reasonable
parameter to study due to the skewness and the
truncated nature of the data. This
“strange statistic” demonstrates a key advantage of bootstrapping (as well as
the beauty of the CLT when it does apply).
We found the 25% trimmed mean performs reasonably well. Carrying this calculation out in Minitab is a
little strange but students should understand the commands in (d).

**Examples**

Note that this is the first chapter to include two
worked-out examples. One deals with *z*-procedures for a proportion and the
other with *t*-procedures for a
mean. Earlier drafts did not include
these examples, and students strongly requested that some examples be included,
but we have since found that our students tend not to notice or study from the
examples. You might encourage students
to read them carefully, and especially to answer the questions themselves first
before reading the model solutions provided.

**Summary**

The Chapter Summary includes a table on the different
one-sample procedures learned for binary and quantitative data. With these students we like to use different
notation (*z** vs. *z*_{0}) to help them distinguish between critical values and
test statistics, often a common source of confusion.

Issues of probability distributions (as in Sections 4.1 and 4.2) are addressed in Exercises #1-20. Issues of sampling distributions and inferences for a proportion are addressed in Exercises #21-46. Issues of sampling distributions and inferences for a mean are addressed in Exercise #42 and #47-65.