chapter 1

INVESTIGATING STATISTICAL CONCEPTS, APPLICATIONS, AND METHODS

NOTES FOR INSTRUCTORS

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6

General Notes:

We envision the materials as being very flexible in how they are used. You may choose to lead students through many of the investigations together as a class, but we also encourage you to give students time to work through some questions on their own (or better in pairs) and then debrief with the students afterwards. If you do have students work through investigations largely on their own, it’s very important to conduct a wrap-up discussion at the end of class, and/or at the beginning of the next class, in which you make clear what the “morals” of the investigations were. In other words, summarize for students what they were supposed to have learned through the investigations and what they are responsible for knowing, making sure they are reading the additional exposition in the text as well. These wrap-up discussion times are also ideal for inviting students’ questions, because they will have wrestled with the ideas enough to know what the issues are and where their understanding is shaky. You may wish to collect students’ answers to just a few of the questions in an investigation to read over and give feedback before the next class session. The practice problems are intended to provide students with basic review and practice of the most recent ideas between class periods. This will help structure their time outside of class, and provide a way for you and the students to informally assess their understanding and provide feedback. You may choose to collect and grade these as homework problems or use them more to motivate initial discussion in the next class period. You can also consider including a “participation” component in your syllabus to include effort if your evaluation will be more informal. Solutions have been posted online. They are password protected giving the instructor the option of giving students direct access or not. These problems also work well in a course management system such as Blackboard or WebCT for more automatic submission and feedback.

You may also wish to supplement some of the material in the book, e.g., bringing in recent news articles for discussion, assigning data collection project assignments. We think students will find these investigations interesting and motivating, but there will also be time to share other examples as well. You may also wish to have students refer to the additional examples we have posted here. If you do bring in your own material, we do caution you to try to remain consistent with the text in terminology, notation, and sequencing of ideas. Some of this may be different for you as the instructor and may take a while to get used to. Keep in mind that the material you think is usually introduced at different points in the course will be coming eventually.

We have written the materials assuming students will have easy access to a computer, and we make increasing use of technology as the course progresses. We have taught the course with daily access to a computer lab, but believe it will also work with less frequent visits to a computer lab and/or more instructor demonstrations (using a computer project system). If the students do not have frequent access to computers during class, you may wish to assign more of the computer work to take place outside of class. We do provide detailed instructions for all of the computer work (Minitab, Excel, java applets), but you may still want to encourage students to work together. We have also assumed use of Minitab 14 but at the Minitab Hints page will try to outline where you will have to make adjustments to use Minitab 13. Even with heavy use of computers, it is also nice to have some days where you focus less on the computers to give students a chance to ask questions on the larger concepts in the material and even work a few calculations “by hand.” Student will use a calculator on a daily basis as well.

You might consider giving an exam after Chapter 2 and then either after Chapter 4 or 5 depending on how deep you will be able to go into the chapters.

Section 1-1: Summarizing Categorical Data

Timing: Taking roll, explaining the syllabus, telling them a bit about what the course would be like, and then going through most of the section together took about 60 minutes. Students are asked to make a graph in Excel but that can be easily moved to outside of class (perhaps after an instructor demonstration).

Some additional information about the study in Investigation 1-1:

- Appeared in August, 2002 New England Journal of Medicine

- You can also find some links on the web to the ensuing court case. A recent one may still be here or here.

- In May, 2000: eight persons who had worked at the same microwave-popcorn production plant were reported to have bronchiolitis obliterans. No recognized case was identified in the plant. Therefore, they medically evaluated current employees and assessed their occupational exposure in Nov. 2000.

- They used a combination of questionnaires and spirometric testing. They also compared information to the National Health NE Survey

- The results here focus on the results of the spirometric testing: 31 people had abnormal results, 10 with low FVC (forced vital capacity) values, 11 with airway obstruction, and 10 with both airway obstruction and low FVC.

- Diacetyl is the predominant ketone in artificial butter flavoring and was used as a marker of organic-chemic exposure

- They tested air samples and dust samples from various areas in the plant. These areas included

Plain-popcorn packaging line, bag-printing areas, warehouse, offices, outside

Quality control or maintenance

Microwave-popcorn packaging lines

Mixing room

The first group is considered “non-production” so lower exposure but they also looked at how long employees had worked in different areas to classify them as “high exposure” and “low exposure.”

In (e), get students to tell you about their description of the graph – solicit descriptions from several people. Make sure the descriptions are in context and include the comparison, but then you will probably be able to tell them that all the responses were good and that’s one distinction between statistics and other subjects, that there can be multiple correct answers. When students offer suggestions about reasons for the difference in the groups, make sure they discuss a factor that differs between the two groups. So saying “other health issues” isn’t enough, but saying that those who worked in certain areas of the plant may have different SES status than those who work in the production areas of the plant or that they may be more likely to live in the country which has different air quality, etc. Really get them to suggest the need for comparison, either to people outside the plant or to people in different areas of the plant. Also build up the idea of not just comparing the counts but converting to proportions first.

In (k), we encourage you to go through the odds ratio calculations especially. You might consider asking a student in the class to define odds, but you need to build up the odds ratio slowly and always encourage them to interpret the calculation correctly.

Page 7: It’s important that students get a chance to practice with the vocabulary soon as it is not as easily mastered as they may initially think. You especially need to watch that they state variables as variables. Too often they will want to say “lung cancer” instead of describing the entire variable. Or they will slip into summaries like “the number with lung cancer.” Or they will combine the variables and the observational units: “those with lung cancer.” We strongly recommend trying to get them to think of the variable as the question asked of each observational unit.

The practice problems are intended to get students to work more with variables and to make sure they can do the two-way table and segmented bar graphs. Much of the terminology will be unfamiliar to them or they will have other “day-to-day” definitions so it is important to “immerse” them into the vocabulary and allow them to practice it often. We suggest beginning the next class by discussing these problems, especially 1-1f and 1-2. Problem 1-3e is an important issue that may or may not come up in your initial class discussion but not everyone given the survey filled it out. You need to worry about the people who may have been home sick or even the people who became so sick they no longer work at this factory. We highly encourage you to either collect students’ work on the practice problems (reading through and commenting on their responses) and/or to briefly discuss them at the beginning of the next class period. We envision these as being a more informal, though provoking, self-check form of assessment.

We have included “section summaries” at the end of the sections in Chapter 1. This is a good place to ask students if they have questions on the material so far. You might also consider adapting these into bullet points to recap the previous class period at the start of the next class period. You may want to remind students occasionally that they should read all of the study conclusion, discussion, and summary sections carefully; some students get in the habit of working through the investigations by answering questions but do not “slow down” enough to read through and understand the accompanying exposition.

Section 1-2: Analyzing Categorical Data

Timing: Students were able to complete this section in approximately 90 minutes. You may wish to have students complete some of the Excel work outside of class (including for Investigation 1-2 prior to coming to class). We did Investigation 1-2 mostly together but then students worked on Investigation 1-3 in pairs. There is an Excel Exploration but no other technology is used.

Investigation 1-2 gives students immediate practice in applying the terms. We strongly encourage you to allow the students to work through these questions, at least through (j), on their own first. Question (c) asks students to use Excel, but again they could do that outside of class or you could demonstrate it for them. Students will struggle and you need to visit them often to make sure they are getting the answers correct (e.g., how they state the variables, whether they see “amount of smoking” as categorical, and calculation of the odds-ratios). The odds-ratios questions are asked to encourage them to treat having lung cancer as a success and putting the non-smokers odds in the denominator. This ensures the odds-ratio is greater than one and treats the non-smokers as the reference group to compare to. (Though students may have some trouble getting used to treating having lung cancer as a success!) The main criticism we expect to hear about in (j) is age but even after the odds-ratios were “adjusted” for age there could be other differences between the groups again, e.g., socio-economic status, diet, exercise that is related to both smoking status and occurrence of lung cancer. Don’t expect perfect answers on (j) and (k) but give them a chance to think about the issues before you discuss them together. Question (l) contains a typo; the question should be about “lung cancer patients” not “smokers” (you can joke that it certainly doesn’t tell us about the proportion of smokers, but also not about lung cancer patients in general). This is a subtle point but important for students to think about. You should give students a chance to think about the issue for a while before you discuss it with them.

In this text, we tend to distinguish between the types of study (case-control, cohort, cross-classification) and the type of design (retrospective, prospective). These are not clear distinctions and you may not wish to spend too long on them (the definition of a cohort study given is especially simplistic). Mostly you will want students to distinguish between observational studies and experiments, but also always considering the implications of the study design on the final interpretations of the results. You can ask them to read the discussion on p. 15 outside of class. One note on how we use the “definition” boxes. The top of the box is generally a generic definition. The bottom of the box will be how the term applies in the particular study being discussed.

Questions (m) and (n), about when we can draw cause/effect conclusions and when we can generalize results to larger population, are important ones that will arise throughout the course, so it’s worth drawing students’ attention to them. In particular, we want students to get into the habit of seeing these as two distinct questions, answered based on different criterion, to always be asked when summarizing the conclusions of a study. Students probably won’t have great intuition here so give them time to struggle with these questions first, without expecting perfect answers, then discuss them together. These points will come up repeatedly throughout the course. In the following discussion, we touch on the debate of whether “lung cancer” should be considered a variable. We tend to fall in the camp that it is a variable but students need to be clear that that fact that the distribution of the variable was controlled by the researchers affects the types of conclusions we can draw.

Questions (o)-(t) are good ones to discuss as a class (note the typo in the ordering of r-s). You should remind students that you have shifted gears here a bit and are now looking at properties of these calculations. You might consider dividing up the calculations in (o)-(s), assigning different groups of students to work on each question and share their results with the class. These questions are a good example of where even good students might answer the questions well but miss the point behind the calculations, so you might want to hold a follow-up discussion or at least draw students’ attention to the discussion section in the book. Students should be able to tell you that it doesn’t make sense to calculate the proportions conditional on the smoking level since the lung cancer “variable” was controlled by the researchers. This is a subtle point that you may not want to belabor but it begins to explain to students why we use the odds ratio in many situations instead of the more easy to interpret relative risk. Similarly, the “invariance” of the odds ratio to how we define success and failure is another advantage.

Investigation 1-3 provides students with more practice and gets them to again think in terms of association. They should be able to tell you some advantages to the prospective design over the retrospective design (e.g., not relying on memory, seeing what develops instead of taking people who are already sick). However this does not take into account any of the other possible confounding variables or that they only selected healthy, white men initially. You may want to pre-create the Excel worksheet for them and then have them open it and start from there.

The Excel Exploration can be done inside or outside of class. I had students finish it in pairs outside of class and then turn in a report of their results. They should see that the odds-ratio and the relative risk are similar when the baseline risk is small and that they can be very different from 1 even for the same difference in proportions. This is also the first time they really see that the OR and RR are equal to one when the difference in proportions is zero. We encourage you to have them view the updating bar graph throughout these calculations to also see the changes visually. This exploration is essentially playing with formulas but allows them to come to a deeper understanding of the formulas, how to express them, and hopefully how they are related. Some issues you might want to ask them about afterwards (in class or in a written report) include:

- when will RR and OR be close together (you can even lead a discussion of the mathematical relationship OR = RR(1-p₂)/(1-p₁) )

- when are RR and OR more useful values to look at than the difference in proportions (primarily when the success proportions are very small or very large)

- when will RR and OR be equal to 1 and what does that imply about the relationship between the variables/the difference in proportions

These comparisons should fall out if they follow the structure of the examples and what changed with each table. The last questions in (l) also try to help them distinguish between saying the probability of success is the same across groups and the probability of success is the same as the probability of failure. They should realize that the latter condition is not required for independence. Part (m) is where we try to direct their summary of the issues, you may wish to add even more structure there. A 5-point scale appeared to work well in grading this paragraph. You also need to decide how much of the Excel output you want them to turn in.

As you summarize these first 3 investigations you might even want to warn them that they won't see RR and OR too much for a while but the other big lessons they should be taking from this early material is the importance of the study design and always using graphical and numerical summaries as they explore the data. Students should also be getting the idea that statistics matters and that statistics is important for investigating important issues.

Additionally you can highlight the three studies they have seen so far (Popcorn, Wynder and Graham, Hammond and Horn) and compare and contrast them. For example:

Popcorn and lung disease	Wynder and Graham	Hammond and Horn
Defined subjects as high/low exposure, classified airway obstruction	Found subjects with and without lung cancer, classified smoking status (case-control, retrospective)	Found level of smoking, followed subjects, whether died of lung cancer (cohort, prospective)
Meaningful to examine proportion with airway obstruction	Not meaningful to examine proportion with lung cancer	Meaningful to examine proportion who died of lung cancer and proportion of smokers
Similar number in each explanatory variable group	Similar number in each response variable group
May not be representative	Controlled interviewer behavior	Not much control (22,000 ACS volunteers)

Section 1-3

Timing: This section will probably take approximately 45-50 minutes. You may choose to do more leading in this section in the interest of time. No technology is used.

The initial steps of Investigation 1-4 should be fairly routine for students by this point. You might consider asking them to complete up to a certain question before they come to class. It is also fun to ask them whether or not they wear glasses and if they remember the type of lighting they had as a child. The key point is of course the distinction between association and causation and through class discussion students should be able to suggest some reasonable confounding variables. Where to be picky is to make sure that their confounding variable has a clear connection to the response (eye condition) and that there is the differentiation in this variable between the explanatory variable groups (type of lighting). You might consider having them practice drawing an experimental design schematic (formally introduced later in the course), along with matching the different confounding variable outcomes with the different explanatory variable outcomes. For example:

The practice problems at the end of this investigation are a little more subtle and it will be important to discuss them in class and ensure that students understand the two things they need to discuss to identify a variable as potentially confounding (its connection to both the explanatory and the response variable).

In Investigation 1-5, we have chosen to treat Simpson’s Paradox as another confounding variable situation. This investigation goes into the mathematical formula as another way to illustrate the source of the paradox (the imbalance in where the women applied and an imbalance in the acceptance rates of the two programs). You might also consider showing them a visual illustration such as:

where the size of the circles are intended to convey the sample sizes in each category and thus their “weight” in the overall calculation.

There is a typo in questions (i) and (k), they should ask whether the overall acceptance rate for women is equal to the average of the acceptance rates in each program, not about Simpson’s Paradox. This change greatly simplifies the questions and students should not have much trouble. However, then this is purely about weighted averages and not Simpson’s Paradox. If you really want to challenge the students, you can ask them to work with the weighted averages to derive mathematically (or at least verify) conditions where Simpson’s Paradox will not occur. Click here for an example discussion, which could also be turned into a homework problem.

The practice problems will help them see the paradox arising in different settings. Even when they see and pretty much understand what’s going on, students often struggle to provide a complete explanation of the cause of the apparent paradox. A very good follow-up question is to ask them to construct hypothetical data for which Simpson’s Paradox occurs as in the following homework problem: Suppose two softball players each have 200 at-bats over 2 months in the season. Construct a two-way table so one player has a higher average (success proportion) for each month individually, but the other player has a higher average (success proportion) over these 200 at-bats.

It will be important to convey to students exactly what “skills” and “concepts” you want them to take away from this investigation. If you want to focus on the “weighted average” idea (which has some nice reoccurrences later in the course), students will probably need a bit more practice. In summarizing these investigations with students, we are hoping they have motivated the need for more careful types of study designs that would avoid the confounding issues. Students often have an intuitive understanding of “random assignment” but this will be developed more formally in the next section.

Section 1-4: Designing Experiments

Timing/Materials: This section will probably take approximately 45-50 minutes and consists of several small investigations. Try to focus on the big issues tying the investigations together. Some of the simulations could be assigned to out of class. You will need index cards for the tactile simulation and access to an internet browser. You should get the students in the habit of going to the main data files and java applets page here and selecting from that page.

In Investigation 1-6, students see yet another example of the limitations of an observational study and are usually very good at identifying potential confounding variables. It’s fun to ask students if they know whether their institution has a foreign language requirement and what might be the reasons for that requirement. Deciding whether foreign language study directly increases verbal ability (as posited by many deans), this leads into the idea of an experiment and most students appear to have heard these terms, including placebo, before. We would recommend going through this investigation with the students rather quickly. A schematic on p. 37 is missing:

You may also wish to discuss with them the schematic for the original observational study and the potential “verbal ability” confounding variable:

In Investigation 1-7, we strive to help students see the need for randomization. We have students begin with a hands-on simulation of the randomization process. We feel this engages the students and gives them a concrete model of the process. We encourage you to have the students come to the board to collectively create the dotplot of their results in (d). Students could conduct the randomization outside of class and bring in their results but we feel this concept is important enough that you may prefer to do so in class. Students then transition to an applet to perform the randomization process many, many more times. Hopefully the prior hands-on simulation and the graphics of the applet will help them connect to the computer simulation (and reinforce that they are mimicking the randomization process used by the researchers). They should be able to work through the applet questions fairly quickly and then you will want to emphasize that randomization “evens out” other lurking or extraneous variables between the groups. It is important to emphasize to students that while we often throw around the word “random” in everyday usage, achieving true “statistical randomness” takes some effort and should not be short-circuited.

Investigation 1-8 is listed as optional or may be presented briefly. It continues to use the applet to have students think about the concept of “blocking.” We have chosen to discuss a rather informal use of “blocking” in that students are grouping subjects into homogenous units. [A more formal introduction to blocking would only allow the block size to equal the number of treatments.] Students can again play with the tactile simulation or you can consider question (b) as a “thought experiment” since they should see the effect pretty quickly. The applet conveys the idea that if you actively balance out factors such as gender between the two groups, that will ensure further balance between the groups on some other variables as well (those related to gender, like height). We chose not to emphasize this concept strongly but did want students to think about the advantages (and disadvantages) of carrying out the experimental design on a more homogeneous group of subjects.

At this point in the course, you might also consider assigning a data collection project where students work with categorical variables and consider both observational and experimental studies. An example assignment is posted here.

Section 1-5: Assessing Statistical Significance

Timing/Materials: With some review of the idea of confounding variables at the beginning, this section takes approximately 60 minutes. One timing consideration is in having students do a second tactile simulation. This simulation is very similar to that in Section 1-4 but here focuses on the response instead of the explanatory variable. Still, you may want to make sure these simulations occur on different days. We do see value in having them do both as students too easily forget what the randomization in the process is all about. This simulation also ties closely to an applet and helps transition students to the concept of a p-value. You will need pre-sorted playing cards or index cards and access to an internet browser. We pre-sort the playing cards into packets of 24, with 11 red ones (hearts/diamonds) representing successes and 13 black ones (clubs/spades) representing failures, but you could also use index cards and have students mark the successes and failures themselves.

The goal in this section is to see the entire statistical process, from consideration of study design, to numerical and graphical summaries, to statistical inference. Students learn about the idea of a p-value by simulating a randomization test. While in the previous section we focused on how randomization evens out the effects of other extraneous variables, here the focus is on how large the difference in the response variable might be just due to chance alone. You will want to greatly emphasize the question of how to determine whether a difference is “statistically significant.” Try to draw students to think about “what would happen by chance,” even before the simulation, as a way to answer this question (around question (f)). At some point (beginning of class, around question (f), end of class) you may even want to detour to another example to illustrate the logical thinking of statistical significance. One demonstration that we have found effective is to have a student volunteer roll a pair of dice that look normal but are actually rigged to roll only sums of seven and eleven (e.g., unclesgames.com, other sources). Students realize after a few rolls that the outcomes are too surprising to have happened by chance alone for normal dice and thus provide compelling evidence that there is something fishy about the dice. It is important for students to think of the randomization distribution as a “what if” exploration to help them analyze the experimental results actually obtained.

For part (i) of Investigation 1-9, we have students create a dotplot on the board, with each student placing five dots (one for each repetition of the randomization process) on the plot. You can also have students turn in their results for you to compile between classes or have them list their results to you orally (or as you walk around the room) while you (or a student assistant) type them into Minitab or place them on the board yourself.

The data described in Investigation 1-9 have been slightly altered. In the actual study, 11 observers were assigned to Group A and 12 to Group B, however we preferred that these column totals were not the same as the row totals. Students should find the initial questions very straight forward and again you could ask them to complete some of this background work prior to the class meeting. Some notes on the applet:

- holding the mouse over the deck of cards reveals the composition of the deck (students should note the 13 red and the 11 black cards).

- the alignment of the tick marks and the horizontal scale tends to be better in Netscape than in Internet Explorer.

- with 1000 repetitions, when you “show tallies” the values tend to crash a bit, but students should be able to parse them out.

- the “Difference in Proportion” button is to help students see the transition between this distribution and the distribution of _A – _B that they will work with later but it may not be worth addressing at this point in the course.

- we encourage you to continually refer to the visual image of the empirical randomization distribution given by the applet when interpreting p-values.

There are some important points for students to be comfortable with in the discussion on p. 54. In particular, how the p-value presents a measure of the strength of evidence along a continuous scale. You will also want to emphasis that the p-value measures how often research results at least this extreme would occur by chance if there was no true difference between the experimental groups. You might also want to remind students that the terminology introduced in this investigation will be used throughout the rest of the course.

Brief Introduction to Minitab

You may wish to initially demo Minitab to your students while going over some of these basic features and then requiring students to open Minitab outside of class and mimic the steps you have shown them. From this point out in the course, Minitab will be used rather heavily.

Section 1-6: Probability and Counting Methods

Timing: This section, consisting of two investigations, should take approximately 45 minutes. You may also choose to supplement with some other probability applications and/or discussion of lotteries. Minitab is used in both investigations.

At this point, quantitatively inclined students are often chomping at the bit for a more analytic approach to determining p-values that circumvents the need for the simulations. Investigation 1-10 introduces them to the idea of probability as the long-run relative frequency of an outcome. Minitab is used to carry out a simulation (cast as a randomization to address statistical significance, to parallel the earlier investigations) and then graph the behavior of the empirical probability over the number of repetitions. This is the first time students create and use a Minitab macro. We initially have them copy and paste session commands to reinforce the repetition (Note: while copying and pasting the commands in the Session window is often much quicker, many students tend to prefer using the menus). After doing this for a while, students are often ready to create a program that repeats these commands for them a large number of times. Some students will pick up these programming ideas very quickly, others will need a fair bit of help. You may want to pause a fair bit to make sure they understand what is going on in Minitab. If a majority of your students do not have programming background, you may want to conduct a demonstration of the procedure first. The two big issues are usually helping students save the file in a form that is easily retrieved later and getting them into the habit of using (and initializing!) a counter. We suggest using a text editor rather than a word processing program for creating these macro files so Minitab has less trouble with them. In saving these files, you will want to put double quotes around the file name. This prevents the text editor from adding “.txt” to the file name. The macro will still run perfectly fine with this extension but it is a little harder for Minitab to find the file (it only automatically lists those files that have the .mtb extension – you will need to type *.txt in the File name box first to be able to see and select the file). On some computer systems, you also have to be rather careful in which directories you save the file. You might want students to get into the habit of saving their files onto floppies or onto the computer desktop for easier retrieval. These steps may require some trial and error to smooth out the kinks.

There is a large typo on p. 60 – the Discussion and confidence interval should not be there but rather the page should start with question (i).

The graph created on p. 61 is worth seeing as it shows the relative frequency as a function of the number of repetitions. Having students type in these commands is rather mechanical but we think it is important for them to create the graph with their own data instead of merely being shown a static picture. Still, it will be important to have longer discussions on what the pictures represent (either as a class or a writing assignment). The key will be in not letting the computer work overwhelm the larger concepts here.

This is also the first point in the course where we require students to make predictions. It’s important for students to know that this is informal and that you are only interested in their first thoughts, they should not worry about correctness at this point. We employ this strategy a lot to get students to take a stake. If their prediction is incorrect, then they are more likely to take the time to correct their misconception.

At this point you may choose to introduce some other interpretations of probability, e.g., subjective probability, to introduce them to the diversity of uses of the term. Also, while the calculations in this course often make use of the equal probability of outcomes from the randomization, you might want to caution them to not always assume outcomes are equally likely. The following transcript from a Nightline broadcast a few years about may help bring home the point:

TED KOPPEL: Dr. Andrews, I'm sure you have heard such cautionary advice before so on what basis is the assumption being made that this is the one that's going to have the kind of impact on southern California in particular that's being predicted?

RICHARD ANDREWS: Well, in the business that I'm in and that local government and state government is in, which is to protect lives and property, we have to take these forecasts very seriously. We have a lot of forecasts about natural hazards in California and we have a lot of natural events here that remind us that we need to take these forecasts seriously. I listen to earth scientists talk about earthquake probabilities a lot and in my mind every probability is 50-50, either it will happen or it won't happen. And so we're trying to take the past historical record, our own recent experience of the last, two of the last three years and make the necessary preparedness measures that can help protect us as much as we can from these events.

In Investigation 1-11, students use this basic probability knowledge to derive the formula for the hypergeometric probability. We hope students will become comfortable using Minitab to calculate these probabilities and try not to spend too long on the combinations calculations. You can also use Minitab or other software to show them lots of graphs of the hypergeometric distribution for different parameter values. Make sure on p. 67 they see the subcommand at the top of column 2. Students will vary in their comfort levels with the binomial coefficient and calculations. You may wish to help students maintain focus on the “end result” and make more use of technology to calculate the probabilities (including showing your students how to do these combination calculations on their calculator). We often also advise them that we are more interested in their ability to set up the calculation (e.g., on an exam) and to interpret the result (and make decisions based on the value). You should consider helping students see in the pattern in the hypergeometric combinations with the top numbers in the numerator adding to the top number in the denominator and the bottom numbers adding as well. We also help to help student continue to distinguish between “empirical” and “theoretical” probabilities, and we strive to keep the “statistical motivation” of these calculations in mind – how often do experimental results as extreme as this happen “by chance.”

The Random Babies applet in Practice 1-20 is fun and often memorable for students. While the applet repeats the concepts of the Minitab simulation it may be more visually appealing for students. You may want to apologize about the possibly offensive context ahead of time.

Section 1-7: Fisher’s Exact Test

Timing: This section, along with Chapter 1 review, should take about 50 minutes. You may also wish to provide more practice carrying out Fisher’s Exact Test and discussing the overall statistical process. In the Investigation, students are asked to create a segmented bar graph and calculate p-values (meaning technology is helpful but not essential).

This section brings the statistical analysis full circle by formally using the hypergeometric probabilities to calculate p-values for two-way tables. It does not introduce new ideas but rather tries to pull together the ideas of the previous two sections. It first continues the analysis of the “Friendly Observers” study, for which students have only approximated the p-value so far, and asks them to calculate the exact p-value now that they are knowledgeable of combinations and hypergeometric probabilities. In Investigation 1-13, students are given some practice by hand but then are again asked to turn the calculations over to Minitab. It may be worth showing them several ways to do these calculations in Minitab so they may find the method most natural for them individually. It is also worth showing them that the same calculation can be set up several different ways and arrive at the same p-value (as long as they are consistent), e.g., top of p. 73. (We actually intended for all of the calculation details to be there (see the errata page) so you can either spell it out for students or have them think through it.)

Beginning with question (f), the investigation transitions into focusing on the effect of sample size on the p-value. This gives students additional practice while making a very important statistical point that will recur throughout the course.

Investigation 1-14 gives them additional practice with Fisher’s Exact Test but also brings up the debate of what the p-value really means with observational data.

Summary

Students need to be strongly encouraged to read the Chapter Summary and the Technology Summary. Especially in this preliminary edition with no index, students will need to carefully organize their notes. You should remind them that this course will be rather “cyclic” in that these ideas will return and be built upon in later chapters. With these students, we have had good luck asking them to submit potential exam questions as part of the review process. You might also consider showing students a graphic of the overall statistical process and how the ideas they have learned so far fit in, e.g.:

Chapter 2

This chapter parallels the previous chapter (considering of data collection issues, numerical and graphical summaries, and statistical inference from empirical p-values) but for quantitative variables instead of categorical variables. The themes of considering the study design and exploring the data are reiterated to remind students of their importance. Analyses for quantitative data are a bit more complicated, because no longer does one number summarize a distribution and we focus on shape, center, and spread in describing these distributions. This also leads to heavier use of Minitab for analyzing data (e.g., constructing graphs and calculating numerical summaries) as well as for simulations. If your class does not meet regularly in a computer lab, you might want to consider having students work through the initial study questions of several investigations, saving up the Minitab analysis parts for when you can visit a computer lab. Or if you do not have much lab access, you could use computer projection to demonstrate the Minitab analyses. Keep in mind that there are a few menu differences if you are using Minitab 13 instead of Minitab 14 (see the powerpoint slides for Day 8 of Stat 212). One thing you will want to discuss with your students is the best way to save and import graphics for your computer setting. Some things we’ve used can be found here.

Section 2-1: Summarizing Quantitative Data

Timing/Materials:

Students will be using Minitab in Investigations 2-3 (oldfaithful.mtw), 2-5 (temps.mtw), and 2-6 (fan03.mtw). Instructions for replicating the output shown in Investigation 2-2 (cloudseeding.mtw) are included as a Minitab Detour on p. 92. Excel is used in Investigation 2-7 (housing.xls). Investigations 2-1 and 2-2 together should take about 50-60 minutes. Investigations 2-3, 2-4, and 2-5 together should take another 60 minutes or so. Investigation 2-6 could take 40-50 minutes, and Investigation 2-7 could take 50-60 minutes. You might consider assigning Investigation 2-6 as a “lab” that students work on in pairs and complete the “write-up” outside of class. Investigation 2-7 explores the mathematical properties of least squares estimation in this univariate case and can be skipped or moved outside of class.

Investigation 2-1 is meant to provoke informal discussions of anticipating variable behavior. You may choose to wait until students have been introduced to histograms (in which case it could also serve to practice terminology such as skewness). You can also consider using your own survey questions and having students examine their own data. It’s even entertaining to look at their own results after doing this activity and seeing whether the behavior at your school as the same as for the data provided. One goal is to help students get used to having the variable along the horizontal axis with the vertical axis representing the frequencies of observational units. Furthermore, we want to build student intuition for how different variables might be expected to behave. You will probably want to have students add identifying numbers to the graphs for easier reference:

Students usually quickly identify graphs 1 and 6 as either the soda choice or the gender variable, the only categorical variables. Reasonable arguments can be made for either choice. In fact, we try to resist telling students there is one “right answer” (another habit of mind we want them to get into in this statistics class that some students may not be expecting, as well as that writing coherent explanations will be an expected skill in this class). We tell them we are more interested in their justification than their final choice, but that we see how well they support their answers and the consistency of their arguments. A clue could be given to remind students the name of the course these 35 students were taking. This often leads students to select graph 1 as the gender variable, assuming the second bar represents a smaller proportion of women in a class for engineers and scientists. Students usually pick graphs 2 and 3 (the two skewed to the right graphs) as number of siblings and haircut cost. We do hope they will realize that graph 3, with its gap in the second position and its longer right tail (encourage students to try to put numerical values along the horizontal scale) is not reasonable for number of siblings. However the higher peak at $0 (free haircuts by friends) and the gap between probably $5 and $10 does seem reasonable. (In fact, students often fail to think about the graph possibly starting at 0.) We also expect students to choose between height and guesses of age for graphs 4 and 5. Again, reasonable arguments could be made for either, more symmetric shape for height, as expected for a biological characteristic? Or skewed shape for height (especially if they felt the class had a smaller proportion of women)? Again, we evaluate their ability to justify the variable behavior, not just their final choice. This investigation also works well as a paired quiz but the habits of mind that this investigation advocates were part of the motivation for moving it to first in the section.

In Investigation 2-2 students are introduced to some of the common graphical and numerical summaries used with quantitative data, while still in the context of comparing the results of two experimental groups. We present these fairly quickly, and we emphasize the subtler ideas of comparing distributions, because we don't really want to pretend that these mathematically inclined students have never seen a histogram or a median before! This investigation concludes by having students transform the data. While not involving calculus, transforming data is an idea that mathematically inclined students find easier to handle than their more mathematically challenged peers. This piece can be skipped but there are later investigations that assume they have seen this idea. You might also consider asking students to work on these Minitab steps outside of class. Practice 2-1 may seem straight-forward, but some students struggle with it, and it does assess whether students understand how to interpret boxplots. It does not require use of Minitab, but Practice 2-2 does. In Practice 2-2, you might consider narrowing the focus to asking the students to pick just one comparison and asking if t he wood vs. steel comparison is in the direction expected.

Investigation 2-3 formally introduces measures of spread and histograms. The data concern observations of times between eruptions at Old Faithful. We have in mind that spread is a more interesting characteristic than center for this distribution, because spread relates to how consistent/predictable the time of the next eruption is. You may have students go to the website to look at current data and/or pictures of geyser eruptions. One thing to insist on in their discussions of the data is that they treat “IQR” as a number measuring the spread, not as a range of values as many students are prone to do. Investigation 2-4 asks students to think about how measures of spread relate to histograms. This is a “low-tech” activity that can really catch some students in common misconceptions and you will definitely want to give students time to think through (a)-(d) on their own first. The goal is to entice these students to make some common errors involving bumpiness and variety (as explained in the Discussion) so they can confront their misconceptions head on. It will be important to provide students with immediate feedback on this investigation. We encourage taking the time to have students calculate the interquartile ranges by hand as doing so for tallied data appears to be nontrivial for them. The actual numerical values for Practice 2-3 are below:

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
Raleigh	39	42	50	59	67	74	78	77	71	60	51	43
SF	49	52	53	56	58	62	63	64	65	61	55	49

Practice 2-4 tries to convince students to consider a few different histogram bin widths to make sure they aren’t missing some detail. A related java applet can be found here. Investigation 2-5 aims to motivate the idea of standardized scores for “comparing apples and oranges.” While students may realize you are working with a linear transformation of the data, we hope they will see the larger message of trying to compare observations on different scales through standardization. Practice 2-7 should help drive this point home. You also want to reiterate to students that the “suprisingness” of a data value will depend on both the distance from the mean and the variability in the data. The empirical rule is used to motivate an interpretation of standard deviation (half-width of middle 68% of a symmetric distribution) that parallels their understanding of IQR. (See also the results of Practice 2-6.)

Investigation 2-6 gives students considerably more practice in using Minitab to analyze their data. Students will probably need some help with questions (n)-(p) especially if they are not baseball fans. These questions can be addressed in class discussion where those that are baseball fans can be the “experts” for the day. Still, we also want students to get into the mental habit of playing detective as they explore data. We find Practice 2-9 helps transition the data set to one that applies more directly to individual students. We encourage you to collect students’ written explanations (perhaps in pairs) to provide feedback on their report writing skills (incorporating graphics and interpreting results in context). If treated more as a lab assignment, you might consider a 20 point scale:

Defining MCI: 2 pts; Creating dotplots: 2 pts; Creating boxplots: 2 pts; Producing descriptive statistics: 2 pts; Discussion: 8 pts (shape, center, spread, outliers); Removing one team and commenting on influence: 3 points; Overall communication: 1 point (including embedding the graphics into the report). Such a rubric can be shown to the students as part of the assignment statement as well.

Investigation 2-7 leads students to explore mathematical properties of measures of center, and it also introduces the principle of least squares in a univariate setting. As we mentioned above, this investigation can easily be skipped. Questions (a) and (b) motivate the need for some criterion by which to compare point estimates, and questions (c)-(h) reveal that the mean serves as the balance point of a distribution. Beginning in (k), students use Excel to compare various other criteria, principally for comparing the sum of absolute deviations and the sum of squared deviations. Students who are not familiar with Excel may need some help, particularly with the “fill down” feature. Questions (o) and (p) are meant to show students that the location(s) of the minimum is not affected by the extremes but is affected by the middle. Students will be challenged to make a conjecture in (q), but some students will realize that the median does the job. Questions (t)-(w) redo the analysis for the sum of squared deviations, and in (x) students are asked to use calculus to prove that the mean minimizes SSD. This calculus derivation goes slowly for most students working on their own, so you will want to decide whether to save time by leading them through that. Practice 2-10 extends the analysis to an odd number of observations, and Practice 2-11 looks at a new criterion, the maximum of absolute deviations, for which the minimum occurs at the mid-range.

Section 2-2: Statistical Significance

Timing/Materials: Students are asked to conduct a simulation using index cards in Investigation 2-8, follow-up by creating and executing a Minitab macro. Encourage students to review the steps in creating a macro prior to coming to class. This macro is used again in Investigation 2-9 and then modified to carry out an analysis in Investigation 2-10. This section might take 75-90 minutes.

This section again returns to the question of statistical significance, as in Chapter 1, but now for a quantitative response variable. As before, students will use shuffled cards and then a Minitab macro to simulate a randomization distribution. However this time there will not be a parallel probability model, because we need to consider not only the number of samples but also the value of the mean for each sample which is computationally intensive. We encourage you to especially play up the context in Investigation 2-8 where students can learn a powerful message about the effects of sleep deprivation. (It has been shown that sleep deprivation impairs visual skills and thinking the next day, this study examines the effects 3 days later.) The tactile simulation will probably feel repetitive so you may try to streamline it but we have found students still need some convincing on the process behind a randomization test. It is also interesting to have students examine medians as well as means. In question (h) we again have students add their results to a dotplot on the board. Students are asked to type in the macro commands, you may also choose to provide the macro file to them. Either way, it will be important for them to understand what Minitab is doing. Remembering to initialize the counter (let k1=1 at the MTB> prompt) before running the macro is the most common error that occurs; students also need to be reminded that spelling and punctuation are crucial to the functionality of the macro. In this case, they also need to realize that the worksheet containing the data needs to be open before they run the macro. Indicator variables, as in (l), will also be used extensively throughout the book. We encourage students to get into the habit of running the macro once to make sure it is executing properly before they try to execute it 1000 times. We do show them the results of generating all possible randomizations in (p) to convince them of the intractability of this approach and to motivate later study of the t distribution. You might consider providing more feedback (e.g., collecting samples of their answers) for questions (d) and (e) to monitor students’ progress with these concepts. A slightly improved version of the figure on p. 128 is below:

Investigations 2-9 and 2-10 provide further practice while focusing on two statistical issues: the effect of within group variability on p-values (having already studied sample size effects in Chapter 1) and how to interpret p-values with observational data. Question (b) of Investigation 2-9 is a good example of how important we think it is to force students to make predictions, regardless of whether or not their intuition guides them well at this point; students struggle with this one, but we hope that they better remember and understand the point (that more variability within groups leads to less convincing evidence that the groups differ) through having made a conjecture in the first place. The subsequent practice problems are a bit more involved than most, so you may want to incorporate them more into class and/or homework.

Chapter 3

In this chapter, we transition from comparing two groups to focusing on how to sample the observational units from a population. While issues of generalizing from the sample to the population were touched on earlier, in this chapter students formally learn about random sampling. We focus (but not exclusively) on categorical variables and introduce the binomial distribution in this chapter, along with more ideas and notation of significance testing. There is a new spate of terminology that you will want to ensure students have sufficient practice applying. In particular, we try hard to help students clearly distinguish between the processes of random sampling and randomization, as well as the goals and implications of each.

Section 3-1: Sampling from Populations I

Timing/Materials: Investigation 3-1 should take about one hour. A version of Table I (a random number table) has been placed here. Students can also use a random number generator in their calculator if you aren’t concerned that they learn to use a random number table (though the latter is convenient for testing situations). Investigation 3-2 can be done quickly at the end of a class period, about 15 minutes. An applet is used in Investigation 3-3 and there are some Minitab questions in Investigation 3-4. These two investigations together probably take 60-75 minutes. You will probably want students to be able to use Minitab in Investigation 3-5 which can take 30-40 minutes. Investigation 3-6 is a Minitab exercise focusing on properties of different sampling methods (e.g., stratified sampling) which could be moved outside of class (about 30 minutes) or skipped. Investigations 3-2 and 3-6 can be regarded as optional if you are short on time.

The goal of Investigation 3-1 is to convince students on the need for (statistical) random sampling rather than convenience sampling or human judgment. Some students quibble that part (a) only asks for “representative words” which they interpret as indicating representing language from the time or words conveying the meaning of the speech. This allows you to discuss that we mean representative as having the same characteristics of the population, regardless of which characteristics you will decide to focus on. Here we focus on length of words expecting most students to oversample the longer words. Through constructing the dotplot of their initial samples (again, we usually have students come to the front of the class to add their own observation) we also hope to provide students with a visual image of bias, with almost of the student averages falling above the population average. We hope having students construct graphs of their sample data prior to the sampling distribution will help them distinguish between these distributions, and you will want to point this out frequently. The sampling distribution of the sample proportions, due to the small sample size and granularity may not produce as a dramatic an illustration of bias, but will still help get students thinking about sampling distributions and sampling variability. This investigation also helps students practice with the new terminology and notation related to parameters and statistics. We often encourage students to remember that the population parameters are denoted by Greek letters, as in real applications they are unknown to the researchers (“It’s Greek to me!”). The goals of questions (t)-(w) are to show students that random sampling does not eliminate sampling variability but does move the center of the sampling distribution to the parameter of interest. By question (w) they should believe that a method using a small random sample is better than using larger nonrandom samples. Practice 3-1 uses a well-known historical context to drive this point home (You can also discuss Gallup’s ability to predict both the Digest results and the actual election results much more accurately with a smaller sample).

Investigation 3-2 is meant as a quick introduction to alternative sampling methods. You may even choose to talk students through these ideas but it’s important for them to consider methods that are still statistically random but that may be more convenient to use in practice.

Investigation 3-3 continues the exploration of samples of words from the Gettysburg address using a java applet to take a much larger number of samples and to more easily change the sample size in exploring the sampling distribution of the sample mean. Students should also get the visual for a third distribution (beyond the sample and the empirical sampling distribution), the population. The questions step students through the sampling process in the applet very slowly to ensure that they understand what the output of the applet represents (e.g., what does each green square involve). The focus here is on the fundamental phenomenon of sampling variability, and on the effect of sample size on sampling variability; we are not leading students to the Central Limit Theorem yet. The last few questions attempt to help students to continue to think in terms of statistical significance by asking if certain sample results would be surprising. A common student difficulty is distinguishing between the sample size and the number of samples so you will want to discuss that frequently. You may consider a simple powerpoint illustration to help them focus on the overall process behind a sampling distribution (copy here), e.g.:

This exploration is continued in Investigation 3-4 but for sample proportions. Again the focus is on sampling variability and sample size effects. The parallel is drawn to the hypergeometric distribution and students again use Minitab to construct the theoretical sampling distribution and expected value which they can compare to the applet simulation results. Students continue this comparison by calculating the exact probability of certain values compared to the proportion of samples simulated. The subsequent practice problems address another common student misconception, that the size of the population always affects the behavior of the sampling distribution. In parts (a) and (c), consider using a sample of size 20 (and 2 nouns) instead of 100 words. We want students to see that with large populations, the characteristics of the sampling distribution do not depend on the population size (you will need to encourage them to ignore small deviations in the empirical values due to the finite number of samples).

In Investigation 3-5 students put the earlier observations together to use the hypergeometric distribution to make a decision about an unknown population parameter based on the sample results and again consider the issue of statistical significance. In class discussion you will need to emphasize that this is the real application, making a decision based on one individual sample, but that their new knowledge of the pattern of many samples is what enables them to make such decisions (with some but not complete certainty). (Reminding them that they usually do not have access to the population and that they usually only have one sample. But that the detour of the previous investigations was necessary to begin to understand the pattern of the sampling distribution.) In this investigation they are also introduced to other study design issues such as nonsampling errors. The graphs on p. 164 provide a visual comparison of the theoretical sampling distribution for different parameter values. You may want to circle the portion of the distribution to the right of the arrow, to help students see that the observed sample result is very unusual for the parameter value on the left but not for the one on the right. The discussion tries to encourage students to talk in terms of plausible values of the parameter to avoid sloppy language like “p is probability is equal to 2/3.” You may want to remind them of what they learned about the definition of probability in Chapter 1 and how the parameter value is not what is changing in this process. The goal is to get students to thinking in terms of “plausible values of the parameter based on the sample results.”

As discussed above, Investigation 3-6 is a Minitab exercise to explore the sampling distribution resulting from different sampling methods. This could work well as a paired lab to be completed outside of class to further develop student familiarity and comfort with Minitab.

At this point in the course you could consider a data collection project where students are asked to take a sample from a larger population for a binary variable where they have a conjecture as to the population probability of success. Click here for an example assignment.

Section 3-2: Sampling from a Process

Timing/Materials: In this section we transition from sampling from a finite population to sampling from a process which motivates the introduction of the binomial probability distribution to replace the hypergeometric distribution as the mathematical model in this setting. Beginning in Investigation 3-8 we rely heavily on a java applet for simulating binomial observations and use both the applet and Minitab to compute binomial probabilities. This section should take 50-60 minutes, especially if you give them some time to practice Minitab on their own.

Investigation 3-7 presents a variation on a study that will be analyzed in more detail later. You may wish to replace the photographs at petowner.html (for Beth Chance, correct choice is in the middle) with your own. The goal here is to introduce a Bernoulli process and Bernoulli random variable; introduction of the binomial distribution waits until the next investigation. Investigation 3-8 presents a similar simulation but in a more artificial context which is then expanded to develop the binomial model. You can play up this example though, telling students to begin answering the questions immediately - they will be uncomfortable with the fact that you have not shown them any questions! You can warn them that question 4 is particularly tough J. The point is for the students to guess blindly (and independently) on all 5 questions. You can then show them an answer key (we randomly generate a different answer key each time) to have them (or their neighbor) determine the number of correct answers. You can also tease the students who have 0 correct answers that they must not have studied. You may want to work together with the students through most of this investigation. Student misconceptions to look are for are confusing equally likely outcomes for the sample space with the outcomes of the random variable, not seeing the distinction between independence and constant probability of success, and incorrect application of the complement rule. The probability rules are covered very briefly. If you desire more probability coverage in this course, you may wish to expand on these rules. We again encourage use of technology to help students calculate probabilities quickly and to give students a visual image of the binomial distribution. Questions (y) and (z) help students focus on the interpretation of the probability and how to use it to make decisions (is this a surprising result?) so is a good time to slow down and make sure the students are comfortable. These questions also ask students to consider and develop intuition for the effects of sample size. They will again need to be reminded as in the hint in question (y) of the correct way to apply the complement rule. This comes up very often and causes confusion for many students. You may wish to change the heading on p. 176 to “Inverse Cumulative Probabilities.” Question (aa) is also a difficult one for students but worth spending time on. The subsequent practice problems provide more practice in identifying the applicability of the binomial distribution and the behavior of the distribution for different parameter values. Be specific with students if you use “parameter” to refer to both the numerical characteristic of the population and of the probability model.

Section 3-3: Exact Binomial Inference

Timing/Materials: This section will probably take about 3 hours, depending on how much exploring you want students to do vs. leading them through. Many students will struggle more than usual with all of the significance testing concepts and terminology introduced here, and they also need time to become comfortable with binomial calculations. Ideally students will have access to technology to carry out some of the binomial probability calculations (e.g., instructor demo, applet, or Minitab). Access to the applet is specifically assumed in Investigations 3-12 and 3-13 (for the accompanying visual images). The section concludes by introducing students to the exact binomial p-value and confidence interval calculation in Minitab.

Investigation 3-9 has students apply the binomial model to calculate p-values. We again have students work with questions reviewing the data collection process, some of which you may ask them to consider prior to coming to class. (There are many especially good measurement issues to discuss/debate with your students in these next few investigations.) Since the sample result has a miniscule p-value, we begin in (f) by having students consider a subset of the data with less extreme results first, before examining the full dataset in (l). Once students feel comfortable with the steps of this inferential process (you may want to summarize the steps for them: start with conjecture, look at sample data, consider appropriate sampling distribution, calculate and interpret p-value), you can then add the terminology of the null and alternative hypotheses. You will want to get students into the habit of writing these hypothesis statements using both symbols and “in words.” You can draw the parallel that in Chs. 1 and 2, the null hypothesis was “no treatment effect.” In the terminology detour on p. 181, we start by consider the null as a set of values but then transition to always considering simple null hypothesis statements that specify equality to a single value of the parameter. Students repeat this inferential process, and practice setting up null and alternative values in Investigations 3-10 (again considering issues of sample size) and 3-11 (a slightly more complicated, “nested” use of the binomial distribution). The graphs on p. 184 are a good precursor to considering when the binomial distribution can be approximated by a normal distribution. Keep in mind that all of the alternative hypotheses up to this point have been one sided.

The transition to two-sided p-values is made in Investigations 3-12 and 3-13. You will want to help students understand when they will want to consider one-sided vs. two-sided alternatives. This is a trickier issue than when it’s presented in terms of a z-test, because here you don’t have the option of telling students to simply consider the test statistic value and its negative. In Investigation 3-12, the sampling distribution under the null hypothesis is perfectly symmetric (p = .5) but not in Investigation 3-13. In the former case, we consider the second tail to be the set of observations the same distance from the hypothesized value as the observed result. But in the latter case, there are different schools of thought for calculating the two-sided p-value (as discussed on p. 194). The applet uses a different algorithm from Minitab. You may not want to spend long on this debate with your students, and we suggest that you focus on the larger idea of why two-sided tests are important and appropriate, but they should be aware why the two technologies may lead to different results. Note also that the numerical values will differ slightly depending on the decimal expansion of 2/3 that is used.

Investigation 3-14 then pushes this line of reasoning a step further by having students determine which hypothesized values of the parameter would not be rejected by a two-sided test. They thereby construct, through trial-and-error, an interval of plausible values for the parameter. We believe doing this “by hand” will help students understand how to interpret a confidence interval. A Powerpoint illustration of this process (but in terms of the population mean and the empirical rule) can be found here (view in slideshow mode). The applet can be used in a similar manner, but you have to watch the direction of the alternative in demonstrating this for students. The message to reiterate is that the confidence interval consists of those values of the parameter for which the observed sample result would not be considered surprising (considering the level of significance applied).

Investigation 3-15 begins discussion on Types of Errors. You will want to make sure students are rather comfortable with the overall inferential process before using this investigation. Many students will struggle with the concepts but the investigation does include many small steps to help them through the process (meaning we really encourage you to allow the students to struggle with these ideas for a bit before adding your explanations/summary comments; you will want to make sure they understand the basic definitions before letting them loose). This investigation can also work well as an out of class, paired assignment. The concept of Type I and Type II Errors will reoccur in Chapter 4.

Section 3-4: Sampling from a Population II

Timing/Materials: This section will take approximately one hour. Use of Minitab is assumed for parts of Investigations 3-16, 3-17, and 3-18.

In this section, students learn that the binomial distribution that they have just applied to random processes can also be applied to random sampling from a finite population if the population is large. Students consider the binomial approximation to the hypergeometric distribution and then use this model to approximate p-values. The goal of Investigation 3-16 is to help students see how the probabilities are quite similar for large populations. Investigations 3-17 and 3-18 then provide practice in carrying out this approximation in real contexts. Practice 3-31 introduces the sign test as another inferential application of the binomial distribution.

Summary

You will want to remind students that most of Ch. 3, calculating the p-values in particular, concerned binary variables, whether for a finite population or for a process/infinite population. Remind them what the appropriate numerical and graphical summaries are in this setting and how that differs from the analysis of quantitative data. If you will be concerned that students can properly verify the Bernoulli conditions, you will want to review those as well. Be ready for students to struggle with the new terminology and notation and proper interpretations of the p-value and confidence intervals. Encourage students that these ideas will be reinforced by the material in Ch. 4 and that the underlying concepts are essentially the same as they learned for comparing two groups in Chapters 1 and 2.

Another interesting out of class assignment here would be sending students to find a research report (e.g., http://www.firstamendmentcenter.org/PDF/SOFA.2003.pdf) and asking them to identify certain components of the study (e.g., population, sampling frame, sampling method, methods used to control for nonsampling bias and methods used to control for sampling bias), to verify the calculations presented, and to the comment on the conclusions drawn (including how these are translated to a headline).

Chapter 4

This chapter continues the theme of Chapter 3, the behavior of random samples from a population and how knowledge of that behavior allows us to make decisions. Most of the chapter is devoted to models that apply when large samples are selected, namely the normal distribution. We begin with some background on the normal distribution as a model and then focus on the Central Limit Theorem for both categorical (binary) and quantitative data. The last section, bootstrapping, provides alternative inferential methods when the Central Limit Theorem does not apply (e.g., small sample, other sample statistics).

Section 4-1: Models of quantitative data

Timing/Materials: Heavy use of Minitab (including features new to version 14) is used in Investigation 4-1 and 4-2. You may want to assign some of the reading (e.g., p. 226-7) to outside of class. Probability plots (Investigation 4-2) may not be on your syllabus, but we ask students to use these plots often and so we do not recommend skipping them. This section can probably be covered in 60-75 minutes.

In this section we try to convey the notion of a model, in particular, probability models for quantitative variables. Investigation 4-1 introduces the idea that very disparate variables can follow a common model (with different parameter values). We do not spend a long time on nonnormal models (e.g., exponential, gamma) but feel students should get a flavor for nonsymmetric models as well and realize that the normal model does not apply to all variables. The subsequent practice problems lead students to overlay different model curves on data histograms. (Minitab 14 automatically scales the curve and thus we do not have them convert the histogram to the density scale first.). In Investigation 4-2, probability plots are introduced as a way to help assess the fit of a model. There is some debate on the utility of probability plots, but we feel they provide a better guide than simple histograms for judging the fit of a model, especially for small data sets. Still, it can take students a while to become comfortable reading these graphs. We attempt to focus on interpreting these plots by looking for a linear pattern and do not ask students to learn the mechanics behind the construction of the graphs. We use questions (h)-(j) to help them gain some experience in judging the behavior of these graphs when the data are known to come from a normal distribution; many students are surprised at how much variation arises in samples, and therefore in probability plots, even when the population really follows a normal distribution. Some nice features in Minitab 14 make it easy to quickly change the model that is being fit to the data (both in overlaying the curve on the histogram and in the probability plot). If you are very short on time, Investigation 4-2 could be skipped but we will make use of probability plots in later chapters.

Section 4-2: Applying the (Normal) Probability Model

Timing/Materials: Minitab is used extensively in Investigations 4-3 and 4-4. Investigation 4-5 centers around a java applet which has the advantage of supplying the visual image of the normal model. You may wish to begin with Minitab until students are comfortable drawing their own sketches and thinking carefully about the scaling and the labeling of the horizontal axis. This section probably requires at least 90 minutes of class time.

In Investigation 4-3, the transition is made to using the theoretical models to make probability statements. The last box on p. 234 will be an important one to emphasize. We immediately turn the normal probability calculation over to Minitab and do not use a normal probability table at all. (This also has implications for the testing environment.) It will be important to continue to accompany these calculations with well-labeled sketches of probability curves and to help students distinguish between the theoretical probability and the observed proportion of observations in sample data. By the end of Activity 4-3, we would like students to be comfortable applying a model to a situation where they don’t have actual observations. Such calculations are made in Practice Problems 4-5, 4-6, and 4-7, including practice with elementary integration techniques and simple geometric methods for finding areas under probability “curves.”

We continue to apply the normal probability model to real sample data in Investigation 4-4 and you will want to make sure students are becoming comfortable with the notation and Minitab. On p. 238, we discuss the complement rule for these continuous distributions and you will want to highlight this compared to the earlier adjustments for discrete distributions (once students “get” the discrete adjustment, they tend to over apply it). This investigation also tries to motivate the correspondence between the probabilities calculated in terms of X from a N(m, s) distribution and in terms of Z from the Standard Normal distribution. This conversion may not seem meaningful to students at first (both for the ability to convert the measurements to the same scales and since we are not having them look the z-score up on a table) but you will want to remind them of the utility of reporting the z-value. In using Minitab, most students will prefer using the menus but it may be worth highlighting some of the Session command short cuts as well. We have attempted to step students through the necessary normal probability calculations (including inverse probability calculations) but afterwards you will want to highlight the different types of problems and how they can recognize what is being asked for in a particular problem.

Investigation 4-5 provides more practice but using the java applet. We apologize for the strange spacing on these next pages. You will want to make sure students are comfortable with the axis scales (note the applet reports both the x values and the z values) and in interpreting the probability that is reported. This investigation also introduces “area between” calculations and provides the justification of the empirical rule.

Section 4-3: Distributions of Sample Counts and Proportions

Timing/Materials: This section covers many important, and difficult for students, ideas related to the sampling distribution of a sample proportion. It introduces students to the normal approximation to the binomial distribution and to z-tests and z-intervals for a proportion. For Investigation 4-6, you will want to bring in Reese’s Pieces. You may be able to find the individual bags (“fun size”) or you may have to pour from a larger bag to each individual student. This takes some time in class but is always a student favorite. We often pour candies into Dixie cups prior to the start of class to help minimize the distribution time. We aim for at least 25 candies in each cup, and then ask students to select the first 25 “at random” (without regard to color). You can try to give these instructions before they have read too much about the problem context. Also in Investigation 4-6, they quickly turn to a java applet to take many more samples. Investigation 4-7 might actually be a good one to slowly step students through without the “distractions” of technology. Investigation 4-8 assumes students will use technology to calculate probabilities and you will want results from the earlier analysis of this study on hand. Similarly, technology is assumed for probability calculations in Investigation 4-9 including the 1 Proportion menu in Minitab. Investigation 4-10 involves the confidence interval simulation applet. Students can work through this together in pairs outside of class but you will want to insist on time in class for debriefing of their observations (and/or collection of written observations). You will want to carry out the “which prefer to hear first” survey in Investigation 4-11 to obtain the results for your students, possibly ahead of time. This investigation also requires quick use/demonstration of the confidence interval simulation applet. This section could take 3 hours of class time.

In Investigation 4-6, we first return to some very basic questions about sampling variability. Hopefully these questions will feel like review for the students but we think it is important to think carefully about these issues and to remind them of the terminology and of the idea of sampling variability. Weaker students can become overwhelmed by the reliance on mathematical notation at this point and you will want to keep being explicit about what the symbols represent. In the investigation they are asked to think about the shape, center, and spread of the sampling distribution of sample proportions as well as using the applet to confirm the empirical rule (reminding them that the “observational units” are the samples here). They also think about how the sample size and the probability of success, p, affect the behavior of the sampling distribution. At this point you could tell them the “formulas” for the mean and the standard deviation of the sampling distribution, or you can have them work through (or lead them through) the probability detour on p. 252-3. If you have the time, this probability detour provides a nice introduction and practice for rules of expectation and variance. Mostly you will want to highlight how these expressions depend on n and p, and that the normal shape depends on how large the sample size is and how extreme (close to 0 or 1) the success probability is. This is the first time students are introduced to the phrase “technical conditions” that will accompany all subsequent inferential procedures discussed in the course. You will probably have to give some discussion on why the normal approximation is useful since they already have used the binomial and hypergeometric “exact” distributions to make inferences. You will want to make sure everyone is comfortable with the calculations on p. 254, where all of the pieces are put together. Practice Problems 4-11 and 4-12 provide more practice doing these calculations and practice 4-13 is an optional exercise introducing students to continuity corrections.

Investigation 4-7 refers to the context of a statistical investigation and students must consider hypothesis statements and p-values, as they have before, but now using the normal model to perform the calculations. You will want to emphasize that the reasoning process is the same. Some students will want to debate the “logic” of this calculation (for example, assuming that the proportion of women among athletes should be the same as the proportion of women among students) and you will want to be clear about what this p-value does and does not imply and that there are many other issues involved in such a legal case (e.g., surveys of student interest and demonstrable efforts to increase the participation of women are also used in determining Title IX compliance). The idea of a test statistic is formally introduced on p. 258 (one advantage to using the normal distribution) and p. 259 tries to remind them of the different methods for finding p-values with a single categorical variable that they have encountered so far. Students should be encouraged to highlight p. 260 as one they will want to return to often from this point in the course forward. You might also want to show them how this structure applies to the earlier randomization tests from Chapter 1 and 2 as well.

Investigation 4-8 returns to an earlier study and re-analyses the data with the normal approximation. You will want to have the reference for the earlier binomial calculation handy. This investigation continues on to consider Type I and Type II Error probabilities through the normal distribution. Some students will find this treatment of power easier than the earlier use of the binomial distribution, but you will want to make sure they are comfortable with the standard structure of tests of significance before continuing to these more subtle issues. You will want to draw many pictures of normal curves and rejection regions. Please see the errata page for corrected versions of the graphics on p. 265 (.583 should be .584 and .746 should be .75).

Similarly, Investigation 4-9 shows how much more straight-forward it is to calculate a confidence interval using the normal model (though remind them that it still represents an interval of plausible values of the parameter). Students are introduced to the terms standard error and margin of error. This would be a good place to bring in some recent news reports (or have students find and bring in) to show them how these terms are used more and more in popular media. A subtle point you may want to emphasize with students is how “margin of error” and “confidence level” measure different types of “error.” The Applet Exploration beginning on p. 272 should help them focus on a proper interpretation of confidence. This exploration can be completed outside of class, but you will probably want to emphasize to students whether you consider their ability to make a correct interpretation of confidence a priority. (We often tell them in advance it will be an exam question and warn them that it will be hard to “memorize” a definition due to the length of a correct interpretation and the insistence on context, so they should understand the process.) We hope the applet provides a visual image they will be able to use for future reference, for example by showing that the parameter value does not change but what does vary is the sample result and therefore the interval. Though we do want students to understand the duality between level of significance and confidence level, we encourage you to have them keep those as separate terms. One place you can trim time is how much you focus on sample size determination calculations. All of these procedures are summarized on p. 270, another page you will want to remind them to keep handy. We have included the necessary Minitab commands and common critical values (top p. 271) for ease of reference. Let students know if you will be requiring them to carry out these calculations in other ways.

Investigation 4-11 provides students with a scenario where the normal approximation criterion are not met and therefore an alternative method should be considered. We present the formula for the “Wilson Estimator” and then use the applet to have them explore the improved coverage properties of the “adjusted Wald intervals.” You may want to discuss with them some of the intuitive logic of why this would be a better method (but again focus on how the idea of confidence is a statement about the method, not individual intervals). In particular, in the applet, they should see how intervals that previously had length zero (because the sample proportion was 0 or 1), now produce meaningful intervals. Some statisticians argue that this “adjusted Wald” method should always be used instead of the original Wald method, but since Minitab does not yet have this option built in, we tend to have students consider it separately. They may wish to add this (and the Minitab “trick” for calculating such an interval) to the summary on p. 270. We also like to emphasize to students how recently this method has come into the mainstream to help highlight the dynamic and evolving nature of the discipline of statistics.

Section 4-4: Distributions of Sample Means

Timing/Materials: Investigation 4-12 makes heavy use of Minitab (version 14) with students creating more Minitab macros. There are applet explorations on p. 294 that reinforce some of the material in Investigation 4-12 while also extending the investigation. Minitab is also assumed in Investigation 4-13 and 4-14. You might consider having students collect their own shopping data for two local stores. A convenient method is to randomly assign each student a product (with size and brand details) and then ask them to obtain the price for their product at both stores. This appears to be less inconvenient for students that asking them to find several products, but you will still want to allow them several days. These data can then be pooled across the students to construct the full data set. The sampling frame can be obtained if you can convince one local store to supply an inventory list or you can use a shopping receipt from your family or from a student (or a sample of students). This section will probably take at least 2 hours of class time.

This section parallels the earlier discussions in Section 4-3 but focuses on distributions of sample means rather than proportions. It introduces students not only to the Central Limit Theorem for a sample mean but also to t-distributions, t-tests, and t-intervals, so it includes many important ideas. Students work through several technology explorations and you will want to help emphasize the “big picture” ideas. We believe that the lessons learned should be more lasting by having students make the observations themselves rather than being told (e.g., this distribution will be normal). Students will be able to apply many of the simulation and probability tools and habits of mind learned earlier in the course. You will of course need to keep reminding students to carefully distinguish between the population, the sample, and the sampling distribution.

Investigation 4-12 gives students two different populations, one close to normal and the other sharply skewed, and asks them to take random samples and study the distributions of the resulting sample means. Students who have become comfortable with Minitab macros will work steadily through the investigation, but those who have struggled with Minitab macros will move slowly and may need some help. When running the macro on p. 281, it is helpful to execute the macro once and create the dotplots of C2 and C3. If these windows are left open, then when you run the macro more times, Minitab (version 14) should add observations to the windows and automatically update the displays. (It may not work the first time if students try this so this might be better as a demonstration.) Once students get a feel for how the samples are changing and how the sampling distribution is being built up, closing these windows on the fly will allow the macro to run much more quickly. Make sure that students realize that differences in results between the normal-looking and the skewed populations. Once students have made the observations through p. 284, they are ready for the summary, the Central Limit Theorem. We try to emphasize that there’s nothing magical about the “n>30” criterion; rather we stress that the more non-normal the population, the larger the sample size needed for the normal approximation to be accurate. You will again need to decide if you want to present them with the formula s/, and have them verify that it matches the simulation results, and/or go through the derivation in the probability detour. It is important to again give students plenty of practice in applying the CLT to solve problems (e.g., p. 286).

The Minitab Exploration on p. 287 then continues to have them explore coverage properties and to motivate t intervals to replace z intervals. After students make these observations, we always focus on t intervals (instead of z intervals) with sample means. Again, if you are short on time, you may want to streamline some of this discussion, but we also encourage you to use it as a vehicle to review earlier topics (e.g., confidence, critical values, technical conditions). In particular, you can remind them of the commonality of the general structure of the confidence interval, estimate + margin of error.

The applet explorations on p. 294 are useful for providing students with visual images of the intervals while exploring coverage properties and widths (as in the previous investigation) while also expanding the exploration of different population shapes. The second exploration asks them to explore a uniform, a normal, and an exponential population. We want them to review the behavior of the sample and the sampling distribution (and be able to predict how each will behave) and hopefully by the end be able to explain why the sample size does not need to be as large with the (symmetric!) uniform distribution versus the exponential distribution to achieve the desired coverage.

Investigation 4-13 is intended as an opportunity for students to apply their knowledge and to make the natural leap to the one-sample t test-statistic. This is another good study to discuss some of the data collection issues. Also, in this case, the score of an individual game might be of more interest than the population mean and so we introduce the formula for a one sample prediction interval. Be ready for students to struggle with the distinction between a confidence interval and a prediction interval. We do not show them a way to obtain this calculation from Minitab (because we don’t know one!). You should also remind students that the prediction interval method is much more sensitive to the normality condition. We do summarize the technology tools on p. 300 and the t procedures on p. 302. You may want to give students the option of using either Minitab or the applet to perform such calculations. The applet has the advantage of automatically providing a sketch of the sampling distribution model which we feel you should continue to require as part of the details they include in their analyses. The applet also provides the 95% confidence interval. In Minitab, you must make sure the alternative is set to “not equal” to obtain a two-sided confidence interval (we do not discuss one-sided intervals here) but Minitab also allows you to change the confidence level.

Investigation 4-14 introduces paired t procedures as an application of the above methods on the differences. This is a rich investigation that first asks students to conduct some data exploration and to consider outliers. There is an obvious outlier and when students look at the Data Window they find that the products were not actually identical. They can then remove such items (any where the size/brand combination does not match exactly) from the list before the analysis continues. You might want to emphasize to statistics majors especially that this type of exploration, cleaning, and data management is a large component of statistical analyses. While summarizing this investigation, you should emphasize the advantage of using a paired design in the first place.

Section 4-5: Bootstrapping

Timing/Materials: Heavy usage of Minitab is used in this section. Some of these ideas are very difficult for students, so you may want to lead them through this section more than most. If you do not have this enough time in your course, this section can be skipped, and latter topics do not have a substantial dependence on students having seen these ideas.

Many advocate bootstrapping as a more modern, flexible procedure for statistical inference when the model based methods students have seen until now do not apply. They also see bootstrapping as helping students understand the intuition of repeated sampling. Furthermore, instead of assuming a normally distributed sampling distribution, bootstrapping just relies on the “model” that sample obtained looks like the population. In our brief experience in teaching bootstrapping (as an earlier topic in the course), we found it was difficult for students to get past the “sampling with replacement” philosophy and the theoretical details in a short amount of time. We subsequently moved the bootstrapping material to the end of Chapter 4 so that students would already by comfortable with the “traditional” procedures and the idea of sampling distribution. This will help them see how the bootstrapping approach differs while hopefully having enough background to understand the overall goals. In Investigation 4-15, we begin by having students apply the theoretical results to the Gettysburg Address sampling to see that the normal/t distributions are not good models for smaller sample sizes. We provide more pictures/results in this section but you can have students recreate the simulations themselves. Since the “sampling with replacement” approach feels mysterious to many students, we have them take a few samples to see that some words occur more than once and that we are just creating an “infinite” population to sample from that has the same characteristics as the sample. Then we have them verify that the bootstrap distribution has the same shape and spread as the empirical sampling distribution of means. One way to approach bootstrapping is that it provides a way to estimate the standard error of a statistic (like the median or the trimmed mean) that do not have nice theoretical results (based on rules of variance). You can either stop here or you can continue on to p. 311 to apply a “pivot method” to construct a bootstrap confidence interval. The notation becomes complicated and the results are not intuitive, but do help remind students of bigger issues such as the meaning of confidence and the effect of confidence level of the width of the interval. The bootstrap procedure is applied in Investigation 4-16. In Investigation 4-17 the behavior of the trimmed mean is explored, in a context where the mean is to a reasonable parameter to study due to the skewness and the truncated nature of the data. This “strange statistic” demonstrates a key advantage of bootstrapping (as well as the beauty of the CLT when it does apply). We found the 25% trimmed mean performs reasonably well. Carrying this calculation out in Minitab is a little strange but students should understand the commands in (d).

Summary

The Chapter summary includes a table on the different one-sample procedures learned for binary and quantitative data. With these students we like to use different notation (z* vs. z₀) to help them distinguish between critical values and test statistics, often a common source of confusion.