Let us begin by emphasizing that there is no "right" way to teach with this book. We hope that Workshop Statistics will prove useful to students and instructors in a wide variety of settings. It can be used as a stand-alone text or as a supplement, with computers or graphing calculators, as in-class work or take-home assignments. Naturally, we think that the book will work best in a classroom environment that promotes the features we extol in the preface: active learning, conceptual understanding, genuine data, and use of technology.
The purpose of these notes is to supply information from our experiences teaching with Workshop Statistics that might prove useful to other instructors. We provide details about how the book has been utilized at Dickinson College, where it was initially developed. We also offer general suggestions for teaching with the book and topic-by-topic comments about the goals of each activity and common student reactions. For the benefit of those experienced in teaching with the first edition, we also point out major changes in the new edition for each topic. This Guide applies to the "generic" (red cover) version of the text, but instructions and comments specific to the Minitab, Fathom, and Graphing Calculator versions will be added soon. You can read through this document sequentially or skip directly to one of the sections listed below:
With 75-minute class meetings on a Tuesday/Thursday schedule, I use one topic for each class period. In a typical class period, I spend five minutes or so introducing the day's topic, collect data from the students (if the day's topic calls for it), and then let them work through the activities. I and a student assistant then spend most of the class time walking about the room, peering over shoulders, checking students' progress, and asking them questions about what they are doing. I try to visit each student several times during the period. Whether students have questions or not, I often ask personalized questions of them as a means of checking their understanding. If I notice a common problem, I might go to the front, ask everyone to stop working momentarily, and discuss the problem. I also put some answers on the board occasionally, allowing students to check their work. (Another approach to providing this feedback is to direct students to the web page of answers to in-class activities.) Finally, I try to spend the last five minutes discussing with the whole class what they were supposed to have learned that day.
To facilitate students' recording information in their books, I recommend that students tear the perforated pages out of their books and place them in a three-ring binder (which I also do myself) from the start of the course. I do try to insist that students write directly in the spaces left in their books and not on scratch paper, for I think it helps students to have the questions and their responses together in one place.
I give three exams during the semester. These are somewhat cumulative but focus much more on recent material; of course, intelligent application of the material in later parts of the course depends largely on understanding earlier material. The exams try to stress understanding and interpretation as well as calculation. I allow students to use their books (and, of course, their responses to its questions) on the exams. Since the classroom has only half as many computers as students, I do not ask students to use computers on the exams. Instead I present them with computer output and ask questions about the interpretation of results. I sometimes devote the class period before an exam to a review session, a rare opportunity for me to lecture about the main ideas and to present analyses of sample activities.
I assign 3-4 homework activities per topic, collecting and grading them once per week. I try to select these so that the major ideas of the topic are covered; I also try to include a mix of problems that can be done by hand vs. with technology. I sometimes ask students to hand in their in-class activities as well, although this practice quickly creates a difficult grading burden. I encourage students to work together on the homework activities, but I require that they write up their comments individually.
I spend as little time as possible showing students how to use the technology, allowing students to concentrate on more substantive concerns. I spend more time on this earlier in the course as they are getting comfortable with the software and much less time as the course progresses. I sometimes use overhead projection equipment to demonstrate something, but more often I just write instructions on the board. In general I prefer students to explore the technology for themselves rather than to watch and mimic what I do.
Every topic begins with a "Preliminaries" section that asks students questions to get them thinking about the issues of the day. These questions often ask students to make guesses about the values of variables. While such questions are not central to learning statistics, I emphasize them for several reasons. They can provide the class with a sense of fun (you might even give a prize to the closest guesser of the day), and they hopefully motivate students' interest in the day's material. More importantly, they can show students that statistics is relevant to everyday life and get them used to thinking of data as more than just detached numbers.
Many topics call for data to be collected at the beginning of class. To protect students' privacy (at least somewhat), I pass around scratch paper, ask them to record the information on them, and have the student assistant collect the strips and either write the data on the board or enter it directly into the computer.
The primary goals of this topic are to get students thinking about data, help them to appreciate different types of variables, and expose them to simple visual displays (bar graphs and dotplots) of a distribution. Substantial changes from the first edition in this topic include asking for different data to be collected in the preliminaries, paying more attention to the definition of variable (1-2a,b) postponing the use of technology until Topic 2, and emphasizing interpretations more (1-4e).
The Preliminaries aim to get students thinking about data not as naked numbers but as numerical information with a context. You might use the first question as an opportunity to explain your goals for the course and the second to emphasize the importance of students' accepting responsibility for being active learners. You also might use question 5 as a model for many questions in the book that ask for students' guesses or intuitions by mentioning that students' actual responses are less important than their taking the question seriously and putting forth sincere efforts.
Except in rare cases, we always collect data anonymously in an effort to avoid potential embarrassments. We usually have students record their data on scratch paper; then we collect it and either write it on the board or have a student assistant enter it into a computer file. Another approach is to pass around a sheet that contains a table in which students report their data, but this loses some of the anonymity. Other approaches to collecting data on students is to ask them to send their information to you before class via e-mail or to set up a form on the Web through which they can enter their data.
Be sure to bring to class rulers with centimeter markings in order to measure heights and lengths of signatures. Questions may arise (for example, does landing in an airport in a state count as having visited it?) that could lead you to discuss some of the thorny issues involved with defining variables carefully and collecting data well.
In case your class is small enough that you would prefer to combine your data with those from other classes, sample data collected on students may be found here.
Activity 1-1 introduces the simple but fundamental principle of variability. You might tell students that while this activity hardly taxes their minds, it does introduce the key idea that permeates the course and the study of statistics.
The definitions following Activity 1-1 are essential for students to understand. We have concluded that in the past we have given short shrift to making sure that students understand the terms "variable" and "observational unit." We have come to believe that these are crucial ideas for students to be comfortable with throughout the course, so more homework activities in this topic are devoted to these terms, and more activities throughout the entire book begin by asking students to identify the observational units and variables in a study. Users of the first edition should note that we use the term "quantitative variable" as opposed to "measurement variable" in this edition.
Activity 1-2 tries to get students comfortable with the definition of variable and distinctions between types of variables. While this is a fairly simple task, it sometimes becomes problematic later when students need to decide whether an inference situation concerns means or proportions. Questions (a) and (b) aim to help students to see the difference between a variable and a summary statistic based on a variable (whether or not a student has red hair vs. number of students with red hair). Question (e) tries to indicate that how one measures the variable determines its type. Emphasizing precisely what variables and observational units are is important here.
Activity 1-3 introduces the bar graph as a simple visual display of the distribution of a categorical variable. It also begins to address the necessity of writing one's conclusions when analyzing data.
Activity 1-4 asks students to tally the results of a data collection. While this is very straightforward, reading tables of tallies (or frequencies) is a crucial skill in later topics. Many students have considerable difficulty with reading tables of tallies correctly. Question (e) also emphasizes the ability to connect verbal descriptions with underlying data.
Activity 1-5 introduces the dotplot as a simple visual display of the distribution of a measurement variable. Question (b) aims to get students to identify personally with their data analysis, and questions (d) reinforces the importance of writing about one's data analysis. We suggest not giving many hints about what kinds of features to look for in a distribution; let them struggle to think of things on their own. We emphasize to students that some responses might be more insightful than others but that there are no clear-cut right/wrong answers here.
Activity 1-6 is the first to present displays based on a larger data set not collected on students themselves. It also provides a first introduction to the important idea of comparing distributions of data. Question (b) previews the concept of median. We advise students not to worry about counting the dots to find the exact median; I'm satisfied with reasonable approximations here but that vagueness frustrates some students.
The homework activities provide practice with variables and observational units (1-7 through 1-13, 1-18) and with some simple graphical analyses (1-13 through 1-17). We have found 1-11 to be surprisingly challenging for students but also helpful in getting them to understand these definitions.
The Preliminaries again consist of a mixture of questions that ask for data to be collected about students and questions that call for predictions from students. Questions 1-3 are relevant to Activity 2-1, but the data collected are analyzed only in homework activity 2-5. Questions 4 and 5 relate to Activity 2-3, questions 6 and 7 to Activity 2-2, and question 8 to homework activity 2-7. Sample data collected on students about these "Preliminaries" questions will appear HERE.
Specific uses of technology that students learn in this topic are entering data, performing arithmetical operations on data, sorting data, creating dotplots, and separating data into groups.
Activity 2-1 introduces students to the use of technology. You may need to spend some time explaining and/or demonstrating how students are to enter data into their software or calculator. The "generic" version simply uses the phrase "use technology to" throughout the book, as in parts (b), (c), and (f). Students are also supposed to begin to appreciate the role of rates as they find that the person with the most points does not have the highest points/letters ratio.
In Activity 2-2 students use technology to create a new variable (% of women) from existing ones, to produce a dotplot, and to sort the results. As this is the first activity requiring students to access data, you will probably need to show them where to access the relevant files. The sorting creates problems for many students, because they often fail to sort the specialty names along with the variables and therefore can not easily identify which specialties have the highest and lowest percentages of women. In (d) we think it's important to emphasize to students that "percentages" are proportions multiplied by 100. It is very important throughout the book that "proportion" refer to a number between 0 and 1 (inclusive), so we expect "percentage" to be between 0% and 100%. Students struggle with questions (h) and (i) as well.
Activity 2-3 aims to give students more experience using technology to manipulate and analyze variables. It also continues to address the "how do we measure this idea" theme of this topic. Questions (f)-(h) continue to emphasize the relevance of using rates rather than raw numbers for comparison purposes.
Activity 2-4 provides another example of the need to think about whether a variable is measuring what it intends to. Most students quickly realize that because many states emphasize the ACT more than the SAT, those with a small percentage of students taking the SAT tend to have high average scores because the few students who take the exam tend to be among the best.
The homework activities ask students to use technology to analyze data collected on themselves (2-5, 2-12) and from available sources (2-7 through 2-11). With the exception of 2-10, all of these activities also concern the usefulness of rates or percentages.
New uses for technology in this topic are to create histograms and to vary their bin widths.
Again there are many questions in the Preliminaries, aiming to get students thinking about some of the issues and data covered in the topic. We reiterate to students that we don't care how well they guess things like a typical weight for an Olympic rower, but we do care that they engage themselves with such questions and respond to them conscientiously. Sample data collected on students about these "Preliminaries" questions may be found here.
Activity 3-1 leads students to develop a checklist of six features to consider when describing a distribution of data. You might want to lead students through this activity with a class discussion to make sure that nobody misses the point. Emphasizing the terminology of right- vs. left-skewness is probably worthwhile, as many students find the terms counter-intuitive.
Activity 3-2 asks students to match up variables with the corresponding dotplots of their distributions. The scales have been removed from the dotplots, of course, so that students have to judge based on shape and features other than center and spread. You might advise students to see if they can construct a meaningful scale that seems to work for their selections. This activity particularly lends itself to working in groups to increase the chances that someone in the group is familiar with baseball and Monopoly. You might want to emphasize that the explanation for students' choices is more important than the choices themselves. Some other issues that have arisen include: the Monopoly sample does not include railroads, the cities for the snowfall amounts were selected from all around the U.S., and "margin of victory" means the difference in number of runs scored between the winner and loser.
An alternative to this activity is to produce dotplots or histograms based on class-generated data and have students match up the graphs with the variables. This could avoid the problem of some students not knowing about Monopoly or baseball, and it might help students to identify more personally with the data. For example, if height were included among the variables, students could look around the room to spot a tall outlier.
Activity 3-3 emphasizes the importance of context when analyzing data. Students should recognize that the dotplot has an outlier and clusters. The important point here is that they should search for causes of the outlier and clusters. Those who know about rowing will know that the outlier is the coxswain, who calls out instructions but does not help with the rowing. The lower cluster of weights all belong to rowers in "LW" (lightweight) events.
Activity 3-4 introduces the stemplot. Some students may not recognize that the easiest way to construct it is to go through the data in the order presented rather than looking for all of the single digits and then all of the teens and then all of the twenties and so on. Be ready for students to ask whether they should list repeated values (such as 13, which appears four times) multiple times; of course they should.
Activity 3-5 asks students to interpret a histogram. Some will struggle to understand how the midpoints presented in the graph correspond to endpoints of the subintervals. Question (b) should say "61 or more," and the answer to question (c) is meant to be "no" since 90 is not a subinterval endpoint. Question (e) asks students to investigate the effect of changing the number of bins (subintervals) in the histogram and to notice how substantially different the resulting graphs appear.
The homework activities ask students to create and to analyze visual displays of data. Emphasizing that students relate their verbal descriptions to the context is crucial throughout these activities, and helping students to acquire the habit of labeling the axes of their graphs is also important. Several activities (3-6, 3-7, 3-9, 3-10, 3-15, and 3-20) provide students with the displays, while many (3-8, 3-11 through 3-14, 3-16 through 3-19) ask the students to create the displays first. Activities 3-8 and 3-10 call for the use of technology, while 3-11, 3-13, and 3-17 are to be done by hand. In other activities, we recommend letting students choose for themselves whether to create graphs using technology or by hand. Activity 3-11 introduces a variation on the standard stemplot known as a split stemplot.
New uses for technology in this topic are to calculate means and medians.
The Preliminaries are briefer than in previous topics, but they again try to generate student interest and thinking about the issues and data for the topic. Sample data collected on students about these "Preliminaries" questions may be found here; the instructor in question was 60 years old at the time.
Activity 4-1 covers the basic calculations of the mean and median. We don't expect students to find the mean or median in (b); in fact, we'd prefer a more creative response. You will probably have to help many students in question (i), where they are to make the jump to the general case for identifying the location of the median. Question (l) is challenging for many students; its point is that current justices have not yet served full terms and so underestimate the mean and median years of service among prior justices who had served full terms.
Activity 4-2 ask students to make guesses about the mean and median of a set of data based on dotplots. The first set is meant to be fairly easy, as the distributions are quite symmetric, just to establish that mean and median really do measure the center of a distribution. These questions also provide your first opportunity to show students how to use technology to calculate summary statistics. In Minitab you might use the "describe" command to display descriptive statistics, in which case you might want to warn students not to concern themselves (for now) with all of the other statistics that are displayed.
Questions (e)-(i) lead students to see how the mean and median relate to each other in skewed distributions. Many students struggle with question (j), which tries to make the point that the mean is not a sensible measure with categorical variables. We believe that re-focusing students' attention on this issue of variable types is important here.
Activity 4-3 leads students to investigate the property of resistance of the mean and median. Question (b) tries to make the obvious but important point that knowing only the mean or median does not reveal anything about the spread or shape of a distribution. Questions (c) and (e) are good examples of questions where we want students to make thoughtful predictions, but we don't care much about how accurate their predictions are. We do care, of course, that they rethink their predictions in (d) and (f) if they turn out to be inaccurate. This activity works particularly well in Fathom, where students can just drag an observation around and watch the immediate effect on the mean and median.
Activity 4-4 aims to help students see an important limitation of measures of center. The moral here is that one often wants to consider the entire distribution and not just a measure of center. While the median readability level of pamphlets equals the median reading level of patients, many patients are left without a single pamphlet at or below their reading level. Question (f) tries to bring this point home in case students have missed it in question (e). Some students, not realizing the importance of the "under 3" and "above 12" designations, will need guidance with (a). You might also be prepared for many students ignoring the tallies in (b) and just treating the distributions as if they were uniform with one observation at each level; this produces median grade levels of 7.5 for patients and 11 for pamphlets. You should also notice that the dotplots are presented in a different order than the data tables (reading level first in data table but second in dotplot), which may cause some confusion.
The homework activities ask students to calculate, interpret, and analyze measures of center of distributions of data. Activity 4-5 tries to expose the common error of not sorting values before identifying the median. Activities 4-6 through 4-8 involve student-collected data; whether students should analyze them by hand or with technology is left to the instructor's discretion. Be aware that Activity 4-9 is confusing to many students who are not sure how to calculate the mean from a frequency table, so you may want to provide some guidance if you ask students to do this by hand. Activities 4-10 through 4-16 all deal with various properties of averages. Activity 4-10 illustrates that knowing a total amount allows one to calculate a mean but not a median or mode. Activity 4-11 reveals that the mean of percentages does not necessarily produce the overall percentage. We particularly like assigning homework problems such as Activities 4-12 and 4-13 that require students to construct examples to illustrate their knowledge of the properties. Activity 4-14 introduces the idea of a weighted average between two groups, and Activity 4-16 leads students to discover the effect of a linear transformation on measures of center. Activity 4-15 serves as a reminder that a single measure of center may not reveal much of interest about a dataset.
In addition to new and updated datasets, substantial changes from the first edition in this topic include the introduction of mean absolute deviation as a measure of spread and as motivation for the standard deviation. Activity 5-7 is also new, attempting to address student misconceptions about spread as related to "bumpiness" and number of distinct values.
New uses for technology in this topic are to calculate standard deviation and inter-quartile range.
Sample data collected on students about Preliminaries questions will appear HERE.
Activity 5-1 introduces the five-number summary, range, IQR, and boxplot. You may want to stress that the IQR is the difference between the quartiles; some students just stop with the quartiles themselves. Some students are confused by what question (h) intends for verification; you may want to ask students to mark the location of the quartiles on the dotplot as a visual confirmation.
You may want to tell students that different textbooks and different technologies calculate quartiles in slightly different ways. Therefore, they should not be alarmed if their hand calculations of quartiles do not match exactly the values produced by their technology.
Activity 5-2 shows students how to calculate mean absolute deviation and standard deviation as measures of spread. Even though many of the steps in the calculations are already provided in the table, many students find these calculations to be challenging and time-consuming. In particular, taking the sum of twelve entries in a column causes problems for some. This activity also provides a good opportunity to talk about rounding in calculations. We stress to students that they should retain as many digits of accuracy as possible in intermediate calculations and round only the final answer. The notation following question (h) intimidates some students, so we try to show them how the notation corresponds with the process they just completed.
We do not recommend spending much time on students' calculating standard deviations. We prefer to let technology handle those calculations so that students can focus on interpretations as illustrated in the following activities.
Activity 5-3 asks students to complete two "matching" exercises. The first is to convince them that IQR and standard deviation do in fact measure the variability of a distribution: more variable distributions produce larger values of these measures. The second is to help students see how a five-number summary relates to a boxplot and particularly to its skewness. Many students have difficulty with the vertical orientation of these boxplots, so you should probably be prepared to explain to them that it's just rotated from a horizontal boxplot and that larger values appear at the top rather than on the right.
Activity 5-4 aims to lead students to conclude that the IQR is resistant to outliers while the standard deviation and range are not. You might want to advise students that they only need to enter the values of years of service into their technology; they need not enter the Justices' names. As with Activity 4-3, this activity works particularly well in Fathom, where the numerical measures change dynamically as students drag a point in a graph.
Activity 5-5 leads students to the empirical rule, which is one answer to the "what does the value of standard deviation tell us" question. Some students may need an explanation of the "plus/minus" notation in question (b). In question (d) you might want to advise students that it's easier to count the scores that do not fall between those two values and then subtract. We try to emphasize that this empirical rule holds only roughly and even at that only for mound-shaped distributions.
Activity 5-6 introduces z-scores, an idea to which students return later (in Topic 15) when studying normal distributions. It provides another answer to the question of how to interpret the value of standard deviation. In (h) and (i) some students struggle with realizing that the higher z-score between two negatives is the one closer to zero.
Activity 5-7 is designed to confront student misperceptions about measures of spread. Many students think that F has more spread than G because its distribution is "bumpier," and many students believe that J has more spread than I since it contains more distinct values. You might want to draw their attention to these misconceptions, after they have completed the activity, in order to help students confront them head-on.
At this early point in the course some students may be tempted to enter the data into their technology by hand, but we strongly encourage you to let them use the pre-entered files that we provide.
The homework activities ask students to calculate and interpret measures of spread. Activities 5-8, 5-10, 5-16, 5-22, and 5-24 definitely require use of technology. The expectation in (a) of Activity 5-8 is that students base their answers on their knowledge of geography. Activity 5-10 tries to combat the misperception that larger values produce more variability than smaller values. Activities 5-14 and 5-21 expose limitations of boxplots as visual displays. Activity 5-16 concerns the empirical rule, while Activities 5-18 and 5-19 involve z-scores. Activity 5-22 is similar to Activity 4-12 in asking students to create example that illustrate their understanding of these measures' properties. Activity 5-24 intends to be a reminder that summaries do not tell the whole story, for the means and standard deviations are identical for all three machines. Activity 5-26 is a follow-up to Activity 4-16 in leading students to consider the effect of linear transformations on measures of spread.
Nevertheless, this is a shorter topic than the previous one. This is also a topic that could work well with in-class activities assigned as homework, if you are pressed to preserve class time. Another approach is to work some of these activities into earlier topics. For example, the side-by-side stemplot of Activity 6-1 could be done with Topic 3, and the modified boxplots of Activity 6-2 could be done with Topic 5.
In addition to new and updated datasets, substantial changes from the first edition in this topic include a new "matching" activity.
New uses for technology in this topic are to produce (modified) boxplots.
Many students find the "top 100 American films" data in the Preliminaries section appealing to work with. Collecting the data does take some time, though, so this may be a good example where collecting data more efficiently outside of class time may be desirable. Sample data collected on students about the "Preliminaries" questions may be found here.
Activity 6-1 tries to accomplish many goals, introducing the fundamental idea of a statistical tendency as well as the graphical technique of the side-by-side stemplot. In our experience students enjoy this activity because they aren't afraid to expose their geographical ignorance. We encourage them to shout out a state about which they are unsure so that we can reach a class consensus. We're careful not to tell them (until after they've completed the activity) that the "answers" concerning states' east/west status appear in Activity 6-12. We do tell students that they should end up with 26 eastern and 24 western states, although we caution them that having this 26/24 breakdown does not guarantee that they have labeled each state correctly.
Questions (e)-(g) guide students to the important concept of a statistical tendency. Since this is a crucial concept that comes up throughout the course, we try to emphasize students' understanding of this idea.
Activity 6-2 introduces both comparative boxplots and modified boxplots. You might want to point out that the data here are already sorted, so finding quartiles by hand is not as difficult as it would otherwise be. Question (a) provides a good example of an activity for which we write the numerical answers on the board and insist that students check their calculations before proceeding. (You could also point students to the answers available on the Web.) Some students need help to understand the description of the outlier test preceding question (d). Question (g) might provide a convenient time to remind students always to write in complete sentences and relate their comments to the context at hand.
Activity 6-3 is another "matching" exercise. This is another especially good one for asking students to work in groups so that they can combine their knowledge of various variables and regions of the country. We emphasize to students that a reasonable explanation is more important than reaching the "right" answer here.
Activity 6-4 follows in the spirit of leading students to more open-ended problems toward the end of topics. You probably want to explain how to use technology to produce comparative boxplots on the same scale in (b). Although the data entry in (c) can be tedious, we think it's appropriate for students to do occasional data entry by hand when it enables them to analyze data specific to themselves.
The homework activities ask students to apply these methods and concepts to analyzing more datasets. Activities 6-5, 6-6, 6-11, 6-12, 6-13, 6-15, and 6-17 are to be done by hand. Activities 6-7, 6-10, and 6-18 require technology. Technology may be used but is not essential for Activities 6-8, 6-9, 6-20, and 6-21. Activities 6-14, 6-16, and 6-19 ask students to interpret graphical information that is provided.
Since its techniques of analysis are so simple, this topic is a particularly good one for letting students work independently at their own pace. We tend to interrupt them less in this topic than in some others, but we do try to make sure that everyone grasps and can explain the finer points such as Simpson's paradox. This topic typically takes students less time to complete than many of the others.
Substantial changes from the first edition in this topic include a stronger emphasis on the concept of independence and a mention of the term relative risk, as well as new and updated datasets.
New uses for technology in this topic are to organize raw data on categorical variables into cross-tabulation tables.
While collecting data from students for the Preliminaries section, it may be worth highlighting that all of the variables being asked about are categorical and that this distinguishes this topic from earlier ones. Sample data collected on students about the "Preliminaries" questions may be found here.
Activity 7-1 is intended primarily to make sure that students understand the relationship between the two-way table and the raw data. We used to take for granted that they understood this, but we found that students had trouble constructing the table from the raw data, so now we think this is an important exercise. The important terms response and explanatory variable are also introduced here and should be highlighted.
Activity 7-2 gives students an extended example in which to learn how to analyze two-way tables. A common error occurs in (h), where many students miss the cumulative nature of the percentages and so just put marks at .379, .499, and .122 rather than "stacking" these. Also in (h) some students put "very much" on the bottom and "not much" on top, which is not the order given in the other age groups and so defeats ease of comparison. You also might want to check especially carefully that students recognize the differences among questions (j)-(l).
Activity 7-3 tests whether students can read proportions (approximately) from a segmented bar graph. Some students will need a helpful nudge to answer question (c).
Activity 7-4 leads students to discover Simpson's paradox. The hypothetical example is contrived so that most students recognize that hospital A is the better hospital despite its lower survival rate because it treats most of those in poor condition, who are naturally less likely to survive than those in fair condition. Some students take for granted the observation that those in poor condition are less likely to survive than those in fair condition, but we emphasize this aspect as well as hospital A's treating most of the "poor condition" patients. When we first used this activity we did not ask question (f), but then we realized that this is the crucial question to see if students understand the phenomenon. Some students are unsure about which hospital they would prefer, but most recognize the superiority of hospital A.
With good students we sometimes ask a follow-up question: If you call home and learn that dear Aunt Milly is in the hospital with this condition, which hospital do you hope she's in? We claim that the answer is hospital B, because that would mean that she's likely in fair condition and therefore more likely to survive. We only ask this of strong students, though, because this is a subtle and much less important point than Simpson's paradox in general.
We also mention to students that while these data are made up, they are realistic: Many small, rural hospitals have higher survival; rates than large, urban hospitals because they handle less difficult cases.
A graphic that we have found helpful for explaining the Simpson's paradox phenomenon will appear HERE.
Activity 7-5 addresses the concept of independence more directly. Question (a) asks students to create a two-way table from raw data, as in Activity 7-1, but using technology this time. If this is not important to you, you may want to save time by providing the table for students. You may also want to update the table to use the current composition of the Senate. In the summer of 2001, following Senator Jeffords' switch from Republican to Independent, there are 50 Democrats and 49 Republicans in the Senate. Ten of the Democrats are women, and three of the Republicans are women.
The homework activities ask students to apply these methods and concepts to analyzing more datasets. None requires the use of technology, although it may be helpful in Activity 7-16. Activities 7-6, 7-16, 7-17, and 7-18 deal with data collected in class. Many of these activities start by asking students to identify explanatory and response variables. Relative risk comes up in Activities 7-12, 7-13, 7-14, and 7-19. The concept of independence is emphasized in Activity 7-25, and Simpson's paradox is relevant in Activities 7-20, 7-21, and 7-22.
We particularly like Activities 7-21 and 7-22 in that they assess students' understanding of Simpson's paradox by asking them to create an example that illustrates it. Students who understand the phenomenon well have little difficulty, but those with a partial understanding find these exercises to be very challenging. Activity 7-20 refers to a very famous historical example; it returns in an in-class activity of Topic 24.
Activity 7-23 address a common misinterpretation of two-way tables. It's tempting to conclude here that the test is worthwhile based on the 63 of 100 cases in which a person predicted to stay actually does stay. That questions (a) and (b) produce the same answer, though, is supposed to convince students that the test prediction provides no valuable information. Many students struggle with understanding what the segmented bar graphs requested in (g) and (h) should look like.
Substantial changes from the first edition are primarily in updated and new datasets.
New uses for technology in this topic are to produce scatterplots and labeled scatterplots.
Be sure to bring rulers to class for question 5 of the Preliminaries section. You might want to mark off heights on the board and have students read off their heights from there. This topic provides a nice reminder that you may prefer to have students report this data to you prior to class so that you can have it compiled beforehand and not spend valuable class time on the data collection. Sample data collected on students about the "Preliminaries" questions may be found here.
Activity 8-1 starts by asking students about observational units and variables. While these ideas are fairly simple, they are crucial to reinforce in students' minds. The activity goes on to introduce the important idea of association and the scatterplot as a visual display of the association between two quantitative variables. You might want to point out that when students are asked for a scatterplot of A vs. B, the intention is for A to be on the vertical axis and B on the horizontal. You might also make students aware that the data tables in the books sometimes put the response variable first (as in this activity) and sometimes put the explanatory variable first (as in Activity 10-1). Question (f) aims to remind students about the idea of a statistical tendency, first encountered with side-by-side stemplots in Activity 6-1, and to point out that association is a tendency.
Activity 8-2 gives students practice with judging the direction and strength of an association from a scatterplot. When we encounter students having trouble with this activity, we often ask them first to distinguish the positive from the negative associations and then to concentrate on the strengths. You might want to tell students that they will revisit this example in Activity 9-1 when they encounter the correlation coefficient as a numerical measure of association. We tell students not to worry if their answers are off by one cell (say, switching moderate negative with least strong negative). Question (b) tests whether students can think more generally about direction and strength of associations. Many of the examples listed appear later.
Activity 8-3 tries to show that with paired data, one can learn much by inserting a "y=x" line on the scatterplot. Some students are confused by the fact that differing scales on the axes prevent the line from appearing at a 45 degree angle. Users of the first edition will note that we call it the "y=x line" and not a "45 degree line" in this edition.
Activity 8-4 gives students the opportunity to analyze genuine data from an important, still fairly recent, historical event. You may want to show students how to use technology to create scatterplots at this point. It may be helpful to remind them here that the convention for plotting A vs. B is to put A on the vertical axis and B on the horizontal. The moral in (c) and (d) is that one loses a great deal of information by discarding flights with no O-ring failures, for all of those flights occurred at relatively high temperatures.
Activity 8-5 shows that one can produce a labeled scatterplot to incorporate information from a categorical variable into a scatterplot.
The homework activities ask students to apply these methods and concepts to analyzing more datasets. Those that require use of technology are Activities 8-6, 8-9, 8-11, and 8-13 through 8-17. Technology would also be helpful in Activities 8-12 and 8-19. Students interpret scatterplots provided in Activities 8-7, 8-8, 8-10, and 8-18. Activity 8-7 is intended as a follow-up to Activity 8-2.
Substantial changes from the first edition include new and revised datasets and a new activity that introduces students to the effect of outliers on correlation through real as opposed to hypothetical data. Some activities have also been reordered: the calculation of correlation now precedes the activity concerning association and causation.
New uses for technology in this topic are to calculate correlation coefficients. Students also use technology for a guessing exercise involving pseudo-random bivariate normal data.
Activity 9-1 leads students to discover the basic properties of correlation. It starts with the same data and scatterplots that students analyzed in Activity 8-2, as it tries to convince students that their impressions about direction and strength of association follow what the correlation coefficient reveals. Notice that students do not work with the calculation of correlation here; they use technology to do the calculations so that they can concentrate on correlation's properties. You might want to go over questions (c)-(f) with the class as a whole to make sure that everyone has the right ideas here; we recommend doing this only after students have thought about the questions and written their own reactions to them. You might also want to make sure that everyone understands the morals of the last two questions: that correlation measures only linear and not curvilinear relationships and that clustered of data may have a strong correlation even when the data within each cluster have a weak association. Users of the first edition will notice that questions on the effect of outliers on correlation has been moved to a homework activity (9-7).
Activity 9-2 does introduce students to the effect of outliers on a correlation coefficient, in the context of variables from the Monopoly board game. Some students get anxious/lazy in (c) and want to go straight to the correlation without producing a scatterplot, but we encourage them to start with the scatterplot and then make a guess for the correlation value.
This is another activity that is particularly well-suited to the dynamic graphical capabilities of Fathom.
Notice that the formula for calculating a correlation coefficient appears at the end of this activity. The formula presented is in terms of z-scores and is not the typical formula used for computational purposes.
Activity 9-3 steps students through (finally!) the calculation of the correlation coefficient. We view this as much less important than understanding its properties, but we want students to see the formula nonetheless. Notice that the book does much of the work for the students, asking them only to fill in a few missing z-scores and cross-products. In question (c) we want students to recognize that the strong negative association causes most positive z-scores for weight to be paired with negative ones for MPG.
Activity 9-4 is one of my favorites and most successful. It guides students to the realization that a strong association between two variables does not imply a cause-and-effect relationship between them. Since the context is so ridiculous, almost no student has any difficulty in seeing that a causal explanation is not appropriate here. We argue that this is one of the very most important concepts for any statistics student to understand, so we try to make sure that every student can explain to me in their own words the moral of this activity.
You may also want to point out that since the association here is very non-linear, the correlation coefficient is not a revealing measure of association for these data. Nevertheless, the point that association does not imply causation is still valid. Students return to these data in Activity 11-3 when they study transformations to achieve linearity.
Activity 9-5 is designed to give students practice judging the value of a correlation coefficient based on a scatterplot. Students seem to have a lot of fun with it as well. Some get into contests with their partners, and others prefer to work together with their partners. We like to go around the room and guess along with the students.
For this activity students need to use slightly more sophisticated technology. We have written a Minitab macro, a TI-83 program, and a Fathom document to generate the "pseudo-random" data. The idea is to generate data from a bivariate normal distribution where the variables have equal means and standard deviations and where the correlation coefficient rho is chosen from a uniform distribution on the interval (-1,+1). A Java applet that implements this activity is available here.
In question (b), students invariably underestimate the value of the correlation coefficient between their own guesses and the actual correlation values. They are pleased to find that the actual value of this correlation exceeds their expectations, often considerably. Questions (f)-(h) lead students to question the validity of judging their guess accuracy from this correlation, though.
The homework activities for this topic ask students to explore further properties of correlation and to apply it to new datasets and situations. Activities 9-6 and 9-7 involve properties of correlation: 9-6 tries to help students realize that the slope of the line does not affect the value of the correlation, and 9-7 provides further experience wit the effects of outliers. Both of these require technology. Activity 9-8 tests whether students can anticipate the direction of an association based on data values without benefit of technology. Activities 9-9 through 9-14, 9-17 and 9-18 all require technology as students apply correlation analysis to various datasets. The distinction between association and causation is the focus of Activities 9-15 and 9-16. You might want to caution students before they work on any of these activities that it is good practice to always look at a scatterplot before computing a correlation.
Substantial changes from the first edition include approaching regression by first considering the mean value of the response variable as its predictor. This helps to lead into the interpretation of r2 as well as motivating regression in the first place. This topic also includes a new activity that cautions students against interpreting r2 without first looking at a scatterplot of the data.
New uses for technology in this topic are to calculate least squares lines, along with resulting fitted values and residuals. Drawing the least squares line directly on the scatterplot is another use of technology in this topic.
Activity 10-1 tries to get students thinking about the basic idea of using a line to summarize the relationship between two variables and the potential usefulness of that idea for making predictions. Questions (a)-(c) aim to convince students that in the absence of further information, the mean airfare is the best prediction to make, but one can use knowledge of distance to make a better prediction for airfare. You might want to indicate that there is no "right" answer in (d) for which line best summarizes the data. Depending on their algebra background and knowledge, many students might need some help with finding the slope and intercept of their line in (g) and (h. In question (i) we vehemently insist that students get used to using variable names rather than generic x and y symbols when writing the equation of a least squares line.
You might want to point out the "y-hat" notation that we use with least squares lines. The carat symbol ("hat") is just to clarify that the line produces a predicted value for the y-variable and not an actual data value. You might also make clear that we use the terms "least squares line" and "regression line" interchangeably throughout the book. Also notice that the formulas given for the least squares estimates of the slope and intercept coefficients are in terms of the means, standard deviations, and correlation between the two variables. We do not provide the computational formulas, as we expect students to use technology to calculate these estimates. We do hope that the formulas given provide some insight into least squares lines.
Students cover a lot of ground in Activity 10-2, where they apply the least squares criterion to the issue of selecting a regression line to fit the data. Some students may need considerable one-on-one help to work with the formulas in (b). You'll want to show students how to use technology to find the regression line in (c). This activity is another case where round-off errors can arise and confuse some students. In (d) and (e) we expect students to use the equation of the least squares line to calculate the predictions, not just to estimate predictions visually. Question (i) aims to warn students of the danger of extrapolation. Round-off errors committed by students can affect questions (j) and (k), which try to illustrate the interpretation of the slope coefficient.
Activity 10-3 introduces the big ideas of fit and residual and leads students to the interpretation of r2 as the proportion of variability in the y-variable explained by the least squares line with the x-variable. We intend students to answer (a) and (b) based on the regression equation but to address (c) based solely on the values given in the table; in this way students should come to better understand the relationship between fitted values and residuals.
Questions (g)-(k) lead students to look at r2 as the proportion of variability in air fares explained by knowing distance. This is a very difficult idea for students to understand. The approach here is to compare the sum of squared residuals from the regression line to the sum of squared deviations from the mean airfare; in other words: compare the squared prediction errors from the line to the squared prediction errors from the mean. You probably want to make sure that students know how to use technology to calculate sum of squared errors in (j).
Activity 10-4 tries to provide a cautionary example illustrating that it is always important to examine visual displays of data. In this instance both regression lines have very similar values of r2, but the regression model is clearly appropriate for one case and not the other.
The homework activities for this topic ask students to apply regression ideas to a variety of datasets. Activities that require use of technology are 10-6 through 10-9, 10-11, and 10-13 through 10-16. Students practice using the formulas for slope and intercept coefficients with Activities 10-5 and 10-10; they are asked to investigate some consequences of these formulas in Activity 10-18. Activity 10-17 is one of my favorites; we have used parts of it on exams as well. We also particularly like part (d) of Activity 10-16, which revisits the issue of association vs. causation.
Substantial changes to this topic from the first edition, in addition to new and updated datasets, include a new activity introducing students to residual plots.
New uses of technology in this topic are to produce residual plots and perform transformations.
Activity 11-1 starts off in (a)-(c) with a review of basic regression ideas that students learned in Topic 8. Question (d) breaks new ground by asking students to look at a scatterplot of residuals vs. longevities. This scatterplot reveals a "megaphone" pattern, indicating that the line better predicts gestation periods for animals with shorter longevities than for animals with greater longevities. Students should realize in (e) that the elephant is an outlier in both variables but does not have the biggest residual. Questions (f)-(h) guide students to find that the giraffe has the largest residual because its gestation period is much longer than expected, but removing the giraffe from the analysis has little effect. Questions (i)-(k) reveal that removing the elephant does have a substantial effect on the analysis, thus introducing students to the idea of an influential observation.
Activity 11-2 is new in this edition, another example of a "matching" activity. You might want to help students realize that the residual plots are just rotations of the original scatterplots, with the rotation serving to make the regression line into a horizontal "mean residual = 0" line. Encouraging students to draw this horizontal line on the residual plots may be helpful to them. Part (b) then asks them to do another matching, this time between the residual plots and verbal descriptions of them. The key point for students to grasp is that patterns in the residual plot indicate nonlinearity in the original relationship.
Good students occasionally ask why residual plots are necessary, since they can see the nonlinearity in the original scatterplot. My typical answer is first that the nonlinear relationship may be subtle enough that it's easier to detect in the residual plot than in the original scatterplot. Perhaps more importantly, residual plots are especially useful with multiple regression involving more than one predictor variable.
Students explore the idea of data transformations, one of the more challenging mathematical ideas to appear in the book, in Activity 11-3. Transformations are a natural answer to the question "what do we do when we spot a nonlinear relationship" that arises in Activity 11-2. Some students will need help understanding how to calculate logarithms in (b). In question (d) you might want to remind students to use "log ( people per tv )" when they write out the regression equation. Question (f) indicates how to interpret the slope coefficient with this log transformation. Students should discover a much better fit with the transformed data than with the original data.
Most of the homework activities for this topic ask students to use technology to analyze data and further investigate these regression issues of outlier, influence, residual plots, and transformations. The only homework activities not requiring technology are 11-13 and 11-15. Activity 11-13 reminds students of the danger of extrapolation, and Activity 11-15 provides a residual plot for students to interpret. Students examine transformations in Activities 11-4 and 11-5. Outliers and influence are the subjects of Activities 11-6, 11-7, 11-11, and 11-12. Residual plots are a focus of Activities 11-9, 11-10, and 11-15. Activity 11-14 helps students to see that inverting solving a regression equation for the explanatory variable does not necessarily produce the regression equation with the variable's roles switched.
Most of the material on randomness in this and following topics is presented in terms of simulations rather than probability rules. We try to help students to develop an intuitive sense for properties of randomness, particularly for the effect of sample size. We also try to relate the study of randomness immediately to concepts of statistical inference, namely confidence and significance, that come later in the course.
The primary goals of this topic are to convince students of the need for random sampling and for students to begin exploring properties of randomness. Substantial changes in this topic from the first edition, in addition to updating the U.S. Senate list, are an activity (12-3) highlighting the concept of bias and one (12-6) illustrating that population size has little effect on sampling variability.
Questions 5 and 6 of the Preliminaries collect some data that students are to analyze in a homework activity, so you might want to decide whether you want to assign those activities before you bother to collect the data. Depending on how e-mail savvy/addicted your students are, you may want to alter the questions, perhaps to cover the last day instead of the last week or the last 4 hours instead of the last 24 hours, in an effort to produce more variability in the responses. Sample data collected on students about the "Preliminaries" questions will appear HERE.
New uses of technology in this topic include generating random samples from a population.
Activity 12-1 simply introduces students to the terms population and sample. You may want to stress that they have studied data from both so far but that most of the rest of the book will focus on analyzing samples in order to infer knowledge of populations. We suggest that you quickly discuss this activity with the class, and make them aware in no uncertain terms that appreciating the distinction between population and sample is absolutely critical to understanding statistical inference, which is the subject of much of the remainder of the course.
Activity 12-2 provides some examples of biased sampling designs; you might want to cover these questions as a class discussion. For the Literary Digest example, we try to get students to identify at least two major sources of bias: that owners of cars and phones during the Depression tended to be more wealthy and Republican and that those who take the time to write in are typically less happy with the status quo incumbent than those who decline to write in.
Activity 12-3 is a new one that aims to help students understand the concept of bias by collecting some data from themselves that illustrate bias. The idea is that when students choose a sample of five Senators whose names are familiar, the samples will most likely contain more experienced Senators and therefore be biased in the direction of overestimating the mean years of service in the Senate. We ask students to answer (b) without looking back at the list, but many students complain that they do not know the names of five Senators. After they have made an honest effort, we tell them to go ahead and peek at the list but that they should only write down names they have heard of. To combine the sample means in (h), we either have students come to the board and mark their mean value on a dotplot or else have them call out their mean value while we enter them into technology. If your results are typical you should find that most students' means exceed the population mean value, often by a lot. We try to emphasize to students that getting a few samples that over- or under-estimate a population value is not evidence of bias, but rather having a systematic such tendency is evidence of bias. You might impress upon students the importance of question (k)- that increasing the sample size does not help to reduce bias.
Having encountered a biased method of sampling Senators, Activity 12-4 asks students to avoid bias by selecting a simple random sample, using a table of random digits, from the 100 members of the U.S. Senate. We don't think you can emphasize strongly enough that this is one of those rare situations in which one actually knows details of the population. We usually describe for the entire class how to read the table of random digits and then let them take their samples individually, encouraging them all to enter the table at different points. You might want to remind students that information for the population is given earlier in the activity. Since it's impossible to find 45% Democrats in a sample of 10, all students should answer "no" to the first part of question (d). In question (e), though, we hope that they recognize that this "no" response does not mean that the sampling method is biased in the sense of systematically favoring one group over another. This should be revealed by the dotplot of class results in (f), where you should find about the same number of sample means on the low as on the high side of the population value. You might want to show students this dotplot and the one from Activity 12-3 on the same scale so that the bias, and lack thereof, is more clear.
We emphasize to students that the definitions of parameter and statistic following question (g) are crucial to understanding the material in the rest of the course. While students usually find question (i) to be easy and routine, we caution them that the distinction trips up many students who do not pay proper attention to it. We think it's also worth repeating at this point that this situation is a rare one in which you know the values of population parameters.
Activity 12-5 continues with random sampling of Senators but lets technology take over the sampling and increases the sample size to 20. You'll want to explain to students how to use technology to do this repeated sampling. One option is to use a Minitab macro that we have written; another option is to use the applet available here. You might need to remind some students to record the sample proportion (not number) of Democrats. In (d) and (e) students should find that the sample means are less variable with the larger sample size, so in (f) they should say that the result of a single sample is more likely to be close to the population value with a larger sample. You may want to emphasize in your wrap-up that while sample size does affect precision, it does not affect bias.
Activity 12-6 has a very specific aim: to help students to realize the counter-intuitive principle that while the sample size directly affects the amount of sampling variability, the population size does not. This, a simple random sample of size 1500 provides just as much information about the population of the entire U.S. as it does about the population of any one state. Questions (a) and (b) try to help students to see this by asking them to take samples of size 5, first from a population of 100 and then from a population of 10,000. This is an extremely difficult point for students to accept, but we feel that it is important to emphasize because one so often encounters national surveys of 1000-1500 people.
The Gallup organization has a wonderful web site where you can find recent survey results and also a very nice description of many ideas related to sampling.
The homework activities for this topic ask students to investigate these sampling issues further. Activity 12-7 emphasizes the definitions of parameter and statistic, while Activity 12-8 asks about whether the sample of students in class is representative of various populations. Activities 12-10 through 12-14 present various studies and ask students to comment on the sampling methods. Activities 12-15 and 12-16 involve using a table of random digits. Activity 12-18 is the only one requiring technology, as it asks students to analyze sample proportions of Democrats under repeated sampling, much as they analyzed sample mean years of service in class. Activity 12-20 raises the issue of people's truthfulness, and Activity 12-21 introduces other common types of nonsampling bias. Activities 12-22 through 12-24 concern another type of bias, while Activities 12-25 through 12-27 present other more sophisticated methods of probability sampling.
The primary goals of this topic are to help students become aware of the need for well-designed experiments and to familiarize students with some of the key concepts and techniques associated with experimental design. We recommend that you consider leading a class discussion through much of this material in order to keep students roughly at the same point and so that you can interject often to clear up common questions.
Many changes from the first edition appear in this topic. First, this topic has been moved to much earlier in the course. It consists of completely new activities, designed to focus students' attention on specific concepts that many find difficult to grasp. The first activity tries to introduce students to the various types of studies and to the virtues of controlled experiments. The second activity concentrates on the difficult concept of confounding, the third on the concept of randomization, and the fourth on blindness. The fifth activity is the only holdover from the first edition, and the sixth activity asks students to study the concept of blocking.
The Preliminaries provide a convenient point at which to remind students that they should be serious and thoughtful in addressing these questions but that they are not expected to be able to answer them as knowledgeably as if they had already studied the topic. For question 5, we have in mind a very small experiment to be done using students as subjects. You should prepare strips of paper containing letter sequences with hyphens separating groups of letters. Half of the strips should say JFK-CIA-FBI-USA-SAT-GPA-GRE-IBM-NBA-CPR, and the other half should say JFKC-IAF-BIU-SASA-TGP-AGR-EIB-MN-BAC-PR. Randomly give out one version to half the students and the other type to the rest. Tell students not to look at the letters until you ask them to, and then give them twenty seconds to memorize as many as they can. Then have them write down as many letters, in order, as they can remember. Finally, have them look at the strip and determine how many letters they remembered correctly before making an error of any kind. Then collect data on the results, perhaps by asking them to write their score on their strip and turn it in. Notice, of course, that the two letter sequences are identical. The conjecture is that students who receive the sequence in convenient three-letter chunks will tend to memorize more letters correctly than those given the other version.
Sample data collected on students about the "Preliminaries" questions will appear HERE.
No new uses of technology appear in this topic.
Activity 13-1 presents four different ways of collecting data to address a question, defines terms related to the data collection strategies, and asks students to identify which studies go with which terms. Anecdotes are included to try to make clear to students that "data beat anecdotes," as David Moore puts it. You probably can not stress this point enough. Question (c) tries to establish that surveys are a step up from anecdotes but only present people's opinions and perceptions. We encourage students to get ion the habit of drawing diagrams as in (f) to describe experimental designs. Questions (g) and (h) are key for recognizing the advantages of controlled experiments over observational studies. We strongly recommend that you emphasize this point through class discussion. Question (i) establishes the principle of comparison as one of the principal ways of achieving control in an experiment.
Activity 13-2 is another example of an activity where we abandon our principle of using real data and instead use hypothetical (but realistic) data designed to make a very specific point. The goal here is to study the concept of confounding, a very difficult one for students to grasp. We suggest that you work through this activity with the class as a whole so that you can explain what's going on every step of the way, but we still recommend that you force students to be answering the questions on their own first before you discuss them.
In (c) students should report that students in the "foreign language" group did tend to do better on the SAT, but in (d) they should offer non-causal explanations for that. For instance, one could surmise that students with stronger verbal ability in the first place tend to be the ones who choose to study a foreign language, so they would do better on the SAT even if the foreign language study had no effect. Questions (e)-(g) ask students to investigate this claim by providing (hypothetical) data on these students' scores on a verbal aptitude test prior to their foreign language study. It does turn out that those with high verbal aptitude to begin with are precisely those who study a language, and so prior verbal aptitude and foreign language study are confounded. You may or may not want to mention the subtle distinction between a lurking and a confounding variable: a lurking variable is simply one that is not recorded and so may or may not be confounding, whereas a confounding variable is one whose effects can not be separated rom those of the explanatory variable. Questions (h) and (i) try to convince students that gender is not confounded with foreign language study because 5 students of each gender studied a language and five did not. (Qian is intended to be a female name.)
Activity 13-3 suggests a solution to the problem of confounding whenever one has the luxury of assigning subjects to treatments: randomization. This activity tries to convince students that randomly assigning subjects to one group or the other really does tend to balance out potential effects of lurking variables. In (c) you should let students know whether you prefer one method of performing the randomization. You might want to have students compare their results in (d) with each other so that they can see the balancing out; you might even have students produce a dotplot of the difference in the group means to see that those differences are roughly centered at zero. Question (e) is an important one, as it tries to convince students that randomization not only controls variables that you can measure but also ones that are too difficult to measure.
Note in Activity 13-4 that we first ask students to identify variables and produce a graphic of the design. We believe strongly that these are good habits to encourage in students. This activity raises the issue of blindness in the context of a very controversial study. We encourage you to lead a discussion of students' reactions to the controversial nature of this procedure.
Activity 13-5 asks students to pull together the three principles of comparison, randomization, and blindness by writing about how all three were implemented in an experiment that they have already analyzed. If you have been leading a class discussion of the earlier activities, we suggest stopping that at this point to let students write about this experiment.
Activity 13-6 addresses another very challenging concept for students to grasp: blocking. The point here is that randomly assigning cars to treatment groups is a reasonable idea, but since MPG ratings differ substantially between small and luxury cars it makes more sense to stipulate that luxury cars and small cars will each have a 50/50 split between the two groups. The sensibility of this plan is seen in that there is much less variation in the group mean differences when one first blocks on car type as opposed to assigning cars to groups completely at random.
The many homework activities for this topic lead students to consider these ideas in greater depth and to apply what they have learned. These activities present descriptions of studies to students and typically ask them to identify the variables involved, describe the design of the study, and comment on potential limitations to the conclusions that can be drawn. Activities 13-7, 13-9, 13-12, 13-13, 13-15, and 13-16 describe observational studies, while Activities 13-10, 13-11, 13-14, 13-17, 13-21, and 13-23 present experiments. Activities 13-10, 13-13, 13-17, and 13-21 ask students to analyze some data, with Activity 13-17 pertaining to the memory experiment from the Preliminaries section. Activities 13-26 and 13-27 ask students to go out ad conduct small-scale experiments. None of these activities requires use of technology, though it could be helpful in Activity 13-17.
New uses of technology in this topic are to perform probability simulations, specifically to simulate observations from a binomial distribution.
We often start by apologizing to students for the context of Activity 14-1 and saying that we hope noone finds it objectionable. Students seem to get a kick out of this version of the famous "matching problem" or probability more than they would if it were presented as, say, a group of men throwing their hats into the middle of a room and selecting one at random.
We try to pass out four index cards to each student before class begins. Most students catch on quickly to how to do the simulation in (a) and (b). The table in (c) has a lot of information, so you might want to collect data on just one of the two variables; we recommend the "all wrong?" variable since its probability is not as straightforward to assess. You can either record the data for (c) on the board as students call out their results, or have them put their results on the board themselves, or you might enter it into your technology directly as they call out their results.
For collecting the data in (f), we ask students to go to the board and put tally marks in the appropriate cell. We often make the column for "3 matches" extremely narrow and ask students to explain why we can get away with that. In (k) and (l) we begin to anticipate the issue of "unlikeliness" that drives p-values and test of significance. We hope that students will say that getting four matches is quite unlikely but getting zero, one, or two matches is not at all unlikely.
Activity 14-2 extends students' analysis of the "random babies" situation to a more theoretical analysis. Since the sample space involves only 24 outcomes, it is not too cumbersome to have students analyze the entire sample space. We do provide a listing of that sample space, though, in order not to slow students down. We let students work through (a)-(d) at their own pace but then interrupt them so that we can compare answers for (d) and make sure that everyone is in agreement there. You may want to emphasize that the analysis here depends upon assuming that the 24 possible outcomes are equally likely, which is what it means to say that the babies are returned to their mothers "at random." You may want to caution students in many situations it is not appropriate to assume equal likeliness and that, in fact, this is a common misconception that leads to serious errors. A particularly egregious example concerns an earthquake "expert" who stated on Nightline that since an earthquake either happens or doesn't, its probability is always 50%.
With luck you'll be able to highlight for students that the empirical probabilities from 14-1 closely approximate the theoretical probabilities in (d). Question (e) is meant to show students that empirical estimates generally get closer to the theoretical probabilities as the number of repetitions in the simulation increases. Questions (f) and (g) get at the idea of expected value. As students work through (f) we try to convince them that the average here is calculated just like always: summing the values and dividing by the number of values. Since there are so many repeated values, though, it's easier to take each value times its frequency and add those up for the numerator. You might show students the algebra to see that this is equivalent to taking each value times its relative frequency of occurrences and diving by the number of repetitions; this provides the analogy to working with the theoretical probabilities in (g).
One of the challenges of teaching probability ideas largely from a simulation standpoint is that students may not recognize when simulation is useful as an analysis tool to approximate probabilities and when it is intended merely as a pedagogical tool to help them to visualize what would happen in the long run. In these activities it serves both purposes.
Activity 14-3 tries to show students that probability refers to long-run outcomes, so in the short run it is very difficult to distinguish among similar probabilities. Many students are reluctant to make guesses in (a) and (c), but we try to insist that they do so just to drive home the point that it's hard and that mistakes occur in the short run. Once the coins have been flipped 50 times, it's pretty easy to tell which is which.
Activity 14-4 leads students into more efficient techniques of simulation while trying to develop more of their intuition for outcomes of chance events. Using the random digit table to simulate boy/girl births comes easily to many students, but others will need some guidance. Even though students typically work in pairs, we insist that partners use different lines of the random digit table here. Combining the results in (c) can be tricky if students have worked at varying paces and so arrive here at very different times. We again have students put tally marks of their outcomes on the board to do this compiling. We like to draw students' attention to (e), where they should realize that a 3/1 split is more likely than a 2/2 split: even though a 2/2 split is more likely than either of the two 3/1 splits, the two 3/1 splits combined are more likely than a 2/2 split.
Question (f) is another good example of our not minding if their initial guess is wrong, provided that they go back and correct it in (g) if they were wrong. We do care, though, that they make a reasonable attempt to think through (f) before doing (g). Some students tend to get lazy and not bother with the initial guess/expectation, but we want them to be thinking about the issue before they conduct the simulation.
Also in (g) you want to be sure that students know how to do this simulation using their technology. Even though it' not asked for, it's good practice to have students create a histogram of their simulation results as well as tallying the results. They should find that a four-child family is more likely to have a 50/50 split than a ten-child family. To help them understand this, you might ask if they literally expect to get exactly 5000 heads if they flip a fair coin 10,000 times.
Activity 14-5 continues students' exploration of sample size and confronts another misconception. Students should see that the hospital with the smaller sample size is much more likely to have days in which 60% or more of its births be boys, while the larger hospital is much more likely to have days with between 41% and 59% boy births. You might emphasize that this reveals that larger sample sizes tend to produce more typical results.
The homework activities for this topic aim to help to increase students' understanding of fundamental ideas of probability and to further prepare them to study statistical inference. Activity 14-6 follows upon the "random babies" activity, and Activity 14-7 asks students to consider when the equal likeliness assumption is reasonable. We have found that students have surprising difficulty with 14-7. Activity 14-10 is a favorite of mine, because more than providing a probability problem to be solved with a counting argument, it introduces the logic of significance testing by asking how likely an observed result (both officers women) would have been under a hypothesis of random selection. Students are asked to perform simulations in Activities 14-11 through 14-15. All of these simulation exercises involve principles of randomness as well, ranging from sample size effects to runs. Activities 14-14 and 14-15 also manage to include real data. Technology is needed for Activity 14-12 and 14-13, a table of random digits for Activity 14-11 and 14-14, and students may use either or a physical device in Activity 14-15.
This topic has undergone many substantial changes from the first edition. Normal distributions are motivated through both real and simulated data that follow a normal curve. The empirical rule and z-scores are emphasized much more strongly to convince students that normal calculations produce reasonable approximations. Real data on birthweights are used to provide students with practice using the standard normal table, and the idea of normal distributions as models for data is highlighted. Another "matching" activity has been added to help students see how closely sample data from populations correspond to the probability models. The algebraic notation associated with normal probability calculations has been greatly reduced from the first edition. Extensive practice using the standard normal table without a context has been eliminated.
Another change is that this study of normal distributions occurs earlier than it did in the first edition. The new "Probability" topic provided a natural segue to normal distributions. In the first edition normal distributions were not introduced until after students studied sampling distributions via simulation.
This topic lends itself to working through with the class as a whole, at least through the basics of reading the table and standardizing. We try to keep the class pretty much together as we work through this topic, as opposed to other topics where students work much more at their own pace. As always, though, we strive to have students work through a question themselves before we present the solution to them. We sometimes ask students to present their solutions for their peers in this topic.
You might draw students attention particularly to questions 3 and 4 of the Preliminaries, as they address statistical issues relevant to working with normal distributions.
Depending on how much you want students to read a table of normal probabilities, a new use for technology in this topic could be performing probability calculations involving normal distributions.
Activity 15-1 tries to motivate the normal model by reminding students that they have encountered mound-shaped distributions both with real data and with simulated sampling distributions. It also highlights the importance of z-scores and introduces use of the standard normal probability table.
In (a) we expect students to comment on the mound shapes and symmetry of the distributions. Question (c) indicates how to read the mean and standard deviation of a normal curve from its sketch. Questions (d)-(k) try to establish that comparing z-scores puts all normal distributions on a common scale, while questions (l) and (m) try to convince students that the probabilities given in the standard normal table really do approximate empirical probabilities from real data that can be modeled with a normal curve.
You should make sure that all students can read the normal table correctly in (l). You might tell them that they will need to use this skill often throughout the remainder of the course.
Activity 15-2 gives students practice with the necessary skills for reading the standard normal table. We insist that they shade in areas under the curve as well as reading the table here, and we also want them to make guesses for the probability based on the shaded area. We try to convince them that this habit helps them to keep calculations straight in their mind when they become trickier, and it also provides a way to check the reasonableness of their final answers. Question (d) involves nothing more than looking up the tabled value. In (e) students should subtract the tabled value for 2.34 from 1, and in (f) they should note that they also could have taken advantage of symmetry by looking up -2.34 directly. Question (g) is trickier, but many will figure out on their own that they need to look up tabled values for the two z-scores and subtract; some will get in the bad habit of subtracting the z-scores themselves rather than their associated tabled values. Question (h) is meant to convince students that the normal model does produce reasonable approximations. One exercise that should have been included here is one where the z-score is off the table and where the probability is therefore less than .0002 or greater than .9998; we recommend that you give students such a problem or at least tell them what kind of answer you would expect in such a circumstance.
Questions (i) and (j) ask students to read the table "in reverse," to find values that generate certain probabilities above or below. Again we insist that they first draw a sketch to get a sense of the situation and what they're looking for. Many students struggle with this, particularly with the algebra of solving for the value once the z-score has been read from the table..
Activity 15-3 tries to show students the relationship between the probability model for a population and the pattern in sample data resulting from that population. With the larger sample size of 100, students have little difficulty in matching up the sample datasets in (a) with the respective populations from which they are drawn. With a smaller sample size of 10 in part (b), this matching is much more difficult. Of course, that's the point: with small samples it's hard to tell the shape of the population distribution from the sample data. This point is an important one in doing statistical inference with small sample sizes, because the t-procedures with small samples depend on having a normally distributed population, but it is very hard to assess the normality of a population based on a small sample.
The homework activities for this topic offer students more opportunities to practice normal calculations and also to discern situations where a normal model is likely to produce reasonable approximations. Activity 15-4 provides practice with determining the mean and standard deviation from a sketch of a normal curve. Activities 15-6 and 15-7 ask students to practice normal probability calculations, with 15-7 being a more open-ended question. The last few questions of Activity 15-9 are challenging for many, as they ask about changing m and s in a production process. Activities 15-11 and 15-12 provide real data with which to compare predictions from a normal model. Activity 15-13 tries to reveal that a normal model can not be appropriate for these coin ages because the distribution must be skewed because the standard deviation is almost as large as the mean and negative values are not possible. Activity 15-14 leads students to discover the empirical rule, and Activity 15-15 introduces the idea of critical values. None of these homework activities requires the use of technology.
We encourage you to make students aware of how critically important the concepts presented in this topic are and how fundamental these ideas are to much of what comes later in the course. We also recommend that you let them know that many students find this material to be fairly difficult. You might try to allay some of their fears by pointing out that they will have multiple opportunities to deepen their understanding of these concepts and that they should focus on the "big picture" with this first pass.
This topic focuses on sampling distributions of sample proportions, and the following topic turns to sampling distributions of sample means. Again we ask students to use simulation as the tool for investigating long-term behavior of sample statistics. we question whether students "see" what we want them to in computer simulations, we favor starting with physical simulations, using candies in this case.
The primary goals of this topic are to familiarize students with the concepts of sampling variability and sampling distribution, to lead them to some important findings regarding the sampling distribution of a sample proportion, and to provide students with a first exposure to the concepts of confidence and significance.
You need to bring Reese's Pieces candies to class for this topic. These candies come in three colors: orange, brown, and yellow. Two or three one-pound bags suffice for a class of 24 or so. You could also use M&M's, but they come in more colors and therefore colors occur in smaller proportions, so the sampling distributions will not come as close to a normal distribution as it does for orange Reese's Pieces with the same sample size. One danger of using candies for this simulation is that while students certainly enjoy eating candy in statistics class, you may need to make an effort to direct their attention to the statistical principles at work here.
As you lead students through the Preliminaries questions, you might point out to students that all of the variables being asked about are categorical. This shift from dealing with quantitative variables indicates that we will work with proportions rather than means as the sample statistic of interest in this topic.
Substantial changes from the first edition include an attempt to introduce the concept of significance as well as that of confidence in this topic. This change is a result of studying sampling distributions of sample means in the following topic, where again both concepts of confidence and significance will be investigated.
There are no new uses of technology in this topic, although it is used more extensively than before for performing more simulations based on the binomial distribution.
Activity 16-1 reminds students of the crucial distinctions between population and sample, parameter and statistic. We suggest that you lead the class through this activity as a group because much of the next activity needs to be done together.
Activity 16-2 leads students through candy sampling, which should be done together since question (g) calls for the pooling of class data. We scoop out more than 25 candies for each student and then ask them to count out a sample of size 25 and then count how many they have of each color. We invite them to dispose of the excess candies in an appropriate manner. You might want to distribute the candies as they enter class in order to save some time.
Questions (b)-(e) are perhaps obvious but nonetheless critical to understanding what statistical inference is all about, so we suggest that you make those answers exceedingly clear to students. For question (g) we go around the room and ask students to report their sample proportion of oranges while we create the dotplot on the board; you could also type the values directly into technology or have students put their values on the board's dotplot themselves. We recommend leading a discussion of these questions with the entire class. In question (m) you might admit that the question is quite vague with phrases such as "most" and "reasonably close" and "some" and "way off," but students are to focus on the big idea and not the details here. You might also tell them that the next activity returns to quantify and formalize these questions.
Activity 16-3 moves to using technology to simulate the same process many more times and much more efficiently. We have written a Minitab macro called "reeses.mtb" to do this, but it amounts to little more than sampling from a binomial distribution. We have also developed a Java applet available here for simulating the sampling of Reese's Pieces that students seem to find appealing.
We don't think you can emphasize enough to students that we have to specify a certain value for the population proportion in order to make the computer run the simulation. Based on years of experience with thousands of Reese's Pieces, we believe that the population proportion of orange candies is slightly less than 50%.
In question (b) students should see a distribution that is roughly normal, that is centered around the actual population proportion of .45. Question (f) is key to the notion of confidence; many students struggle to see that the answer to (f) is the same as the middle percentage in the table of (e). We stress to them that understanding (f) is the key to understanding one of the two critical concepts of the rest of the course.
Questions (i)-(m) investigate the effect of sample size. In question (j) be prepared to warn students that the scale on the display has probably changed, making the difference hard to see, but that the distribution is indeed less spread out than before. Students should discover in (k)-(m) that the larger sample size produces more samples with proportions close to the population value.
Questions (n)-(p) use the empirical rule to introduce the idea of 95% confidence more explicitly. Questions (q) and (r) ask students to verify that the familiar expression for the standard deviation of a sample proportion is reasonable, based on its closeness to their simulated findings. Some students tend to become enamored of this standard deviation formula as if it specifies the entire sampling distribution; we try to remind them that the shape and center of the sampling distribution are just as noteworthy.
Activity 16-4 continues students' study of sampling distributions in a context where the concept of statistical significance is the relevant concept. Many students struggle mightily with the reasoning of test of significance, and this activity is meant as a gentle first pass to acquaint them with that reasoning process.
You might want to ask students to take an on-line ESP test and gather data from the class. One such test appears here, although it offers five choices of shape rather than four. You could discuss why it's probably not reasonable to conclude that the student who achieves the highest score has any particular ESP ability.
The homework activities for this topic provide experience with sampling distributions for a sample proportion. They also extend some of what students learn through in-class activities. Those that require technology for simulations are 16-9, 16-11, 16-12, and 16-15.
Activity 16-5 provides students with lots of practice distinguishing parameters and statistics; many of the examples listed appear later in the book. Activities 16-6 through 16-8 ask students to investigate various aspects of the Central Limit Theorem for a sample proportion: 16-6 and 16-8 reveal that the standard deviation of p-hat is largest when theta is to .5, and 16-7 leads students to see that the standard deviation of p-hat decreases by the reciprocal of the square root of the sample size. Activity 16-9 reinforces ideas related to sample size and sampling variability, using simulation as well as the CLT. Activities 16-10 through 16-12 and 16-15 touch on the concept of significance, while 16-13 and 16-14 concern confidence.
Also as with the previous topic, students investigate the two key concepts of statistical inference- confidence and significance- while studying sampling distributions in the topic. This topic's primary goals are therefore to advance students' understanding of the concept of sampling distribution and the role that it plays in statistical inference, while leading them to a finding about the sampling distribution of a sample mean.
This topic is new to the second edition. We added it because we want give students an opportunity to deepen their knowledge of sampling distributions by studying them for a sample mean as well as a sample proportion. Our hope is that this also leads to a more grounded understanding of inference procedures for means later in the course.
As you lead students through the few Preliminaries, you might want to draw attention to the fact that the variables asked about are quantitative and not categorical.
New uses of technology in this topic is for conducting simulations of quantitative variables from various population distributions.
Activity 17-1 does not exactly follow in our tradition of conducting concrete simulations before computer ones, because we do not provide the actual population of 1000 pennies from which to sample. We do encourage using a table of random digits to conduct the first few simulations, in the hope of making the sampling process somewhat more tangible for students. We recommend that you try to keep students working at roughly the same pace through this activity. Questions (a)-(c) should not be taken lightly: take this opportunity to emphasize that this variable is quantitative unlike the variables studied in the previous topic, that the summary values provided are parameters and not statistics, and that the population distribution is clearly skewed to the right. We suggest that you summarize the moral of question (i) for students: that while tier sample means vary, they roughly center around the population mean, and the variability in these sample means is much less than the variability in the original population.
The transition from (i) to (j) can be confusing, so we urge you to clarify for students that the mean of their five sample means of size 5 can be considered as a sample mean of size 25. The scale provided in (j) is not very convenient because very few sample means will exceed 20; you should choose a more appropriate scale (say from 0 to 20) when you and your students construct this dotplot. Be sure to emphasize the result in (l) that the distribution of sample means looks quite normal even though the population distribution is quite skewed.
Activity 17-2 asks students to turn to technology to simulate this sampling process many more times. We have written a Minitab macro and also a Java applet available here for this purpose. We encourage you to let students work on their own in this activity, but it is more important than ever that you try to reach every group and make sure that they are understanding what their simulations reveal. Questions (i)-(l) try to convince students that the same pattern holds for even more non-normal populations.
Question (m) asks students to plug into the formula for the standard deviation of the sampling distribution of x-bar. While most students find this to be routine, many are confused about which standard deviation from their simulations to compare it to. Part of the difficulty here, of course, is that the phrase "standard deviation" is used for so many purposes- the population standard deviation, the sample standard deviation, the standard deviation of the simulated sample means, and the (theoretical) standard deviation of the sample mean. We resist the temptation to introduce new symbols (such as sigma_x-bar) for each of these quantities and prefer to try to help students learn to articulate in words which they are talking about.
Activity 17-3 tries to pull together what students learn about the sampling distribution of a sample mean and tie it in with concepts of significance and confidence as encountered in Topic 16. Questions (a)-(c) do not refer to the Central Limit Theorem, but we believe that they are important for helping students to see what's going on here. In (d) we want students to comment on the shape and center as well as the spread of that sampling distribution; too often students think that the sigma/sqrt(n) formula is all there is to the CLT. While students tend to get lazy, we think it's important to try to get them to draw a reasonable and well-labeled sketch in (e) since that can help them to visualize what's going on with the rest of the questions.
The homework activities for this topic provide more experience applying what students have learned about sampling distributions and also extending their knowledge. Technology is required for conducting sampling simulations in Activities 17-6; the other homework activities do not require technology. Activities 17-7 through 17-9 ask students to explore the effects of sample size. Activity 17-8 concerns the concept of confidence, while Activities 17-10 and 17-11 involve the issue of significance.
Many students want to be told how to solve these problems without having to understand what they are asking. While it is possible to teach students to solve CLT problems by rote, we resist this and insist that students learn to think through the problems and reason their way to the solution.
You might want to start by reminding students that while we use the term "Central Limit Theorem" in the singular, there is actually a CLT for sample proportions and one for sample means. You might even tell them in advance that Activities 18-1 and 18-2 deal with proportions, 18-3 and 18-4 with means. Activity 18-5 serves as a reminder that the CLT does not apply in all situations and that students should check its technical conditions before applying it. Activity 18-2 pertains to significance and 18-4 to confidence.
Substantial changes in this topic from the first edition include the two new activities (18-3 and 18-4) about the CLT for a sample mean. More attention is also given to the technical conditions under which the CLT approximations reasonable, and a new activity (18-5) cautions students against applying the CLT when those technical conditions are not satisfied.
None of the activities for this topic require use of technology. If you prefer to have students use technology rather than tables to calculate normal distribution probabilities, or if you give them that option, then you will want to have technology available for that purpose.
Depending on how comfortable your students are feeling with this material, you may want to work through Activity 18-1 with them as a group. You might then jump to Activity 18-3 and lead them through that one as a group also, helping them to see that this one concerns a sample mean whereas 18-1 dealt with a sample proportion. We definitely suggest letting students struggle on their own (in groups and with your personalized help, of course) with the other activities, though. Since students do find this topic challenging, we have purposefully kept the number of questions few to allow for more question/answer time on individual bases.
Activity 18-1 gives students practice with calculations involving the CLT for a sample proportion. We encourage students to complete the guess in (d) as a check on their future work. This activity also has them study the effect of sample size yet again. You might emphasize that changing the sample size changes only the spread of the sampling distribution, not the center or shape. Questions (j) and (k) try again to convince students of the counter-intuitive fact that the population size does not enter in to these calculations.
We urge you to allow students to work through Activity 18-2 on their own. It asks them to perform a calculation similar to those in Activity 18-1 and also to interpret what the resulting probability indicates about the statistical significance of a sample result.
Activity 18-3 gives students practice with calculations involving the CLT for a sample mean. You might want to emphasize to students that candy bar weights are quantitative, so the sample mean is a reasonable statistic to consider. While question (a) does not specifically ask for a sketch, it is useful to draw.
Again students investigate the effect of sample size as they practice these calculations. Be prepared in (e) for some students to answer "decrease" because they have observed that in previous questions the probability asked for decreased when the sample size increased. The relevant question is, of course, "probability of what?", so students find in (e) that the larger sample size produces a larger probability even though in the previous two activities the larger sample size always produced a smaller probability. This frustrates some students, but we urge them to start with the fact that larger sample sizes produce less variability and then reason from there. This also provides a good opportunity to remind them of the value of drawing a sketch of the sampling distribution.
The moral of question (g) deserves to be summarized for students: that the calculation for a sample size of 40 would still be valid even if the population distribution of candy bar weights were skewed.
We also suggest that you permit students to work through Activity 18-4 on their own. It asks them to perform more CLT calculations, this time with an eye toward the concept of confidence. In fact, the probabilities asked for throughout this activity all turn out to be the same.
Activity 18-5 aims to caution students against applying the CLT when its technical conditions are not satisfied. The sample size times the success probability (nq) is much less than 10 in this example, so the CLT does not provide a reasonable approximation. This is borne out by the large discrepancy in probabilities between questions (a) and (b). You should probably remind students that when dealing with sample proportions, they also need to check that n(1-q) equals at least 10 as well.
The homework activities give students still more practice applying the CLT. Unfortunately, only Activity 18-11 concerns a sample mean; the rest involve proportions. Activity 18-6 gives students a chance to compare their CLT calculations with earlier simulation results to see that they agree fairly closely. Activity 18-7 asks students to explore sample size effects and is a good example of one probability decreasing and another increasing as the sample size is increased. Activities 18-8 and 18-9 are similar to 18-6 and 18-7 in that they lead students to compare CLT calculations to earlier simulation results and then explore sample size effects. Activity 18-10 addresses significance and 18-11 confidence. Activity 18-14 is a good and challenging one for many, as they are asked about the effect of changing the parameter value rather than the sample size. Activity 18-15 provides reminders to check whether technical conditions are satisfied before applying the CLT.
In addition to the inclusion of new datasets, substantial changes from the first edition with this topic are that activities investigating the effect of confidence level and sample size are integrated into the analysis of actual survey results. Also, the terms "standard error" and "margin-of-error" are introduced earlier in this edition, and an activity cautioning students not to apply the confidence interval procedure when its technical conditions are not met is also presented here. In many ways this topic tries to accomplish what was covered with two topics in the first edition, but our hope is that students have been better prepared for studying confidence and so can move fairly quickly and confidently (sorry for the pun!) through this material.
Another change is that the technical conditions for this procedure have been restated in terms of both the sample size and the sample proportion, providing a more reasonable condition for the normal approximation to the binomial distribution that underlies this procedure. Moreover, the term "technical conditions" is used rather than "assumptions" in the first edition to try to emphasize to students that these assumptions are not made blindly but rather are conditions to be checked.
New uses for technology in this topic include calculating confidence intervals for a population proportion. Technology is also used for an activity that simulates repeated sampling and construction of confidence intervals.
It's a good idea to bring pennies to class for the spinning experiment in the Preliminaries; we've heard that newer pennies spin more consistently than older ones. We check to make sure that students understand that I'm asking for an interval of values in questions 6 and 7; some students surprise me with their inability to recognize that this means specifying a lower and an upper endpoint. We have come to realize that making sure that students understand "interval" is prerequisite to helping them to understand "confidence."
Activity 19-1 steps students through the development of confidence intervals, first by reminding them of the essential ideas of observational unit, type of variable, sample statistic and population parameter. It then has them do a "two standard error" confidence interval based on their knowledge of the CLT and empirical rule. It concludes with the terminology and formula for the more general confidence interval. You may want to work through this activity with the class as a whole so that you can arrive at the results together and answer common questions along the way.
Activity 19-2 then leads students to find critical values from the standard normal table. We point out that for commonly used confidence levels, the book provides the critical values. Thus, we tell students that they only have to find the critical values from scratch if they are working with an uncommon confidence level.
Activity 19-3 asks students to construct a confidence interval for the first time. Some will need help with the mechanics of doing this in (a). In (e) we have in mind looking at the "plus/minus" term from (a); this point probably warrants mentioning to the whole class.
Activity 19-4 introduces students to the term "margin of error" and to using technology to calculate a confidence interval; it also asks students to explore the effect of confidence level and of sample size. If you have done earlier activities in this topic with the class as a group, we recommend letting students work on their own through this one. The first several questions are routine. Questions (f) and (g) may seem odd, but they are meant to dispel the common student misconception that by reporting the half-width they are describing the entire interval. Question (h) is another example of a question where students' impressions may be misguided, but the important thing is that they correct their thinking if necessary when they answer (i) and (j). Students should find, of course, that the interval widens as the confidence level increases. We try to convince skeptical students that this result makes sense, for to be more confident one must allow more room for error.
Question (k) is the first to ask students to use their technology to calculate a confidence interval. If you are using Minitab, you may need to point out that students must first calculate what 22% of 493 is. Questions (l) and (m) have students investigate the effect of sample size on confidence intervals, where they should not be surprised to find that the interval gets narrower as the sample size increases.
Activity 19-5 uses computer simulations to illuminate the interpretation of confidence intervals. Once again it's worth emphasizing that for the purposes of simulation, one has to assume a certain value for the population proportion that would never actually be known in practice. We have written a Minitab macro called "confsim.mtb" to perform the simulation; it relies on generating random data from a binomial distribution. We also have a Java applet available here for this activity. Students' answers to (b) should naturally be in the vicinity of 95%. The point of (c) is to help students see that the samples for which the interval fails to contain the parameter are precisely those whose sample statistic falls far from the parameter value. The answer to (d) being "no", students are to explain in (e) that the procedure generates an interval containing the population parameter 95% of the time in the long run. We encourage students to read carefully the comments on interpreting confidence intervals that follow (e). Since these are subtle ideas, you probably want to spend some time summarizing the points here.
Question (f) can be tricky because many students initially believe that increasing the sample size should produce a larger percentage of successful intervals. Of course since the confidence level remains 95%, they should still find that about 95% of the intervals succeed. Increasing the sample size does make the intervals more narrow, but the same percentage succeed in capturing the parameter value.
Activity 19-6 serves to remind students that data collection is crucial and that a confidence interval will be misleading if the sample was collected in a biased manner. You should let students know whether you permit them to use technology for the calculation in (a). The interval here gives extremely misleading information because the sampling method was so biased against Roosevelt.
The homework activities for this topic provide more experience with calculating, understanding, and applying confidence intervals. None of these activities tells students to use technology, so you should let them know if they are permitted or expected to do so. Activities 19-7 through 19-9 present real data and ask students to comment on sampling issues as well as to calculate confidence intervals. Many students are confused by question (i) of Activity 19-9, which is supposed to convince them that while a sample of July and August baseball games is probably not biased with regard to margin of victory, it would almost certainly be biased if the variable being studied were game-time temperature. Activity 19-10 asks students to work backwards and determine the sample proportion from knowing the interval. Activity 19-11 presents a rare case in which the population parameter is known. Activity 19-14 investigates the effect of sample size, while Activity 19-15 examines effect of confidence level. Activity 19-17 serves as another reminder to consider whether the sampling method is biased. We have often used parts of Activity 19-21 as exam questions.
This topic contains many substantial changes from the first edition. One is that it appears much earlier, directly following confidence intervals for a proportion. Our hope is that this placement helps students to recognize the similarities in the procedure for both situations. Another change is that the development of the procedure briefly mentions the case where the population standard deviation s is known. New datasets also appear with the revised and expanded activities.
The data collection in the Preliminaries raises some interesting issues. You may want to draw attention to the fact that the variables presented are quantitative and not categorical. Again we favor collecting the data anonymously for those students who consider their sleeping habits private. Converting from bedtimes and waketimes to sleeping times is a challenging exercise for some. You may want to record the times in minutes and let technology convert those times to hours later. Sample data collected on students about the "Preliminaries" questions may be found here.
A new use of technology in this topic is to construct confidence intervals based on the t-distribution for estimating a population mean.
Activity 20-1 reviews once again the distinction between quantitative and categorical variables and also between parameters and statistics. It is advisable to lead students through this activity as a group so that you can be together to learn about the t-distribution in the next activity. In question (g) students are supposed to realize that it makes sense to use the sample standard deviation s as an estimate of the population standard deviation s.
We recommend describing the t-distribution to the class and illustrating the use of the t-table with the class as a group in Activity 20-2. You should indicate especially how the t-table differs from its standard normal counterpart: it provides critical values rather than probabilities, the probabilities listed across the top are for areas to the right rather than to the left of the indicated value, and each member of the family of t-distributions has its own row of the table. In question (a) we tell students to just draw the same sketch they've always drawn for a standard normal curve and ust be aware that the t-distribution is a bit more spread out.
In question (e) students should note that the t critical value is greater than the corresponding z critical value, appropriately reflecting the greater uncertainty introduced by approximating s with s. Be sure that all students can do the mechanics of the calculation in (f). Questions (g) and (h) speak to the importance of checking technical conditions before applying the t-procedure. Students tend to ignore the "or" in the condition about a large sample size or a normal population, so (g) is meant to suggest that while the distribution of these ages may well be skewed, the procedure is still valid because the sample size is moderately large. However, the point of (h) is partly that the procedure would not have been valid with a much smaller sample size. You might remind students of their simulation studies about the distribution of the sample mean, which revealed approximate normality even with skewed populations.
As students write their responses to (i), it's probably good to remind them that what they learned about proper, and improper, interpretations of confidence intervals for a proportion applies equally to CI's for a mean. The applet available here illustrates proper interpretation for a mean as well as for a proportion, so you might want to demonstrate that for students even though it is not asked about in the activity.
Activity 20-3 provides students with practice calculating these intervals and also asks them to explore effects of sample size and confidence level. They should again find that a larger sample size and a smaller confidence level produce a narrower interval (all else being equal). You'll want to be ready to tell students that with a sample size of 1199 and so 1198 degrees of freedom, they can use either the 500 or the infinite degrees of freedom line in the table without making much difference.
Activity 20-4 aims to help students develop their intuition for the effects of sample size and sample standard deviation on confidence intervals for a population mean. Many students struggle with these ideas at first, so you might have to prod them to realize that sample 2 (with its smaller sample size and larger variability) would do the worst job of estimating the population mean and sample 3 (with its larger sample size and smaller variability) would do the best job.
Activity 20-5 asks students to use technology to calculate a confidence interval based on sample data collected about themselves. If you recorded the sleeping times in minutes earlier, you want to have students convert them to hours (by diving by 60) in (a). Notice that question (a) asks for a descriptive analysis of the data; you might want to draw students' attention to this as a reminder to start their analyses with graphical displays, numerical summaries, and verbal descriptions. In part (c) you may need to let students know how to get their technology to calculate the interval.
Questions (d) and (e) raise an important issue and address a common misperception. Some students tend to forget what the interval is estimating- in this case, the mean sleeping time of a student at the college the previous night. This interval estimates a population mean and not an individual's sleeping time, so there's no reason to suspect that 90% of the students should have a sleeping time that falls within the interval. We regard this as a much more serious misunderstanding of confidence intervals than some of the subtle shadings of interpretation, as this error reflects a lack of knowledge of what the interval estimates. Be prepared to correct students who claim that the small sample size or the non-normal distribution is what's causing the percentage in (d) to be far from 90%.
The homework activities for this topic provide students with opportunities to develop their ability to apply this confidence interval procedure and also to deepen their understanding of it. Students are asked specifically to use technology in Activities 20-9, 20-13, and 20-18. Calculations are to be done by hand in Activity 20-10. Activities 20-6, 20-14, and 20-19 do not involve calculations. You should make clear to students whether you allow them to use technology for the others, which typically involve a mix of both calculations and interpretations.
Activity 20-7 asks students to practice reading the t-table and to notice some patterns in their findings. Activities 20-9, 20-12, and 20-18 present populations as well as samples so that students can see if their interval succeeds. Activity 20-17 may seem out of place, but it serves to remind students about intervals for a proportion.
You might warn students of the challenges of this topic in your overview. We suggest that you also make clear how important the ideas presented in this topic are and how they provide the foundation for all tests of significance to be studied later. You might also tell them that tests of significance and confidence intervals are the two primary techniques of statistical inference, and you might point out that this topic returns to considering categorical variables and proportions.
Substantial changes from the first edition abound in this topic. The pieces of a significance test are developed more gradually and in the context of a realistic application. All of the activities in this topic are new.
As students answer the Preliminaries questions, you might tell them that they will be analyzing these outcomes more formally in the first activity. If you would like to read students an account of the "which tire" story, check the Chance News web site, in particular edition 5.04. Questions 5 and 6 are one of our favorite in-class data collection questions, and students seem to enjoy them also. Student responses to question 6 for one group were: 10 left front, 23 right front, 5 left rear, and 9 right rear.
A new use of technology in this topic is for conducting a significance test about a population proportion.
Activity 21-1 introduces the reasoning and structure of tests of significance through a step-by-step development. Terminology and notation are introduced along the way. This activity definitely lends itself to working through with the group as a whole since it entails so many new terms and symbols. Questions (a)-(f) are fairly straightforward, but they ask students to remember the CLT from Topic 18 and how to do calculations with it. Question (g) is an important one, making the point that the parameter needs to be clearly described in words and that the symbol alone is not sufficient.
In presenting the "generic" null hypothesis, you might emphasize that a specific value, depending on what's being tested in a given situation, is substituted for theta_0. Similarly, when introducing the alternative hypothesis, be clear that it is to take one of the three forms listed there and that the choice depends on what's being tested. You might stress to students that the hypotheses are stated in questions (g)-(k) before the sample results are even considered.
Questions (l)-(n) are fairly obvious but important to the process. Question (o) was already answered as part of (c), and the sketch requested in (p) was already drawn in (c). You might highlight for students that questions (q) and (r) are doing nothing more than stepping them through a CLT calculation. In the form for the test statistic you may want to point out that the (null) hypothesized value of theta appears in the denominator, as opposed to the sample proportion that appears in the confidence interval formula. This is also true of the technical conditions, a difference between intervals and tests. When you describe then "generic" calculation of the p-value, make very clear that the three forms (a), (b), and (c) are meant to be matched up with the corresponding forms of the alternative hypothesis.
We try to convince students that if they really understand what a p-value means, then they understand the essentials of the testing process. You might want to caution students that the p-value is quite different than p-hat. We also try to impress upon students that the adjectives (little, some, moderate, strong) in the table following question (s) are just guidelines. We do emphasize the "statistically significant" phrase more than some others, and it does reappear often in the book. Thus, you might want to note for students that "rejecting the null hypothesis" at a certain level is virtually synonymous with the "sample result being statistically significant" at that level.
Activity 21-2 asks students to work through an example that involves two significance tests. We think it's advisable to let students work on their own through this one, but recognize that many will struggle and so be especially ready to point them in the right direction. Even though (f) asks students to use technology, you may want to ask them to do this one by hand for the practice and then turn to technology for (i).
Student-collected data on the "which tire" question are analyzed in Activity 21-3. Question (b) is difficult for many students, in part because the variable here ("which tire would you say?") is categorical but not binary. It becomes binary ("right front or not") by the statement of the conjecture, but some students have trouble with this. The instructions in (f) do not specify whether technology should or can be used, so that is up to you.
Activity 21-4 leads students to investigate the effect of sample size on a test of significance. They should discover that a sample proportion of 30% may or may not be statistically significantly greater than one-quarter, depending on the sample size. We do think it's helpful for students to use technology in (b) and (c) to free them from the computational burden so that they can focus on the statistical issue of sample size. We use this activity as the backdrop in our wrap-up of this topic to assert that statistical significance addresses only whether a sample result is unlikely to have occurred by chance. A difference may or may not be important, which is a separate issue from statistical significance. This idea is explored in more depth in Topic 23.
The homework activities offer students opportunities to apply and strengthen their knowledge of significance tests. One thing to be aware of is that many activities (Activity 21-6 is illustrative) do not provide students with a significance level a; we much prefer that they draw conclusions such as "there is moderate evidence against the null hypothesis and to conclude that ..." As always, we insist that students' conclusions be related to the context of the data and problem: conclusions such as "reject Ho" are unacceptable even when a specific significance level has been asked about. Activities 21-5, 21-6, 21-8, 21-9, 21-11, and 21-12 involve two-sided alternatives, while Activities 21-10, 21-13, 21-14, 21-15, and 21-16 call for one-sided tests. No activities specifically call for students to do calculations by hand or with technology, so you should make your expectations clear.
The changes in this topic from the first edition are again considerable. The first two activities, leading students to the t-test for a population mean, are new and present a more gradual and natural progression in a genuine context. The last two in-class activities are retained from the first edition.
The Preliminaries section is very brief, but you may want to point out to students that we are again considering quantitative variables.
A new use of technology in this topic is for conducting a t-test about a population mean.
Activity 22-1 leads students to the t-test. You may want to work through this activity with the class as a group so that you can present the details of the procedure while everyone is on the same page. Questions (a) and (b) lead students to state the hypotheses, (c) and (d) remind them to begin with a visual and descriptive analysis, and (e) and (f) prod students to recall the purpose and need for conducting a test in the first place. In describing the test procedure presented at the end of this activity, you might highlight that the form of the p-value again depends on the alternative hypothesis. You might also remind students that they encountered the t-distribution and t-table earlier and warn them that reading the table for p-values is a bit different than reading it for critical values. It can't hurt to point out yet again how similar this procedure is to its analog for a population proportion. You might accentuate this by showing that the test statistic still has the form: sample statistic minus hypothesized value divided by standard error of sample statistic.
Activity 22-2 gives students practice carrying out the test that was motivated in the first activity. In question (d) be prepared to help students realize that from the table they can only report that the p-value lies between two values (.001 and .005 in this case). We tell students that this is enough information about the p-value to make an informed decision, and we also point out that technology can report the p-value more exactly but that its output should be consistent with their finding from the table. Although the activity does not ask for it, you might want to have students conduct the test using technology as well, so that they can confirm their test statistic calculation and see that the p-value is indeed consistent with their reading of the table.
Activity 22-3 asks students again to investigate effects of sample size and sample variability on the test results. First, though, in question (a), students apply the test procedure by hand to sleeping time data collected earlier. If you have worked through the first two activities with the class as a group, we strongly suggest turning students loose in this activity. You should be prepared in (a) to help students to use symmetry to find p-values from the t-table where they are looking for the probability of being less than a negative test statistic.
Questions (c) and (d) ask for students to use their intuition. By this point most students should be able to reason through these questions correctly, but again the important thing is that they go on to correct their errors if they have mistaken impressions here. Question (e) is the first to ask students to use technology to conduct a t-test, so be prepared to explain how to do that.
Activity 22-4 introduces students to the matched pairs design, asking them to analyze the differences in marriage ages for the couples. By forcing students to calculate the differences by hand, question (a) tries to make students appreciate the paired nature of the data. The remaining questions are fairly straightforward, although (e) does remind students that whether or not the interval contains zero is an important consideration when comparing two groups. Those using newer versions of Minitab will want to be careful in (e) because the alternative must be set to be two-sided in order to produce a conventional confidence interval. You might tell students that in Topic 25 they will be asked to re-analyze these data as if they had been from independent samples, in an effort to help them appreciate the information gained from the matched pairs design.
The homework activities for this topic afford students the opportunity to hone their understanding of t-tests involving a population mean. Activities 22-5 and 22-6 are similar to Activity 20-7 in giving students practice reading a t-table. Activity 22-7 follows up on Activity 22-3 by asking students to conduct two-sided tests and comment on how the p-values compare to one-sided ones. Activities 22-8, 22-9, and 22-17 present only summary statistics and not raw data, but the rest do have raw data available for students to analyze. Matched pairs data are presented in Activities 22-10 through 22-12, and Activity 22-13 asks students for a test about a proportion rather than a mean. Activities 22-15 and 22-16 involve a rare case where the population parameter is known. Only Activity 22-16 specifically requires use of technology, so you should let students know if they are allowed or expected to use technology on other activities.
We encourage you to share these goals of the topic with students before they begin working. Whereas we have encouraged you to do some activities with the class as a group lately, here we strongly advise turning students loose to work on their own (we mean collaboratively, of course) on this entire topic. There are no new procedures to be learned here, so students should be ready to conduct the analyses requested in the activities. We do pay extra attention to making sure that students see and understand the "morals" of these activities. We do this partly by checking individual work often and partly by conducting a thorough class discussion once most students have finished the topic. Throughout this topic we advocate letting technology do the calculations so that students can focus on the grander principles.
If you are pressed for time, you could even assign this topic as homework for students and then summarize the morals in a class discussion taking only a fraction of a class period. We do not recommend this, however, partly because we worry that it could cause students to view these issues as less important than others to which considerable class time is devoted. My view is actually the opposite: that the issues raised in this topic are in many ways more important than the details of the procedures presented earlier.
Substantial changes from the first edition include new and updated datasets and activities. More importantly, the activity (23-4) that leads students to consider the concept of power is new. While some of the other activities have been retained from the first edition, they have been gathered together here from the different topics in which they originally appeared.
You might tell students that the first two and second two of the Preliminaries questions concern issues that they will explore in this activity. The data on cellular phones and credit cards are mentioned only in homework activities, so if you do not intend to assign that you can save time by having students skip the data collection. Sample data collected on students about the "Preliminaries" questions may be found here.
No new uses of technology appear here, although students will use technology extensively for constructing confidence intervals and performing tests of significance.
Activity 23-1 aims to lead students to see the close connection between tests and intervals. Students should find that if the 95% confidence interval contains a certain value, then a two-sided test involving that hypothesized value will fail to reject the null hypothesis at the .05 level. Similarly, values not contained in the confidence interval are rejected as plausible by the test.
This duality does not quite hold exactly for inference about a proportion, because the interval uses the sample proportion whereas the test uses the hypothesized proportion in its calculations. This difference is typically minor, though, and it does not affect the calculations asked for here. We recommend not mentioning this point to students, except possibly for better students whom you believe will benefit from the knowledge.
Activity 23-2 tries to help students see that fixed significance levels should not be treated as sacred. Questions (a) and (b) have very similar sample results but have p-values on opposite sides of .05. Questions (b) and (c) have very different sample results but have p-values on the same side of .05. Many students obtain the correct results but do not trust themselves enough to answer (d) and (e) correctly. The moral is supposed to be that p-values are much more informative than simple statements of "significance" or not.
Activity 23-3 guides students to discover the important distinction between practical and statistical significance. The p-value of the test establishes that the sample proportion is indeed statistically significantly greater than .25, but the confidence interval in reveals that the difference is extremely modest in practical terms. We try to convince students that the test and interval are in complete agreement here: as in Activity 23-1 the test rejects the value which is not in the confidence interval. More importantly, though, the confidence interval reveals more information than does the test of significance. Students should be encouraged to accompany tests with confidence intervals.
Activity 23-4 introduces students to the concept of power, defined as the probability of rejecting a null hypothesis that is actually false. The approach here is to use simulation to develop students' intuition about power, not to lead students through detailed calculations of power. Students should find in (c) that there is a lot of overlap in the distribution of hits in 30 at-bats between a .250 hitter and a .333 hitter, revealing that it is not easy to conclude that a .333 hitter is better than a .250 hitter. In (g) there should be considerably less overlap with a sample size of 100, indicating a much more powerful test with this larger sample size. Questions (h)-(j) try to lead students to see that the significance level and the alternative value of the parameter, in addition to sample size, also affect the power of a test.
We also tell students that this issue of power is why we do not allow them to use the phrase "accept the null hypothesis." Because some tests have very low power, failing to reject Ho does not mean that it should be accepted as true, for it may well be false but the test may have little chance of establishing that it is false.
Activity 23-5 shows students that before collecting sample data, one can determine how large a sample size is needed to obtain a desired level of accuracy and confidence. Students make three common mistakes in (c): they try to use the entire confidence interval formula and not just the half-width piece, they make algebraic errors in manipulating the expression, and they permit round-off errors in intermediate calculations to affect the final result considerably. In (b) and (d) we try to stress the intuitive nature of these questions: to be more accurate you need a larger sample and to be more confident you need a larger sample. We advocate resisting the temptation to do the algebra for the students and give them the general formula for determining the sample size; we like to think that students can figure that out from understanding the expression for confidence intervals in the first place. We insist that students round up to produce an integer for their answers here, since one can not interview a fraction of a person. Question (f) tries to illustrate that the size of the population (as opposed to the size of the sample) plays no role. In question (h) we look for students to respond that one must interview the entire population to achieve perfect accuracy with 100% confidence.
Although it does not arise in this activity, one issue that many students struggle with is that they must supply an initial estimate (guess) for the sample proportion in order to calculate the necessary sample size. You might mention that using .5 as the estimate is the most conservative approach in that it produces the largest possible sample size.
Activity 23-6 uses very contrived hypothetical data to make a specific point. The data are rigged so that despite varying distributions of withdrawal amounts, each machine has exactly the same sample size, sample mean, and sample standard deviation. They necessarily produce identical confidence intervals for the population mean. The moral is that the mean summarizes only one aspect of a distribution and that students should not forget to perform exploratory analyses of data.
Activity 23-7 provides still one more reminder to students to think about the data collection process before applying an inference procedure. Most students do a good job of explaining in (c) that the confidence interval produces a very poor estimate because the sampling procedure was horribly biased. Some students mistakenly argue that this is just one of those 5% of all intervals that fail to include the value of the population parameter. It is very important to correct this impression and to help them realize that even if the sample is gathered in a random manner, 5% of the intervals will fail.
Question (d) is much more challenging for students than in question (c). The point here is that if the population one cares about is the 1999 U.S. Senate, then one knows precisely that 9/100 is the exact proportion of women in the population. In other words, there is no inference to be made since one has studied and described the entire population. This activity is meant to caution students that statistical inference is not valid for all sets of data and depends on having taken a sample (preferably random, of course) from a population.
An exceptionally large number of homework activities are included in this topic. This is appropriate and necessary because students examine so many different and important ideas in this topic. With so many options (and such limited time!), you will want to choose even more wisely than usual to give your students additional experience with these ideas without overburdening them. You may also want to be sure to assign some activities involving proportions and some concerning means. Technology is required in Activities 23-25 and 23-26, but we encourage students to feel free to use it in all other homework activities for this topic as well.
The duality between tests and intervals is relevant in Activities 23-8, 23-16, and 23-18. The folly of adhering strictly to fixed significance levels is studied in Activity 23-15. The distinction between statistical and practical significance comes up in Activities 23-9, 23-10, 23-19, and 23-28. Students investigate the issue of power in Activities 23-13, 23-14, 23-17, 23-20, 23-25, and 23-26. Sample size determination is a goal in Activities 23-23 and 23-24. The importance of examining the entire distribution is emphasized in Activity 23-17, and the question of whether inference is applicable in the first place arises in Activities 23-11 and 23-12. In addition, Activity 23-21 leads students to examine the relationship between one-sided and two-sided tests.
This topic concerns comparing two population proportions between two groups. This is equivalent, of course, to talking about association between two binary categorical variables. Some students struggle with the "comparison" idea because they still do not fully understand what a variable is. For instance, they might regard comparing the proportion of people satisfied with their appearance and the proportion not satisfied as two variables rather than one, so they would fail to see that another variable (such as gender) is needed to formulate a genuine comparison. You might want to try to help students appreciate this point as they answer questions 4 (one variable) and 5 (comparison) and also questions 6 (one variable) and 7 (comparison) of the Preliminaries.
In this topic we again try to present the idea intuitively through simulation before moving on to a formal analysis. Students often have a great deal of trouble seeing the connection between the results of the simulation and the conclusion of significance test. They fail to see the complimentary nature of the two analyses and instead regard them as different and unrelated things. We suggest that you try to drive home the point about this connection by stressing that the purpose of the simulation is to make clear what the p-value of the test actually means.
You need to bring cards with you to class in order to conduct a hands-on simulation of randomization. You could use index cards marked as "success" or "failure," but playing cards are easier to shuffle. It's probably best to have packets arranged ahead of time with 11 of one color (representing "success") and 12 of another color in each packet.
Substantial changes from the first edition for this topic include a real context for the introductory activity and also for the activity that investigates the effect of sample size. A larger organizational change is that this edition combines tests and intervals for comparing proportions into one topic.
A new use for technology in this topic is to conduct inference for comparing two proportions.
Activity 24-1 asks students to conduct a simulation to reinforce their understanding of p-value, and therefore of the entire reasoning process underlying significance tests, in the setting of comparing proportions between two groups. Questions (a)-(c) again set the stage for needing an inference procedure to judge whether the observed results are unusual to occur just by chance. One very nice thing about the context of this psychology study is that it used randomization to assign subjects to group A or group B, so the simulation that students conduct simply repeats that randomization under the assumption that the observer has no effect and so the 11 "winners" and 12 "losers" are just randomly assigned to one group or the other. One unfortunate thing about this context is that the marginal totals are the same for both margins, so some students get confused about the 11 winner vs. 12 losers vs. the 12 in group A and 11 in group B.
To compile the class results in (f) we draw the scale on the board and ask students to come forward and place five dots on the plot corresponding to their fine simulated sample results. Question (i) is a key one, as it asks how many of the simulated results produced sample results as extreme as in the psychology researchers' actual data. You will probably need to proceed slowly and carefully through questions (j) and (k), helping students to see that since their shuffling and dealing essentially assumed no effect of the observer, we will have evidence in support of the researchers' conjecture if this random process rarely leads to results as extreme as they actually found. The actual p-value here is about .03 (as determined from the hypergeometric probability distribution for Fisher's exact test), so you should see some but not many simulated samples with 3 or fewer successes in group A.
We recommend leading students through the information about the significance test presented following this activity. We especially try to highlight the connection between the simulation just conducted and the test procedure, arguing that the test procedure eliminates the need for using simulation to address the issue of how likely it is that such extreme sample results would occur by chance. One item worth drawing particular attention to is the "combined" sample proportion in the denominator of the test statistic. You might tell students that they do not to combine the two groups to calculate this and that merely taking the average of the two groups' proportions only works if their sample sizes are equal. Also note that we have opted for the theta_1 = theta_2 form of the null hypothesis, but you might prefer to use the equivalent form theta_1 - theta_2 = 0.
Activity 24-2 asks students to perform a formal test of significance on experimental data that they have encountered before. We recommend that you let students work on their own through this and the following activities. This activity provides another good opportunity to emphasize the distinction between parameter and statistic, to review the notational differences between the two, and to encourage beginning with descriptive analyses before proceeding to inferential ones. In question (c) some students will be tempted to average the two sample proportions rather than to compute the combined sample proportion; while this distinction is negligible here it can be important when the sample sizes differ markedly. Many students are baffled by question (f), which merely asks them to report the p-value of the test and recognize what it means.
This activity also provides a natural point to introduce confidence intervals for estimating the difference between two population proportion. Since the conclusion in this study is that the sample data provide overwhelming evidence that the population proportions differ, the logical next goal is to estimate the magnitude of that difference. While the calculation in (h) is fairly routine, it takes some students quite a while to punch that through their calculators. Also be aware that some students will need helping understanding the importance of the fact that the interval includes only negative values. The key, we think, is to stress what the interval estimates: the difference in population proportions.
You might want to ask students to consider how the interval would have differed if they had estimated theta_plac - theta_AZT to see if they realize that the interval would have covered positive values of the same magnitude. (This point arises in homework Activity 24-13.) You might also want to point out that the interval includes zero precisely when the test (with significance level corresponding to confidence level) fails to reject that the proportions are equal. With better students you might admit that the previous sentence is not completely true because the test uses the combined sample proportion and the interval does not.
Activity 24-3 once again has students investigate the effects of sample sizes on tests of significance. Students are to discover that the distinction between 71% and 81% satisfaction rates in a sample may or may not be statistically significant, depending on the sample size. This difference of 10 percentage points becomes more and more significant as the sample sizes increase. In (b) and (c) we have in mind that sample sizes of 10 or so in each group would not be very conclusive, while sample sizes of 1000 or so in each group would. While students could perform these tests by hand, we strongly recommend the use of technology to free them to concentrate on the principle involving sample sizes at work here. Minitab users should be sure to click on "use pooled estimate" if you want results to match the formulas presented in the book.
Activity 24-4 is another of our favorites in that it forces students to reconsider a lesson learned much earlier in the course. If you did not assign homework Activity 7-20 earlier, we recommend having students work through that before they proceed to this activity. The significance test in (a) reveals a highly significant difference in sample proportions of acceptance, but the explanation is not discrimination but rather Simpson's paradox. For whatever reason, men tended to apply to the easier programs to get into, and women tended to apply to the tougher programs. This activity again makes the point that the design of a study (in this case an observational study) is a very important consideration when interpreting data.
The homework activities for this topic provide many opportunities for students to develop and apply their knowledge of these procedures. Activity 24-5 presents the results of another simulation study to assess students' understanding of how simulation analyses relate to tests. Activity 24-6 follows up on Activity 24-3 by asking for confidence intervals to accompany the tests performed earlier, and Activity 24-7 tries to make even more clear the phenomenon that large sample sizes can render even small differences as statistically significant. Activities 24-8 and 24-9 ask students to analyze data from a large study of alcohol use on campus; the first activity steps them through a detailed analysis and the second is left quite open-ended. Activity 24-10 includes a reminder to consider the design of the study. Some students complain about Activity 24-11 because it asks for a one-sided test but the sample data actually fall in the opposite direction; we like to think that this reinforces the principle that hypotheses are to be specified before seeing the data, and it gives them experience with a p-value greater than .5. Activity 24-19 confuses some students, as they have to be very careful in defining the variable so that appropriate outcomes are compared. Activities 24-22 and 24-23 call for continued examinations of sample size effects.
Substantial changes in this edition for this topic include new datasets and a real application to motivate and illustrate the test procedure.
A new use for technology in this topic is to perform two-sample (unpooled) t-tests and their accompanying confidence intervals.
Activity 25-1 leads students to consider the need for the two-sample t-test, presents it, and asks students to conduct it and interpret the results. Since summary statistics are presented and students are told that Grisham's 64-word sentence is the only outlier, they should be able to quickly construct modified boxplots by hand in (a). In (b) they should comment that Grisham's sentence lengths to be be longer and more variable than Moore's. Question (c) aims to remind students about the need for a significance test to assess how likely such a difference in sample means would be if in fact the population mean sentence lengths were equal.
Since students have seen many inference procedures at this point, you should not need to say much about the details given following (c). Questions (d)-(f) lead students through the test. You may need to remind some students how to find the two-sided p-value required in (f). Question (g) aims to help students recognize the proper interpretation of a p-value; you might suggest that confused students refer back to the definition of p-value in Activity 21-1.
Question (j) is important for keeping students in the habit of checking technical conditions. Some students will need a reminder that the sample sizes need not be large if the populations have normal distributions. We also try to stress that the common "30" cut-off is just a rule-of-thumb. In this case one sample size exceeds 30 and the other is very close, and while the sample data are skewed slightly to the right, there is little reason to doubt the validity of the t-test here.
Even though the activity does not ask for it, you may want to have students conduct the test using technology as a way to check their work. The activity also does not ask for a confidence interval to estimate the difference in mean sentence lengths, but if you have time you might ask students for that as well.
Activity 25-2 tries to develop students' intuitions about the roles played by sample sizes, sample means, and sample standard deviations in comparing two means. It also introduces them to using technology with these procedures, so you'll want to make students aware of the details for doing that. In summarizing questions (d)-(g), you might want to point out the duality between the test and interval results here. In questions (h)-(j) we intend for students to spot that Barb's sample means differ more than Alex's, that Carl's times are less variable than Alex's, and that Donna has larger samples than Alex. Students should explain in (j) that these factors account for the significant differences found the three commuters other than Alex.
Activity 25-3 asks students to conduct t-tests related to a genuine study and to review the role of randomization in controlled experiments. One unfortunate aspect of the context here is that we do not have the raw data and so can not ask for an exploratory analysis. Questions (a)-(e) lead students to conclude that randomization achieved its goal of producing two groups that were very similar with regard to television viewing before the treatment was imposed. Questions (f)-(h) reveal fairly strong evidence that the two groups differ with regard to television viewing after the treatment was imposed. Again you may want to ask students to produce a confidence interval for the magnitude of the difference.
The homework activities for this topic ask students to deepen their understand of these ideas and techniques and also to apply them to a variety of datasets. Activities 25-4, 25-6, and 25-7 require no calculations, asking students to reason through various circumstances. Activities 25-8, 25-9, 25-10, and 25-25 provide only summary statistics and so are amenable to analyzing without use of technology. Raw data are available for the other activities, enabling students to perform exploratory as well as inferential analyses. Many of these activities build on analyses from earlier in the book.
Activity 25-5 continues the analysis of Activity 25-1 after removing the outlier. Activity 25-22 invites students to perform an "independent samples" analysis on matched pairs data to see the advantage of the pairing. Activity 25-23 continues students' analysis of the hypothetical data for which summary statistics are identical among three groups despite very different distributions. Activity 25-24 has a similar moral about examining distributions. Activity 25-25 pertains to the distinction between statistical and practical significance.
The primary goal of this topic is to introduce students to chi-square tests of independence in two-way tables. We also hope that the topic serves to reinforce students' developing understanding of tests and p-values. New uses of technology in this topic are to conduct chi-square tests and to use simulation to approximate the sampling distribution of the chi-square statistic.
The Preliminaries for this topic ask about some of the contexts for data that appear in the topic, rather than asking about statistical issues. The fourth and fifth questions can be surprisingly difficult for students, though, as they continue to struggle with understanding what variables are and remembering what association means. Accordingly, these questions can be good to discuss with the class as a whole to help to refocus their attention on the issues of this topic.
Questions (a) and (b) of Activity 26-1 remind students about how to read and interpret information presented in two-way tables. Question (c)-(e) motivate the formula for calculating expected counts ("expected" under the null hypothesis of no association between the two variables) by asking students to apply the overall proportion of agreement to each of the three political groups. Question (f) then asks students to confirm that the usual formula produces the same expected counts as (c)-(f). You might want to be prepared to help students understand the notation in the test statistic formula before (h). Question (i) simply tries to force students to think about how the test statistic works, that it produces large values when there are strong deviations from the null hypothesis of no association.
Activity 26-2 provides an opportunity for students to conduct a chi-square test on a 3x3 table. Perhaps surprisingly, the data reveal little or no evidence to believe that the three political groups differ with respect to their opinions about spending on the space program. If you worked through Activity 26-1 with the class as a whole, we recommend letting them work through this activity on their own.
Activity 26-3 aims for two achievements: having students use technology to conduct a chi-square test and showing them that one can identify the type of association by concentrating on the cells that contribute the most to the calculation of the test statistic.
Leading students to recognize the duality between the two-proportion z-test and the chi-square test for a 2x2 table is the goal of Activity 26-4. You might want to emphasize that this duality holds as long as the z-test is two-sided and that no one-sided analog exists for the chi-square test. This is another activity illustrating the importance of technology for freeing students to focus on the conceptual, statistical issues.
Activity 26-5 tries to help students to understand the nature of tests and p-values further by using simulation to examine the distribution of the chi-square test statistic when there really is no association between the two variables. You might want to regard this activity as optional. You should help students to realize that the "no association" hypothesis is achieved here by keeping the marginal totals fixed and randomly "shuffling" the data within those constraints. Technology is crucial for doing this simulation, for which we have written a Minitab macro. Fathom does a particularly nice job with this type of simulation.
The homework activities provide experience with applying and interpreting chi-square tests to real data. Many of these ask student to revisit data that they have analyzed before. Those involving new datasets, such as Activities 26-7 and 26-14, ask for both descriptive and inferential analyses. With the exception of Activity 26-13, which asks students to explore sample size effects, these activities do not direct students to use technology. You should make clear where you permit or encourage or forbid that.
We find Activity 26-6 interesting in that it parallels Activity 26-2 but leads to a very different conclusion. Activity 26-9 concerns the duality with z-tests. We especially like Activity 26-10, which reminds students that inference is not appropriate when the data consist of the entire population, and Activity 26-11, which hopes to remind them that causal conclusions can not be drawn from observational studies.
This topic does not attempt to provide a thorough introduction to regression analysis. It presents a test procedure for whether a correlation differs from zero, a confidence interval method for estimating the slope of a regression model, and a test for assessing whether a regression slope coefficient differs from zero. In other words, the focus is on assessing evidence of an association between variables.
Since it will have been quite a while since students first studied regression, you may want to start by reminding them of what regression is about. The brief Preliminaries should help to achieve this. You might contrast the setting of this topic with that of the previous one by saying that while chi-square tests assess evidence of an association between two categorical variables, correlation and regression tests assess evidence of an association between two quantitative variables. Some instructors prefer not to cover regression descriptively early in the course and instead to present descriptive and inferential analyses together at this point.
New uses of technology in this topic are for simulating sampling distributions of regression coefficients and for producing regression output needed to conduct inferences regarding the slope coefficient.
You need to bring index cards, 16 per student, to class for Activity 27-1, which returns to the familiar theme of using simulation to approximate a randomization test. Questions (a)-(c) direct students through a descriptive analysis that reveals a moderately strong positive association between a team's payroll and its winning percentage. Question (d) then asks students to carry out a simulated randomization test by writing the teams' winning percentages on index cards, shuffling them, and randomly assigning them to the teams and their payrolls. To combine the class simulation results in (e) you might ask students to put their correlation value on a dotplot on the board, or you might type them into your technology as students call them out.
You should find in this activity that it is very rare to achieve a correlation coefficient as large as the one in the actual data. This provides yet another opportunity for you to emphasize that a test and its p-value indicate how often such an extreme sample result would occur by chance alone. In this case students should see, although you might want to help them by stressing this point, that "chance" takes the form of their literally shuffling the winning percentages and assigning them at random to the teams and their payrolls.
Question (i) then asks students to apply a formal t-test, as presented in the box above, to these data. Many students are again confused as to how the simulation and the test procedure relate to each other. You might explain that the simulation is meant here primarily as a teaching tool to show students what question the test addresses. In practice researchers typically go straight to the t-procedure, although it is important to verify that the technical conditions are satisfied. You might add that the simulation type of analysis is also widely used in its own right and that it has the advantage of requiring fewer technical conditions.
Activity 27-2 leads students to explore the sampling distribution of regression slope coefficients and to understand how that relates to the t-procedures for a population slope. This is a fairly long and involved activity, so you might want to lead students through parts of it as a group. One thing to emphasize at the outset is that the data presented are a sample from the population of all students at the college. Questions (a)-(d) produce a descriptive analysis of the data, and question (e) is meant to prod students into realizing that this sample regression line would almost certainly have been different if another sample had been chosen from the population. At this point you might again raise for students the key issue of inference: how unusual would it be to get sample data this extreme if there were no association between the variables in the population?
To address this issue, we ask students to consider data from a hypothetical population in which no association exists between GPA and hours of study. We have written a Minitab macro that takes samples from this population and constructs regression lines for the samples. Students should find in (h) that the dotplot of sample regression slopes has the same general shape as many other sampling distributions: mound-shaped and centered around the population value. Question (i) leads students to see that the slope coefficient in the actual sample is larger than that for all (or at least very close to all) of the simulated samples from a population with no association, so they should conclude that the actual sample does provide strong evidence of an association.
We strongly recommend that you interrupt the class and discuss the t-procedures in the box preceding (k). One aspect to emphasize again is the common structure of both the interval and test procedure as compared to those procedures for other parameters. One slight difference to draw students' attention to is that the degrees of freedom are n-2 in this situation. Questions (k)-(q) asks students to apply the t-procedures to the sample data. Again you might need to emphasize that the simulation analysis was primarily an educational tool and that the t-procedures provide the formal analysis and can be applied without conducting a simulation first. You might also remind students of the distinction between statistical and practical significance, because while the sample regression slope differs significantly from zero, it is nonetheless not large at all. In other words, these sample data provide strong evidence of an association, but that association is likely not a strong one.
Activity 27-3 aims to help students recognize the importance of the technical conditions and the usefulness of residual plots for checking them.
The homework activities provide students with opportunities to
apply these inferential techniques for correlation and regression.
All require use of technology, and all ask students to revisit data that
they have analyzed in simpler ways previously. Activities 27-5 and
27-7 concern t-procedures for a correlation coefficient. Activity
27-9 asks students to conduct a simulation analysis of correlation as in
Activity 27-1. Activity 27-8 leads students to revisit the issue
of influential observations, and it also asks students to conduct a test
about a population slope value other than zero. Activity 27-13 focuses
on the construction and interpretation of residual plots. Many of
these homework activities, 27-10 and 27-15 in particular, leave their questions