Math 37 - Lecture 8
Describing Data with Numbers, cont.
Variability:
To calculate quartiles:
- Arrange observations in increasing order and locate the median
- The first quartile (Q1) is the median of the observations located below the median of the full data set.
- The third quartile (Q3) is the median of the observations located above the median of the full data set.
Side by Side Stemplots of McGwire and Sosa career HR/season totals
993 |
0 |
1348 |
|
1 |
05 |
2 |
2 |
5 |
9932 |
3 |
366 |
92 |
4 |
0 |
82 |
5 |
|
|
6 |
6 |
0 |
7 |
|
Median and quartiles for McGwire: Sosa:
Interquartile Range (IQR) = Q3-Q1
Range of the middle 50% of the data
Boxplots
Length of box = interquartile range
Line inside the box = median
Whiskers extend to endpoint = max and min
Outliers
Points that don't follow the same pattern as the rest of the data
Examples - an extremely low weight for a rower, an extremely high number of hysterectomies for a doctor, points with unusual values
Would you consider either of last year's homerun totals outliers?
Possible Criterion - One way to check for outliers:
1.5IQR criterion - consider any point more than 1.5 x IQR (or 1.5 boxlengths) from either quartile to be an outlier
Home Runs:
IQR -
Low point for outlier check (Q1-IQR) -
High point for outlier check (Q3+IQR) -
Do you consider any points outliers?
Example: Hysterectomies by Swiss Doctors
20 25 25 27 28 31 33 34 36 37 44 50 59 85 86
Are 85 or 86 outliers by the 1.5IQR criterion?
Modified Boxplot - whiskers extend to last points within 1.5 IQR rule (min and max of non-outliers), plot these outliers separately
Which is best?
A |
50 |
50 |
50 |
63 |
70 |
70 |
70 |
71 |
71 |
72 |
72 |
79 |
91 |
91 |
92 |
B |
50 |
54 |
59 |
63 |
65 |
68 |
69 |
71 |
73 |
74 |
76 |
79 |
83 |
88 |
92 |
C |
50 |
61 |
62 |
63 |
63 |
64 |
66 |
71 |
77 |
77 |
77 |
79 |
80 |
80 |
92 |
a) Calculate the five-number summaries for each distribution
b) Create boxplots (on the same scale) of these three distributions. Do you see any differences among the three distributions?
d) Construct split stemplots for each class. Do you see any differences among the three distributions?
d) If you had not seen the actual data and had only been shown the boxplots, would you have been able to detect the differences in the three distributions?