Math 37 - Lecture 8

Describing Data with Numbers, cont.

Variability:

To calculate quartiles:

- Arrange observations in increasing order and locate the median

- The first quartile (Q1) is the median of the observations located below the median of the full data set.

- The third quartile (Q3) is the median of the observations located above the median of the full data set.

Side by Side Stemplots of McGwire and Sosa career HR/season totals

993

0

1348

 

1

05

2

2

5

9932

3

366

92

4

0

82

5

 

 

6

6

0

7

 

Median and quartiles for McGwire: Sosa:

Interquartile Range (IQR) = Q3-Q1

Range of the middle 50% of the data

Boxplots

Length of box = interquartile range

Line inside the box = median

Whiskers extend to endpoint = max and min

Outliers

Points that don't follow the same pattern as the rest of the data

Examples - an extremely low weight for a rower, an extremely high number of hysterectomies for a doctor, points with unusual values

Would you consider either of last year's homerun totals outliers?

Possible Criterion - One way to check for outliers:

1.5IQR criterion - consider any point more than 1.5 x IQR (or 1.5 boxlengths) from either quartile to be an outlier

Home Runs:

IQR -

Low point for outlier check (Q1-IQR) -

High point for outlier check (Q3+IQR) -

Do you consider any points outliers?

Example: Hysterectomies by Swiss Doctors

20 25 25 27 28 31 33 34 36 37 44 50 59 85 86

Are 85 or 86 outliers by the 1.5IQR criterion?

 

Modified Boxplot - whiskers extend to last points within 1.5 IQR rule (min and max of non-outliers), plot these outliers separately

 

 

 

 

Which is best?

A

50

50

50

63

70

70

70

71

71

72

72

79

91

91

92

B

50

54

59

63

65

68

69

71

73

74

76

79

83

88

92

C

50

61

62

63

63

64

66

71

77

77

77

79

80

80

92

a) Calculate the five-number summaries for each distribution

b) Create boxplots (on the same scale) of these three distributions. Do you see any differences among the three distributions?

d) Construct split stemplots for each class. Do you see any differences among the three distributions?

d) If you had not seen the actual data and had only been shown the boxplots, would you have been able to detect the differences in the three distributions?