Workshop Statistics: Discovery with Data and Fathom

Topic 3: Displaying and Describing Distributions

Activity 3-1: Features of Distributions

(a) center of distribution
(b) variability or spread of distribution
(c) shape of distribution
(d) 2 distinct clusters of scores
(e) outliers (one low, one high) that aren't with the rest of the data
(f) granularity (data at fixed intervals), here data occur in multiples of 5

Activity 3-2: Matching Variables to Dotplots

(a) - (3) because the values contain no repeats and are fairly evenly spread out
(b) - (6) because some cities have zero snowfall, and there is a lot of variation among the others
(c) - (5) because the values are at regular increments, and there are mostly ones and twos with a gradual drop-off beyond that
(d) - (4) because the increments are fairly regular and there are many prices with two properties at the same price
(e) - (2) because there are many different values (large variability) and more repeated values than dotplot 7 (weights reported to the integer)
(f) - (1) because the values are slightly skewed to the right, but with a concentration at the lower end.  The skewness to the right makes sense, expect a few mothers to be a fair bit older than average, but not as many to be much younger than average.
(g) - (7) because there are a wide range of values with slight skewness to the left. Makes sense that might have a few cars that a very light, but not as many cars that are extremely heavy. Less granularity than dotplot 2 (larger number of distinct values).
(h) - (8) because the scores are skewed left, most of the students were at the high end, perhaps scores in the 80’s and 90’s, with a few students lower in the distribution, not scoring well on this exam

Activity 3-3: Rowers' Weights

(a) Four rowers weigh 195 pounds. This value has the tallest stack of dots on the dotplot.
(b) The shape of the distribution is skewed to the left, with the center around 195 pounds.  The spread is from 120 to 230 pounds, with 2 clusters and an outlier near 120 pounds.
(c) There is one cluster around 150-160. If we look at the events of those rowers, there is an LW designation each time.  These rowers participant in "lighweight" events which require them to weight below a certain amount (165) on race day.  The upper cluster are not in lightweight events and there is no upper bound for how much they can weigh.
(d) The apparent outlier is Segaloff, the coxswain, who calls out instructions but does not row and is therefore light so as to add little extra weight to the boat.

Activity 3-4: British Rulers' Reigns

(a) 63 years, Victoria
(b) 0 years, Edward V.  This ruler must have ruled for less than 6 months, which was rounded down to 0 years.
0| 9026536791
1| 3907332305
2| 10224255
3| 555983
4| 4
5| 609
6| 3
0| 0123566799
1| 0023333579
2| 01222455
3| 355589
4| 4
5| 069
6| 3
(e) The distribution of lengths of reign of British rulers ranges from 0 to 63 years, and is skewed to the right.  A large cluster of rulers reigned under 40 years.  There is a small cluster of reigns 50-some year reigns, and no major outliers.

Activity 3-5: Geyser Eruptions

(b) Answers will vary from student to student
(c) This group contains 24 eruptions with intereruption times between 47.5 and 52.5 minutes.
(d) 39 of the intereruption times lasted more than 82 minutes.
(e) No, since 90 is not a starting point of a histogram's interval.
(f) There seem to be two clusters, one in the 50's and one in the high 70's and low 80's (minutes). Turns out the duration between eruptions depends on whether the previous eruption was long or short.
(g) The different subinterval widths change the histogram's appearance dramatically. With 5 subintervals the two clusters are not apparent, and with 20 subintervals the distribution looks very jagged. The most informative picture is probably the histogram with 10 subintervals.
(h) The following histogram shows us that exactly 9 eruptions had intereruption times of at least 90 minutes: