Math 37 - Lecture 5

Describing Data

13.0 4.9 13.3 14.5 11.0 10.0 14.3 12.6 18.6 10.0 12.6

11.4 12.5 12.6 15.2 13.7 12.6 11.4 13.9 11.3 14.2 12.4

12.4 12.3 13.9 13.1 13.9 11.4 11.9 13.7 10.9 13.4 12.5

14.5 13.4 13.5 13.6 15.9 15.7 12.0 14.4 12.5 10.2 8.8

12.0 11.1 11.6 15.3 13.3 11.1

Want to summarize the information in our data set.

To present info:

- Identify source of data

- Identify variables and the types of variables

Categorical=observations are put into categories

Quantitative=observations are numerical values

- Identify units

- Label everything

I. Tables - Summarize the information, well-labeled.

II. Graphs - See the overall pattern, deviations. Avoid overclutter, misleading information.

Categorical Data

1. Bar graph - comparison to other groups (p. 185)

2. Pie Chart - comparison to the whole

Quantitative Data

1. Dotplots (see Lab 2)

2. Stem and Leaf Plots

a) Sort the data

b) Use the first part of the data as the stem

c) Write the stems vertically

d) Use the last part as Leaf (sometimes truncate)

e) Arrange leaves in increasing order

Example Length of time of Hitchcock movies

119, 105, 120, 120, 116, 108, 120, 130, 136, 103, 116,

108, 113, 132, 81, 108, 111, 101, 103, 126, 117, 128

In one of these movies, the action took place in one room and it was filmed without editing, can you identify it?

Advantages

- Quick visual picture of data (center, deviations)

- See actual numerical values

Disadvantages

- Best for smaller data sets (100)

- Can give poor picture

- Can split stems or truncate to change level of detail

3. Histograms

a) Divide data into classes

Each data point belongs to exactly one class

Convention: Include left hand endpoint and not right

b) Count number of observations in each class, frequencies

c) Area of each block is proportional to frequency

Easiest: Classes of equal length, then height=frequency

Example Weights of the 1996 Men's Olympic Rowing Team for 7 rowing events (single, lightweight double, and quadruple sculls, pair, four, lightweight four, eight with coxswain)

154 224 214 195 160 155 195 205 195 195 200 210 210 205 200 215 220 205 210 160 160 158 208 121 207 207

Can you identify distinct groups of rowers?

Advantages

- No set width of class intervals

- More flexibility than for stem and leaf plots

- May have natural classes

Disadvantages

- Don't see actual data values

- Take a little more time by hand

Look for

- Central location

- Variability

- Skewness vs. symmetry

- Patterns and deviations