Math 37 - Lecture 24

More than one variable (Ch. 2)

Before usually just had one variable of interest (e.g. Mean time difference, Proportion who favor Coke). What if we have two (bivariate) or more variables? We can describe them individually, but more importantly, examine the relationship between them.

Examples: Are they related?

- SAT scores and freshman GPA - Sex and Degree

- Weight and Height - Smoking and Cancer

Review - Variable types, explanatory vs. response variables

GRAPHICAL SUMMARIES

- Frequency Tables, Segmented Bar Graphs are used to describe the relationship between two categorical variables.

- Side by Side Boxplots can describe patterns between one categorical and one quantitative variable. (e.g. two-sample t-tests)

- Scatterplots are used to display the relationship between two quantitative variables.

One variable on horizontal axis, one on vertical. Measure both

variables on each individual. Each individual appears as one point in the plot. Can use different symbols (tags) to show the effect of a categorical variable. If there is an explanatory variable, always put the explanatory variable on the horizontal axis.

Example Manatees are a large, gentle sea creature living along the Florida coast. Many manatees are killed or injured by powerboats.

Explanatory Variable =

Response Variable =

Scatterplot: How describe the relationship?

Interpreting Scatterplots:

1) Look for overall pattern

direction form strength

2) Look for deviations from overall pattern

Outliers - any individual observation that falls outside the overall pattern of the graph.

Direction:

Positive Association: Two variables are positively associated when high values on first variable occur with high values on second variable, and low values occur with low values.

e.g. Students with higher SAT scores tend to have higher frosh GPAs

Negative Association: Two variables are negatively associated when high values of one variable occur with low values of the other, and vice versa.

e.g. People who smoke tend to have shorter life spans.

Overall Pattern: To describe a scatterplot, state the direction (positive or negative), form (is it linear?), how strong the relationship appears (how large is the scatter), and identify any outliers.

Problems with Scatterplots

- Changes in scale can drastically effect the picture presented.

 

NUMERICAL SUMMARY

If the relationship between two quantitative variables is linear,

the correlation coefficient measures the strength and direction of the relationship.

Notation: Let x1,...,xn be n observations for the first variable.

Let y1,...,yn be n observations for the second variable.

, sx are the mean and standard deviation for variable 1

, sy are the mean and standard deviation for variable 2

Formula:

 

 

 

 

 

 

Example Is there a relationship between the average weekly household spending, in British pounds, on tobacco products and alcoholic beverages for each of the 11 regions of Great Britain?

Region

Alcohol

Tobacco

North

6.47

4.03

Yorkshire

6.13

3.76

Northeast

6.19

3.77

East Midlands

4.89

3.34

West Midlands

5.63

3.47

East Anglia

4.52

2.92

Southeast

5.89

3.20

Southwest

4.79

2.71

Wales

5.27

3.53

Scotland

6.08

4.51

N. Ireland

4.02

4.56

 

 

Properties:

Example Predict the correlation coefficient between Math 37 students' midterm 1 and midterm 2 scores. Interpret this value.

Example Was there a relationship between the draft lottery number assigned in 1970 and the males' birth date?

Example For 22 countries, recorded their life expectancy and the number of people per television set in each country.

 

Country

Life exp

Per TV

Country

Life exp

Per TV

Angola

44

200

Mexico

72

6.6

Australia

76.5

2

Morocco

64.5

21

Cambodia

49.5

177

Pakistan

56.5

73

Canada

76.5

1.7

Russia

69

3.2

China

70

8

S. Africa

64

11

Egypt

60.5

15

S. Lanka

71.5

28

France

78

2.6

Uganda

51

191

Haiti

53.5

234

UK

76

3

Iraq

67

18

US

75.5

1.3

Japan

79

1.8

Vietnam

65

29

Madagascar

52.5

92

Yemen

50

38

 

(a) Does there appear to be an association between the two variables?

MTB > corr c1 c2

Correlation of Life and TV =-0.804 -.922

(b) What would you tell someone who concludes that sending TVs to countries with lower life expectancies will cause their inhabitants to live longer?

Association vs. Causation

Correlation measures association.

CORRELATION DOES NOT IMPLY CAUSATION!

- May be a third variable influencing both of yours

- May be a lurking/confounding variable

- To determine causation, you must

Example

x

10

8

13

9

11

14

6

4

12

7

5

y

7.46

6.77

12.74

7.11

7.81

8.84

6.08

5.39

8.15

6.42

5.73

- Calculate the correlation coefficient (find mean and SD for x and y)

- Construct the scatterplot, describe the association