Math 37 - Lecture 25
Regression
Example Anscombe data: correlation coefficient
If graph shows a linear relationship, can we model it?
Goal: Model overall linear pattern found in the scatterplot.
A regression line is a straight line that describes how the response variable varies with the explanatory variable. Also, allows us to predict the response from the explanatory variable value.
Example Manatees: Can we summarize the linear relationship between the number of boat registrations and number killed?
Explanatory variable=
number of boat registrations
Response variable=
number of manatees killed
Scatterplot showed a strong, positive, linear association

Equations for a Line: y=a+bx
y=height on vertical axis; a=intercept (height when x=0); b=slope (rate of change in y as x changes); x=position on horizontal axis
Example:
=a+bx= (killed)= -41.4 + .125(registrations)
To add regression line to plot, pick two values of x (far apart), find the predicted values,
, and draw a line through the two points.
Residual=observed y - predicted y = y - ![]()
The Least Squares technique minimizes the squared residuals
Fitting a line:
One Technique - Least Squares Regression
Def: Minimizes sum of the squared vertical distances from the line
Calculation: b= r sy/sx a=
- b![]()
Example Manatees, r= ,
= , sx= ,
= , sy=
If x=585, predict killed. Observed
Example Predicting NBA attendance
If there a highly associated variable?

Scatterplot:
Correlation Coefficient =
=18.57,sx=3.87,
=15710, sy=3570
Regression Line:
Outliers?
Are any teams unusual, e.g. have a large residual?
Can you give an explanation?
What if you refit the model without these observations?
Unusual Observations
Need an explanation before can remove the point from the data
- Often have small residuals
- Often occur for observations with extreme x values.
- Should point be included?
Are the influential? How decide?
R2 value = % of variation in y explained by regression on x
i.e. Does x do a good job of telling us why y varies?
Example For Manatees, R2=88.6% of the variation in #killed is explained
- Boat registrations does a lot to explain increase in manatee deaths
- Prediction of manatees killed from number of boat registrations is reasonably accurate
Cautions
- Only linear dependence
- Extrapolation: Very cautious when predicting for values of explanatory variable outside the range for which have data