Workshop Statistics: Discovery with Data and Fathom

Topic 10: Least Squares Regression I

Activity 10-1: Airfares

(a) Answers will vary from student to student. Might choose the mean, 166.92.
(b) Answers will vary from student to student, but some examples would be which airline you choose to fly with, or the distance to the destination city, how far in advance you book.
(c) Based on the scatterplot, knowing the distance would be useful because there appears to be a fairly strong association between distance and airfare.
(e) Answers will vary from student to student, but $130 is a good estimation.
(f) Answers will vary from student to student, but $260 is a good estimation.
(g) Answers will vary from student to student, but using our answers from (e) and (f), slope = (260-130)/(1500-300)=13/120=.10833.
(h) Answers will vary from student to student, but using our answers from (e) and (g), intercept = 97.5.
(i) Answers will vary from student to student, but using our answers from (g) and (h), airfare = 97.5 + (.108) * distance.
(j) Answers will vary from student to student, but here is a possible line:

        The equation for this line is airfare = 81 + .122 * distance

(k) Answers will vary from student to student.
(l) Answers will vary from student to student, but the sum of squared residuals for the above line is 14370.
(m) Answers will vary from student to student.
(n)

        airfare = 83 + .117 * distance is the equation for the Least-Squares line.  The sum of squared residuals for this line is 14310.
 
 

Activity 10-2: Airfares (cont.)

(a)
 
mean
std. dev.
correlation
airfare (y)
166.9
59.5
.795
distance (x)
713
403

(b) b =.795(59.5/403) = .117;  a = 166.9-.117(713) = 83.479
(c) airfare = 83.479 + .117 * distance; this agrees exactly with Fathom's equation (except that Fathom rounded 83.479 to the nearest integer)
(d) 83.479+.117(300) = $118.58
(e) $258.98
(f)

(g) Answers will vary from student to student, but a good estimate would be $190.
(h) $188.78
(i) $415.99;  This is probably not a reliable estimate since a distance of 2,842 miles is well beyond our data set.
(j)
distance
900
901
902
903
predicted airfare
$188.78
$188.89
$189.01
$189.13
(k) Each mile adds about another $0.11, which is close to the slope of our least squares regression line, .117.
(l) $11.70
 

Activity 10-3: Airfares (cont.)

(a) $150.87
(b) 178-150.87 = $27.13
(c) Atlanta: fitted - $150.87,  residual - $27.13;  Boston: residual - $11.30; Chicago fitted - $155.10
(d) St. Louis has the largest residual: distance - 737,  airfare - $98,  error - overestimate of $71.77

(e) greater
(f) below
(g) $111.08
(h) 4: Atlanta, Detroit, Pittsburgh, St. Louis
(i) Most cities have a smaller residual than their deviation from the mean.  This suggests that predictions from the regression line are generally better than the airfare mean because least squares regression takes the explanatory variable into account.
(j) sum of squared residuals: $14,308.09;  sum of squared deviations from overall mean: $38,882.92
(k) Answers will vary from student to student, but here is a good example:

(l) 14308.09/38882.92 = .368, 1 - .368 = .632
(m) r = .795, r2 = .632;  This is the same as the proportion of variability in the response variable that is explained by the regression model.
 

Activity 10-4: College Tuitions (cont.)

(a)

public: tuition = -13,138 + 9.59 * founded


private: tuition = 84,719 - 37.1 * founded

(b) public: r2 = .257; private:  r2 = .255; These two values for r2 are very similar.
(c) The line on the private college scatterplot appears to do a better job of summarizing the relationship between tuition and founding year.  The points follow the linear relationship much more closely.
(d) public: $5,083;  private: $14,229;  Judging from the scatterplot, the private school prediction seems more reasonable because the points fall closer to the line in the area of 1900 on this scatterplot.