Stat 414 – Computer Problem 1

Due Friday, midnight, Oct. 20


This is mostly older material but need to figure out much more of the R stuff on your own. You may work together on this problem and submit one file with both names on it.

Suppose we have data on X = the number of supervised workers and Y = the number of supervisors for 27 industrial establishments.  Data are in supvis.txt.

(a) Fit a weighted regression using 1/num.workers2 as the weights.

(b) Examine the vcov(model1) output and confirm that the standard errors of the coefficients are the square roots of the diagonal values of the vcov matrix.

(c) Suppose I plan to hire 100 more workers, how many more supervisors should I hire?

(d) Produce an appropriate residual plot to decide whether this model adequately addresses the heterogeneity in the data.

The following code is what we used before to create prediction bands.

#order the x-variable

supvis2 = supvis[with(supvis, order(X)),]


model1 <- lm(Y ~ X, data = supvis)

pred1 <- predict(model1, newdata = data.frame(X = supvis2$X), interval = "prediction")


alldata <- cbind(supvis2, pred1)


ggplot(alldata, aes(x = X, y = Y)) + #define x and y axis variables

  geom_point() + #add scatterplot points

  geom_line(aes(y = lwr), col = "coral2", linetype = "dashed") + #lwr pred interval

  geom_line(aes(y = upr), col = "coral2", linetype = "dashed") + #upr pred interval



(e) Create graphs of prediction bands for model 1 (above), model 2 using 1/X^2 as weights), and model 3 using a log-log transformation.


·       For model 2, make sure you specify the weights in the predict command as well.

·       For model 3, make sure the graph is on the same scale (X and Y) as the other models

Remember to include all relevant code!


(f) Based on the 3 plots which model(s) would you recommend and why.

(g) Apply the Huber-White adjustment (Heteroscedasticity-Consistent) to the standard errors for model 1:



vcovHC(model1, "HC1")

coeftest(model1, vcov = vcovHC(model1, type = "HC1")) 

How do the standard errors of the estimated coefficients change? Does the statistical significance of the variable change?

(h) Repeat (g) for model 2. Summarize what the comparison tells you.


Other HC variations

HC0: White’s original suggestion

HC1: corrects for the degrees of freedom

HC2: uses leverage-adjusted residuals

HC3: gives less weight to influential observations

HC4: controls how much the leverage values are weighted (2004)

Include your output and discuss any differences in the results.

Also note what vcovHC(model1, type="const") does.