Stat
414 – Computer Problem 1
Due
Friday, midnight, Oct. 20
This is
mostly older material but need to figure out much more of the R stuff on your own.
You may work together on this problem and submit one file with both names on
it.
Suppose we have data on
X = the number of supervised workers and Y = the number of supervisors for 27
industrial establishments. Data are in supvis.txt.
(a) Fit a weighted
regression using 1/num.workers2 as the weights.
(b) Examine the
vcov(model1) output and confirm that the standard errors of the coefficients
are the square roots of the diagonal values of the vcov matrix.
(c) Suppose I plan to
hire 100 more workers, how many more supervisors should I hire?
(d) Produce an
appropriate residual plot to decide whether this model adequately addresses the
heterogeneity in the data.
The following code is
what we used before to create prediction bands.
#order
the x-variable
supvis2
= supvis[with(supvis, order(X)),]
model1
<- lm(Y ~ X, data = supvis)
pred1
<- predict(model1, newdata = data.frame(X = supvis2$X), interval =
"prediction")
alldata
<- cbind(supvis2, pred1)
ggplot(alldata,
aes(x = X, y = Y)) + #define x and y axis variables
geom_point() + #add scatterplot points
geom_line(aes(y = lwr), col =
"coral2", linetype = "dashed") + #lwr pred interval
geom_line(aes(y = upr), col =
"coral2", linetype = "dashed") + #upr pred interval
theme_bw()
(e) Create graphs of
prediction bands for model 1 (above), model 2 using 1/X^2 as weights), and
model 3 using a log-log transformation.
Hints:
· For model 2, make sure
you specify the weights in the predict command as well.
· For model 3, make sure
the graph is on the same scale (X and Y) as the other models
Remember to include all
relevant code!
(f) Based on the 3 plots
which model(s) would you recommend and why.
(g) Apply the
Huber-White adjustment (Heteroscedasticity-Consistent) to the standard errors
for model 1:
#library(lmtest)
#library(sandwich)
vcovHC(model1,
"HC1")
coeftest(model1,
vcov = vcovHC(model1, type = "HC1"))
How do the standard
errors of the estimated coefficients change? Does the statistical significance
of the variable change?
(h) Repeat (g) for
model 2. Summarize what the comparison tells you.
Other HC variations
HC0: White’s original
suggestion
HC1: corrects for the
degrees of freedom
HC2: uses
leverage-adjusted residuals
HC3: gives less weight
to influential observations
HC4: controls how much
the leverage values are weighted (2004)
Include your output and
discuss any differences in the results.
Also note what vcovHC(model1,
type="const")
does.