Stat 414 –
HW 6
Due
midnight, Wednesday Nov 13
1) Consider a basic random slopes model:
,
with
(a) Derive the expression for .
(Hint: Var(aX + bZ) = a2Var(X) + b2Var(Z) +
2abCov(X, Z))
(b) Derive the expression for , the covariance between two observations
in the same Level 2 group.
Below is output for the model with random
slopes for .
(c) Use this output for find the
estimated value for .
Show the details!
(d) Calculate the estimated variance of
an observation () with NAP = 0.045. Show the
details!
(e) Calculate the estimated covariance of
2 observations in the same beach ( and
) with NAP = 0.045 and NAP =-1.036.
Show the details!
(f) Calculate the estimated correlation
for the 2 observations in (e). Show the details!
2) Recall the 2006
Programme for International Student Assessment (PISA), organized by the OECD,
gathered data on 15- to 16-year-old children from schools across Australia.
Variables in the dataset included
· z_read, a standardized
reading score
· cen_pos, parental occupational
status, a 64-category ordinal score (the lowest value represents low
occupational status), which is centered and we will treat as quantitative
· cen_escs: the centered PISA index
of economic, social, and cultural status, a composite index created from items
on the student questionnaire, including parental occupation and education as
well as possessions within the home (e.g., books, computers, areas for children
to study).
· female, an indicator variable
with 1 = female and 0 = male
ReadingScores
=
read.table("https://www.rossmanchance.com/stat414/data/ReadingScores.txt",
header=T)
head(ReadingScores)
When we fit the OLS lines, a few schools
stood out as behaving differently from the rest.
(a) According to the above graph,
guesstimate where the variation in the standardized reading scores is the
smallest. Briefly explain.
Here is the multilevel
model output
(b) According to this model, for
what values of cen_escs is the variation in standardized reading scores
minimized? (Check your calculation
against the graph of the fitted model.)
(c) Create models 3 and 3b with female (only
as the predictor variable) to decide whether or not to use random
slopes on female. Include your R code and the most relevant output.
Which model do you recommend? Justify your answer.
(d) According to model 3b, which group
(males or females) is predicted to have more variability in their standardized
reading scores? Explain. Is this consistent with the data? (Include a graph?)
(e) Fit a model with cen_pos, cen_escs,
and female, with random slopes on cen_escs and female.
model4 =
lmer(z_read ~ cen_pos + cen_escs + female + (1 + cen_escs + female| schoolid),
data = ReadingScores, REML = F)
summary(model4)
How many parameter estimates are there
in this model? Interpret the variance/covariance parameter estimates for this
model in context.
3)
Three-level model: For
the achieve.txt data set, there were 10,903 third-grade students nested within 568
classrooms nested within 160 schools.
achieve =
read.table("https://www.rossmanchance.com/stat414F20/data/Achieve.txt"
, header=TRUE)
(a) Fit the “unconditional means” (null)
model, putting the ‘higher level’ first to see how much variation is at each
level. How many parameters are in this
model?
#library(lme4)
summary(model0 <- lmer(geread~ (1|school/class), data =
achieve))
confint(model0)
Alternatively
#library(nlme)
#summary(lme(geread ~ 1, random = ~1 | school/class, data =
achieve))
#intervals
(b) Write out the model equation and
interpret the parameters.
(c) Do the
reading scores appear to differ significantly across the classes? Across the
schools? Justify your answers.
(d) What is
the “total variance” in reading scores estimated by this model?
(e) How much
of the variance is explained at each level?
(f) How
correlated are two students in the same class (in the same school)? How correlated are two students in the same
school (but different classes)? (Hints:
To answer these questions, think about what they share, the covariance will
involve the variance terms they have in common. Alternative reference.)
(g) Add student vocabulary score (gevocab),
number of students per class (clenroll), and number of students in the school
(cenroll). (Include your R code and relevant output.)
(h) Identify these as student, class, or school
level variables.
(i) Is this a better fitting model? Are any of
these variables significant? How do we interpret the signs of the fixed effects
coefficients? (Do they behave as you would have expected?)
(j) How did the variance components change?
(Calculate the percentage reduction for each) Are they still statistically
significant? What does that tell you? How do we interpret them?
(k)
How would you interpret the following models (Hints: What’s random? What
is/is not correlated?)
lmer(geread~gevocab+gender + (1|school) + (gender|class), data =
achieve)
lmer(geread~gevocab+gender + (-1 + gender|school) + (1|class),
data = achieve)