Stat 414 – HW 6

Due midnight, Wednesday Nov 13

 

1) Consider a basic random slopes model:

,

with

(a) Derive the expression for .  (Hint: Var(aX + bZ) = a2Var(X) + b2Var(Z) + 2abCov(X, Z))

(b) Derive the expression for , the covariance between two observations in the same Level 2 group.

 

Below is output for the model with random slopes for .

A screenshot of a computer

Description automatically generated

(c) Use this output for find the estimated value for .  Show the details!

(d) Calculate the estimated variance of an observation () with NAP = 0.045. Show the details!

(e) Calculate the estimated covariance of 2 observations in the same beach ( and ) with NAP = 0.045 and NAP =-1.036. Show the details!

(f) Calculate the estimated correlation for the 2 observations in (e). Show the details!

 

2) Recall the 2006 Programme for International Student Assessment (PISA), organized by the OECD, gathered data on 15- to 16-year-old children from schools across Australia. Variables in the dataset included

·       z_read, a standardized reading score

·       cen_pos, parental occupational status, a 64-category ordinal score (the lowest value represents low occupational status), which is centered and we will treat as quantitative

·       cen_escs: the centered PISA index of economic, social, and cultural status, a composite index created from items on the student questionnaire, including parental occupation and education as well as possessions within the home (e.g., books, computers, areas for children to study).

·       female, an indicator variable with 1 = female and 0 = male

ReadingScores = read.table("https://www.rossmanchance.com/stat414/data/ReadingScores.txt", header=T)

head(ReadingScores)

 

When we fit the OLS lines, a few schools stood out as behaving differently from the rest. 

A graph of a graph

Description automatically generated with medium confidence

(a) According to the above graph, guesstimate where the variation in the standardized reading scores is the smallest. Briefly explain.

 

Here is the multilevel model output

A screenshot of a computer program

Description automatically generated A graph with blue lines

Description automatically generated

(b) According to this model, for what values of cen­_escs is the variation in standardized reading scores minimized?  (Check your calculation against the graph of the fitted model.)

 

(c) Create models 3 and 3b with female (only as the predictor variable) to decide whether or not to use random slopes on female. Include your R code and the most relevant output. Which model do you recommend? Justify your answer.

(d) According to model 3b, which group (males or females) is predicted to have more variability in their standardized reading scores? Explain. Is this consistent with the data? (Include a graph?)

 

(e) Fit a model with cen_pos, cen_escs, and female, with random slopes on cen_escs and female. 

model4 = lmer(z_read ~ cen_pos + cen_escs + female + (1 + cen_escs + female| schoolid), data = ReadingScores, REML = F)

summary(model4)

How many parameter estimates are there in this model? Interpret the variance/covariance parameter estimates for this model in context.

 

3) Three-level model: For the achieve.txt data set, there were 10,903 third-grade students nested within 568 classrooms nested within 160 schools.

achieve = read.table("https://www.rossmanchance.com/stat414F20/data/Achieve.txt" , header=TRUE)

 

(a) Fit the “unconditional means” (null) model, putting the ‘higher level’ first to see how much variation is at each level.  How many parameters are in this model?

#library(lme4)

summary(model0 <- lmer(geread~ (1|school/class), data = achieve))

confint(model0)

 

Alternatively

#library(nlme)

#summary(lme(geread ~ 1, random = ~1 | school/class, data = achieve))

#intervals

 

(b) Write out the model equation and interpret the parameters.

(c) Do the reading scores appear to differ significantly across the classes? Across the schools? Justify your answers.

(d) What is the “total variance” in reading scores estimated by this model?

(e) How much of the variance is explained at each level?

(f) How correlated are two students in the same class (in the same school)?  How correlated are two students in the same school (but different classes)?  (Hints: To answer these questions, think about what they share, the covariance will involve the variance terms they have in common. Alternative reference.)

 

(g) Add student vocabulary score (gevocab), number of students per class (clenroll), and number of students in the school (cenroll). (Include your R code and relevant output.)

(h) Identify these as student, class, or school level variables.

(i) Is this a better fitting model? Are any of these variables significant? How do we interpret the signs of the fixed effects coefficients? (Do they behave as you would have expected?)

(j) How did the variance components change? (Calculate the percentage reduction for each) Are they still statistically significant? What does that tell you? How do we interpret them?

(k) How would you interpret the following models (Hints: What’s random? What is/is not correlated?)

lmer(geread~gevocab+gender + (1|school) + (gender|class), data = achieve)

lmer(geread~gevocab+gender + (-1 + gender|school) + (1|class), data = achieve)