Stat 414 – Review 2 problems


1) Consider this paragraph: The multilevel models we have considered up to this point control for clustering, and allow us to quantify the extent of dependency and to investigate whether the effects of level 1 variables vary across these clusters. 

(a) I have underlined 3 components, explain in detail what each of these components means in the multilevel model.

(b) The multilevel model referenced in the paragraph does not account for “contextual effects.” What is meant by that?


2) Give a short rule in your own words describing when an interpretation of an estimated coefficient should “hold constant” another covariate or “set to 0” that covariate


3) Consider this excerpt: “application of multilevel models for clustered data has attractive features: (a) the correction of underestimation of standard errors, (b) the examination of the cross-level interaction, (c) the elimination of concerns about aggregation bias, and (d) the estimation of the variability of coefficients at the cluster level.”

Explain each of these components to a non-statistician.


4) (a) Complete these sentences:

Only                                 can reduce level-1 variance in the outcome

Only                                 can reduce level-2 variance in the outcome

Only                                 can reduce variance of random slopes

(b) Explain the distinction between these two R commands

+ (1 | id) + (age | id)

+ (1 + age | id)


6) The following SAS output is from modeling results for a randomized controlled trial at 29 clinical centers. The response variable is diastolic blood pressure.


Description automatically generated 


Description automatically generated

(a)  What is the patient level variance? (Clarify any assumptions you are making about the output/any clues you have.)       

(b)  What is the center level variance?

(c)   What is an estimate of the ICC? Calculate and interpret.

(d)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment C at a center with average aggregate blood pressure scores?

(e)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment A at a center with aggregated blood pressure scores at the median?

(f)    What is the expected diastolic blood pressure for a randomly selected patient receiving treatment C at a center with aggregate blood pressure scores at the 16th percentile?

(g)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment B at a center with aggregate blood pressure scores at the 97.5th percentile?


This is my attempt to have you come up with the models, but this is probably more difficult than what I can ask on exam 2 where I’m still telling you what models to look at.  My focus is on interpreting the provided output and understanding how making changes impacts the model/model equations/graphs.

7) Chapp et al. (2018) explored 2014 congressional candidates’ ambiguity on political issues in their paper, Going Vague: Ambiguity and Avoidance in Online Political Messaging. They hand coded a random sample of 2012 congressional candidates’ websites, assigning an ambiguity score. These 2014 websites were then automatically scored using Wordscores, a program designed for political textual analysis. In their paper, they fit a multilevel model of candidates’ ambiguities with predictors at both the candidate and district levels.

Variables of interest include:

·        ambiguity = assigned ambiguity score. Higher scores indicate greater clarity (less ambiguity)

·        democrat = 1 if a Democrat, 0 otherwise (Republican)

·        incumbent = 1 if an incumbent, 0 otherwise

·        ideology = a measure of the candidate’s left-right orientation. Higher (positive) scores indicate more conservative candidates and lower (negative) scores indicate more liberal candidates.

·        mismatch = the distance between the candidate’s ideology and the district’s ideology (candidate ideology scores were regressed against district ideology scores; mismatch values represent the absolute value of the residual associated with each candidate)

·        distID = the congressional district’s unique ID

·        distLean = the district’s political leaning. Higher scores imply more conservative districts.

·        attHeterogeneity = a measure of the variability of ideologies within the district. Higher scores imply more attitudinal heterogeneity among voters.

·        demHeterogeneity = a measure of the demographic variability within the district. Higher scores imply more demographic heterogeneity among voters.

Consider the following research questions. Explain what model(s) you could explore to test these hypotheses (including what the different levels represent).


(a) Is ideological distance [from district residents] associated with greater ambiguity, but to varying degrees across the districts?

(b) Does the impact of ideological distance depend on the attitudinal heterogeneity among voters in the district?

(c) Controlling for ideological distance, does ideological extremity [of the candidate] correspond to less ambiguity?

(d) Does more variance in attitudes [among district residents] correspond to a higher degree of ambiguity in rhetoric?

(e) Does candidate rhetoric become clearer as the candidate’s issue positions move to the ideological extremes?

(f) Does the variability in ambiguity scores differ for Republican and Democratic candidates?


8) Let’s consider some models for predicting the happiness of musicians prior to performances, as measured by the positive affect scale (pa) from the PANAS instrument. MPQ absorption = levels of openness to absorbing sensory and imaginative experiences



(a) Calculate and interpret the intraclass correlation coefficient



instructor = ifelse(musicians$audience == "Instructor", 1 , 0)

students = ifelse(musicians$audience == "Students", 1, 0)

summary(model1 <- lmer(pa ~ 1 + instructor + students +  (1 + instructor + students | subjnum), data = musicians))

(b) Interpret the 34.73, -4.19, and 4.51 values in context.

(c) Calculate a (pseudo) R2 for Level 1 for this model.





Description automatically generated

(d) Write the corresponding model out as Level 1 and Level 2 equations. (In terms of parameters, not estimates). How many parameters are estimated in this model?

(e) Provide interpretations for all estimated parameters!

(f) Which variance components decreased the most between Model1 and Model2? Provide a brief interpretation.

(g) Suppose I want to add an indicator variable for male to Model 2 as a predictor for all intercept and slope terms. How many parameters will this add to the model?

(h) Suppose I add the male indicator to Model 2 as suggested.  How will this change the interpretations you gave in (e)?