Stat 414 – Review 2 problems solutions


1) Consider this paragraph: The multilevel models we have considered up to this point control for clustering, and allow us to quantify the extent of dependency and to investigate whether the effects of level 1 variables vary across these clusters. 

(a) I have underlined 3 components, explain in detail what each of these components means in the multilevel model.

Control for clustering: We have observations that fall into natural groups and we don’t want to treat the observations within the groups as independent, by including the “clustering variable” in the model, the other slope coefficients will be “adjusted” or “controlled” for that clustering variable (whether we treat it as fixed or random)


Quantify the extent of the dependency: The ICC measures how correlated are the observations in the same group


Whether the effects of level 1 variables vary across the clusters: random slopes


(b) The multilevel model referenced in the paragraph does not account for “contextual effects.” What is meant by that?


The ability to include Level 2 variables, variables explaining differences among the clusters.  In particular, we can aggregate level one variables to be at Level 2 (e.g., group means). Being able to include these in the model, along with controlling for the individual groups, is another huge advantage of multilevel models.


2) Give a short rule in your own words describing when an interpretation of an estimated coefficient should “hold constant” another covariate or “set to 0” that covariate

We should hold variable 2 constant when we are interpreting the slope of variable 1, unless the interaction of these two variables is also included, then if variable 2 is at zero we can interpret the main effect of variable 1. 

We should put all explanatory variables (including the Level 2 random effects) at zero when interpreting an intercept.  In a random intercepts (only) model, all the slopes are the same so you can say “in a particular school” but in a random slopes model, you need to make , so you would say for the average school (or describe how you are talking about the overall effect, before any level 2 group deviations in the slope).


3) Consider this excerpt: “application of multilevel models for clustered data has attractive features: (a) the correction of underestimation of standard errors, (b) the examination of the cross-level interaction, (c) the elimination of concerns about aggregation bias, and (d) the estimation of the variability of coefficients at the cluster level.”

Explain each of these components to a non-statistician.

(a) Assuming independence allows us to think we have a larger sample size (more information) than we really do and underestimates standard errors.

(b) Including interaction terms between Level 1 and Level 2 variables

(c) Still get to analyze the data at Level 1 vs. aggregating the variables to Level 2 which could have a different relationship than the Level 1 relationship. It’s problematic to assume the Level 2 relationship applies to individuals.

(d) We are able to estimate the intercept-to-intercept and slope-to-slope variation at Level 2


4) (a) Complete these sentences:

Only  level 1 variation in predictors (aka covariate aka explanatory variable)              can reduce level-1 variance in the outcome

Only level 2 variation in predictors*                             can reduce level-2 variance in the outcome

Only cross-level interactions                     can reduce variance of random slopes

*The subtle reminder here is that “level 1 variables” can have variation at both level 1 and level 2. One way to capture that is to include both x and  in the model.


b) Explain the distinction between these two R commands

+ (1 | id) + (age | id)

+ (1 + age | id)

Both allow for random intercepts by id and random slopes for age by id, but the first does not allow for a correlation (covariance) between these intercepts and slopes.  However, the first will have two types of random intercepts (by id), so a better command (if you are just trying to zero out the slope/intercept covariance) might be

          + (1 | id) + (-1 + age | id)

Note, there is a section in your text titled “Do not force  to be 0!”


6) The following SAS output is from modeling results for a randomized controlled trial at 29 clinical centers. The response variable is diastolic blood pressure.


Description automatically generated 


Description automatically generated

(a)  What is the patient level variance? (Clarify any assumptions you are making about the output/any clues you have.)       

Because the first table is titled “covariance,” I’m assuming those are estimated variances rather than estimated standard deviations.  The patient to patient variance is estimated to be  = 73.7 mmHG2


(b)  What is the center level variance?

The center to center variance is estimated to be  = 10.7 mmHG2


(c)   What is an estimate of the ICC? Calculate and interpret.

10.7 / (10.7 + 73.6) approx 0.13.  This represents the correlation btween two patients at the same clinic and that 13% of the variation in diastlic blood pressure is at the clinic level.


(d)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment C at a center with average aggregate blood pressure scores?

90.97 mmHG, the intercept (no treatment C effect and no clinic effect because at the average center)


(e)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment A at a center with aggregated blood pressure scores at the median?

Because we are assuming the clinic effects are normally distributed, with a center at the median,  is again assumed to be zero.  So 90.87 + 3.11 to include the effect of treatment A = 93.98 mmHG.


(f)    What is the expected diastolic blood pressure for a randomly selected patient receiving treatment C at a center with aggregate blood pressure scores at the 16th percentile?

Now we want to assume the clinic effect is 1SD below zero where the random clinic effects are assumed to be normally distributed with mean zero and standard deviation = sqrt(10.67). So a clinic at the 16th percentile is predicted to fall 3.27 below the average across all the clinics.

So 90.87 (intercept) + 0 (treatment C) – 3.27 (random effect for 16th percentile) = 87.60 mmHG.


(g)  What is the expected diastolic blood pressure for a randomly selected patient receiving treatment B at a center with aggregate blood pressure scores at the 97.5th percentile?

Now we want to assume the clinic effect is 2SD above zero = 2(3.27)

So 90.87 + 1.41  + 2(3.27) = 98.82 mmHG.


7) Chapp et al. (2018) explored 2014 congressional candidates’ ambiguity on political issues in their paper, Going Vague: Ambiguity and Avoidance in Online Political Messaging. They hand coded a random sample of 2012 congressional candidates’ websites, assigning an ambiguity score. These 2014 websites were then automatically scored using Wordscores, a program designed for political textual analysis. In their paper, they fit a multilevel model of candidates’ ambiguities with predictors at both the candidate and district levels.

Variables of interest include:

·        ambiguity = assigned ambiguity score. Higher scores indicate greater clarity (less ambiguity) the response variable

·        democrat = 1 if a Democrat, 0 otherwise (Republican)

·        incumbent = 1 if an incumbent, 0 otherwise

·        ideology = a measure of the candidate’s left-right orientation. Higher (positive) scores indicate more conservative candidates and lower (negative) scores indicate more liberal candidates.

·        mismatch = the distance between the candidate’s ideology and the district’s ideology (candidate ideology scores were regressed against district ideology scores; mismatch values represent the absolute value of the residual associated with each candidate)

·        distID = the congressional district’s unique ID our Level 2 grouping variable

·        distLean = the district’s political leaning. Higher scores imply more conservative districts.

·        attHeterogeneity = a measure of the variability of ideologies within the district. Higher scores imply more attitudinal heterogeneity among voters.

·        demHeterogeneity = a measure of the demographic variability within the district. Higher scores imply more demographic heterogeneity among voters.

Consider the following research questions. Explain what model(s) you could explore to test these hypotheses (including what the different levels represent).


(a) Is ideological distance [from district residents] associated with greater ambiguity, but to varying degrees across the districts?


Want a model with mismatch (quantitative) as a Level 1 variable with random slopes across districts.  See if the coefficient of mismatch is positive and if the random slopes variance is significant (“varying degrees”).

(This is like a group-centered variable, we care less about the ideology of the candidate but more in how the candidate’s ideology compares to their district’s.)

          ambiguity = 1 + mismatch + (1 + mismatch |district)


(b) Does the impact of ideological distance depend on the attitudinal heterogeneity among voters in the district?


This is asking for an interaction between mismatch at Level 1 and attitudinal heterogeneity at Level 2.  We include such an interaction by including the level 2 variable in both the equation for intercepts (as a fixed effect) and in the equation for slopes (creating a cross-level interaction between the two variables).

          ambiguity = 1 + mismatch + attHeterogeneity + mismatch*attHeterogeneity + (1 + mismatch |district)

          adds two parameters (You don’t have to add attHeterogeneity on its own by why assume its coefficient is zero 😊)

Not clear from the statement whether we should keep the random slopes but this would allow us to see how much of the variation in random slopes across districts is explained by attHeterogeneity.


(c) Controlling for ideological distance, does ideological extremity [of the candidate] correspond to less ambiguity?


Need ideology (Level 1) in the model and focus on the (adjusted) coefficient of mismatch (also a level 1 variable).

          ambiguity = 1 +  mismatch + ideology + (1 | district)

          Not clear from the statement whether we should keep the random slopes


(d) Does more variance in attitudes [among district residents] correspond to a higher degree of ambiguity in rhetoric?


Need attHeterogeneity (Level 2) in the model to explain variation in intercepts across districts.

          ambiguity = 1 + attHeterogeneity + (1 | district) – is the coefficient positive?


(e) Does candidate rhetoric become clearer as the candidate’s issue positions move to the ideological extremes?


This one could actually be an interaction between ideology and party: “Our expectation is that candidate rhetoric will become clearer as the candidate’s issue positions move to the ideological extremes. Accordingly, we interact the dichotomous party variable with candidate ideology, since we expect Democrats to be more clear with lower ideology scores (more liberal) but Republicans to be more clear with higher ideology scores (more conservative).”


(f) Does the variability in ambiguity scores differ for Republican and Democratic candidates?


Need to have the democrat variable in the model with random slopes (the random slopes is what allows the variability in the response to differ between the categories)


8) Let’s consider some models for predicting the happiness of musicians prior to performances, as measured by the positive affect scale (pa) from the PANAS instrument. MPQ absorption = levels of openness to absorbing sensory and imaginative experiences



(a) Calculate and interpret the intraclass correlation coefficient

(23.72)/(23.72 + 41.70) = 0.363

36% of the variation in pa scores (pre-performance happiness) is at the musician level (i.e., is attributable to differences among musicians)


The correlation between two performances for the same musician



instructor = ifelse(musicians$audience == "Instructor", 1 , 0)

students = ifelse(musicians$audience == "Students", 1, 0)

summary(model1 <- lmer(pa ~ 1 + instructor + students +  (1 + instructor + students | subjnum), data = musicians))

(b) Interpret the 34.73, -4.19, and 4.51 values in context.

34.73: the estimated pre-performance happiness for musicians playing in public or juried recitals (the other two performance types) for the average performer (or estimated population mean)


-4.19: The estimated decrease in pre-performance happiness for a particular performer, on average, with an instructor for the audience compared to public or juried recitals


4.51: The estimated performer-to-performer population standard deviation in pre-performance happiness for public or juried performances.


(c) Calculate a (pseudo) R2 for Level 1 for this model.

Comparing the Level 1 variance between Model 0 and Model 1:

1 – 36.39/41.70 = .127

12.7% of the performance to performance pre-performance happiness levels within a performer can be explained by type of audience (instructor, student, or other)




Description automatically generated

(d) Write the corresponding model out as Level 1 and Level 2 equations. (In terms of parameters, not estimates). How many parameters are estimated in this model?


  gives us the fixed effect for mpqabc

  gives us the instructor*mpqabc interction

gives us the students*mpqabc interction

We are also assuming we have .  Whether you use  or  notation isn’t that important but differentiating and remembering what they measure.

There are 6 “beta” coefficients, 4 variance terms, and 3 covariances. = 13 parameters.


(e) Provide interpretations for all estimated parameters!

·       34.82 is estimated pre-performance happiness for public or juried recitals for the average musicians with average MPQ absorption (or estimated population mean for population of musicians with average MPQ absorption)

·       -4.65 is the predicted decrease in pre-performance happiness for student audiences for a particular musician with average MPQ absorption (have to zero out the interaction) compared to public/juried audiences

·       -.02279 is the predicted decrease in pre-performance happiness associated with a one-unit increase in the MPQ absorption scale for public or juried recitals (have to zero out the interaction)

·       -4.25 is the predicted decrease in pre-performance happiness for an instructor  audience for a particular musician with average MPQ absorption compared to public/juried audience

·       .369 is the predicted lowering of the negative effect of instructor audience on pre-performance happiness as performer MPQ increases. (Note: slope of mpqabc = -.023 + .292 student + .369 instructor.  So if performing in front of an instructor, a one-unit increase in MPQ is associated with a -.023 + .369 = 0.35 increase in pre-performance happiness rather than a 0.023 decrease if performing in front of public/jury.)

·       .2922 is the lowering of the negative effect of student audience on pre-performance happiness as performer MPQ absorption increases (becomes less negative for student audience – in fact, if a student audience, a one-unit increase in MPQ absorption is associated with a 0.27 increase in pre-performance happiness)

·       20.32 is the musician to musician variability in pre-performance happiness for performers with average MPQ in a public or juried recital.

·       8.063 is the variability in the effect (slope) of an instructor audience (vs. public or juried) across musicians

·       10.725 is the variability in the effect (slope) of a student audience across musicians

·       36.587 is the unexplained variability across performances within musicians after adjusting for audience type and MPQ absorption

·       0.09 is a weak correlation between intercepts and the instructor slopes (estimated correlation between pre-performance happiness levels for juried recitals or public performances and changes in happiness for instructor audiences, after controlling for absorption levels)

A graph with different colored lines

Description automatically generated

No real fanning

Cov = .09(4.508)(2.839) = 1.2

Min = 1.2/10.725 0 (so fanning out if anything)


Knowing a musician’s intercept doesn’t tell us much about the impact on them of an instructor audience (might be larger or smaller than average)


·       -0.73 says musicians with higher intercepts (pre-performance happiness for public or juried recitals and average MPQ absorption) tend to have even larger decreases in happiness for student audiences than musicians with smaller intercepts.

Chart, line chart

Description automatically generated

Cov = -.73(4.508)(3.275) = -10.77

Min = 10.77/10.725 1 (so fanning in)



·       0.60 says musicians with a larger effect of instructor tend to have a smaller effect for student audiences.


(f) Which variance components decreased the most between Model1 and Model2? Provide a brief interpretation.

The slopes for instructors and students.  So knowing MPQ absorption explains variation among musicians in the instructor-audience and student-audience effects.


(g) Suppose I want to add an indicator variable for male to Model 2 as a predictor for all intercept and slope terms. How many parameters will this add to the model?


This is a Level 2 variable.  So you will have 3 new level 2 coefficients (in each of the three Level 2 equations)


(h) Suppose I add the male indicator to Model 2 as suggested.  How will this change the interpretations you gave in (e)?

The intercept applies to females and the slope coefficient interpretations are after adjusting for both MPQ and gender.  When trying to zero out interaction terms will be talking about females with average MPQ absorption.