Stat 414 – Review 2 problems solutions
1) Consider this paragraph: The multilevel models we have considered up to this point control for clustering, and allow us to quantify the extent of dependency and to investigate whether the effects of level 1 variables vary across these clusters.
(a) I have underlined 3 components, explain in detail what each of these components means in the multilevel model.
Control for clustering: We have observations that fall into natural groups and we don’t want to treat the observations within
the groups as independent, by including the “clustering variable” in the model,
the other slope coefficients will be “adjusted” or “controlled” for that
clustering variable (whether we treat it as fixed or random)
Quantify the extent of the
dependency: The ICC measures how correlated
are the observations in the same group
Whether the effects of level 1
variables vary across the clusters: random
slopes
(b) The multilevel model referenced in the paragraph does not account for “contextual effects.” What is meant by that?
The ability to include Level 2
variables, variables explaining differences among the clusters. In particular, we can aggregate level one
variables to be at Level 2 (e.g., group means). Being able to include these in
the model, along with controlling for the individual groups, is another huge
advantage of multilevel models.
2) Give a short rule in your own words
describing when an interpretation of an estimated coefficient should “hold
constant” another covariate or “set to 0” that covariate
We should hold variable 2 constant
when we are interpreting the slope of variable 1, unless the interaction of
these two variables is also included, then if variable 2 is at zero we can
interpret the main effect of variable 1.
We should put all explanatory
variables (including the Level 2 random effects) at zero when interpreting an
intercept. In a random intercepts (only)
model, all the slopes are the same so you can say “in a particular school” but
in a random slopes model, you need to make , so you would say for the average school (or describe how you are
talking about the overall effect, before any level 2 group deviations in the
slope).
3) Consider this excerpt: “application of multilevel models for clustered data has attractive features: (a) the correction of underestimation of standard errors, (b) the examination of the crosslevel interaction, (c) the elimination of concerns about aggregation bias, and (d) the estimation of the variability of coefficients at the cluster level.”
Explain each of these components to a nonstatistician.
(a) Assuming independence allows us
to think we have a larger sample size (more information) than we really do and
underestimates standard errors.
(b) Including interaction terms
between Level 1 and Level 2 variables
(c) Still get to analyze the data
at Level 1 vs. aggregating the variables to Level 2 which could have a
different relationship than the Level 1 relationship. It’s problematic to
assume the Level 2 relationship applies to individuals.
(d) We are able to estimate the
intercepttointercept and slopetoslope variation at Level 2
4) (a) Complete these sentences:
Only level 1 variation in
predictors (aka covariate aka
explanatory variable) can
reduce level1 variance in the outcome
Only level 2 variation in predictors* can reduce level2 variance in the outcome
Only crosslevel interactions can reduce variance of
random slopes
*The subtle reminder here is that “level 1
variables” can have variation at both level 1 and level 2. One way to capture
that is to include both x and in the model.
b) Explain the distinction between these two R commands
+ (1 + age  id)
Both allow for random intercepts by
id and random slopes for age by id, but the first does not allow for a
correlation (covariance) between these intercepts and slopes. However, the first will have two types of
random intercepts (by id), so a better command (if you are just trying to zero
out the slope/intercept covariance) might be
+
(1  id) + (1 + age  id)
Note, there is a section in your
text titled “Do not force to be 0!”
6) The following SAS output is from modeling results for
a randomized controlled trial at 29 clinical centers. The response variable is
diastolic blood pressure.
(a) What is the patient level variance? (Clarify any
assumptions you are making about the output/any clues you have.)
Because the first table is titled “covariance,”
I’m assuming those are estimated variances rather than estimated standard
deviations. The patient to patient
variance is estimated to be = 73.7 mmHG^{2}
(b) What is the center level variance?
The center to center variance is estimated
to be = 10.7 mmHG^{2}
(c) What is an estimate of the ICC? Calculate and
interpret.
10.7 / (10.7 + 73.6) approx 0.13. This represents the correlation btween two
patients at the same clinic and that 13% of the variation in diastlic blood
pressure is at the clinic level.
(d) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment C at a center with average
aggregate blood pressure scores?
90.97 mmHG, the intercept (no
treatment C effect and no clinic effect because at the average center)
(e) What is the expected diastolic blood pressure for a randomly
selected patient receiving treatment A at a center with aggregated blood
pressure scores at the median?
Because we are assuming the clinic effects
are normally distributed, with a center at the median, is again
assumed to be zero. So 90.87 + 3.11 to
include the effect of treatment A = 93.98 mmHG.
(f) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment C at a center with aggregate
blood pressure scores at the 16th percentile?
Now we want to assume the clinic effect is
1SD below zero where the random clinic effects are assumed to be normally
distributed with mean zero and standard deviation = sqrt(10.67). So a clinic at
the 16^{th} percentile is predicted to fall 3.27 below the average across all the clinics.
So 90.87 (intercept) + 0 (treatment C) – 3.27
(random effect for 16^{th} percentile) = 87.60 mmHG.
(g) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment B at a center with aggregate
blood pressure scores at the 97.5th percentile?
Now we want to assume the clinic effect is
2SD above zero = 2(3.27)
So 90.87 + 1.41
+ 2(3.27) = 98.82 mmHG.
7) Chapp et al. (2018) explored 2014
congressional candidates’ ambiguity on political issues in their paper, Going
Vague: Ambiguity and Avoidance in Online Political Messaging.
They hand coded a random sample of 2012 congressional candidates’ websites,
assigning an ambiguity score. These 2014 websites were then automatically
scored using Wordscores, a program designed for political textual analysis. In
their paper, they fit a multilevel model of candidates’ ambiguities with
predictors at both the candidate and district levels.
Variables of interest include:
·
ambiguity =
assigned ambiguity score. Higher scores indicate greater clarity (less
ambiguity) the response variable
·
democrat =
1 if a Democrat, 0 otherwise (Republican)
·
incumbent =
1 if an incumbent, 0 otherwise
·
ideology =
a measure of the candidate’s leftright orientation. Higher (positive) scores
indicate more conservative candidates and lower (negative) scores indicate more
liberal candidates.
·
mismatch =
the distance between the candidate’s ideology and the district’s ideology
(candidate ideology scores were regressed against district ideology scores;
mismatch values represent the absolute value of the residual associated with
each candidate)
·
distID =
the congressional district’s unique ID our Level 2
grouping variable
·
distLean =
the district’s political leaning. Higher scores imply more conservative
districts.
·
attHeterogeneity = a measure of the variability of ideologies within the
district. Higher scores imply more attitudinal heterogeneity among voters.
·
demHeterogeneity = a measure of the demographic variability within the
district. Higher scores imply more demographic heterogeneity among voters.
Consider the following research
questions. Explain what model(s) you could explore to test these hypotheses
(including what the different levels represent).
(a) Is ideological distance [from district residents] associated
with greater ambiguity, but to varying degrees across the districts?
Want a
model with mismatch (quantitative) as a Level 1 variable with random slopes
across districts. See if the coefficient
of mismatch is positive and if the random slopes variance is significant
(“varying degrees”).
(This is
like a groupcentered variable, we care less about the ideology of the
candidate but more in how the candidate’s ideology compares to their
district’s.)
ambiguity = 1 + mismatch + (1 + mismatch
district)
(b) Does the impact of ideological distance depend on the
attitudinal heterogeneity among voters in the district?
This is
asking for an interaction between mismatch at Level 1 and attitudinal
heterogeneity at Level 2. We include
such an interaction by including the level 2 variable in both the equation for
intercepts (as a fixed effect) and in the equation for slopes (creating a
crosslevel interaction between the two variables).
ambiguity = 1 + mismatch +
attHeterogeneity + mismatch*attHeterogeneity + (1 + mismatch district)
adds two parameters (You don’t have to
add attHeterogeneity on its own by why assume its coefficient is zero 😊)
Not clear from the statement whether we should keep the
random slopes but this would allow us to see how much of
the variation in random slopes across districts is explained by
attHeterogeneity.
(c) Controlling for ideological distance, does ideological
extremity [of the candidate] correspond to less ambiguity?
Need
ideology (Level 1) in the model and focus on the (adjusted) coefficient of
mismatch (also a level 1 variable).
ambiguity = 1 + mismatch + ideology + (1  district)
Not clear from the statement whether
we should keep the random slopes
(d) Does more variance in attitudes [among district
residents] correspond to a higher degree of ambiguity in rhetoric?
Need
attHeterogeneity (Level 2) in the model to explain variation in intercepts
across districts.
ambiguity = 1 + attHeterogeneity + (1
 district) – is the coefficient positive?
(e) Does candidate rhetoric become clearer
as the candidate’s issue positions move to the ideological extremes?
This one could actually
be an interaction between ideology and party: “Our expectation is that candidate rhetoric will become
clearer as the candidate’s issue positions move to the ideological extremes.
Accordingly, we interact the dichotomous party variable with candidate
ideology, since we expect Democrats to be more clear
with lower ideology scores (more liberal) but Republicans to be more clear with
higher ideology scores (more conservative).”
(f) Does the variability in ambiguity scores
differ for Republican and Democratic candidates?
Need to have the democrat
variable in the model with random slopes (the random slopes is
what allows the variability in the response to differ between the categories)
8) Let’s consider some models for predicting the happiness of musicians prior to performances, as measured by the positive affect scale (pa) from the PANAS instrument. MPQ absorption = levels of openness to absorbing sensory and imaginative experiences
Model0
(a) Calculate and interpret the
intraclass correlation coefficient
(23.72)/(23.72
+ 41.70) = 0.363
36% of the variation in pa scores
(preperformance happiness) is at the musician level (i.e., is attributable to
differences among musicians)
And/Or
The correlation between two performances
for the same musician
Model1
instructor =
ifelse(musicians$audience == "Instructor", 1 , 0)
students =
ifelse(musicians$audience == "Students", 1, 0)
summary(model1
< lmer(pa ~ 1 + instructor + students +
(1 + instructor + students  subjnum), data = musicians))
(b) Interpret the 34.73, 4.19, and 4.51
values in context.
34.73: the estimated preperformance happiness for musicians playing in
public or juried recitals (the other two performance types) for the average
performer (or estimated population mean)
4.19: The estimated decrease in preperformance happiness for a
particular performer, on average, with an instructor for the audience compared
to public or juried recitals
4.51: The estimated performertoperformer population standard deviation
in preperformance happiness for public or juried performances.
(c) Calculate a (pseudo) R^{2}
for Level 1 for this model.
Comparing the Level 1 variance between Model 0 and Model 1:
1 – 36.39/41.70 = .127
12.7% of the performance to performance
preperformance happiness levels within a performer can be explained by type of
audience (instructor, student, or other)
Model2
(d) Write the corresponding model out as
Level 1 and Level 2 equations. (In terms of parameters, not estimates). How
many parameters are estimated in this model?
gives us the fixed effect for mpqabc
gives us the instructor*mpqabc interction
gives us the students*mpqabc
interction
We are also assuming we have . Whether you use or notation isn’t that important but
differentiating and remembering what they measure.
There are 6 “beta” coefficients, 4 variance terms, and 3
covariances. = 13 parameters.
(e) Provide interpretations for all
estimated parameters!
· 34.82 is estimated
preperformance happiness for public or juried recitals for the average
musicians with average MPQ absorption (or estimated population mean for
population of musicians with average MPQ absorption)
· 4.65 is the predicted
decrease in preperformance happiness for student audiences for a particular
musician with average MPQ absorption (have to zero out the interaction)
compared to public/juried audiences
· .02279 is the
predicted decrease in preperformance happiness associated with a oneunit
increase in the MPQ absorption scale for public or juried recitals (have to zero out the interaction)
· 4.25 is the predicted
decrease in preperformance happiness for an instructor audience for a particular musician
with average MPQ absorption compared to public/juried audience
· .369 is the predicted
lowering of the negative effect of instructor audience on preperformance
happiness as performer MPQ increases. (Note: slope of mpqabc = .023 + .292
student + .369 instructor. So if performing in front of an instructor, a oneunit
increase in MPQ is associated with a .023 + .369 = 0.35 increase in
preperformance happiness rather than a 0.023 decrease if performing in front
of public/jury.)
· .2922 is the lowering
of the negative effect of student audience on preperformance happiness as
performer MPQ absorption increases (becomes less negative for student audience –
in fact, if a student audience, a oneunit increase in MPQ absorption is
associated with a 0.27 increase in preperformance happiness)
· 20.32 is the musician to musician variability in preperformance
happiness for performers with average MPQ in a public or juried recital.
· 8.063 is the
variability in the effect (slope) of an instructor audience (vs. public or
juried) across musicians
· 10.725 is the
variability in the effect (slope) of a student audience across musicians
· 36.587 is the
unexplained variability across performances within musicians after adjusting
for audience type and MPQ absorption
· 0.09 is a weak
correlation between intercepts and the instructor slopes (estimated
correlation between preperformance happiness levels for juried recitals or
public performances and changes in happiness for instructor audiences, after
controlling for absorption levels)

No real
fanning Cov =
.09(4.508)(2.839) = 1.2 Min =
1.2/10.725 ≈ 0 (so fanning out if anything) Knowing a musician’s
intercept doesn’t tell us much about the impact on them of an instructor
audience (might be larger or smaller than average) 
· 0.73 says musicians
with higher intercepts (preperformance happiness for public or juried recitals
and average MPQ absorption) tend to have even larger decreases in happiness for
student audiences than musicians with smaller intercepts.

Cov = .73(4.508)(3.275) = 10.77 Min = 10.77/10.725 ≈ 1 (so fanning in) 
· 0.60 says musicians
with a larger effect of instructor tend to have a smaller effect for student
audiences.
(f) Which variance components decreased
the most between Model1 and Model2? Provide a brief interpretation.
The slopes for instructors and students.
So knowing MPQ absorption explains variation
among musicians in the instructoraudience and studentaudience effects.
(g) Suppose I want to add an indicator
variable for male to Model 2 as a predictor for all intercept and slope
terms. How many parameters will this add to the model?
This is a Level 2 variable. So you will have 3 new level 2 coefficients (in each
of the three Level 2 equations)
(h) Suppose I add the male indicator to
Model 2 as suggested. How will this
change the interpretations you gave in (e)?
The intercept applies to females
and the slope coefficient interpretations are after adjusting for both MPQ and
gender. When trying to zero out
interaction terms will be talking about females with average MPQ absorption.