**Stat 414 – Review 2 problems**

**1) **Consider this
paragraph: The multilevel models we have considered up to this point __control
for clustering__, and allow us to __quantify the extent of dependency__
and to investigate __whether the effects of level 1 variables vary across
these clusters__.

(a) I have underlined 3 components, explain in detail what each of these components means in the multilevel model.

(b) The multilevel model referenced in the paragraph does not account for “contextual effects.” What is meant by that?

**2)** Give a short rule in your own words
describing when an interpretation of an estimated coefficient should “hold
constant” another covariate or “set to 0” that covariate

**3) **Consider this excerpt: “application of multilevel models for clustered data has
attractive features: (a) the correction of underestimation of standard errors,
(b) the examination of the cross-level interaction, (c) the elimination of
concerns about aggregation bias, and (d) the estimation of the variability of
coefficients at the cluster level.”

Explain each of these components to a non-statistician.

**4) **(a)** **Complete these sentences:

Only __ __ can reduce level-1 variance in the outcome

Only __ __ can reduce level-2 variance in the outcome

Only __ __ can reduce variance of random slopes

(b) Explain the distinction between these two R commands

+ (1 + age | id)

**6)** The following SAS output is from modeling results for
a randomized controlled trial at 29 clinical centers. The response variable is
diastolic blood pressure.

(a) What is the patient level variance? (Clarify any
assumptions you are making about the output/any clues you have.)

(b) What is the center level variance?

(c) What is an estimate of the ICC? Calculate and
interpret.

(d) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment C at a center with average
aggregate blood pressure scores?

(e) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment A at a center with aggregated
blood pressure scores at the median?

(f) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment C at a center with aggregate
blood pressure scores at the 16th percentile?

(g) What is the expected diastolic blood pressure for a
randomly selected patient receiving treatment B at a center with aggregate
blood pressure scores at the 97.5th percentile?

*This is my attempt to have you come up with the models,
but this is probably more difficult than what I can ask on exam 2 where I’m
still telling you what models to look at.
My focus is on interpreting the provided output and understanding how
making changes impacts the model/model equations/graphs.*

**7)** Chapp et al. (2018) explored 2014
congressional candidates’ ambiguity on political issues in their paper, *Going
Vague: Ambiguity and Avoidance in Online Political Messaging*.
They hand coded a random sample of 2012 congressional candidates’ websites,
assigning an ambiguity score. These 2014 websites were then automatically
scored using Wordscores, a program designed for political textual analysis. In
their paper, they fit a multilevel model of candidates’ ambiguities with
predictors at both the candidate and district levels.

Variables of interest include:

·
ambiguity =
assigned ambiguity score. Higher scores indicate greater clarity (less
ambiguity)

·
democrat =
1 if a Democrat, 0 otherwise (Republican)

·
incumbent =
1 if an incumbent, 0 otherwise

·
ideology =
a measure of the candidate’s left-right orientation. Higher (positive) scores
indicate more conservative candidates and lower (negative) scores indicate more
liberal candidates.

·
mismatch =
the distance between the candidate’s ideology and the district’s ideology
(candidate ideology scores were regressed against district ideology scores;
mismatch values represent the absolute value of the residual associated with
each candidate)

·
distID =
the congressional district’s unique ID

·
distLean =
the district’s political leaning. Higher scores imply more conservative
districts.

·
attHeterogeneity = a measure of the variability of ideologies within the
district. Higher scores imply more attitudinal heterogeneity among voters.

·
demHeterogeneity = a measure of the demographic variability within the
district. Higher scores imply more demographic heterogeneity among voters.

Consider the following research
questions. Explain what model(s) you could explore to test these hypotheses
(including what the different levels represent).

(a) Is ideological distance [from district residents]
associated with greater ambiguity, but to varying degrees across the districts?

(b) Does the impact of ideological distance depend on the
attitudinal heterogeneity among voters in the district?

(c) Controlling for ideological distance, does ideological
extremity [of the candidate] correspond to less ambiguity?

(d) Does more variance in attitudes [among district
residents] correspond to a higher degree of ambiguity in rhetoric?

(e) Does candidate rhetoric become clearer
as the candidate’s issue positions move to the ideological extremes?

(f) Does the variability in ambiguity scores
differ for Republican and Democratic candidates?

**8)** Let’s consider
some models for predicting the happiness of
musicians prior to performances, as measured by the *positive* affect
scale (pa) from the PANAS instrument. MPQ absorption = levels of
openness to absorbing sensory and imaginative experiences

__Model0__

(a) Calculate and interpret the
intraclass correlation coefficient

__Model1__

instructor =
ifelse(musicians$audience == "Instructor", 1 , 0)

students =
ifelse(musicians$audience == "Students", 1, 0)

summary(model1
<- lmer(pa ~ 1 + instructor + students +
(1 + instructor + students | subjnum), data = musicians))

(b) Interpret the 34.73, -4.19, and 4.51
values in context.

(c) Calculate a (pseudo) R^{2}
for Level 1 for this model.

__Model2__

(d) Write the corresponding model out as
Level 1 and Level 2 equations. (In terms of parameters, not estimates). How
many parameters are estimated in this model?

(e) Provide interpretations for all
estimated parameters!

(f) Which variance components decreased
the most between Model1 and Model2? Provide a brief interpretation.

(g) Suppose I want to add an indicator
variable for *male* to Model 2 as a predictor for all intercept and slope
terms. How many parameters will this add to the model?

(h) Suppose I add the male indicator to
Model 2 as suggested. How will this
change the interpretations you gave in (e)?