Stat 414 – Review II

Due Tuesday, 9am: In PolyLearn,

(1) post at least one question you have on the material described below.

(2) post at least one example question that you think I could ask on the material described below

Format of Exam: The exam will cover Lectures 9-15, Homeworks 5-6, Ch. 4-5, 8, 10, 15). The exam will mostly be interpretation questions but I might ask you to do something on the computer (using R). The exam will be worth approximately 50-75 points. Be ready to explain your reasoning, including in “laymen’s” language. You can use hard copies of notes, handouts, and textbook. Keep in mind that some questions could overlap with material on exam 1.

What you need to know about Hierarchical Linear Models

· Recognizing multilevel data

o Multilevel models include “grouping variable(s)” (e.g., school) that cluster multiple outcomes (that you expect to be correlated) together, and the “effects” of these grouping variable(s) are modeled as the unobserved outcomes from some random distribution instead of as unrelated and unknown fixed values.

· Interpreting graphs to explore relationships

· Defining Level 1 and Level 2 and Level 3

o How to specify the levels in R

· Random intercepts models

o Interpret the variance components in context

o Intraclass correlation coefficients

o Variance partitioning coefficients

· Random slopes models

o Interpretation as interaction across higher level units

§ e.g., Level 1 variables can have random slopes at Level 2

o Generally include correlation between slopes and intercepts (and slopes and slopes) in the model but can force to be zero

o Interpretation/visualization of variance component for slopes, covariance

o 95% of slopes should fall within 2 SD of overall slope

o Multiple variables can have random slopes (do you want them to be correlated?)

· Stating the model equation, interpreting the model components

o Thinking of the Level 2 equations as “intercepts as outcomes” and “slopes as outcomes” models

o Working with both “level equations” and “composite equation”

o Do be careful with indices

o Do specify error terms, and their distributions, and covariance terms

· Adding variables that are measured at different levels

o Inclusion of cross-level interactions

o Expected change in Level 1 and Level 2 variances (percentage change)

· Using estimated effects to make predictions (e.g., “what is the expected response for a student with IQ=100, a student with IQ = 100 at school 2”)

o Explain the principle of shrinkage/pooling as it pertains to multilevel models

· Comparing models

o Using likelihood ratio test to assess whether a full model is significantly better than a reduced (nested) model

§ Be able to write out hypotheses

o Using AIC, BIC to compare models

o Can also calculate percentage change in variation explained at different levels (pseudo-R²)

§ Usually with random intercepts only

· Do be able to interpret model equations and model output

o Parameters vs. statistics

o When need to “fix” other variables and when need to set them to zero or mean

o Interpretation of covariance (correlation) between intercepts and slopes in context

o “What does it mean to have random intercepts in this model?” “What does it mean for this variable to have random slopes?” (in context, in model, in pictures)

· Centering

o Why good idea

o Interpretation of coefficients

o Group mean centering vs. Grand mean centering

o Within group vs. Between group regressions

§ Including exploration/explanation for why they could differ

· Examining residuals to check model assumptions

o Marginal vs. Conditional residuals

o Random effects, Random effects residuals

o Exploration and interpretation of influential observations (what it means to be influential and why we care), possible remedies

· Apply to longitudinal data

o Repeated measurements at Level 1

o Consider different forms of association at Level 1 (e.g., quadratic, piecewise)

o Wide vs. Long format

o Use graphs to suggest components that should be included in model

o Percentage of Level 1 variation explained by changes over time

o Time independent vs. time dependent variables

VPC = variance partition coefficient = percentage of variation in response explained by each level

o VPC = (level variance)/(total variance)

o Focusing on the random intercepts (no random slopes)

o In a two-level model, this equals the ICC, which also represents the correlation between 2 individuals in the same Level 2 group (e.g., 2 students in the same school)

o The higher the correlation within the level 2 units (i.e., the larger the ICC) the lower the variability is within the level 2 units and consequently the higher the variability is between the level 2 units.

o In a three-level model (e.g., students, classroom, school), are a few different types of correlations could look at

o ICC = level 3 variance / (total variance)= correlation of responses within same level 3 unit across the level 2 units (e.g., students in same school)

o ICC = (level 2 variance + level 3 variance)/(total variance) = correlation of responses within same level 2 units (e.g., students in same class)

o ICC = level 3 variance / (level 2 variance + level 3 variance) = correlation of level 2 units with the same level 3 unit (e.g., classes within same school)

o (level 1 variance)/(total variance) = how much variation is due to level 1 differences

Remember to

· Go beyond “variation in intercepts and slopes” but be able to explain in context what the intercepts and slopes represent

· Distinguish between aggregating and disaggregating data

· Distinguish between correlation of observations, correlation of errors, and correlation of slopes

· Check out the “practice questions” in PolyLearn

· Be able to set up a model based on the research questions

· Translate between equations and graphs and output

· Be able to interpret interactions in context