Stat 414 Final exam review

Stat 414 – Final Review Handout

Optional Review Session: Sunday, 12-2pm (classroom and zoom?)

Final exam: Monday, 7:10-10:00am

Format of Exam: The exam will be cumulative with maybe a bit of emphasis on applying what we learned before to new situations (e.g., logistic models, longitudinal data, cross-effects). The exam will mostly be interpretation questions. Be ready to explain your reasoning, including in “layperson’s” language. You can use 2 pages of (two-sided) notes. I will be giving you output rather than asking you to run any models in R.

Advice: Review Quizzes/commentary, HW solutions/grader comments (including on submission page), Lecture notes, textbook. Post questions in the Final Exam discussion board in Canvas.

Overview: We have learned about multilevel models because they are required for proper analysis of multilevel data to account for the clustering/nesting in the design of the study. But basically, a multilevel model is simply a regression model that includes the “level 2 grouping variable” in the model to explore the adjusted associations. Adding the Level 2 could be done without learning any new methods by including the categorical variable as fixed effects. Alternatively, we have seen how to treat them as random effects, admitting that we don’t care too much about the specific observed levels of this level 2 grouping variable but to model the population the groups are sampled from, and adding only one new parameter to the model (intercept variance). So mostly we have focused on the consequences of that assumption, such as shrinkage of the estimates of the level 2 grouping variable’s “effects,” intraclass correlation as a measure of the within-group correlation, and random slopes as the parallel to assuming an interaction between level 1 variables and the level 2 grouping variable. We also were exposed to different estimation methods such as maximum likelihood, leading to likelihood ratio tests. This basic structure is easily extended to more than 2 levels and to generalized models such as logistic regression.

Topics since Midterm

From Day 10 (and Sec 5.2), you should be able to

· Add Level 2 variables to the model (model equations, R commands, graphs, interpretations, significance)

o Composite equation vs. Level equations

o Indices

o Derived vs. Observed

o Calculation of variation explained (intercepts, slopes)

From Day 11, you should be able to

· Use graphs, context, and model equation to interpret an interaction in context

o Explain the nature of the interaction

§ Change in slopes = change in effect of x₁ on y depending on value of x₂

§ NOT the same as x₁ and x₂ being related to each other

o Interpret signs of coefficients

· Write out separate equations

o Be able to talk about why we don’t just fit separate equations

· Be careful when interpreting “main effects” if have an interaction

o Can describe slope of x₁ on y when x₂ is at zero (or mean if centered)

· Why it’s useful to center variables involved in an interaction

· Interpret random slopes models (aka “random coefficients models”)

o Interpretation as interaction across higher level units

§ e.g., Level 1 variables can have random slopes at Level 2

· Explain the distinction between a random slopes model and fitting a separate equation for each Level 2 group

o Complete pooling vs. Partial pooling vs. No pooling

· Interpret the standard deviation/variance of the slopes ()

· Write out the Level 1 and Level 2 equations for the random slopes model

o Including specifying error terms, and their distributions, and covariance terms

o Do be careful with indices

o Thinking of Level 2 equations as “intercepts as outcomes” and “slopes as outcomes” models

o Interpretation of the model components

o Generally include correlation between slopes and intercepts (and slopes and slopes) in the model but can force to be zero

· Compare the random slopes to the fixed slopes models and decide significance of random slopes

· Compare relative sizes of variance components in context

· Interpretation/visualization of variance component for slopes

o 95% of slopes should fall within 2 SD of overall slope

· Interpretation of covariance/correlation between random intercepts and random slopes

· Distinguish between “random slopes” ( and “slope effects” ()

From Day 12 (and Sec. 5.2), you should be able to:

· Add a Level 2 variable to explain variation in random slopes

o Inclusion of cross-level interactions

§ Interpretation

o Level equations (adding Level 2 variable to equation for intercept vs. slope vs. both)

o Measuring change in Level 1 and Level 2 variances as percentage change

§ Could explain pretty much all of the Level 2 variation/a Level 2 variable can be sufficient adjusting for the clustering in the study design

· Explain how random slopes models induce heteroscedasticity in the responses

o Variance as quadratic function of

o Minimized at

§ Does this occur within the range of x values in the dataset?

· Explain how random slopes models different correlations among pairs of depending on the values involved

o Translating between and

· Interpret covariance/correlation between random intercepts and random slopes

o Distinguish between cov(y’s) and cov(u’s)

o Sign of covariance and implications for fanning in/fanning out of lines

§ Interpretation in context

o Translating between and

§ Recognizing which is reported in the output

· Interpret the variance-covariance matrix output (marginal vs. conditional)

o Compare model predictions to observed results

· Distinguish between Level 1 variance, Level 2 variance, and total variance

· Explain limitation of ICC in random slopes model

From Day 13, you should be able to:

· Consider random slopes for multiple variables

o Do you want them to be correlated?

o Can increase complexity of model pretty quickly

o Interpretation of random effects correlations (and identifying pairs from output)

· Determine the number of parameters being estimated in a model

o SD and Var are just 1, Include covariances

· Decide when to use a logistic regression model

· Interpret an odds ratio in context

· Explain why we don’t often use linear regression models to model probabilities

· Interpret a basic logistic regression model

o Back transform intercept to predicted probability

o Use exp(slope) as an odds multiplier

o Substitute into equation and back transform to predicted probability

· Continue to consider multiple regression models and adjusted associations

From Day 14, you should be able to

· Identify the need for a multilevel logistic regression model (response variable is binary rather than quantitative, with clustered data)

o Use a chi-square test to decide whether the level 2 grouping variable is associated with the response variable (aka significant level 2 variation in response)

o Fit a logistic regression model with random intercepts

§ Same as adding a categorical variable with lots of categories that we aren’t really all that interested in individually but want to adjust for their impact on the response

o Compute an intraclass correlation coefficient with a multilevel logistic regression null model

o Interpret the sign of a slope coefficient in a multilevel logistic multiple regression model

§ Predicted probability as increasing or decreasing

§ Adjusting for other variables

§ Slope as “subject specific” rather than “population average” effect (average subject rather than averaged over all the subjects)

o Interpret/visualize random slopes in a multilevel logistic regression model

§ Keep in mind that changing intercepts moves the model left and right, changing slopes changing the rate of increase/decrease (how quickly it starts the S-shaped pattern)

o Write out the Level 1 and Level 2 and composite equations for multilevel logistic multiple regression model

o Describe the variance component(s)

o Compare models

o Summarize models in context

From Days 15/16, you should be able to

· Apply multilevel models to longitudinal data

o Repeated measurements at Level 1

· Identify time independent (“invariant”) vs. time dependent explanatory variables

· Identify Wide vs. Long format

· Compute percentage of Level 1 variation explained by changes over time as well as changes explained by other variables after accounting for time dependence

· Consider different error structures (AR(1)) at Level 1

o vs. random slopes

o Application of variance/covariance equations

o Comparison to observed correlation matrix

· Consider different forms of association at Level 1 (e.g., quadratic, piecewise)

· Use graphs to suggest components that should be included in model

From Day 17, you should be able to

· Fit a three-level model (model equations, R syntax, interpretations)

· Variance partitioning coefficients vs. ICC

· Interpret interactions

· Discuss different model formulations (e.g., random slopes at one level but not another)

From Chapter 13, you should be able to

o Identify non-hierarchical models (imperfect hierarchies)

§ Lower-level groups feed into different upper level groups

o Interpret a “crossed-effects” model

§ Multiple sources of “random effects” on the same level

§ Interpret large/small random effects in context

§ Interpret parameter estimates in context

· Still need to consider whether a variable is included in an interaction

· Interpret interaction as changes in slope/effect of other variable

§ Variance components, intraclass correlation coefficient combinations

§ Prediction

§ Random slopes

Some reminders

· Distinguish between variables that have only level 1 variation, level 1 and level 2 variation, only level 2 variation

o A Level 1 variable can explain variation at Level 2 if the distributions (means) of the Level 1 variable differ across the level 2 categories. Can also increase the Level 2 variation if the associations are in “opposite directions.”

· Discuss random slopes as the interaction between a level 1 variable and the level 2 units/grouping variable

o As a proxy for meaningful Level 2 variables or could be replaced if have access to meaningful Level 2 variables

o Assumes Level 2 random effects are not associated with Level 1 variables (no confounding/have accounted for the relevant variables/model is appropriately specified)

· Can test specific Level 2 variables (e.g., aggregated Level 1 variables) to decide whether the Level 1 association between y and x and the Level 2 association between and significantly differ

o e.g., does living in a more religious country have the same effect as being a more religious person?

o e.g., does being the type of family who lives in poverty vs not have the same effect of a change in poverty for an individual child?

· Treating the Level 2 grouping variable as fixed instead of random is a way to adjust for all possible observed and unobserved characteristics from Level 2 unit to unit rather than the random effects model which adjusts for “units like these.”

· Interpretation of different model components in context

o Go beyond “variation in intercepts and slopes” but be able to explain in context what the intercepts and slopes represent

o Including for categorical variables

o Including variance explained

o Including interactions

o Including main effects when have interactions

§ When need to “fix” other variables and when need to set them to zero or mean