Stat 414 – Review I

 

Due Monday, 9pm: In PolyLearn,

(1) post at least one question you have on the material described below.

(2) post at least one example question that you think I could ask on the material described below

 

Format of Exam: The exam will cover Lectures 1–8, Homeworks 1–3.  The exam will mostly be interpretation questions but I might ask you to do something on the computer (using R).  The exam will be worth approximately 50 points.  Be ready to explain your reasoning, including in “laymen’s” language.   You can use hard copies of notes, handouts, and textbook.  You can also access R scripts from the lecture notes page.

 

(I’m going by handout number, more than day discussed, subbullets may be expansions)

 

From Day 1 handout, you should be able to

·       Defining meaningful variables for a given context (e.g., tip amount vs. tip percentage)

·       Explain the meaning of “least squares” as a possible estimation method

·       Identify and evaluate validity conditions (LINE) for inference with least squares regression (in context)

·       Identify and recommend possible remedies to violations of LINE assumptions

o   Making conjectures based on what know about study context

o   Using residual plots to check for patterns/relationships with other variables

o   Residuals vs. x versus Added Variable Plots

o   Interpreting influential observations and  leverage values

 

From Day 2 handout, you should be able to

·       Interpretation of intercept and slope coefficients in context

o   Multiple regression: After adjusting for other variables in the model (e.g., comparing individuals in the same sub-population)

§  Adjusted and Unadjusted relationships can look very different

o   Interpreting coefficients of categorical variables

§  Effect vs. Indicator parameterization

§  Value of missing coefficient

o   Possible justifications for choosing whether to model a variable as quantitative or categorical

·       Use graphs, context, and model equation to interpret an interaction in context

o   Be able to explain the nature of the interaction

o   Be able to interpret signs of coefficients

o   Be able to write out separate equations

o   Be able to talk about why we don’t just fit separate equations

o   Be careful when interpreting “main effects” if have an interaction

o   Why it’s important to center variables involved in an interaction

·       Carry out test of significance involving individual or graphs of slope coefficients

o   Stating appropriate hypotheses for removing the variable

·       Utilize residual plots to evaluate validity conditions

o   Identify which plot to use and what you learn from the plot

o   Also review study design

·       Explain the principle of weighted least squares and variance covariates to address heterogeneity in residuals

o   Use standardized residuals in residual plots

·       How to check for multicollinearity (e.g., variance inflation factors)

o   Explain what centering is and how centering helps to alleviate certain types of multicollinearity

 

From Day 3 handout, you should be able to

·       How s2 is estimated in least squares regression (sum of squared residuals)

·       Interpretation of root mean square error/residual standard error (unexplained variation about regression line)

·       Interpretation of R2

·       Explain the principle of Maximum Likelihood as another possible estimation method

o   One variable case

o   Two variable case

o   How relates to least squares estimation

·       Testing significance of individual coefficients, groups of coefficients, whole model

o   Partial F test aka Drop in SSE for nested models (df?)

o   Likelihood ratio test (df?)

·       Information criteria/Measures of Fit

o   R2 adjusted

o   AIC

o   BIC

o   Log Likelihood 

·       Graphing the model vs. graphing the data

·       When/How the regression model is the same as

o   Two-sample t-test

o   ANOVA

o   Matched-pairs t-test

 

From Day 4 handout, you should be able to

·       Explain the distinction between ML and REML

·       Identify which method is being used by the software

·       Explain why some comparisons can’t/shouldn’t be made

o   Determine whether models are “nested”

 

From the Day 5 handout, you should be able to

·       Explain the various components of a traditional ANOVA table and how they relate

·       Explain why blocking variables must be included in the analysis and consequences of failing to do so

·       Explain the distinction between adjusted and sequential sums of squares

o   Write out the corresponding hypotheses

·       Explain the principle of variance partitioning

·       Interpret regression coefficients in terms of effect and/or indicator coding

·       Explain the distinction between treating a factor as random or fixed

o   When you might choose to treat a variable as random effects rather than fixed effects

o   Observed “levels” are representative of a larger population (random sample?)

o   Want to make inferences to the larger population

o   Allows us to estimate the variance component

o   Intraclass correlation coefficient

o   Standard errors reflect the randomness at the individual level and from the random effect

·       Discuss advantages such as

o   Generalizability

o   Degrees of freedom

·       Testing the significance of the variance component

o   Stating null and alternative hypotheses

o   Used fixed effects ANOVA

o   Likelihood ratio test (MLE or REML approach)

o   Cut p-value in half?

 

From Day 6 handout, you should be able to

·       Recognizing multilevel data

o   Multilevel models include grouping variables that cluster multiple outcomes together, and the effects of these grouping variables are modeled as the unobserved outcomes from some random distribution instead of as unrelated and unknown fixed values.

·       Defining level 1 and level 2

·     Write out the statistical models (’s, uj, s, )

o   Interpret model components

o   Define indices

o   Interpretation of intercept

o   Standard deviations vs. variances vs. covariances

o   Composite equation vs. Two-level equation

·       Identify number of units at each level

·       Interpret R output (lme, lmer)

·       Determine and interpret the Intraclass correlation coefficient (ICC)

o   Equivalent for the null model to VPC

o   With variables in the model can call it adjusted or conditional

·       Describe consequence of induced correlation

·       Determine the effective sample size and explain its intuition

 

From Day 7 you should be able to

·       Interpret the variance components in context (e.g., vs. graphs)

·       Explain the principle of shrinkage/pooling as it pertains to multilevel models

o   Impact of pooling on variance estimates

o   Factors that impact the size of the weights/consequences

·       Using the estimated variance component to make predictions, standard errors

 

(Predicted) From Day 8 you should be able to

·       Interpret the random intercepts model with one fixed predictor (parallel lines)

o   Idea of intercepts following a normal distribution about the “average line”

o   Residual intraclass correlation coefficient and how might compare

§  Expected change in Level 1 and Level 2 variances

·       Using estimated effects to make predictions (e.g., “what is the expected response for IQ=100 and school 2”)