This model assumes that all of the observations are independent, including students in the same class. Is there much class-to-class variability? In other words, how correlated are the observations in each class? ```{r, echo= c(6, 8, 10)} load(url("https://www.rossmanchance.com/iscam3/ISCAM.RData")) iscamsummary(adultlit$sessions, adultlit$classid) table(adultlit$classid) boxplot(sessions ~ classid, data = adultlit) summary(aov(sessions ~ factor(classid), data = adultlit)) # fix me!!!!!! icc = ; icc ``` (d) Find the "effective sample size" and use that value to "correct" the standard error of the coefficient. Does it impact the statistical significance of that coefficient in this case? ```{r} #Effective sample size: N/(1 + (n-1)×ICC) fixing the typo in Day 13 ess = ; ess #'corrected' standard error #sqrt(n/ess) * SE(beta-hat) ``` To account for this dependence, we can instead carry out a "random effects anova": $y_{ij} = \beta_{0} + u_j + \epsilon_{ij}$ for the $i^{th}$ observation in the $j^{th}$ group. Let's rewrite this as a multilevel model: Level 1: $y_{ij} = \beta_{0j} + \epsilon_{ij}$ with $\epsilon_{ij} \sim N(0, \sigma^2)$ Level 2: $\beta_{0j} = \beta_{00} + u_j$ and $u_j \sim N(0, \tau^2)$ and $cov(\epsilon_{ij}, u_j) = 0$ Notice that substituting $\beta_{0j}$ back into the Level 1 equation returns us to the original "composite" equation. This is also sometimes referred to as a "random intercepts" model or a "null model" or a "variance components model." Use this model to recalculate the ICC. ```{r} #library(nlme) model0 <- lme(fixed = sessions ~ 1, random = ~ 1 | classid, data = adultlit) summary(model0) ``` (e) What are the estimated coefficients? The ICC?

We can view the estimated variance-covariance matrix for individual classes. ```{r} #library(nlme) getVarCov(model0, classid = "1", type = "marginal")[[1]] cov2cor(getVarCov(model0, classid = "1", type = "marginal")[[1]]) #also note getVarCov(model0, classid = "1", type = "conditional")[[1]] #also note library(nlraa) vcmatrix1 = nlraa::var_cov(model0, type = "all"); vcmatrix1[1:10, 1:10] dim(vcmatrix1) ``` Is the variability between the classes statistically significant? ```{r} #unfortunately this doesn't work #MLmodel <- lme(fixed = sessions ~ `, data = adultlit) #but you can go back to gls (REML) #library(nlme) GLSmodel0 <- gls(sessions ~ 1 , data = adultlit) anova(model0, GLSmodel0) #what does this tell you? intervals(model0) ``` Although, even if the variability is not statistically significant, accounting for class is definitely the more "valid" model. So now let's consider the treatment effect, after adjusting for class. ```{r} MLmodel1 <- lme(sessions ~ group, random = ~1 | classid, data = adultlit) summary(MLmodel1) ``` (f) How does adjusting for this lack of independence impact the standard error of the coefficients from model 1? **Notes** - If the observations within a group are correlated, then we are violating the "independence" condition and possibly artificially reducing our standard errors (makes it look like we have a lot more observations than we really do). Multilevel models “find the right standard error.” We also model the dependence rather than just “correcting” for it. ```{r} #install.packages("ggeffects") library(ggeffects) library(tidyverse) library(lme4) MLmodel1b = lmer(sessions ~ group + (1 | classid), data = adultlit) ggpredict(MLmodel1b, terms = c("group", "classid [sample = 9]"), type = "re") %>% plot() # Use ggpredict to create the plot with type = "re" plot_data <- ggpredict(MLmodel1b, terms = c("group", "classid [sample = 9]"), type = "re") # Plot the data with only lines (no SEs or CIs) # group is now classid? ggplot(plot_data, aes(x = x, y = predicted, group = group, color = group)) + geom_line() + # Each classid will have a different line labs(y = "Sessions", x = "Incentive = 0, Control = 1", color = "class id") + theme_bw() ```