---
title: "Stat 414 - Day 21"
subtitle: "Random Slopes cont."
output:
word_document:
reference_docx: BCStyleGuide.docx
html_notebook: default
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_knit$set(global.par = TRUE)
#knitr::opts_chunk$set(fig.width=4, fig.height=2.67)
```
```{r, echo=FALSE, message=FALSE}
par(mar=c(4.1, 4.1, 1.1, 1.1))
options(digits = 7, scipen=3)
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(tidyverse)
library(nlme)
library(lme4)
library(performance)
#install.packages("texreg")
#library(texreg)
```
------------------------------------------------------------------------
**Last Time**
- We fit a “random coefficient model” $y_{ij} = \beta_{0j} + \beta_{1j}x_{1ij} +\epsilon_{ij}$ where $\beta_{0j} = \beta_{00} + u_{0j}$ with $u_{0j}\sim N(0, \tau_0^2)$ and $\beta_{1j} = \beta_{10} + u_{1j}$ with $u_{1j} \sim N(0, \tau_1^2)$
- This adds a variance component for the slopes ($\tau_1^2$) as well as a covariance between
the random slopes and random intercepts ($\tau_{01}$)
------------------------------------------------------------------------
**Example 1:** Continuing with our beach data. Richness varies by Beach, so including Beach in the model (as fixed or random effects) should give us more accurate standard errors. But we see some patterns (related to the beaches) in the residuals and this tells us that we might be able to improve the fit between the model and the data. In this case, we tried a random slopes model, and the variation in the slopes ($\tau_1^2$) was statistically significant.
```{r}
rikzdata <- read.table("http://www.rossmanchance.com/stat414/data/RIKZ.txt", header=TRUE)
model0 = lmer(Richness ~ 1 + (1 | rikzdata$Beach), data = rikzdata)
model1 = lmer(Richness ~ NAP + (1 | Beach), data = rikzdata)
model2 = lmer(Richness ~ NAP + (1 + NAP | Beach), data = rikzdata, REML = FALSE)
anova(model1, model2)
```
(a) Between what two values do we expect 95% of the slopes to fall?
But we notice this gives us a second parameter as well. What does it mean for the slopes and intercepts to be correlated?
```{r}
#this plots the "effects"
plot(ranef(model2)$Beach[,1]~ ranef(model2)$Beach[,2], xlab = "intercept effects", ylab="slope effects")
abline(lm(ranef(model2)$Beach[,1]~ ranef(model2)$Beach[,2]))
#this plots the resulting slopes and intercepts
slopes= fixef(model2)[[2]]+ranef(model2)$Beach[,2]
intercepts = fixef(model2)[[1]]+ranef(model2)$Beach[,1]
plot(slopes ~ intercepts, xlab = "intercepts", ylab="slopes")
abline(lm(slopes~intercepts))
```
(b) Interpret the strong, negative correlation ($corr(u_{0j}, u_{1j}) = \hat \tau_{01}$ = -0.99) between the slopes and intercepts in context: beaches with larger intercepts (meaning what?) tend to have what kinds of slopes (meaning what?).
Exposure is an index composed of the following elements: wave action, length of the surf zone, slope, grain size, and the depth of the anaerobic layer. Only one beach has Exposure = 8, so we are going to combine that with Exposure 10 make this a binary variable (the rest are exposure 11).
```{r}
rikzdata$ExposureCat = (rikzdata$Exposure > 10)
```
(c) Now consider adding Exposure, a Level 2 variable, to the model. What do you expect to change in the model?
Explore how exposure might be related to the intercepts and/or the slopes.
```{r}
BeachExposureCat = tapply(rikzdata$ExposureCat, rikzdata$Beach, mean)
plot(slopes ~ BeachExposureCat); abline(lm(slopes~BeachExposureCat))
plot(intercepts ~ BeachExposureCat); abline(lm(intercepts~BeachExposureCat))
```
(d) Is Exposure "positively" or "negatively" related to the intercepts? How about the slopes? What are the implications to the model?
Fit the model including Exposure
```{r}
model3 = lmer(Richness ~ NAP + ExposureCat + (1 + NAP | Beach), data = rikzdata, REML=FALSE)
summary(model3, corr=F)
BeachAsFactor = factor(rikzdata$Beach)
#Our predicted models
preds = predict(model3, newdata = rikzdata)
ggplot(rikzdata, aes(x = NAP , y = preds , group = Beach, color = BeachAsFactor )) +
geom_smooth(method = "lm", alpha = .5, se = FALSE) +
geom_abline(intercept = 8.1923-3.3238, slope = -2.85) +
geom_abline(intercept = 8.1923, slope = -2.85) +
geom_point(data = rikzdata, aes(y = Richness, color=BeachAsFactor), alpha = .5) +
theme_bw()
```
(e) Interpret the slope coefficient of Exposure in this model.
(f) Is the Exposure variable significant? Do you see a substantial improvement in the fit of the model? How do the variance components change/what has been the main impact?
Visualize the reduction in the Level 2 variability:
```{r}
boxplot(ranef(model2)$Beach[,1], ranef(model3)$Beach[,1], at = c(1,2),
xlab = "random intercepts",
names = c("model2", "model3"),
horizontal = TRUE
)
iscamsummary(ranef(model2)$Beach[,1])
iscamsummary(ranef(model3)$Beach[,1])
boxplot(ranef(model2)$Beach[,2], ranef(model3)$Beach[,2], at = c(1,2),
xlab = "random slopes",
names = c("model2", "model3"),
horizontal = TRUE
)
iscamsummary(ranef(model2)$Beach[,2])
iscamsummary(ranef(model3)$Beach[,2])
anova(model2, model3)
```
(g) Why didn't including the Exposure variable explain much variation in the slopes?
(h) To expand our model to allow for Exposure to explain variation in slopes, write the Level 1
and Level 2 equations, including Exposure in both Level 2 equations.
(i) Now make the composite equation, what happens? (*Hint*: What is the expression for the
intercepts and what is the expression for the slopes?)
So what do we tell R? How many parameters will this add to the model?
Fit the model (and the fancy texreg output for comparing models)
```{r warnings=FALSE}
#model3 = lmer(Richness ~ NAP+ ExposureCat + (1 + NAP | Beach), data = rikzdata, REML=FALSE) #no interaction
model4 = lmer(Richness ~ NAP*ExposureCat + (1 + NAP | Beach), data = rikzdata, REML=FALSE) #with interaction
summary(model4)
texreg::screenreg(list(model2, model3, model4), digits = 3, single.row = TRUE, stars = 0,
custom.model.names = c("no exposure", "no interaction", "interaction"), custom.note = "")
anova(model2, model3, model4)
```
(j) How many parameters did we add to the model? What is the estimate for that parameter? Is it statistically significant? How are you deciding?
(k) Does Model 4 appear to be a better fitting model? How are you deciding? (**Hint*: What measures of "model fit" do we have that we haven't done much with recently?)
**Notes:**
- With a level 2 variable, we are thinking of the level 1 intercepts and slopes as "outcomes" and then running a regression model to explain that variation
- "In cases where the explanation of the random effects works extremely well, one may end up with models with no random effects at level two... random intercepts, slope have zero variance.. Omitted.. The resulting model may be analyzed just as well with OLS regression analysis... within group dependence has been fully explained by the available explanatory variables/interactions (no more dependence in the residuals)."