---
title: "Stat 414 - Day 18"
subtitle: "Level 1 Predictors (Section 4.4)"
output:
word_document:
reference_docx: BCStyleGuide.docx
html_notebook: default
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_knit$set(global.par = TRUE)
#knitr::opts_chunk$set(fig.width=4, fig.height=2.67)
```
```{r, echo=FALSE, message=FALSE}
par(mar=c(4.1, 4.1, 1.1, 1.1))
options(digits = 4)
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
library(tidyverse)
library(nlme)
library(lme4)
```
------------------------------------------------------------------------
### Last Time: Adding a Level 1 variable
- Creates a model with parallel lines for each level 2 unit (intercepts following a normal distribution with some shrinkage)
- intercept is the (predicted or expected) (mean) response for the "average" level 2 unit
- slope is the (estimated or expected) change in response holding level 2 unit fixed
- Level 1 variables ($x_{ij}$) are expected to explain Level 1 variation (lower $\sigma$). Can also explain Level 2 variation if the distribution of the variable changes across the Level 2 units
- Assess statistical significance with either t-tests/partial F-tests or LRT test
- Interpretation of adjusted intraclass correlation coefficient (is grouping still relevant after conditioning on covariate)
------------------------------------------------------------------------
**Example 1: Netherlands language scores cont.**
```{r}
neth = read_delim("https://www.rossmanchance.com/stat414F20/data/NetherlandsLanguage.txt", "\t")
library(lme4)
nullmodel = lmer(langPOST ~ 1 + (1|schoolnr), data = neth, REML = FALSE); summary(nullmodel)
model1 = lmer(langPOST ~ 1 + IQ_verb + (1|schoolnr), data = neth, REML=F); summary(model1)
```
(a) What percentage of the Level 1 variance was explained by verbal IQ? ("Pseudo-$R^2$")
(b) What percentage of the Level 2 variance was explained by verbal IQ?
```{r}
schoolmeans <- neth %>%
group_by(schoolnr) %>%
mutate(mean = mean(langPOST)) %>%
ungroup()
plot(neth$IQ_verb~schoolmeans$mean)
```
(c) What percentage of the total variance was explained by verbal IQ?
```{r}
performance::model_performance(model1)
performance::icc(model1)
performance:::r2(model1)
var(model.matrix(model1) %*% fixef(model1)) #26.18, variance explained by fixed effects
```
Note: We can think of ICC as the proportion of total variance explained by the grouping variable and $R^2$ as the proportion explained by the fixed effects. The difference between adjusted/unadjusted ICC is whether you take into account the "variance explained by the fixed effects" in the denominator. The difference between conditional and marginal \$R\^2% is whether you account for the random effects in the numerator (conditional - marginal = unadjusted ICC).
(d) Is the amount of variability explained statistically significant?
```{r}
summary(model1)
anova(nullmodel, model1)
```
**Example 2: Radon** (note: this data file may not match the one from HW)
Radon comes from underground and can enter more easily when a house is built into the ground (i.e., has a basement). In this dataset (for 919 homes across 85 counties in MN), *floor* indicates whether the measurement was taken in the basement or the first floor, and *basement* indicates whether the house had a basement.
```{r}
ctydata <- read_csv("https://www.rossmanchance.com/stat414/data/cty.dat") #county-level data, including uranium
srrs2 <- read_csv("https://www.rossmanchance.com/stat414/data/srrs2.dat") #house-level data including stfips and cntyfips
#names need to match
ctydata$cntyfips <- as.numeric(ctydata$ctfips)
#have a few duplicated rows in county data
ctydata$newfips <- as.numeric(ctydata$stfips)*1000 + ctydata$cntyfips
length(unique(ctydata$newfips))
merged <- merge(srrs2, ctydata[!duplicated(ctydata$newfips),], by=c("cntyfips", "stfips"))
#focus on just Minnesota
mergedmn <- merged[which(merged$state == "MN"),]
nrow(mergedmn)
mergedmn$uranium = log(ifelse(mergedmn$Uppm==0, .1, mergedmn$Uppm))
mergedmn$logradon = log(ifelse(mergedmn$activity == 0, .1, mergedmn$activity))
#library(lme4)
model0 <- lmer(logradon ~ 1 + (1 | newfips), data = mergedmn)
summary(model0)
```
(a) What is the estimated mean (log) radon ?
(b) What is the basemeas variable about?
```{r}
library(dplyr)
mergedmn <- mergedmn %>%
mutate(basemeas = case_when(
(basement == "Y" & floor == 0) | is.na(basement) ~ "baselower",
basement == "Y" & floor == 1 ~ "baseupper",
TRUE ~ "nobase"
)
)
head(mergedmn)
table(mergedmn$basemeas)
boxplot(mergedmn$logradon~mergedmn$basemeas)
```
(c) Do you predict higher or lower radon levels if the house has a basement and the measurement was taken in the basement? What if the house doesn't have a basement?
(d) Check your predictions
```{r}
model1 <- lmer(logradon ~1 + basemeas + (1 | newfips), data = mergedmn)
summary(model1)
```
(e) What do the estimated random effects $\hat{u}_j$represent in this model?
(f) How does adding this variable to the model change the variance components?
(g) Let's explore Roseau county
```{r}
avg_radon <- ave(mergedmn$logradon, mergedmn$county)
prop_basement <- 1-ave((mergedmn$basemeas == "nobase"), mergedmn$county)
hist(prop_basement)
boxplot(mergedmn$logradon~mergedmn$basemeas)
modelf = lm(logradon ~ basemeas, data = mergedmn)
cbind(mergedmn$county, mergedmn$newfips, mergedmn$logradon, avg_radon, fitted.values(model0), mergedmn$basemeas, fitted.values(modelf), prop_basement)[136:149,]
mean(fitted.values(model0)[136:149])
mean(fitted.values(modelf)[136:149])
tail(ranef(model0)$newfips, 20)
tail(ranef(model1)$newfips, 20)
```
*Notes:*
- Marginal $R^2$ measures the variance explained by the fixed effects as a proportion of the sum of all the variance components ($\hat\sigma^2 + \hat\tau^2 + var(X\beta)$ ("The fixed effects explain ...")
- conditional $R^$ measures the variance explained by both the fixed and random effects in the model. ("The fixed and random effects explain ...")
- The ICC is the difference between these! (the contribution of the random effects...)
- There is also some lack of agreement ("the literature does not seem to have converged onthis topic") in how to calculate 𝑅2 values for these models as the formulas provided herecan actually turn out to be negative!\*