---
title: "Stat 414 - Day 15"
subtitle: "Shinkrage Estimates (4.8)"
output:
word_document: default
html_notebook: default
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_knit$set(global.par = TRUE)
```
```{r, echo=FALSE, message=FALSE}
par(mar=c(4.1, 4.1, 1.1, 1.1))
options(digits = 7, scipen=3)
library(tidyverse)
library(nlme)
```
------------------------------------------------------------------------
### Last Time:
- Fixed vs. Random Effects: If categories themselves aren’t so much of interest, but want to consider the “grouping units” as a random sample from a larger population, can treat as random effects (“effect” = deviation from overall mean)
- $E(Y_{ij}) = \beta_0 + u_j + \epsilon_{ij}$ where we are assuming $\epsilon_{ij} \sim N(0, \sigma^2)$ and $u_j \sim N(0, \tau^2)$ and $cov(\epsilon_{ij}, u_j) = 0$
- Benefits include fewer parameters to estimate and generalizabiity to larger population of units.
- Also induces a non-zero correlation between two observations from the same Level 2 units (this allows us to model dependence within the groups)
------------------------------------------------------------------------
Even though we say we are not all that interested in the individual $u_j$ and that they aren't really parameters but "unobservable latent effects," we do still get estimates for them that are used to estimate $\tau$ and it might still be interesting to explore those estimates (e.g., do they appear to be normally distributed? But how are they estimated differently?
**Example:** Suppose the data in bball.txt are the batting averages for 6 players over several seasons.
```{r}
bball <-read.table("http://www.rossmanchance.com/stat414/data/bball.txt",header=T)
tail(bball)
hist(bball$Batting)
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
iscamsummary(bball$Batting)
```
(a) What is the overall mean batting average for these 43 observations?
(b) Find the sample sizes, means, and standard deviations for each player:
```{r}
boxplot(bball$Batting ~ bball$Player)
iscamsummary(bball$Batting, bball$Player)
```
(c) Do you expect a high or low ICC value? Explain.
(d) Opinion: Do you really think Rodriguez and Suarez are that much better than everyone else? What else could be going on? Which averages do you find the most/least "trustworthy"? Why?
**Treat ***player* as a fixed effect (with effect coding) and fit a linear model.
```{r}
playerf = factor(bball$Player) #rather than treating as character
contrasts(playerf) = contr.sum(6) #make sure uses effect coding, 6 is the number of categories
#Each pair run both and compare the output
summary(model1 <- lm(bball$Batting ~ playerf))
summary(model1gls <- gls(Batting ~ playerf, data = bball))
library(multilevel)
ICC1(aov(Batting ~ playerf, data = bball))
```
(e) What is the estimated overall mean batting average across all players? Is this the same as in (a)? Why not? Where does it come from? (*Hint*: Why larger?) What is the ICC?
(f) What does this model estimate for the "effect" of Jones? (R lists them alphabetically)
(g) What does this model estimate for Jones' population mean batting average? (Hint: overall mean + Jones' effect. Does this number look familiar?)
**Now treat player as a random effect.**
```{r}
library(nlme)
model2 = lme(fixed = Batting ~ 1, random = ~ 1 | Player, data = bball)
summary(model2)
```
(h) What is the estimate of the overall mean?
(i) What is the intraclass correlation coefficient?
The output no longer gives us the estimated effects for the players, but R does store them for us. How do these estimates compare to player effects when estimated as fixed effectgs?
```{r}
ranef(model2)
```
(j) How do these player estimated effects compare to model 1? (You may need to find Suarez.)
We can also convert the effects to the predicted values for each player
```{r}
#Fitted values, prediction for each player
fits=predict(model2); fits
#How do these compare to the player means?
tapply(bball$Batting, bball$Player, mean)
tapply(fits, bball$Player, mean)
```
(l) Whose estimates (Jones or Suarez) changed more? Why does that make sense for these data?
**Definitions:** One way to estimate a player's effect is to ignore all the other players, call this *no pooling*. Another way is to ignore the player to player differences and use the overall mean, call this *complete pooling*. Treating the player as a random effect creates *partial pooling*. We can think of each random effect estimate (predicted group mean) as being a weighted average of the overall mean and the group mean: $w(group mean) + (1- w)(overall mean)$ where the weight for group $j$ ($w_j$), depends on the relative sizes of the variance components and on the group size, $w_j = \tau^2 / (\tau^2 + \sigma^2/n_j)$. The weights reflect the "reliability" of the group.
(m) Calculate the weights for Jones and Suarez. Why are the weights pretty large? Which is larger? Why?
(n) Verify the estimated group means for Jones and Suarez using these weights. Which changes (from the group mean) more? Why?
(s) Summarize what you learn from the graph.
```{r}
ggplot(bball, aes(x=Player, y = playermeans$Batmean, group=playerf)) +
geom_point(aes(y = Batting), col="grey") +
geom_point(aes(y=fits), col="red") +
geom_point(aes(y = playermeans$Batmean), col="blue") +
theme_bw()
```
**Notes**
- Degree of shrinkage depends on the variance of the effect and the number of observations per level in the effect. With large variance estimates, there is little shrinkage.
- You can consider fixed effects as a special case of random effects where the variance component is very large.
- If variance component is small, then more shrinkage.
- If the variance component is zero, the effect levels are shrunk to exactly zero. It is even possible to obtain highly negative variance components where the shrinkage is reversed.
- If very few observations per level, then more shrinkage.
- If many observations per level, the estimates shrink less.
- You can consider fixed effects as a special case with infinitely many observations.