---
title: "Stat 414 - Day 12"
subtitle: "Likelihood Ratio Tests, Restricted Maximum Likelihood"
output:
word_document:
reference_docx: BCStyleGuide.docx
html_notebook: default
html_document:
df_print: paged
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_knit$set(global.par = TRUE)
#knitr::opts_chunk$set(fig.width=4, fig.height=2.67)
```
```{r, echo=FALSE, message=FALSE}
par(mar=c(4.1, 4.1, 1.1, 1.1))
library(tidyverse)
options(digits = 7, scipen=3)
```
------------------------------------------------------------------------
**Last Time: Maximum Likelihood Estimation**
- An alternative to least squares for fitting the parameter estimates in our regression model
- $\sigma^2$ is also considered a parameter in this model and is estimated simultaneously (iteratively) with the $\beta$ coefficients
- The MLE estimate of $\sigma^2$ is biased because it does not account for the other parameters being estimated
- Comparisons of model fit can include penalties for number of parameters in the model
- e.g., want small AIC = -2 x log-likelihood + 2p; BIC = -2 x log-likelihood + p x ln(n)
- Like a drop in SSError test, we can compare “nested” models by looking at the change in the log likelihood.
- $2(L_1 – L_0)$ follows a chi-square distribution with df = difference in number of parameters in the two models
------------------------------------------------------------------------
**Example:** Recall our data on 768 male squid.
```{r}
Squid<-read.table("http://www.rossmanchance.com/stat414/data/Squid.txt",header=T)
nrow(Squid)
Squid$fMONTH = factor(Squid$MONTH)
model1 <- lm(Testisweight ~ DML * fMONTH, data = Squid)
#new trick, just show the first few rows of the "t-table"
summary(model1)$coeff[1:5,]
sum(residuals(model1)^2)
sigma(model1) #residual standard error
par(mfrow=c(2,2))
plot(model1)
```
(a) How many parameters are being estimated in this model?
(b) Show how to calculate the residual standard error ($\hat{\sigma}_{LS}$.
(c) Show how to calculate ($\hat{\sigma}_{ML}$
(d) Fit the model using maximum likelihood and verify your answer to (c).
```{r}
library(nlme)
model1ML <- gls(Testisweight ~ DML * fMONTH, data = Squid, method = "ML")
summary(model1ML)$tTable[1:5,]
sigma(model1ML)
logLik(model1ML)
-768/2*log(2*pi) - 768/2*log(2.5081^2)-1/(2*2.5081^2)*sum(residuals(model1ML)^2)
```
How do the regression coefficients change? What is the achieved log likelihood? How many parameters are being estimated in
this model?
Because of the biased nature of the estimate of $\sigma$, an alternative proposed in the 1930s is *restricted maximum likelihood*, REML. Some properties:
- Maximizes a different likelihood function (special matrix multiplication, some rows constrained), that doesn't depend on the slope coefficients (residuals, removes the "fixed" effects from the model).
- Is often an "iterative" estimation process (e.g., estimate slope coefficients, then sigma, then slope coefficients, etc.)
- With simple regression models, the estimated slope coefficients are the same, but the parameter estimate of the variances differ.
- $E(\hat{\sigma}_{REML}^2) = \sigma^2$ (so really unbiased for variance not standard deviation)
- Overall, REMLs are better to use for estimating variances
(e) Fit the model with restricted maximum likelihood estimation. Does the residual standard error change? Do the regression coefficients change? Does the “achieved likelihood” change?
```{r}
#sigma
#logLik
```
The main thing to keep in mind is that if you want to compare models, use the same estimation procedure (e.g., both use REML or both use ML). If the only differences in your model are the "fixed effects" (regression slopes and intercept) but the same random components (e.g., variance assumptions), probably recommend ML, but REML if variance estimation is involved...
The flexibility gained by using maximum likelihood allows us to explore very different variance-covariance structures for our errors. Previously for this dataset, we used weighted least squares to allow the variance in $y$ increase with $DML$.
Model 2: $Var(\epsilon_{i}) = \sigma^2 \times DML_i$ for observation $i$.
```{r}
#Before
model2WLS = lm(Testisweight ~ DML*fMONTH, data = Squid, weights = 1/DML)
summary(model2WLS)$coeff[1:5,]
sigma(model2WLS)
#Using REML instead
model2REML = gls(Testisweight ~ DML * fMONTH, data=Squid, weights = varFixed(~DML), method="REML")
summary(model2REML)$tTable[1:5,]
sigma(model2REML)
logLik(model2REML)
plot(model2REML)
```
(f) Does this model "fit" better? Are the model assumptions sufficiently met?
(g) How do the likelihoods of model 1 and model 2 compare? What does this tell you? Can you carry out a likelihood ratio test?
```{r}
anova(model1REML, model2REML)
```
We also note pretty different variances in $y$ across the different months.
```{r}
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
iscamsummary(Squid$Testisweight, Squid$fMONTH)
boxplot(Squid$Testisweight~Squid$fMONTH) #months with most variability = 9, 10; least 7 and 8
```
Model 3: weighted regression allowing the variances to differ by month $Var(\epsilon_{ij}) = \sigma_j^2$ for observation $i$ and month $j$.
```{r}
model3REML = gls(Testisweight ~ DML * fMONTH, data=Squid, weights = varIdent(form= ~ 1 | fMONTH), method="REML")
summary(model3REML)$tTable[1:5,]
logLik(model3REML)
```
(h) How does the likelihood of model 3 compare? What does this tell you? DF? Can you carry out a likelihood ratio test? How are the model assumptions?
```{r}
anova(model1REML, model3REML)
plot(model3REML)
```
Now we can go really crazy, we can let the variances increase by DML, in perhaps a different way (different power) for each month. $Var(\epsilon_{ij}) = \sigma^2 DML^{\delta_j}$
```{r}
vfbymonth <- varPower(form =~ DML | fMONTH ) #allows different powers of DML, per month
model4REML = gls(Testisweight ~ DML * fMONTH, data=Squid, weights = vfbymonth) #default is REML
summary(model4REML)$tTable[1:5,]
plot(model4REML)
anova(model1REML, model2REML, model3REML, model4REML)
##See also
library(stargazer)
stargazer(model1REML, model2REML, model3REML, model4REML, type = "text")
```
(i) How many parameters are being estimated in this model? What do the parameter estimates at the bottom represent? (Hint: Look back at the $Var(\epsilon)$ expression.) Is it worth it? (Has the residual plot improved? Are the additional parameter estimates statistically significant?)
**Notes**
- The last likelihood ratio test isn't quite appropriate (why?) but not a horrible idea
- The varPower used in model 4 should not be used with a quantitative predictor that can take the value of zero.
- Finding the right variance structure for a study like this would be largely trial-and-error, using tools like AIC to compare the models. Better yet, use subject-matter knowledge/past information to help inform your choice of model.
- Notice if we fail to reject a model when compared to the "basic" model that saw we can assume homogeneity in the residuals; this is another (some say better) "test for heterogeneity" (e.g., vs. Bartlett's test, Levene's test). In particular, likelihood ratio tests work when you have violations of normality.