---
title: "Stat 414 - Day 11"
subtitle: "Maximum Likelihood Estimation "
output:
word_document:
reference_docx: BCStyleGuide.docx
html_notebook: default
html_document:
df_print: paged
editor: visual
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(eval = FALSE)
knitr::opts_knit$set(global.par = TRUE)
#knitr::opts_chunk$set(fig.width=4, fig.height=2.67)
```
```{r, echo=FALSE, message=FALSE}
par(mar=c(4.1, 4.1, 1.1, 1.1))
library(tidyverse)
options(digits = 7, scipen=5)
```
------------------------------------------------------------------------
**Last Time**
- vcov gives us the variance-covariance matrix of the estimated coefficients
- the square roots of the diagonal values gives us the standard errors of the estimated coefficients
- the size of these SEs depends on $\sigma$, $n$, and $SD(X)$'s.
- if our estimate of $\sigma$ is off, so will be these standard errors (and the $t$ statistics, p-values, confidence intervals, predictions intervals etc.)
- If we don't have good candidates for weights (or not appropriate), an alternative approach is stick with OLS but use HC "sandwich" standard errors
- These use the squared residuals to tells us about the variance (and covariance) structure of the residuals
------------------------------------------------------------------------
**Example:** Airfares from San Luis Obispo to a "random" sample of 12 major U.S. cities as found March 31, 2017 on Travelocity.com for travel on May 8-May 12, 2017 are found in airfare.txt.
```{r}
airfare = read.table("http://www.rossmanchance.com/stat414/data/airfare.txt", "\t", header=TRUE)
load(url("http://www.rossmanchance.com/iscam3/ISCAM.RData"))
iscamsummary(airfare$price)
```
*(a) Fit the "null model" with no predictors*
```{r}
model0 = lm(price ~ 1, data = airfare)
summary(model0)
sum(residuals(model0)^2)
sigma(glsmodel)
```
*(b) What is the estimated intercept? How is it found? How is* $\sigma$ estimated?
So far, we have estimated our parameters using "least squares estimation" and focused on minimizing the sum of squared errors as judged by the residuals. There are alternatives to least squares estimation. One such method is *maximum likelihood estimation*. The *likelihood* is the probability density function (pdf) of a random variable, but viewed as a function of the parameter(s). We want to choose parameter values that maximize the likelihood of observing the data we have. We are trying to match what we observe with what we expect to see.
First consider just one variable (e.g., airfare prices). Suppose our data follow a normal distribution $L(\mu, \sigma; y) = \frac{1}{ \sigma \sqrt{2\pi}} e^(\frac{ -(y-\mu)^2}{ 2\sigma^2 })$.
```{r}
#picture of likelihood function
x <- seq(5, 15, length=1000)
y <- dnorm(x, mean=10, sd=3)
plot(x, y, type="l", lwd=1, xlab="mu", ylab="L(mu)")
abline(v = 11)
```
Suppose we didn't know that $\mu$ = 10, but we observed x = 11. We could think about trying different values of $\mu$ (moving this curve left and right) and stopping when the peak of the curve is highest at our observed value. In this toy example, we would report $\hat{\mu}_{MLE}$ = 11.
With independent observations, the joint likelihood will be the product $L(\mu, \sigma; y) = \frac{1}{(\sigma \sqrt{2\pi})^n} e^(\frac{ -\Sigma(y_i-\mu)^2 }{ 2\sigma^2 })$.
Our goal is to *maximize* this function, for which we would use differentiation, reporting the values of $\hat{\mu}$ and $\hat{\sigma}$. For most likelihoods, it is much simpler to work with the *log likelihood* and the MLEs estimates are the same.
(c) Write out the expression for the log likelihood. How would we find the values of μ and that maximize this function?
### Fitting a linear model with maximum likelihood
```{r}
#install.packages("nlme")
library(nlme)
model0ML = gls(price ~ 1, method = "ML", data = airfare )
sum(residuals(model0ML)^2)
sigma(model0ML)
```
*(c) Is the equation the same? Is the residual standard error the same? How is it calculated now?*
## How do assess the fit of the model?
Rather than SSError, we will focus on the value of the "achieved" likelihood. *(d) Substitute our estimates back in to see how large we were able to make the likelihood function.*
```{r}
#can type in your formula here
```
*(e) Confirm this value in R with logLik(model0). What are the degrees of freedom?*
```{r}
logLik(model0)
logLik(model0ML)
```
Now consider the model that also includes the distances.
```{r}
#OLS
model1 = lm(price ~ distance, data = airfare)
summary(model1)
#ML
model1ML = gls(price ~ distance, method="ML", data = airfare)
summary(model1ML)
```
*(f) Use the above output to find the value of the likelihood function and then confirm with R. Does this model do "better" than the previous model?*
```{r}
logLik(model1); logLik(model0)
logLik(model1ML); logLik(model0ML)
```
*(g) Verify the AIC value in R using AIC(model0) and AIC(model1). Which model has the better value?*
```{r}
AIC(model0)
AIC(model1)
```
*(h) Inference: What was our previous method (in general) for comparing these two models/ deciding whether the distance variable should be included in the model?*
*(i) Carry out the likelihood ratio test.*
```{r}
anova(glsmodel, model1ML)
```
**Definition*:*** The **likelihood ratio test** compares nested models by using the statistic $-2(L_0 - L_1)$ which asymptotically follows a chi-square distribution with df = difference in number of parameters in the two models.
To find the p-value for the likelihood ratio test, adding distance to the model*
```{r}
1 - pchisq(2*(75.144 - 69.026), 1)
```