chipdata = read.delim("http://www.rossmanchance.com/stat414/data/ChipData.txt", "\t", header=T)
head(chipdata, 5)
## person time chip
## 1 MS 84 choc
## 2 DL 118 choc
## 3 EB 112 choc
## 4 JD 87 choc
## 5 RV 100 choc
(a) Explore the data: What do you learn from the boxplots? Do you think these differences will be statistically significant? #Boxplots are a good place to start to compare distributions
boxplot(time ~ chip, data = chipdata)
Conjectures on statistical significance will vary - I see a definite shift, at least for chocolate compared to peanut butter, but there is also still a bit of overlap in the boxplots reflecting the within group variation.
#Here is the ggplot version with the means added as well
library(ggplot2)
ggplot(chipdata, aes(x=chip, y = time)) +
geom_boxplot() +
theme_bw() +
stat_summary(fun.y=mean, geom="point")
(b) State null and alternative hypotheses. What statistical method could you use to analyze these data?
\(H_0: \mu_{choc} = \mu_{pb} = \mu_{bs}\) \(H_a:\) at least one \(\mu\) differs is equivalent to \(H_0: \beta_{choc} = \beta_{pb} = 0\) \(H_a:\) at least one \(\beta\) differs from zero
The tradition “ANOVA F-test” (comparing several population means) and the “partial F-test” (dropping two terms from the regresson model) are equialvent here.
Recall that the traditional ANOVA test finds MSE by pooling the group variances together (because we are assuming all the populations have the same variance, so we make a weighted (by sample size) average of those variances)…
(c) Produce a regression model for testing the hypotheses in (b). Report the test statistic, degrees of freedom, and p-value for your (one) test. What do you conclude?
summary(lm(time~chip, data = chipdata))
##
## Call:
## lm(formula = time ~ chip, data = chipdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.667 -10.667 0.889 17.167 30.889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.111 6.657 11.434 3.38e-11 ***
## chipchoc 16.556 9.414 1.759 0.0914 .
## chippb -2.111 9.414 -0.224 0.8245
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 19.97 on 24 degrees of freedom
## Multiple R-squared: 0.1642, Adjusted R-squared: 0.0946
## F-statistic: 2.358 on 2 and 24 DF, p-value: 0.1161
anova(lm(time~chip, data=chipdata))
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## chip 2 1881.0 940.48 2.3584 0.1161
## Residuals 24 9570.9 398.79
F = 940.48/398.79 = 2.358, df = 2 and 24, p-value = 0.1161
We do not have convincing evidence that the mean melting time differs among semi-sweet chocolate, peanut butter, and butterscotch chips.
(d) Also report the R2 and s (residual standard error) values. Interpret these values in context. Are they consistent with your p-value? Also report “MSTotal” = Var(time).
Residual standard error (s or \(\hat{\sigma}\)) = 19.97 seconds (this is how accurately we can predict someone’s melttime from the type of chip, it’s a typical residual size) Squaring this we find 19.96^2 = 398.8, which matches MSE
We can think of MSTotal (the total variation in the response variable) as SStotal / df total = (1881 + 9570.9)/(2+24) = 440.46
Which also matches the variance of the response variable
var(chipdata$time)
## [1] 440.4558
\(R^2\) is 1881/(1881 + 9570.9), so about 16% of the variability in melting times is explained by the type of chip.
(e) Examine the residual plots, do you consider the model assumptions met?
par(mfrow=c(2,2))
plot(lm(time~chip, data=chipdata))
par(mfrow=c(1,1))
These look pretty good, but only tell us about normality and equal variance (linearity isn’t really a concern with our one categorical variable). The major issue here is violation of the independence assumption with the repeat observations on each indivdual.
(f) Does chip melting time appear to vary across the individuals in the study?
boxplot(time ~ person, data = chipdata)
summary(aov(time~person, data = chipdata))
## Df Sum Sq Mean Sq F value Pr(>F)
## person 8 8705 1088.1 7.129 0.00028 ***
## Residuals 18 2747 152.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model1=lm(time~chip, data=chipdata)
boxplot(model1$residuals~person, data=chipdata)
(g) Look at the data: If we were to “adjust” for the individual doing the melting, is there evidence of a chip effect?
with(chipdata, coplot(time~chip| person))
This didn’t work for several of you
ggplot(chipdata, aes(x=chip, y = time)) +
geom_boxplot() +
theme_bw() +
stat_summary(fun.y=mean, geom="point") +
facet_wrap(~person)
Keep in mind there is only one observation in each of these “boxplots.” We do notice that for most individuals the chocolate is larger than the other two, which seem pretty similar to each other.
(h) Fit a regression model including chip type and person. What do you learn from each of the “effects” tests? What can you tell me about the \(R^2\) for each variable? What is “MSTotal”?
model2=lm(time~chip + person, data=chipdata)
anova(model2)
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## chip 2 1881.0 940.48 17.369 9.780e-05 ***
## person 8 8704.5 1088.06 20.094 5.785e-07 ***
## Residuals 16 866.4 54.15
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The F-test for chip (F = 17.369, df = 2, 16, p-value < .001) indicates that, after adjusting for the person doing the melting, there is strong evidence of a difference, on average, among the chips.
The F-test for person (F = 20.094, df = 8, 16, p-value < .001) tells us that, after adjusting for type of chip, there is strong evidence of a difference, on average, between the subjects in the study.
Note: The SS for chip has not changed! This is because we have perfect “balance” with everyone testing all 3 chips. All we have done is taken the previous SSError and split out some of that variation to the individuals (8704.5 + 866.4 = 9570.9). This gives us a much smaller SSError to be used in the F-statistic, so the SS for chip now looks huge compared to the residual unexplained variation.
\(R^2\) for chip is still 1881/(1881 + 8704.5 + 866.4) = 16%
\(R^2\) for person is (8704)/(1881 + 8704.5 + 866.4) = 76%!
75% of the variation in melting times is person-to-person! And only 8% is left unexplained.
MSTtotal is still the overall variability in melting times, we haven’t changed that, we just split it up into variation due to person, variation due to chip type, and unexplained variation.
(i) Interpret the coefficient of “chipchoc” in context.
summary(model2)
##
## Call:
## lm(formula = time ~ chip + person, data = chipdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7407 -4.0741 -0.5185 3.9259 10.5926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.852 4.697 11.253 5.19e-09 ***
## chipchoc 16.556 3.469 4.773 0.000208 ***
## chippb -2.111 3.469 -0.609 0.551339
## personCC 31.000 6.008 5.160 9.50e-05 ***
## personDL 44.667 6.008 7.434 1.42e-06 ***
## personEB 45.333 6.008 7.545 1.17e-06 ***
## personJD 20.667 6.008 3.440 0.003366 **
## personJR 11.333 6.008 1.886 0.077540 .
## personJS -6.333 6.008 -1.054 0.307491
## personMS 21.333 6.008 3.551 0.002663 **
## personRV 41.333 6.008 6.879 3.70e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.359 on 16 degrees of freedom
## Multiple R-squared: 0.9243, Adjusted R-squared: 0.8771
## F-statistic: 19.55 on 10 and 16 DF, p-value: 4.022e-07
On average, the chocolate chips took 16.6 sec long to melt compared to the butterscotch chips, after adjusting for the person.
(j) Interpret the coefficient of “personCC” in context. On average, person CC took 31 seconds longer to melt compared to AG.
# indicator coding (but applied only to the person variable)
contrasts(chipdata$person) = contr.sum(9)
summary(lm(time~chip + person, data=chipdata))
##
## Call:
## lm(formula = time ~ chip + person, data = chipdata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.7407 -4.0741 -0.5185 3.9259 10.5926
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.111 2.453 31.030 1.01e-15 ***
## chipchoc 16.556 3.469 4.773 0.000208 ***
## chippb -2.111 3.469 -0.609 0.551339
## person1 -23.259 4.005 -5.807 2.67e-05 ***
## person2 7.741 4.005 1.933 0.071195 .
## person3 21.407 4.005 5.345 6.58e-05 ***
## person4 22.074 4.005 5.511 4.74e-05 ***
## person5 -2.593 4.005 -0.647 0.526641
## person6 -11.926 4.005 -2.977 0.008889 **
## person7 -29.593 4.005 -7.388 1.53e-06 ***
## person8 -1.926 4.005 -0.481 0.637152
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.359 on 16 degrees of freedom
## Multiple R-squared: 0.9243, Adjusted R-squared: 0.8771
## F-statistic: 19.55 on 10 and 16 DF, p-value: 4.022e-07
On average, person1 took -23.3 seconds “below average.”
(k) Include the interaction between chip type and person in the model. What do you learn?
model3=lm(time~chip + person + chip:person, data=chipdata)
anova(model3)
## Warning in anova.lm(model3): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
##
## Response: time
## Df Sum Sq Mean Sq F value Pr(>F)
## chip 2 1881.0 940.48
## person 8 8704.5 1088.06
## chip:person 16 866.4 54.15
## Residuals 0 0.0
We can’t test for interaction effects in a randomized block design with no replication, we “run out” of degrees of freedom.
(l) Treat person as random effect \(i\) refers to individual observations and \(j\) refers to the person, so \(y_{ij}\) is the \(i^{th}\) observation for the \(j^{th}\) person. So we will want to know the total number of observations, the observations per person (so here \(n_1\) = \(n_2\) = \(n_3\) = 3), and the number of people.
The expected value for person \(j = \beta_0 + u_j\) for butterscotch
The overall expected value is \(\beta_0\) for butterscotch
\(Var(Y_{ij} | chip\ type) = \sigma^2 + \tau^2\)
There are 5 parameters (3 fixed effects and 2 variances) to estimate.
library(nlme)
model4 = lme(fixed = time ~ chip, random = ~1 | person, data = chipdata)
summary(model4)
## Linear mixed-effects model fit by REML
## Data: chipdata
## AIC BIC logLik
## 204.5056 210.3958 -97.25278
##
## Random effects:
## Formula: ~1 | person
## (Intercept) Residual
## StdDev: 18.56445 7.358543
##
## Fixed effects: time ~ chip
## Value Std.Error DF t-value p-value
## (Intercept) 76.11111 6.656551 16 11.434016 0.0000
## chipchoc 16.55556 3.468850 16 4.772635 0.0002
## chippb -2.11111 3.468850 16 -0.608591 0.5513
## Correlation:
## (Intr) chpchc
## chipchoc -0.261
## chippb -0.261 0.500
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -1.33739451 -0.64702428 0.07431267 0.51262612 1.34718843
##
## Number of Observations: 27
## Number of Groups: 9
#default is REML
The main point is now we partition the random variability (after adjusting for fixed effect) as person to person variability and unexplained variation.
Total variance = 18.56^2 + 7.358^2 =398.6 (Recall this was the SSE when all we had in the model was chip type)
(m) Did this procedure use ML or REML estimation? At the top of the output, tells us REML (that’s the default)
(n) Report the AIC and logLik values.
AIC(model4 )
## [1] 204.5056
logLik(model4)
## 'log Lik.' -97.25278 (df=5)
(o) Fixed effects: How do the intercept, and coefficients for chip type compare to (h)? Interpret \(\hat{\beta}_0\) in context.
head(model2$coefficients)
## (Intercept) chipchoc chippb personCC personDL personEB
## 52.851852 16.555556 -2.111111 31.000000 44.666667 45.333333
(model4$coefficients$fixed)
## (Intercept) chipchoc chippb
## 76.111111 16.555556 -2.111111
The fixed effects look the same!The intercept is the predicted melting time for butterscotch chips for the average person .
The intercepts differ because when subject is treated as random, the output switches over to effect coding.
#showing model 2 again but with effect coding on person
head(summary(lm(time~chip + person, data=chipdata))$coefficients)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.111111 2.452848 31.029695 1.008176e-15
## chipchoc 16.555556 3.468850 4.772635 2.075995e-04
## chippb -2.111111 3.468850 -0.608591 5.513387e-01
## person1 -23.259259 4.005483 -5.806855 2.674961e-05
## person2 7.740741 4.005483 1.932536 7.119488e-02
## person3 21.407407 4.005483 5.344526 6.577848e-05
(p) Sum the two variances together and compare this to MSTotal.
Total variance (note: I maybe shouldn’t call this total variance, it’s not Var(Y) but Var(residuals after taking fixed effects into account) = 18.56^2 (person-to-person variance) + 7.358^2 (within person variance, after adjusting for chip type) =398.6 (Recall this was the SSE when all we had in the model was chip type).
So keep in mind the distinction between the variability in the response and the unexplained variability “about the line” (in the residuals).
(q) How much (what percentage) of this total variation is due to the person-to-person variability?
18.56^2/398.6 = 0.86
86% of the {b>within chip variation (vs. total variation in response) is explained by who is doing the melting.
(r) Why are we using a one-sided test here? Variances can only be positive
(s) Fit the model without random effects using REML and find the log likelihood value.
model5 = gls(time ~ chip, data= chipdata)
anova(model4, model5)
## Model df AIC BIC logLik Test L.Ratio p-value
## model4 1 5 204.5055 210.3958 -97.25278
## model5 2 4 226.4230 231.1352 -109.21149 1 vs 2 23.91743 <.0001
We have strong evidence (chi-square = 23.9, df = 1, p-value < .0001) that the person-to-person variability is above zero.
There are a few issues with the above, mostly the “boundary conditions” involved with estimating the likelihood for variance parameters that must be positive. A quick modification is to cut the p-value in half.