chipdata = read.delim("http://www.rossmanchance.com/stat414/data/ChipData.txt",  "\t", header=T)
head(chipdata, 5)
##   person time chip
## 1     MS   84 choc
## 2     DL  118 choc
## 3     EB  112 choc
## 4     JD   87 choc
## 5     RV  100 choc

(a) Explore the data: What do you learn from the boxplots? Do you think these differences will be statistically significant? #Boxplots are a good place to start to compare distributions

boxplot(time ~ chip, data = chipdata)

Conjectures on statistical significance will vary - I see a definite shift, at least for chocolate compared to peanut butter, but there is also still a bit of overlap in the boxplots reflecting the within group variation.

#Here is the ggplot version with the means added as well
library(ggplot2)
ggplot(chipdata, aes(x=chip, y = time)) + 
  geom_boxplot() + 
  theme_bw() +
  stat_summary(fun.y=mean, geom="point")

(b) State null and alternative hypotheses. What statistical method could you use to analyze these data?

\(H_0: \mu_{choc} = \mu_{pb} = \mu_{bs}\) \(H_a:\) at least one \(\mu\) differs is equivalent to \(H_0: \beta_{choc} = \beta_{pb} = 0\) \(H_a:\) at least one \(\beta\) differs from zero

The tradition “ANOVA F-test” (comparing several population means) and the “partial F-test” (dropping two terms from the regresson model) are equialvent here.

Recall that the traditional ANOVA test finds MSE by pooling the group variances together (because we are assuming all the populations have the same variance, so we make a weighted (by sample size) average of those variances)…

(c) Produce a regression model for testing the hypotheses in (b). Report the test statistic, degrees of freedom, and p-value for your (one) test. What do you conclude?

summary(lm(time~chip, data = chipdata))
## 
## Call:
## lm(formula = time ~ chip, data = chipdata)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.667 -10.667   0.889  17.167  30.889 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   76.111      6.657  11.434 3.38e-11 ***
## chipchoc      16.556      9.414   1.759   0.0914 .  
## chippb        -2.111      9.414  -0.224   0.8245    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 19.97 on 24 degrees of freedom
## Multiple R-squared:  0.1642, Adjusted R-squared:  0.0946 
## F-statistic: 2.358 on 2 and 24 DF,  p-value: 0.1161
anova(lm(time~chip, data=chipdata))
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value Pr(>F)
## chip       2 1881.0  940.48  2.3584 0.1161
## Residuals 24 9570.9  398.79

F = 940.48/398.79 = 2.358, df = 2 and 24, p-value = 0.1161

We do not have convincing evidence that the mean melting time differs among semi-sweet chocolate, peanut butter, and butterscotch chips.

(d) Also report the R2 and s (residual standard error) values. Interpret these values in context. Are they consistent with your p-value? Also report “MSTotal” = Var(time).

Residual standard error (s or \(\hat{\sigma}\)) = 19.97 seconds (this is how accurately we can predict someone’s melttime from the type of chip, it’s a typical residual size) Squaring this we find 19.96^2 = 398.8, which matches MSE

We can think of MSTotal (the total variation in the response variable) as SStotal / df total = (1881 + 9570.9)/(2+24) = 440.46

Which also matches the variance of the response variable

var(chipdata$time)
## [1] 440.4558

\(R^2\) is 1881/(1881 + 9570.9), so about 16% of the variability in melting times is explained by the type of chip.

(e) Examine the residual plots, do you consider the model assumptions met?

par(mfrow=c(2,2))
plot(lm(time~chip, data=chipdata))

par(mfrow=c(1,1))

These look pretty good, but only tell us about normality and equal variance (linearity isn’t really a concern with our one categorical variable). The major issue here is violation of the independence assumption with the repeat observations on each indivdual.

(f) Does chip melting time appear to vary across the individuals in the study?

boxplot(time ~ person, data = chipdata)

summary(aov(time~person, data = chipdata))
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## person       8   8705  1088.1   7.129 0.00028 ***
## Residuals   18   2747   152.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model1=lm(time~chip, data=chipdata)
boxplot(model1$residuals~person, data=chipdata)

(g) Look at the data: If we were to “adjust” for the individual doing the melting, is there evidence of a chip effect?

with(chipdata, coplot(time~chip| person))

This didn’t work for several of you

ggplot(chipdata, aes(x=chip, y = time)) + 
  geom_boxplot() + 
  theme_bw() +
  stat_summary(fun.y=mean, geom="point") +
    facet_wrap(~person)

Keep in mind there is only one observation in each of these “boxplots.” We do notice that for most individuals the chocolate is larger than the other two, which seem pretty similar to each other.

(h) Fit a regression model including chip type and person. What do you learn from each of the “effects” tests? What can you tell me about the \(R^2\) for each variable? What is “MSTotal”?

model2=lm(time~chip + person, data=chipdata)
anova(model2)
## Analysis of Variance Table
## 
## Response: time
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## chip       2 1881.0  940.48  17.369 9.780e-05 ***
## person     8 8704.5 1088.06  20.094 5.785e-07 ***
## Residuals 16  866.4   54.15                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The F-test for chip (F = 17.369, df = 2, 16, p-value < .001) indicates that, after adjusting for the person doing the melting, there is strong evidence of a difference, on average, among the chips.

The F-test for person (F = 20.094, df = 8, 16, p-value < .001) tells us that, after adjusting for type of chip, there is strong evidence of a difference, on average, between the subjects in the study.

Note: The SS for chip has not changed! This is because we have perfect “balance” with everyone testing all 3 chips. All we have done is taken the previous SSError and split out some of that variation to the individuals (8704.5 + 866.4 = 9570.9). This gives us a much smaller SSError to be used in the F-statistic, so the SS for chip now looks huge compared to the residual unexplained variation.

\(R^2\) for chip is still 1881/(1881 + 8704.5 + 866.4) = 16%

\(R^2\) for person is (8704)/(1881 + 8704.5 + 866.4) = 76%!

75% of the variation in melting times is person-to-person! And only 8% is left unexplained.

MSTtotal is still the overall variability in melting times, we haven’t changed that, we just split it up into variation due to person, variation due to chip type, and unexplained variation.

(i) Interpret the coefficient of “chipchoc” in context.

summary(model2)
## 
## Call:
## lm(formula = time ~ chip + person, data = chipdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7407  -4.0741  -0.5185   3.9259  10.5926 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   52.852      4.697  11.253 5.19e-09 ***
## chipchoc      16.556      3.469   4.773 0.000208 ***
## chippb        -2.111      3.469  -0.609 0.551339    
## personCC      31.000      6.008   5.160 9.50e-05 ***
## personDL      44.667      6.008   7.434 1.42e-06 ***
## personEB      45.333      6.008   7.545 1.17e-06 ***
## personJD      20.667      6.008   3.440 0.003366 ** 
## personJR      11.333      6.008   1.886 0.077540 .  
## personJS      -6.333      6.008  -1.054 0.307491    
## personMS      21.333      6.008   3.551 0.002663 ** 
## personRV      41.333      6.008   6.879 3.70e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.359 on 16 degrees of freedom
## Multiple R-squared:  0.9243, Adjusted R-squared:  0.8771 
## F-statistic: 19.55 on 10 and 16 DF,  p-value: 4.022e-07

On average, the chocolate chips took 16.6 sec long to melt compared to the butterscotch chips, after adjusting for the person.

(j) Interpret the coefficient of “personCC” in context. On average, person CC took 31 seconds longer to melt compared to AG.

# indicator coding (but applied only to the person variable)
contrasts(chipdata$person) = contr.sum(9)
summary(lm(time~chip + person, data=chipdata))
## 
## Call:
## lm(formula = time ~ chip + person, data = chipdata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.7407  -4.0741  -0.5185   3.9259  10.5926 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   76.111      2.453  31.030 1.01e-15 ***
## chipchoc      16.556      3.469   4.773 0.000208 ***
## chippb        -2.111      3.469  -0.609 0.551339    
## person1      -23.259      4.005  -5.807 2.67e-05 ***
## person2        7.741      4.005   1.933 0.071195 .  
## person3       21.407      4.005   5.345 6.58e-05 ***
## person4       22.074      4.005   5.511 4.74e-05 ***
## person5       -2.593      4.005  -0.647 0.526641    
## person6      -11.926      4.005  -2.977 0.008889 ** 
## person7      -29.593      4.005  -7.388 1.53e-06 ***
## person8       -1.926      4.005  -0.481 0.637152    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.359 on 16 degrees of freedom
## Multiple R-squared:  0.9243, Adjusted R-squared:  0.8771 
## F-statistic: 19.55 on 10 and 16 DF,  p-value: 4.022e-07

On average, person1 took -23.3 seconds “below average.”

(k) Include the interaction between chip type and person in the model. What do you learn?

model3=lm(time~chip + person + chip:person, data=chipdata)
anova(model3)
## Warning in anova.lm(model3): ANOVA F-tests on an essentially perfect fit
## are unreliable
## Analysis of Variance Table
## 
## Response: time
##             Df Sum Sq Mean Sq F value Pr(>F)
## chip         2 1881.0  940.48               
## person       8 8704.5 1088.06               
## chip:person 16  866.4   54.15               
## Residuals    0    0.0

We can’t test for interaction effects in a randomized block design with no replication, we “run out” of degrees of freedom.

(l) Treat person as random effect \(i\) refers to individual observations and \(j\) refers to the person, so \(y_{ij}\) is the \(i^{th}\) observation for the \(j^{th}\) person. So we will want to know the total number of observations, the observations per person (so here \(n_1\) = \(n_2\) = \(n_3\) = 3), and the number of people.

The expected value for person \(j = \beta_0 + u_j\) for butterscotch

The overall expected value is \(\beta_0\) for butterscotch

\(Var(Y_{ij} | chip\ type) = \sigma^2 + \tau^2\)

There are 5 parameters (3 fixed effects and 2 variances) to estimate.

library(nlme)
model4 = lme(fixed = time ~ chip, random = ~1 | person, data = chipdata)
summary(model4)
## Linear mixed-effects model fit by REML
##  Data: chipdata 
##        AIC      BIC    logLik
##   204.5056 210.3958 -97.25278
## 
## Random effects:
##  Formula: ~1 | person
##         (Intercept) Residual
## StdDev:    18.56445 7.358543
## 
## Fixed effects: time ~ chip 
##                Value Std.Error DF   t-value p-value
## (Intercept) 76.11111  6.656551 16 11.434016  0.0000
## chipchoc    16.55556  3.468850 16  4.772635  0.0002
## chippb      -2.11111  3.468850 16 -0.608591  0.5513
##  Correlation: 
##          (Intr) chpchc
## chipchoc -0.261       
## chippb   -0.261  0.500
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -1.33739451 -0.64702428  0.07431267  0.51262612  1.34718843 
## 
## Number of Observations: 27
## Number of Groups: 9
#default is REML

The main point is now we partition the random variability (after adjusting for fixed effect) as person to person variability and unexplained variation.

Total variance = 18.56^2 + 7.358^2 =398.6 (Recall this was the SSE when all we had in the model was chip type)

(m) Did this procedure use ML or REML estimation? At the top of the output, tells us REML (that’s the default)

(n) Report the AIC and logLik values.

AIC(model4 )
## [1] 204.5056
logLik(model4)
## 'log Lik.' -97.25278 (df=5)

(o) Fixed effects: How do the intercept, and coefficients for chip type compare to (h)? Interpret \(\hat{\beta}_0\) in context.

head(model2$coefficients)
## (Intercept)    chipchoc      chippb    personCC    personDL    personEB 
##   52.851852   16.555556   -2.111111   31.000000   44.666667   45.333333
(model4$coefficients$fixed)
## (Intercept)    chipchoc      chippb 
##   76.111111   16.555556   -2.111111

The fixed effects look the same!The intercept is the predicted melting time for butterscotch chips for the average person .

The intercepts differ because when subject is treated as random, the output switches over to effect coding.

#showing model 2 again but with effect coding on person
head(summary(lm(time~chip + person, data=chipdata))$coefficients)
##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept)  76.111111   2.452848 31.029695 1.008176e-15
## chipchoc     16.555556   3.468850  4.772635 2.075995e-04
## chippb       -2.111111   3.468850 -0.608591 5.513387e-01
## person1     -23.259259   4.005483 -5.806855 2.674961e-05
## person2       7.740741   4.005483  1.932536 7.119488e-02
## person3      21.407407   4.005483  5.344526 6.577848e-05

(p) Sum the two variances together and compare this to MSTotal.

Total variance (note: I maybe shouldn’t call this total variance, it’s not Var(Y) but Var(residuals after taking fixed effects into account) = 18.56^2 (person-to-person variance) + 7.358^2 (within person variance, after adjusting for chip type) =398.6 (Recall this was the SSE when all we had in the model was chip type).

So keep in mind the distinction between the variability in the response and the unexplained variability “about the line” (in the residuals).

(q) How much (what percentage) of this total variation is due to the person-to-person variability?

18.56^2/398.6 = 0.86

86% of the {b>within chip variation (vs. total variation in response) is explained by who is doing the melting.

(r) Why are we using a one-sided test here? Variances can only be positive

(s) Fit the model without random effects using REML and find the log likelihood value.

model5 = gls(time ~ chip, data= chipdata)
anova(model4, model5)
##        Model df      AIC      BIC     logLik   Test  L.Ratio p-value
## model4     1  5 204.5055 210.3958  -97.25278                        
## model5     2  4 226.4230 231.1352 -109.21149 1 vs 2 23.91743  <.0001

We have strong evidence (chi-square = 23.9, df = 1, p-value < .0001) that the person-to-person variability is above zero.

There are a few issues with the above, mostly the “boundary conditions” involved with estimating the likelihood for variance parameters that must be positive. A quick modification is to cut the p-value in half.