Stat 414 – HW 7
Due midnight, Nov. 6
Please remember to
include all your output and R Code. This assignment gives you initial R code
but then expects you to modify it.
Please start early in the week and ask questions, especially on the R
parts!
1) Curran, Stice,
and Chassin (Journal
of Consulting and Clinical Psychology, 1997) collected data on 82
adolescents at three time points starting at age 14 to assess factors that
affect teen drinking behavior. Key variables in the data set are:
·
id = numerical identifier for
subject
·
age = 14, 15, or 16
·
coa = 1 if the teen is a child of an alcoholic parent; 0
otherwise
·
male = 1 if male; 0 if female
·
peer = a measure of peer alcohol
use, taken when each subject was 14. This is the square root of the sum of two
6-point items about the proportion of friends who drink occasionally or regularly.
·
alcuse = the primary response. Four items—(a) drank beer or wine,
(b) drank hard liquor, (c) 5 or more drinks in a row, and (d) got drunk—were
each scored on an 8-point scale, from 0=“not at all” to 7=“every day.” Then alcuse is the
square root of the sum of these four items.
Primary research questions included:
·
do trajectories of alcohol use differ by parental
alcoholism?
·
do trajectories of alcohol use differ by peer alcohol
use?
(a) Identify Level One and Level Two
predictors.
(b) Produce a “spaghetti plot” showing
alcohol use over time for all 82 subjects and summarize what you learn.
alcohol =
read.delim("https://www.rossmanchance.com/stat414/data/alcohol.txt",
"\t", header=TRUE)
ggplot(data = alcohol, aes(y
= alcuse, x = age, group=id)) +
geom_line() +
geom_smooth(aes(group=1),color="green",size=1,
se=F, method="loess") +
theme_bw()
(c) Separate the above graph by whether or
not the teen had alcoholic parents.
What do you learn?
ggplot(data = alcohol, aes(y
= alcuse, x = age, group=id)) +
geom_line() +
facet_wrap(.~coa) +
geom_smooth(aes(group=1),color="green",size=1,
se=F, method="loess") +
theme_bw()
(d) Separate the graph in (b) by male or female and
summarize what you learn.
(e) Separate the graph in (b) by whether
or not peer usage was above 1. Summarize what you learn. Hint: alcohol$peerI = as.numeric(alcohol$peer > 1)
(f) Run the null model and report the intraclass correlation
coefficient. (model0)
(g) Run an unconditional growth model with age as the time
variable at Level One. Interpret the
intercept.
(h) Create age14 = age-14. Explain the advantage of doing
so.
(i) Rerun the unconditional growth model using age14, how do
things change? (model1) Calculate and interpret an R2 value for
the Level 1 variance comparing this model to the null model.
(j) Plot the intercepts (or effects) from model1 vs. (the original)
peer alcohol use variable. Does there
appear to be an association? Interpret this association in context.
peermeans = aggregate(alcohol[, 6],
by=list(alcohol$id), mean)
intercepts
= ranef(model1)$id[,1]
plot(intercepts
~ peermeans[,2])
abline(lm(intercepts ~ peermeans[,2]))
(k) Repeat the previous question with the slopes (or
effects) from model1 vs. (the original) peer alcohol use variable.
(l) Repeat the two previous questions with the alcoholic
parent variable.
(m) Add the effects of having an alcoholic parent and peer alcohol
use in both Level Two equations to model1. Report and interpret all estimated
fixed effects, using proper notation. (Your interpretations should correspond
to what you saw in the graphs.) How much
level 2 variation is explained by these variables?
2) The U.S. National
Longitudinal Study of Youth (NLSY) original
study began in 1979 with a nationally-representative
sample of nearly 13,000 young people aged 14 to 21. CurranData.txt contains
data from a sub-study of children of the female NLSY respondents which began in
1986 when the children were aged between 6 and 8 years. Child assessments
(e.g., reading scores) were then measured biennially in 1988, 1990 and 1992.
The sub-sample of 221 children were assessed on all four occasions.
(a) Read in CurranData.txt and look at the first few rows. Which are
time-varying variables and which are time-invariant? Is the dataset in long or
wide format?
CurranData <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData.txt",
"\t", header=TRUE)
head(CurranData)
(a2) Summarize what you learn by the “spaghetti plot”
library(ggplot2)
ggplot(data = CurranDataLong,
aes(y = RAscore, x = Time))
+
geom_line(aes(group=id), color="light grey") +
geom_smooth(aes(group=1), color="black", size=1) +
theme_bw() +
labs(x="Years
since 1986", y="Reading Score")
(b) Examine the correlations of the reading scores across
the four time points. (Explain what the R code is doing.) How would you summarize their behavior?
Explain why the pattern in the covariation/ correlation makes sense in context.
cor(select(CurranData,
contains("reading")))
(c) If we were to fit an OLS model to these data, what
assumption would this model make about the correlations of the four
observations on each person? Does that assumption seem reasonable here?
(d) Use lme (not lmer) to fit a random intercepts longitudinal model (=
include Time). Tell me about the
variable Time. Report the intra-class correlation coefficient. What
assumptions does this model make about the correlations of the four
observations on each person? Does that assumption seem reasonable here?
CurranDataLong <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData_Long.txt",
"\t", header=TRUE)
library(nlme)
summary(model1
<- lme(RAscore ~ Time,
random = ~1 | id, data = CurranDataLong))
cov2cor(getVarCov(model1, type="marginal")[[1]])
(e) Fit a random slopes model and examine the covariance
matrix. Verify the values for var(y11),
var(y21), cor(0,1), cor(0,2).
(Show me the numbers!) How are the variances changing over time? How do the correlations
change as the observations get further apart?
model2
= lme(RAscore ~ Time,
random = ~Time|id, data = CurranDataLong)
summary(model2)
getVarCov(model2, type="marginal")
cov2cor(getVarCov(model2, type="marginal")[[1]])
fits = fitted.values(model2)
ggplot(data =
CurranDataLong2, aes(y = fits, x = Time)) +
geom_line(aes(group=id)) +
theme_bw()
(f) Add a quadratic term for time and allow (to also have) random
slopes. Compare this to the model without time2,
is the larger model (how much larger?) significantly better? How does the
predicted correlation matrix compare?
model3 = lme(RAscore ~ Time + I(Time*Time), random = ~Time +
I(Time*Time)|id, data = CurranDataLong)
summary(model3)
cov2cor(getVarCov(model3, type="marginal")[[1]])
(j) Return to the basic
two-level random intercepts model with time
and time2 as Level 1
variables, but without random slopes. Fit the AR(1) error structure. Is this
model significantly better than the “independent errors” random intercepts model?
model4 = lme(fixed
= RAscore ~ Time + I(Time*Time), random = ~1| id,
correlation = corAR1(), data = CurranDataLong)
summary(model4); intervals(model4)
cov2cor(getVarCov(model4,
type="marginal")[[1]])
(k) What is the
estimated parameter of the AR(1) model (“autocorrelation”). How do you interpret it? Does there appear to
be correlation between adjacent observations even after conditioning on student?
(l) What happens if you
allow for the autocorrelation structure in the model with random slopes?
(m) How would we decide
whether the pattern of growth (i.e., reading progress) differs between males
and females? Fit an appropriate model and summarize the results. (Pick your
favorite of the above models and add term(s) to reflect this research
question.)
3) Read the article Holditch-Davis D, Edwards
LJ, Helms RW. 1998. Modeling development of sleep-wake behaviors: I. Using the
mixed general linear model. Physiol Behav, 63(3):311-8.
(a) Describe the
contents of Figure 1 (in 2-4 sentences).
(b) Describe the
contents of Figure 2 (in 2-4 sentences).
(c) How are Figure 1
and 2 related? Please describe the relationship between these figures (in 2-4
sentences).
(d) In the Discussion
section of the article the authors discuss some of the benefits of using a
multilevel model. In your own words, describe one benefit discussed by the
authors and elaborate on this point with information provided in this course.