Stat 414 – HW 7

Due midnight, Nov. 6

 

Please remember to include all your output and R Code. This assignment gives you initial R code but then expects you to modify it.  Please start early in the week and ask questions, especially on the R parts!

1) Curran, Stice, and Chassin (Journal of Consulting and Clinical Psychology, 1997) collected data on 82 adolescents at three time points starting at age 14 to assess factors that affect teen drinking behavior. Key variables in the data set are:

·         id = numerical identifier for subject

·         age = 14, 15, or 16

·         coa = 1 if the teen is a child of an alcoholic parent; 0 otherwise

·         male = 1 if male; 0 if female

·         peer = a measure of peer alcohol use, taken when each subject was 14. This is the square root of the sum of two 6-point items about the proportion of friends who drink occasionally or regularly.

·         alcuse = the primary response. Four items—(a) drank beer or wine, (b) drank hard liquor, (c) 5 or more drinks in a row, and (d) got drunk—were each scored on an 8-point scale, from 0=“not at all” to 7=“every day.” Then alcuse is the square root of the sum of these four items.

Primary research questions included:

·        do trajectories of alcohol use differ by parental alcoholism?

·        do trajectories of alcohol use differ by peer alcohol use?

(a) Identify Level One and Level Two predictors.

(b) Produce a “spaghetti plot” showing alcohol use over time for all 82 subjects and summarize what you learn.

alcohol = read.delim("https://www.rossmanchance.com/stat414/data/alcohol.txt", "\t", header=TRUE)

ggplot(data = alcohol, aes(y = alcuse, x = age, group=id)) +

     geom_line() +

     geom_smooth(aes(group=1),color="green",size=1, se=F, method="loess") +

     theme_bw()

(c) Separate the above graph by whether or not the teen had alcoholic parents.  What do you learn?

ggplot(data = alcohol, aes(y = alcuse, x = age, group=id)) +

  geom_line() +

  facet_wrap(.~coa) +

  geom_smooth(aes(group=1),color="green",size=1, se=F, method="loess") +

  theme_bw()

(d) Separate the graph in (b) by male or female and summarize what you learn.

(e) Separate the graph in (b) by whether or not peer usage was above 1. Summarize what you learn.  Hint: alcohol$peerI = as.numeric(alcohol$peer > 1)

(f) Run the null model and report the intraclass correlation coefficient. (model0)

(g) Run an unconditional growth model with age as the time variable at Level One.  Interpret the intercept.

(h) Create age14 = age-14. Explain the advantage of doing so.

(i) Rerun the unconditional growth model using age14, how do things change? (model1) Calculate and interpret an R2 value for the Level 1 variance comparing this model to the null model.

(j) Plot the intercepts (or effects) from model1 vs. (the original) peer alcohol use variable.  Does there appear to be an association? Interpret this association in context.

            peermeans = aggregate(alcohol[, 6], by=list(alcohol$id), mean)

intercepts = ranef(model1)$id[,1]

plot(intercepts ~ peermeans[,2])

abline(lm(intercepts ~ peermeans[,2]))

(k) Repeat the previous question with the slopes (or effects) from model1 vs. (the original) peer alcohol use variable.

(l) Repeat the two previous questions with the alcoholic parent variable.

(m) Add the effects of having an alcoholic parent and peer alcohol use in both Level Two equations to model1. Report and interpret all estimated fixed effects, using proper notation. (Your interpretations should correspond to what you saw in the graphs.)  How much level 2 variation is explained by these variables?

 

2) The U.S. National Longitudinal Study of Youth (NLSY) original study began in 1979 with a nationally-representative sample of nearly 13,000 young people aged 14 to 21. CurranData.txt contains data from a sub-study of children of the female NLSY respondents which began in 1986 when the children were aged between 6 and 8 years. Child assessments (e.g., reading scores) were then measured biennially in 1988, 1990 and 1992. The sub-sample of 221 children were assessed on all four occasions.

(a)  Read in CurranData.txt and look at the first few rows.  Which are time-varying variables and which are time-invariant? Is the dataset in long or wide format?

CurranData <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData.txt", "\t", header=TRUE)

head(CurranData)

(a2) Summarize what you learn by the “spaghetti plot”

library(ggplot2)

ggplot(data = CurranDataLong, aes(y = RAscore, x = Time)) +

  geom_line(aes(group=id), color="light grey") +

  geom_smooth(aes(group=1), color="black", size=1) +

  theme_bw() +

labs(x="Years since 1986", y="Reading Score")

(b) Examine the correlations of the reading scores across the four time points. (Explain what the R code is doing.)  How would you summarize their behavior? Explain why the pattern in the covariation/ correlation makes sense in context.

cor(select(CurranData, contains("reading")))

(c) If we were to fit an OLS model to these data, what assumption would this model make about the correlations of the four observations on each person? Does that assumption seem reasonable here?

(d) Use lme (not lmer) to fit a random intercepts longitudinal model (= include Time).  Tell me about the variable Time. Report the intra-class correlation coefficient. What assumptions does this model make about the correlations of the four observations on each person? Does that assumption seem reasonable here?

CurranDataLong <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData_Long.txt", "\t", header=TRUE)

library(nlme)

summary(model1 <- lme(RAscore ~ Time, random = ~1 | id, data = CurranDataLong))

cov2cor(getVarCov(model1, type="marginal")[[1]])

(e) Fit a random slopes model and examine the covariance matrix.  Verify the values for var(y11), var(y21), cor(0,1), cor(0,2). (Show me the numbers!) How are the variances changing over time? How do the correlations change as the observations get further apart?

model2 = lme(RAscore ~ Time, random = ~Time|id, data = CurranDataLong)

summary(model2)

getVarCov(model2, type="marginal")

cov2cor(getVarCov(model2, type="marginal")[[1]])

fits = fitted.values(model2)

ggplot(data = CurranDataLong2, aes(y = fits, x = Time)) +

  geom_line(aes(group=id)) +

  theme_bw()

(f) Add a quadratic term for time and allow (to also have) random slopes. Compare this to the model without time2, is the larger model (how much larger?) significantly better? How does the predicted correlation matrix compare?

model3 = lme(RAscore ~ Time + I(Time*Time), random = ~Time + I(Time*Time)|id, data = CurranDataLong)

summary(model3)

cov2cor(getVarCov(model3, type="marginal")[[1]])

(j) Return to the basic two-level random intercepts model with time and time2 as Level 1 variables, but without random slopes. Fit the AR(1) error structure. Is this model significantly better than the “independent errors” random intercepts model? 

model4 = lme(fixed = RAscore ~ Time + I(Time*Time), random = ~1| id, correlation = corAR1(), data = CurranDataLong)

summary(model4); intervals(model4)

cov2cor(getVarCov(model4, type="marginal")[[1]])

(k) What is the estimated parameter of the AR(1) model (“autocorrelation”).  How do you interpret it? Does there appear to be correlation between adjacent observations even after conditioning on student?

(l) What happens if you allow for the autocorrelation structure in the model with random slopes?

(m) How would we decide whether the pattern of growth (i.e., reading progress) differs between males and females? Fit an appropriate model and summarize the results. (Pick your favorite of the above models and add term(s) to reflect this research question.) 

 

3) Read the article Holditch-Davis D, Edwards LJ, Helms RW. 1998. Modeling development of sleep-wake behaviors: I. Using the mixed general linear model. Physiol Behav, 63(3):311-8.

(a) Describe the contents of Figure 1 (in 2-4 sentences).

(b) Describe the contents of Figure 2 (in 2-4 sentences).

(c) How are Figure 1 and 2 related? Please describe the relationship between these figures (in 2-4 sentences).

(d) In the Discussion section of the article the authors discuss some of the benefits of using a multilevel model. In your own words, describe one benefit discussed by the authors and elaborate on this point with information provided in this course.