Stat 414 – HW 7

Due midnight, Friday, Nov. 22

 

You don’t need to include R code I give you but should include R code you create/modify and relevant output. Please submit separate files for each question.

 

1) Toenail dermatophyte onychomycosis (TDO) is a common toenail infection where the nail plate becomes separated from the nail bed. Traditionally, antifungal compounds have been used to treat TDO. New compounds have been developed in an attempt to reduce the treatment duration. The data in toenail.txt are from a randomized, double-blind, multi-center study comparing two oral treatments ((Itraconazole and Terbinafine) over a 3-month period, with measurements taken at baseline, every month during treatment, and every 3 months afterwards, with a maximum of 7 measurements per subject.  At each visit, the target nail is coded as 0 (not severe, none or mild separation) or 1 (moderate or severe = complete separation). The research questions are whether the probability of severe infection decreases over time, and whether that evolution differs between the two treatments.

 

#library(readr)
toenail = read.table("https://www.rossmanchance.com/stat414/data/toenail.txt", header=TRUE)

toenail$ttmt = factor(toenail$ttmt)

(a) Explain what is measured by the time variable in this dataset.

(b) Explore the data:

with(toenail, round(tapply(y, list(ttmt, time), mean), 2))

props = ave(toenail$y, by = list(toenail$time, toenail$ttmt))

ggplot(data = toenail, aes(y=props, x=time, group=ttmt, col = ttmt)) +

  geom_line() +

theme_bw()

Include the output and briefly summarize what you learn. (Be clear what the y­-axis represents in the graph.)

(c) First fit a model that treats the observations on each individual as independent:

model1 = glm(y ~ time*ttmt, family = binomial, data = toenail)

summary(model1)

Is the interaction close to significant? How would you interpret the interaction in this context?

(d) Descriptively, how does the behavior of the fitted model compare to that of the raw data?

        glm_fits = fitted.values(model1)
        ggplot(data = toenail, aes(y = glm_fits, x = time, color = ttmt)) +
        geom_line(aes(group=ttmt)) + 
                 labs(title = "glm", xlab="time", ylab="estimated prop") +
        theme_bw() + ylim(0, .5)
 

plot(effects::allEffects(model1))

Notice the non-uniform scaling of the y-axis for the graph on the right! Don’t look at the graph on the right and conclude there is a linear association/model! In fact, we should be saving “curve” rather than “line” much more often…

 

(e) Now fit the multilevel model

model2 = glmer(y ~ time*ttmt + (1 | SubjID), family = binomial, data = toenail, nAGQ = 50)

#adaptive Gause-Hermite Quadrature, nAGQ = 50

summary(model2, corr=F)

#Look at all the graphs but do not include

plot(effects::allEffects(model2))

plot(ggeffects::ggpredict(model2, terms=c("time", "SubjID [sample=9]"), type="re"), show_ci = F)

 

fits = predict(model2,re.form=NA,newdata=toenail) #fixed effects only

mlm_fits=exp(fits)/(1+exp(fits)) #convert log odds to predicted probabilities

                 ggplot(data = toenail, aes(y = mlm_fits, x = time, color = ttmt)) +
        geom_line(aes(group=ttmt)) + 
                 labs(title = "glm", xlab="time", ylab="estimated prop") +
        theme_bw() + ylim(0, .5)

 

Does this model suggest a significant difference in the rate of decline between the two treatments? Explain how you are deciding.

 

Compare the models

#Include this graph

combined_data <- data.frame( time = toenail$time, ttmt = toenail$ttmt, fits = c(glm_fits, mlm_fits), model = rep(c("GLM", "MLM"), each = nrow(toenail)) )

ggplot(data = combined_data, aes(x = time, y = fits, color = ttmt, linetype = model)) +  

geom_line(aes(group = interaction(ttmt, model))) +

labs(title = "Comparison of GLM and GLMER Models", x = "Time", y = "Estimated Probability") +

theme_bw() + ylim(0, 0.5)

 

You may have noticed that these models look pretty different. That’s because they estimate different things! Previously, we interpreted the slope coefficient of a fixed effect in a multilevel model as the “effect” of changing x-variables within a specific group or “on average,” the marginal effect in the population. With logistic regression models, the “within individual” rate of decline is not the same as the “overall average” rate of decline (because of the nonlinear back-transformation). The slope of the marginal relationship applies to a randomly selected person (population average). The slope of the conditional relationship applies to a particular individual. 

 

Use the equation to convert the conditional (“subject specific”) slope to the marginal (think population average) coefficient:

(f) For the above model, what is the marginal slope for time?  Which is larger here, the subject-specific slope or the population-average (group level) slope? Does that make sense in context? Explain.

 

Alternatively, we can use generalized estimating equations (GEEs) to come up with parameter estimates and robust standard errors (think ‘sandwich estimators’).  First, fit a model with a “bad correlation structure.” (Output included)

install.packages("gee")

model3 = gee::gee(y ~ time*ttmt, family = binomial, data = toenail, id = SubjID,

                  corstr = "exchangeable", scale.fix = TRUE, scale.value = 1)

summary(model3)

A screenshot of a computer

Description automatically generated

(g) Why do I consider the “working correlation matrix” assumptions to be unreasonable in this context?

 

Now allow for a more appropriate correlation matrix (output included):

model4 = gee::gee(y ~ time*ttmt, family = binomial, data = toenail, id = SubjID,

                  corstr = "unstructured")

summary(model4)

        ggplot(data = toenail, aes(y = fitted.values(model4), x = time, color = ttmt)) +
        geom_line(aes(group=ttmt)) + 
                 labs(title = "gee", xlab="time", ylab="estimated prop") +
        theme_bw() + ylim(0, .5)
A screenshot of a computer

Description automatically generated
A graph with a line

Description automatically generated

(h) What do you notice about the pattern of correlations in the “working correlation matrix”? Does that make sense in this context?

(i) For models 3 and 4, how do the “robust standard errors” (which ‘correct’ for the dependence in the data) compare to the “naďve standard errors”? Is this what we expect to see? Explain. For which model are they more different?

(j) Which are closer to the observed proportions of severe infections, the marginal (population-averaged) values or the conditional (subject-specific) values?

head(fitted.values(model2), 10)
head(fitted.values(model4), 10)

Decent reference on GEEs vs. MLMs

(k) Suggest a good candidate for a “level 3” in this study.

 

 

2) In this exercise, you will analyze data from a study involving 67 schools. In this study, students were given a math test in grades 3, 4, and 5 that was similar to the SAT math test (e.g., scale 0-800).  The data in schooldata involves 234 repeated measures collected from 122 students across 12 teachers.

schooldata = read.table("https://www.rossmanchance.com/stat414/data/schooldata.txt", header=TRUE)
head(schooldata)
with(schooldata, table(studid, tchrid))[1:10, 1:10]

(a) Explain what is measured by the time variable in this dataset.

(b) Are students nested within teacher?  Explain how you can tell from the output of the table.  Also, sketch a diagram showing the “crossed effects” for a few cases.

(c) Based on the following graph, would you treat year as a quantitative variable or categorical (factor) in this study?  Explain your reasoning.

means <- tapply(schooldata$math, schooldata$year, mean)
with(schooldata, boxplot(math ~ year))
points(1:3, means, col = "red", pch = 19)

(d) Write out the level (model) equations corresponding to an unconditional growth (include year but not random slopes) model (not the prediction equation) with separate student random effects and teacher random effects.  Include the relevant assumptions on the random components.

(e) From the following output

model1 <- lmer(math ~ year + (1 | studid) + (1 | tchrid), data = schooldata)
summary(model1, corr= F)
confint(model1)

Which has more ‘impact’ on test scores in a given year, the student or their teacher?  (How are you deciding?)

(f) Are the student effects statistically significant? (How are you deciding?)

(g) Is the time effect statistically significant? (How are you deciding?)

 

The first two rows of the estimated variance-covariance matrix for the responses will have

 

Student 13099, Teacher 14433

Student 13100, Teacher 14433

Student 13100, Teacher 14484

Student 13099, Teacher 14433

a.

 

b.

c.

Student 13100, Teacher 14433

d.

 

e.

f.

 

(h) Use the random intercepts model above to specify these 6 values. (Make simplifying assumptions here.)

(i) Examine the normality of each of the random effects, e.g.,

qqnorm(ranef(model1)$studid[[1]])

qqnorm(ranef(model1)$tchrid[[1]])

 

What do you conclude/suggest next?

 

(j) So far, we have assumed “additive effects” for the students and teachers.  Suppose we wanted to consider an interaction between our crossed random effects.  We are unlikely to have repeat combinations for our longitudinal data, but what about our hypothetical hockey data…

hockey = read_csv("https://www.rossmanchance.com/stat414/data/shot-attempts.csv")

hockey$dist = with(hockey, sqrt((x_coord - 89)^2 + (y_coord^2)))

hockey$angle = with(hockey, atan2(y_coord, 89 - x_coord))

hockey$goal = ifelse(hockey$shot_result == "Goal", 1, 0)

 

model3.mlm = glmer(goal ~ 1 + angle + dist + (1 | shooter_id) + (1 | goalie_id), family=binomial, data = hockey)

model4.mlm = glmer(goal ~ 1 + angle + dist + (1 | shooter_id) + (1 | goalie_id) + (1 | shooter_id:goalie_id), family=binomial, data = hockey)  #be patient!

summary(model4.mlm, corr = F)

anova(model3.mlm, model4.mlm)

 

How many terms does this add to the model?  Is this model a significant improvement over the additive model?   (Include relevant output and be clear how you are deciding.)  Does the random interaction variance appear substantial? (Include relevant output and be clear how you are deciding.)  Provide a brief interpretation of the implications of this interaction in context.

 

3) The U.S. National Longitudinal Study of Youth (NLSY) original study began in 1979 with a nationally-representative sample of nearly 13,000 young people aged 14 to 21. CurranData.txt contains data from a sub-study of children of the female NLSY respondents which began in 1986 when the children were aged between 6 and 8 years. Child assessments (e.g., reading scores) were then measured biennially in 1988, 1990 and 1992. The sub-sample of 221 children were assessed on all four occasions.

(a)   Read in two datasets.  The first is in “wide format” and the second is in “long format.”  What is the difference in these formats?

CurranDataWide <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData.txt", "\t", header=TRUE)

head(CurranDataWide)

CurranDataLong <- read.delim("https://www.rossmanchance.com/stat414/data/CurranData_Long.txt", "\t", header=TRUE)

head(CurranDataLong)

(b)   Look at the first few rows in wide format.  Which are time-varying variables and which are time-invariant?

(c)   Summarize what you learn from the “spaghetti plot” (pick one to include)

library(tidyverse)

ggplot(data = CurranDataLong, aes(y = RAscore, x = Time)) +

  geom_line(aes(group=id), color="light grey") +

  geom_smooth(aes(group=1), color="black", size=1) +

  theme_bw() +

labs(x="Years since 1986", y="Reading Score")

OR

ggplot(CurranDataLong, aes(x = Time, y = RAscore, group = id)) +

  geom_smooth(method = "loess", alpha = .5, se = FALSE) +

  geom_smooth(aes(group=1), color="black") +

  theme_bw()

 

(d) Examine the correlations of the reading scores across the four time points. (Explain what the R code is doing.)  How would you summarize their behavior? Explain why the pattern in the covariation/ correlation makes sense in this context.

cor(select(CurranDataWide, contains("reading")))     

 

(e) Use lme (not lmer) to fit a random intercepts longitudinal model (meaning, include Time). 

1. Tell me about the variable Time.

2. Report the intra-class correlation coefficient.

3. What assumptions does this model make about the correlations of the four observations on each person? Does that assumption seem reasonable here?

library(nlme)

summary(model1 <- lme(RAscore ~ Time, random = ~1 | id, data = CurranDataLong))

cov2cor(getVarCov(model1, type="marginal")[[1]])

 

(f) Fit a random slopes model and examine the covariance matrix. 

model2 = lme(RAscore ~ Time, random = ~Time|id, data = CurranDataLong)

summary(model2)

getVarCov(model2, type="marginal")

cov2cor(getVarCov(model2, type="marginal")[[1]])

fits = fitted.values(model2)

ggplot(data = CurranDataLong, aes(y = fits, x = Time)) +

  geom_line(aes(group=id)) +

  theme_bw()

 

1. Verify the values for  (Show me the numbers!)

·       var(y11) = variance of “first observation” for person 1,

·       var(y21) = variance of “second observation” for person 1 (which can differ because we are using random slopes),

·       cor(1,2) = correlation of the first two measurements on each person,

·       cor(1,3) = correlation of first and 3rd measurement on each person.

 

2.     How are the variances changing over time? How do the correlations change as the observations get further apart?

 

(g) Return to the basic two-level random intercepts model with time and time2 as Level 1 variables, but without random slopes. Fit the AR(1) error structure.

model4 = lme(fixed = RAscore ~ Time + I(Time*Time), random = ~1| id, correlation = corAR1(), data = CurranDataLong)

summary(model4); intervals(model4)

cov2cor(getVarCov(model4, type="marginal")[[1]])

model5 = lme(fixed = RAscore ~ Time + I(Time*Time), random = ~1| id, data = CurranDataLong)

anova(model5, model4)

Is this model significantly better than the “independent errors” random intercepts model? 

 

(h) What is the estimated parameter of the AR(1) model (“autocorrelation”).  How do you interpret it? Does there appear to be correlation between adjacent observations even after conditioning on student?

 

(i) How would we decide whether the pattern of growth (i.e., reading progress) differs between males and females? Explain the process, you don’t have to carry out.

 

4) Several research studies have explored the hypothesis that teachers’ expectations influence pupils’ intellectual development. In this exercise you will combine results across 19 different experiments.  In these studies, the treatment was whether or not teachers were encouraged to have “high” expectations. 

study

Author

year

weeks

setting

tester

n1

n2

Cohen’s d

SE of diff

1

Rosenthal et al.

1974

2

group

aware

77

339

0.03

0.125

2

Conn et al.

1968

21

group

aware

60

198

0.12

0.147

3

Jose & Cody

1971

19

group

aware

72

72

-0.14

0.167

4

Pellegrini & Hicks

1972

0

group

aware

11

22

1.18

0.373

5

Pellegrini & Hicks

1972

0

group

blind

11

22

0.26

0.369

6

Evans & Rosenthal

1969

3

group

aware

129

348

-0.06

0.103

7

Fielder et al.

1971

17

group

blind

110

636

-0.02

0.103

8

Claiborn

1969

24

group

aware

26

99

-0.32

0.220

9

Kester

1969

0

group

aware

75

74

0.27

0.164

10

Maxwell

1970

1

Indiv

blind

32

32

0.8

0.251

11

Carter

1970

0

group

blind

22

22

0.54

0.302

12

Flowers

1966

0

group

blind

43

38

0.18

0.223

13

Keshock

1970

1

Indiv

blind

24

24

-0.02

0.289

14

Henrikson

1970

2

Indiv

blind

19

32

0.23

0.290

15

Fine

1972

17

group

aware

80

79

-0.18

0.159

16

Grieger

1970

5

group

blind

72

72

-0.06

0.167

17

Rosenthal & Jacobson

1968

1

group

aware

65

255

0.3

0.139

18

Fleming & Anttonen

1971

2

group

blind

233

224

0.07

0.094

19

Ginsburg

1970

7

group

aware

65

67

-0.07

0.174

 

To compare different studies, you need to use a “standardized difference” or “effect size.” Cohen’s d is like the t-statistic but doesn’t use sample sizes,  .  The above table reports the Cohen’s d values and their associated standard errors. Alas it is rare for studies to provide access to the raw data and we must rely on the effect size reported (and then different packages in R), so then we convert each study to the same effect size measure as was done here. This also allows us to use the reported SE’s of the effects to measure the Level 1 residual variance (to estimate

(a) Do the effect sizes appear similar across the studies?

So now we want to estimate the true effect of teacher expectations.

(b) Which 1 or 2 studies would you consider the most reliable?

(c) Use the metagen package to generate output for a meta-analysis of these studies.

bryk = read.table("https://www.rossmanchance.com/stat414/data/bryk.txt", header=TRUE)

bryk$se = sqrt(bryk$var)

bryk$weeks.b <- ifelse(bryk$weeks > 1, 1, 0) #we will use this variable later

install.packages("meta"); library(meta)

model1 <- metagen(TE = diff,

                 seTE = se,

                 studlab = study,

                 data = bryk,

                 sm = "SMD", #we are working with standardized mean differences

                 method.tau = "REML",

                 method.random.ci = "HK"

)

summary(model1)  #output below

(a) The output first reports the weight (as a percentage) given to each study (for both fixed effects and random effects).  Which study is given the most weight? Is this what you expected? Explain.

(b) Report the estimated “pooled effect size” treating the studies as random effects. Interpret the associated confidence interval in context.  Would this meta-analysis suggest that the true treatment effect differs from zero?

(c) In this context,  represents the study-to-study variability in the true effect sizes.  In other words, this provides a measure of how homogeneous or heterogeneous the studies are.  Does the study to study variability in effect sizes appear to be statistically significant?  Justify your answer.

 

Relatedly, Cochran’s Q statistic is a weighted sum of squares comparing the individual study effects to the estimated common effect:

where  is the inverse of the study’s variance and  is from the fixed effects model.

(d) Show a calculation of Cochran’s Q “by hand.”

(e) If Q is large, this is evidence of “excess variation.” It is assumed that Q follows a chi-square distribution with df = k – 1 where k is the number of studies. What do you conclude from the R output?

(f) A common visualization in a meta-analysis is a forest plot:

meta::forest(model1) #output below

Provide a brief summary of how the “fixed effects” results and the “random effects” results differ.

 

So maybe we shouldn’t pool these studies together, maybe small/negative effects are masking large positive effects?  We can use a lot of our usual tools to check for outliers and influential studies. Maybe we can explain some of the variability across studies?

 

The “expectancy treatment” involved deception – in some cases researchers presented teachers with a list of pupils who allegedly displayed potential for tremendous intellectual or a list of students where they had inflated the IQ scores. (In reality, the student names on these lists were determined randomly.)  One question is whether the teachers really believed these claims (and the treatment was truly applied). This could be related to how well the teachers knew their students when they were given the information. This is encapsulated in the “weeks of prior contact” variable.

(g) “Update” the meta-analysis to separate the studies based on the “number of prior contact weeks”

Recall weeks.b equals 1 if the number of weeks is larger than 1, 0 otherwise.

update(model1, subgroup= weeks.b, tau.common=FALSE)

1. In the Results for subgroups section, do the estimated effect sizes appear to differ substantially across the subgroups? 

2. What do you learn from the test of subgroup differences for the fixed effects (common effect) model? (Hint: What are the df and why?)

3. What do you learn from the test of subgroup differences for the random effects model

 

(h) Summarize what you learn from the following graph (is it consistent with the discussion above?):

A diagram of a graph

Description automatically generated


 

 

 

A screenshot of a computer

Description automatically generatedA screenshot of a computer

Description automatically generated

A screenshot of a graph

Description automatically generated