Stat 414 – Review 2 problems
1) Consider this
paragraph: The multilevel models we have considered up to this point control
for clustering, and allow us to quantify the extent of dependency
and to investigate whether the effects of level 1 variables vary across
these clusters.
(a) I have underlined 3 components, explain in detail what
each of these components means in the multilevel model.
(b) The multilevel model in the paragraph does not account
for “contextual effects.” What is meant by that?
(c) Give
a short rule in your own words describing when an interpretation of an
estimated coefficient should “hold constant” another covariate or “set to 0”
that covariate
2) The article you read for HW 5 had the following: “application of multilevel models for
clustered data has attractive features: (a) the correction of underestimation
of standard errors, (b) the examination of the cross-level interaction, (c) the
elimination of concerns about aggregation bias, and (d) the estimation of the
variability of coefficients at the cluster level.
Explain each of these components to a non-statistician.
3) Large scale
alteration (e.g., destruction) of native prairie communities has been
associated with numerous problems (e.g., soil erosion, lack of biodiversity of
plants, increase in atmospheric CO2). This has led to an increase in
prairie reconstruction projects, but there has been a lot of variability in the
success of these projects, even those using the same seed combinations and
dispersal techniques in different years.
A 3x2x2 factorial design was conducted to investigate the impact of soil
type (remnant, cultivated, restored), sterilization (yes or no), and species (leadplant and cornflower) on the height on
germinating plants. Each of the 12
treatments was replicated in 6 pots, for a total of 72 pots. Six seeds were
planted in each pot. (OK, a few pots had
more than six plants, probably because two of the microscopically small seeds
stuck together when planted.) Measurements on each plant in each pot were taken
at 13, 18, 23, and 28 days after planting. Plants that did not germinate are
removed from the analysis (so we will restrict our study conclusions to plants
that germinate!). Not all plants survived to the end of the 28th
day.
(a) Identify the three-levels in this study.
Also identify any Level 2 or Level 3 variables in the study.
Step 0: First in Excel
I sorted by column H (hgt13) which put most of the non-germinating plants at
the end. I did find one row (plant 165) where the gemin value was
incorrect. Remember to “get to know your data”! So then
I subsetted (e.g., used Excel’s Filter) by removing all
the germin=N plants. I then loaded these data into R (283 rows), telling R
that that na.strings were NA.
(b) Examine spaghetti plots of the plant
heights across the measurements for each of the species (coneflower and
leadplant). Is it reasonable to assume
linear growth between Day 13 and Day 28?
Does the initial height and/or rate of growth seem to differ between the
species? Is there more variability in
one species than the other?
(c) Examine spaghetti plots of the plant
heights over time separately for the three types of soil, separately for each
species. What do you learn?
(d) Examine spaghetti plots of the plant
heights over time separately for the two levels of sterilization, separately
for each species. What do you learn?
Focusing on just the leadplants
(e) I next calculated time13 = time – time
13. Give two reasons this could be a
good idea.
(f) Then I fit an “unconditional means” or
“random intercepts” model with no predictors.
How many parameters are estimated? Provide
an interpretation of each, including the variance components. Anything
interesting about the relative size of the variance components?
(g) Next I included the new time variable in
the model assuming linear growth.
Explain what (time13|pot/plant) means to the
model. Write out the theoretical level equations (in terms of ’s and being careful with indices). How many variance/covariance
parameters are there/why?
How much of the within-plant variability is
explained by the linear changes over time?
Interpret the fixed effects. Are either of
the fixed effects statistically significant?
(h) Next I added the sterilization and soil
type variables, including interactions with the time variable.
Why did I include interactions with the time
variable? Is this model a significant improvement from the model in (g)?
(i) But this model in running into some
boundary conditions. One option is to simplify the model, e.g., removing some
variance/covariance components. Write
out the model equations, for a new model so that the intercepts have random
components at Levels 2 and 3 but the slopes are only allowed to vary at level
2. What is
the practical interpretation of this modelling choice? How many parameters does this remove from the model?
[If you check, this model should be more
stable, and not significantly worse.]
(j) Next we could consider adding an
interaction between sterilization and soil type to the model, along with the
three-way interaction between sterilization, soil type, and time.
How many parameters does this add? Interpret the nature of the
three-way interactions. Explain what type of visual would help you assess the
evidence of such an interaction.
(k) How would you change the previous model so that neither
sterilization or soil type (or their interaction) are allowed to influence Day
13 measurements? Why might this be a
reasonable consideration?
(l) Return to the fitted model in (i).
Interpret it! (A brief summary of the important features, especially as the
agree/disagree with your exploratory data analysis. What would the “effects
plots” look like? What seems to maximize growth?!)