Stat 414 – Review 1 Problems


The following are previous exam problems and application problems.  The exam this quarter will also involve some more “conceptual” problems as you have been seeing on the quizzes. I also expect interpretation of output I provide. You won’t be using R live on the exam but I could ask you questions about R commands.  You should assume all of the questions below have “Explain” after them. 


1) Knee injuries, like tears in the ACL (a ligament in the knee) can lead to trabecular bone loss and post-traumatic osteoarthritis, but can bone health improve over time? The output below relates to a study on mice where one knee of each of 36 mice had the ACL (a ligament in the knee) ruptured and then measurements were taken of the bone area mass in the knee for both the healthy knees and the injured knees over 56 days after the injury.

Chart, box and whisker chart

Description automatically generated Chart, box and whisker chart

Description automatically generated


(a) I have fit a rather complicated single-level nonlinear model to these data (using days and group as explanatory variables). Assess the validity of my model. Be very clear how you are evaluating each assumption:


Chart, line chart

Description automatically generated   Chart, scatter chart

Description automatically generated

Chart, histogram

Description automatically generated Chart, box and whisker chart

Description automatically generated

Form of the model: Because the residuals vs. fits graph does not show any leftover pattern, the form of the model I used appears to be adequate.

Independence: We have repeated observations on the same mouse so independence is violated.

Normality: The normal probability plot looks reasonably linear, so the normality of the errors condition is met.

Equal variance:  The residuals vs. fits graph shows increasing variability in the residuals with increasing fitted values, indicating a violation of equality of the error variances at each x (though not super severe)


(b) Which of the following would you consider doing next to improve the validity of the model? Briefly justify your choice(s).

·       Transformation to improve linearity No, model form was fine.

·       Quadratic model to improve linearity No, model form was fine

·       Transformation of response to improve normality No, normality was fine

·       Transformation of explanatory to improve normality No, normality was fine

·       Include days as a variance covariate Yes, the variability in the residuals appears to increase with the number of data

·       Include group as a variance covariate No, the variability in the two treatment groups appears reasonably equal

·       Multilevel model using mouse as a grouping variable (Level 2 units) Yes, this will allow us to model the repeat observations over time as well as on each mouse (two knees)



2) Here is another model for the FEV data

A white background with black numbers

Description automatically generated

(a) Interpret the interaction between age and height in this context.

1) The positive “effect” of age on FEV is larger on average for taller individuals than for shorter individuals. In other words, taller individuals increase in FEV at a faster rate than shorter individuals.

2) The positive “effect” of height on FEV is larger for older individuals than for younger individuals.

You can pick either interpretation.  You could also give some numbers, like “when height = 0, the slope of age is 0.172, but when height = 10, the slope of age is 0.172 + 1.34 = 1.52” (ideally you would choose values that are more meaningful for your data values, but the point is because you have a 2nd quantitative variable rather than telling me the slope for each group, just pick a few of the quantitative values).


(b) How do you decide whether the interaction between age and height is statistically significant?

Because this corresponds to a single term in the model, we can use the p-value shown, “< 2e-16” to decide that the interaction is highly significant in this model.


(c) How do you decide whether the association between age and height is statistically significant?

You would actually have to do a separate analysis that looks at the correlation coefficient (or slope) p-value from regressing one of these variables on the other.  That would tell you whether there was a significant linear association between the two variables.


(d) Smoker doesn't appear to be very significant in the above model. Can I just remove it from the model?

While the p-value for smoker is large (0.5892), smoker is involved in two interaction terms which are significant.  So this isn’t a question about a single coefficient and only indicates that the intercepts are not significantly different after adjusting for the other terms in the model.


(e) State the null and alternative hypotheses for removing Smoker from the model. Is the p-value for this test in the above output?


We would have to carry out a partial F-test comparing the above model to the reduced model that does not include these 3 terms.


(f) What do you learn from the output below?

A black text on a white background

Description automatically generated

This is comparing the models with the smoker terms (3 of them as discussed in e) to the model that only has age, height, and the interaction between them. The p-value is statistically significant so at least one of the terms involving the smoking variable is significant to the model and we should not remove the smoker variable from the model. 

Part of the reminder here is a question like “can I remove smoker” is not as simple as just taking out the one smoker term.


3) Recall our Squid data

A screenshot of a computer screen

Description automatically generated

Squid$fMONTH = factor(Squid$MONTH) 
plot(Testisweight ~ fMONTH, data=Squid)


A graph of a number of squares

Description automatically generated with medium confidence

(a) Why did we create fMONTH?

So that R would treat month as a categorical variable (11 terms) rather than a quantitative variable (one term, assuming a linear association)


(b) Is there seasonality in the data? Does the variability in the response appear to vary by month? Identify 3 months where you think our predictions of Testisweight will be most accurate. Least?

We do see evidence in the boxplots that the median Testisweight varies noticeably across the months suggesting seasonality.

We also see evidence in the boxplots that the box widths differ noticeably across the months, suggesting unequal variances in the Testis weights among the different months.


The graph below shows the predicted values for each month (along with standard errors).

A graph with black dots and numbers

Description automatically generated

(c) If this model was fit with indicator coding and fMONTH = 1 as the reference group, is the coefficient of fMONTH2 positive or negative?

Because the predicted value for fMonth1 is larger than the predicted value for fMonth2, the coefficient of fMonth2 will be negative.


(d) If this model was fit with effect coding, is the coefficient of fMONTH2 positive or negative?

Because the predicted value for fMonth2 appears to be above the overall average, the coefficient of fMonth2 is positive.


(e) If fMONTH1 is the missing category, will its coefficient be positive or negative?

Because the predicted value for fMonth1 appears to be above the overall average, the coefficient of fMonth1 is positive.


These are the fitted lines for the model that includes the interaction between fMONTH and DML

A graph with different colored lines

Description automatically generated

(f) How many terms does including this interaction add to the model?

Multiplying the quantitative DML term with the 11 indicator terms for the 12 months will give us 11 interaction terms to add to the model.


(g) Will the coefficient of fMONTH10*DML be positive or negative?

The purple line appears to have a larger slope than the average slope with DML for the 12 lines, and all slopes are positive, so I predicte a positive coefficient on the interaction term.  


But for addressing the unequal variance: We don't want to assume a "linear relationship" between the variability in the residuals and month number, so we will estimate the variance for each month. We can do that by finding the sample variance for each month.

(h) Which months do we want to 'downweight' in estimating the model?

Going back to the boxplots, we would like smaller weights on months 9 and 10 because they have the largest sample variances.


(i) Conjecture what changes you would expect to see in the previous two graphs in this weighted regression model.

Now we are going to let the variances vary by month, so the graph of the fitted model would have much larger SEs for months 9 and 10.

Right now, we are essentially fitting a separate line for each month, so the weighted regression is really only expected to impact the standard errors, so the interaction plot above should look largely the same.

 A graph with different colored lines

Description automatically generated


(j) How do you expect the residual standard error to change?

We expect months 7 and 8 to have pretty small  values and then the other months will be multipliers for based on the larger month SDs.  The small  corresponds to a smaller residual standard error.


4) Trinh and Ameri (2018) collected data on 1,561 Airbnb listings in Chicago from August 2016, and then they merged in information from the neighborhood (out of 43 neighborhoods in Chicago) where the listing was located. Some of the variables included

·        price = price for one night (in dollars)

·        overall_satisfaction = rating on a 0-5 scale

·        room_type = Entire home/apt, Private room, or Shared room

·        neighborhood = neighborhood where unit is located (1 of 43)


(a) Identify the Level 1 units and the Level 2 units

Level 1 = Airbnb listing

Level 2 = neighborhood


Consider the following output (Indicator parameterization was used for room size)

Fixed effects:
                     Estimate Std. Error t value
(Intercept)            25.353     26.454   0.958
overall_satisfaction   24.919      5.508   4.524
room_typePrivateroom  -82.739      3.831 -21.598
room_typeSharedroom  -105.875     10.960  -9.660


> anova(model1)
Analysis of Variance Table
                     Df  Sum Sq Mean Sq  F value
overall_satisfaction  1   41558   41558   8.0542

room_type             2 2593431 1296715 251.3102


(b) Is the type of room statistically significant?  State the null and alternative hypothesis in terms of regression parameters, and clearly justify your answer.

We want to test H0: private = shared = 0 (no room type effect) vs. Ha: at least one  = 0, after adjusting for overall satisfaction (and neighborhood).  To test these two coefficients simultaneously, we need a partial F-test. The corresponding F value in the output is 251.31.  This is considered quite large (e.g., larger than 4) and would lead us to reject the null hypothesis.  We conclude that after adjusting for overall satisfaction, there is a significant room effect (averaging across the neighborhoods).


(c) Suppose the model with the interaction was as shown below. Describe the nature of the interaction in this context.


Description automatically generated

Keep in mind: this model used indicator coding for room type.  One approach is to look at the three slopes of satisfaction, 33.36 for entire home/apt; 33.36 – 17.54 for private rooms, and 33.36 – 39.99 for shared rooms.  So the “effect” of satisfaction on price is largest for the entire home/apt rentals and lowest, and in fact slightly negative, for the shared rooms.  Price increases with satisfaction rating but at a much lower rate for private rooms compared to entire homes (and even negative for shared rooms).


5) Consider the following two models for predicting language scores for 9 different schools.  IQ_verb is the student’s performance on a test of verbal IQ.

A screenshot of a computer code

Description automatically generatedA screenshot of a computer code

Description automatically generated

Which model demonstrates more school-to-school variability in language scores?

On average, the slope coefficients are larger in magnitude for the modelling including IQ_verb.  It’s counter intuitive, but we will see that after adjusting for IQ_verb, there is actually more school-to-school variation.  The main cause is that within school and between school relationships are not consistent, schools with lower language scores tended to have higher IQ_verb scores, so after adjusting for IQ_verb, the “additional contribution” to match the school means is larger.