Stat 301 – Final Review
Optional Review Sessions: Sunday 5pm (Zoom), Tuesday 4pm (Zoom)
Final Exam:
12pm section Monday 10:10-1pm, 1pm section Wednesday 1:10-4pm
Format: You are
allowed 3 pages of notes. If I want you to use a datafile, I will post it as a
.txt file for easy loading. The computer
questions won’t be a huge component, but I do want some demonstration that you
can carry out an analysis.
The exam will be more of a “performance assessment.” Question types could include
-
Design a study to answer a
particular research question
-
Explore data and how they agree/don’t agree with a
particular research conjecture, suggest steps for “cleaning” the data or next
steps in the analysis
-
Use R and/or and applet to analyze data to answer a particular research question
-
Summarize conclusions to a study based on study
description, provided output, justifying why you can or cannot do certain
things (e.g., significance, estimation, generalize, causation)
-
Develop methodology for a new type of comparison or
statistic
You will be expected to apply the knowledge you have gained this
quarter. Because this may involve more
writing, you will be asked to type in your responses, but can also write them
out if you prefer. You will have access
to R and applets, but if you aren’t sure how to carry out a step in either software,
I will likely be able to help you (e.g., I want to do x, but it’s not working).
It will be important for you to document any steps you use the computer form
(e.g., tell me what R command or what applet you used, what inputs you used,
etc.). You will have the option to
screen capture and upload output.
The final exam
will be cumulative, including a comparison of methods across chapters. You
should focus on the entire statistical process: How do we design a study to
achieve particular research goals? How do we describe the data we have? How do
we test claims about population parameters or processes and/or estimate
parameters? How do we state our final
conclusions, especially after considering how the study was conducted? You should also think about the reasoning
behind the statistical methods, such as standardizing, significance, and
confidence in general.
Advice:
Understand and be able
to apply the procedures first, then worry about more subtle issues and review
the how and why behind the development of the procedure. Also, for all the procedures, know what must
be done by hand and what can be done on the computer. Review the Case Study
(trying the questions from scratch) as well as the “final exam multiple choice
practice” in Canvas.
The focus of Chapter 4 was a quantitative response variable.
From Section 4.1 you should know that
·
You can create an “exact randomization”
distribution if you list all possible random assignments to the two groups,
identify an appropriate statistic (formula) and calculate the statistic for
each possible random assignment, count how many of the random assignments give
a statistic at least as extreme as the one you observed, and
evaluate the p-value.
From Section 4.2 you should know how to
·
Create numerical and graphical summaries comparing two groups with
a quantitative variable
o Compare distributions in terms of shape, center, and spread (citing
appropriate numerical evidence when available), unusual observations
·
Including what is meant by “variability” in a distribution and
different ways it can be measured
·
Possibly explain outliers (know how they can be identified)
o Make sure your comments are
always in context (including measurement units)
·
Predict the behavior of the sampling distribution of the
differences in two sample means from random sampling
(Sampling
from two populations applet;
see also population models version?) Including whether the sampling
distribution of the differences in sample means should be normal or
approximately normal
o
Reasoning behind the standard error formula (adding variances), should make sense to you that the SE of the
difference is larger than the individual variances
o
Reasoning behind the t-distribution
for the standardized statistic
·
Carry out a two-sample t-test
for the difference in population means using
technology (R, TBI applet)
o State null and alternative
hypotheses
o Assess validity of the
procedure
o Raw data vs. Summary
statistics
o Interpret results, including interpretations of test statistic and p-value (in
terms of random sampling)
·
Factors that influence p-value and confidence interval midpoint
and width
·
Determine and interpret a two-sample t-confidence interval
o Make sure confidence level,
mean, variable (context), and “direction” are clear in interpretation
From
Section 4.3 you should be able to
·
Set up and explain the reasoning process behind simulating a
randomization test (Comparing Groups – Quantitative applet)
o How models the null
hypothesis being true/random assignment in the study
o How to simulate the null
distribution (random assignment vs. random sampling)
o How to use the generated
null distribution to calculate an empirical p-value (one or two-sided)
o Provide a detailed
interpretation of what the p-value is measuring in the context of the research
study (and in terms of random assignment)
o Make a decision about the
null hypothesis based on the p-value
o Be able to do this for
difference in sample means, difference in sample medians, or other sample statistics
·
Summarize (and justify) study conclusions in terms of
significance, estimation (=confidence interval), causation, and
generalizability
·
Check the technical conditions to assess the validity of the t procedures
·
Carry out two-sample t-tests
for comparing two long-run treatment means using technology (R, TBI applet)
o Consider alternatives if
technical conditions are not met (e.g., simulation, transformations)
o Be able to test hypothesized
differences other than zero
·
Calculate (using R, TBI applet) and interpret two-sample t-confidence intervals for the
difference in long-run treatment means
o Explore a data
transformation for improving the validity of a t-procedure
o Interpret a confidence
interval after a log transformation in terms of the original units
(multiplicative change in median)
·
Explain the effects of the difference in sample means, sample
sizes, and within group variability on the p-value, confidence interval, and power
From
Section 4.4 you should be able to:
·
Distinguish
between matched pairs designs and two-independent “samples” (or completely
randomized) designs
o Consider benefits and disadvantages
(including feasibility) of the two types of designs
o Be able to describe how to set up a
matched-pairs design, including randomizing the order of the treatments if
repeated observations or within two similar experimental units in pairs
·
Determine
whether a data collection plan necessitates a matched pairs
analysis
·
Create
numerical and graphical summaries comparing matched pairs data by calculating
and examining the differences
·
Carry
out and interpret a matched pairs test simulation
o Logic behind it (exchangeability of the
two responses within each pair)
o Interpreting the results
·
Use
technology (R, TBI applet on the differences, Matched Pairs applet) to carry
out and interpret a matched pairs t-test
·
Use
technology (R, TBI applet, Matched Pairs applet) to calculate and interpret a
matched pairs t-interval
·
Use
technology to carry out and interpret a sign test with paired data
o Exact binomial and/or normal
approximation
·
Set
up a two-way table with paired categorical data (the two treatments are the two
variables = row/column variables)
·
Use
technology (R, One Proportion applet) to carry out and interpret McNemar’s test
o Exact binomial and/or normal
approximation
Other Notes
·
Be able
to distinguish the standard deviation of the sample, the pooled standard
deviation for two samples (combining the groups together to estimate one SD),
and the standard deviation of the distribution of the difference in sample
means (the pooled and unpooled versions), the
version with the “correlation” between the two sets of measurements
·
Keep
in mind that a confidence interval doesn’t really give you “additional
evidence” (against the null hypothesis) but it’s a different way of presenting
the same evidence
o
Keep
in mind that we never can use the procedures in this course to establish
evidence for the null hypothesis (“absence
of evidence is not evidence of absence”)
·
We
can easily test non-zero values for the hypothesized difference in means
o
How
to do this with a t-test
o
How
to represent this in a simulation (e.g., need to remove the treatment effect,
then groups should be equivalent so shuffle, and then add the treatment effect
back)
·
Be
able to distinguish/explain bootstrapping vs. sampling from a finite
(hypothetical) population
o
Key
goal is estimating the sample to sample variation in
the statistic
o
Can
compare to theoretical results, but latter is only available for certain statistics
·
Can
use bootstrapping to compare two samples
o
Bootstrapping
from each sample separately vs. pooling together the samples first
Keep in mind
·
When
comparing distributions, remember to cite your evidence if you think there is a
difference in the groups. In particular, tell me what
you see in the summary statistics (e.g., a higher mean) that leads to your
conclusion (e.g., sleep deprived subjects tend to have lower improvements)
o
Remember
that the confidence level refers to
the reliability of the method – how often, in the long run, random samples will
produce an interval that succeeds in capturing the population parameter
·
Remember
to think about/decipher the direction of subtraction used by the technology
·
Why
is variability in the data an important consideration and how can we reduce it?
What are advantages and disadvantages to doing so?
·
We
can use a two-sample t-test even when
the sample sizes are small if we have reason to believe the population
distributions are themselves normally distributed. You can try to judge this,
especially if you don’t have past experience with the
variable, based on graphs of the sample data.
If the sample data looks plausibly normally distributed (normal
probability plots are a useful tool for helping this judgment), you can cite
this as evidence that the population distribution is normally distributed. If
you aren’t sure, then use an alternative analysis (e.g., simulation-based)
instead.
·
Keep
in mind the two-sample t-test only
compares the two means (vs. other aspects of the distributions or other
measures of center)
·
Try
to avoid the word “accurate” without explaining exactly what you mean by it.
·
Try
to avoid use of the word “group” but clarify if you mean the sample or the
population or the treatment in general
·
Avoid
use of the word “it”
The Cumulative Component (also see old
Review handouts)
Things to
remember include:
·
Identifying
observational units and defining variables, samples vs. populations vs. null
(sampling/randomization) distributions, parameters vs. statistics, explanatory
vs. response variable, bias vs. precision, random assignment vs. random
sampling (including goals)
·
Experiments
vs. Observational Studies
o How to design a randomized experiment, How to properly select a sample
o Scope of conclusions depending on how
study was conducted (Can you draw a cause and effect
conclusion? Can you generalize to a larger population?)
o Sampling errors, nonsampling
errors, and random sample errors (and which of these are measured by the
“margin of error”?)
·
Describing
and comparing distributions of data
o Categorical: segmented bar graphs,
conditional percentages, difference in proportions vs. relative risk vs. odds
ratio (and how to interpret)
o Quantitative: shape, center, and spread,
stemplots, boxplots, histograms, dotplots,
resistance of median and IQR
o When describing distributions, if you
have access to numerical summaries, use them to support your claims
·
How
to interpret probability as a long-run relative frequency
·
How
to carry out a test of significance
o About a population proportion and/or
population mean and/or treatment effect and/or difference in population
proportions and/or difference in population means
·
Make
sure you can state Ho and Ha in symbols and in words
o One-sided vs. two-sided alternatives
o Which technical conditions apply and how
to check them and what they tell you
·
e.g.,
proportions: n >
10 and n(1-) > 10, means: n > 30 or normal population
o Interpretation of test statistic (if
appropriate)
·
General
form: (statistic-hypothesized)/(standard error of
statistic)
o Ideas and distinctions of sampling
distribution and randomization distribution
o How to calculate and/or approximate
p-value
o How to make a decision
based on the p-value and level of significance
o How to interpret the p-value
·
Source
of randomness, choice of statistic, observed result, direction, null hypothesis
o Factors that affect the size of the
p-value
o Defining (and stating the consequences
of) Type I and Type II Errors in context (including direction of Ha)
o How to determine the probabilities of a
Type I Error and of a Type II Error and Power
·
Type
II/Power is for a particular instance of alternative hypothesis
o Explain the
idea of Power and what things might effect the power
of a study/analysis (e.g., sample size(s), type of study design)
·
How
to calculate and interpret a confidence interval
o General form: observed statistic ±
(critical value)×(standard error of statistic)
·
Note,
sometimes we think of “statistic” as a single number and some times as a formula…
o Interpretation: level, parameter,
context (with differences/ratios, include “direction”)
·
Clarify
larger population/process
o Interpret confidence “level” (separate
from interpreting interval)
o How to solve for the sample size
necessary to obtain a specific margin of error for a stated confidence level
·
Duality
between intervals and tests: Any parameter value not contained in a C% CI will
be rejected by a two-sided test at
(100-C)/100 significance level
·
Describe
the difference between statistical significance and practical significance
(e.g., is it a meaningful difference in context)
·
Calculating
p-values for Fisher’s Exact Test and/or binomial process (when ok to do)
·
How
to decide which procedure you should use (quantitative or categorical data, one
or two populations, Fisher’s Exact Test vs. binomial vs. normal vs. t)
·
For
validity of “theory-based” procedures, I tend to worry less about the
randomness condition and more about the sample size condition. The randomness condition is more important to
scope of conclusions.
·
Be
able to get t* and z* values using
technology use 2 as the approximate multiplier with
95% confidence
·
Make
sure you know where ()s go in prediction interval
standard error
See summary tables (including on
technology) and end of chapter examples
Some big
picture stuff
What is Statistical Inference?
The population parameter and the sample statistic summarize the same
variable. The population parameter summarizes the variable for the population,
which is what we want to know, e.g. or
1-2. However, we can’t observe the whole population so we don’t know what the parameter value really
is. However, we can measure the variable
on a sample or randomized groups and compute a sample statistic, e.g. or 1 – 2. The question is what can we infer about the parameter based on this
statistic? Because these statistics follow a null distribution/regular pattern
due to the randomness in the study design, we can estimate or calculate
probabilities of different values of the statistic occurring. Different statistics follow different
distributions, but once we know which distribution we should use, we can make
conclusions about the value of the parameter, e.g., it’s in some interval or we
have evidence that it is not a particular value.
Null Distributions
If we specify a
value for the population parameter, we can take (or simulate) lots of samples
from this population or lots of random assignments and calculate a statistic
for each sample/random assignment. This
allows us to examine the behavior of the statistic so we can discuss the shape,
center, and variability of this “null” distribution. For example, what types of values do we
expect the statistic to have, how far away might the statistic stray from the
hypothesized value of the parameter?
Simulation vs. “Large sample”
(Theory-Based) procedures
In almost every
case we have seen two different ways to approximate the p-value: simulation and
a mathematical model. We are considering the simulation approaches to always be
valid. The mathematical models are only appropriate if the “validity
conditions” of the theory-based approach are met. The advantage of the mathematical models is
we can easily get a confidence interval as well. So you may want to
consider the mathematical model way first but then if the validity conditions
aren’t met, use the simulation approach.
Confidence Intervals Estimate
population parameter
The goal of a
confidence interval is to get a range of plausible values that we think the
population parameter could be equal to.
To do this, we use the sample statistic and a measure of the sampling
(or shuffle to shuffle) variability of the sample statistic. This lets us form an interval around the
sample statistic that should contain the population parameter. Note, we are trying to contain the population
parameter in the interval, not the data and not the sample statistic. In fact,
the sample statistic better be the midpoint (center) of the interval.
Tests of Significance Test
claim about population parameter
The goal of a
test of significance is to make a decision about the
population parameter. Here are the steps
we use:
1) Define the
parameter(s) of interest. (Should also
be able to define the OUs and variable)
2) Specify the
hypotheses (e.g. H0: =1/3,
=50, 1- 2 =0, or no relationship between variable 1 and variable 2 in population)
Always in terms of the population (parameters)
because that’s what is unknown and what we are trying to make statements about (take off the hats!)
The null hypothesis is the “dull
hypothesis” or the “ho-hum hypothesis”
The alternative hypothesis specifies
something interesting (“a-ha!”)
One
or two-sided (decide based on wording of research question)
3) Check the
validity conditions, sketch the null distribution of the (test) statistic
assuming H0 is true, and identify the appropriate test procedure by
name
If
the validity conditions are not met, use a randomization-based (simulation)
method instead
4) Compare the
data observed in the sample to what’s “expected” from H0. Find the
p-value=probability of observing a value of the statistic as extreme or more
extreme when H0 is true.
Know how to get the computer to give
you the appropriate one or two-sided p-value
5) Draw a
conclusion in context
Decide
to reject or fail to reject Ho
Reject if p-value, synonymous with saying result is
“statistically significant”
Make
conclusion about research question of interest (back to English)
If we
repeatedly took different samples or random shuffles and calculated the value
of the statistic for each sample, the p-value indicates how often we would
expect to see the statistic value that we actually did
observe, or one more extreme, when Ho is true. If the statistic value is very unlikely (so
small p-value) we stop believing H0 (recall the loaded dice
example). We can compare to the significance
level as a benchmark to decide whether the p-value is “too small.”
T vs Z With proportions, our observations consist of “yeses” and
“nos” for each observational unit in the
population. A picture of this population
is simply a bar graph. In particular, we don’t worry about its shape or
variability. We will always consider approximating the null distribution of the
sample proportions with the normal distribution. Thus, we never worry about
using the t distribution with
proportions. With means, the t distribution is used to take into account the extra variation we will see in the
null distribution if we also substitute the sample standard deviation, s, into the equation. The key is that both a z-statistic and a
t-statistic have “standardized” our observed statistic onto a comparable scale.
Population vs Variable vs Parameter A population is a group of objects, a
variable is what we measure about the objects, a question (e.g., height). The observational units are the objects we
measure (e.g., buildings, volleyball players).
You need to be able to decide how many populations you have and how many
variables, e.g., are you measuring two different things/answering two different
questions about the objects (e.g., height and age); are you measuring the same
thing on two different groups (e.g. heights of men and
heights of women). Parameters are
numbers, we just may not know their exact numerical value, that describe the
population (e.g., the average height of all buildings, the average age of all
volleyball players).
Independence/Matched Pairs We can also assess the “independence”
between samples to justify a two-sample procedure. This is not the same as the independence
between variables. Instead we are making sure the
responses of one group are not influencing or related to the responses in the
other group. If they are, then a better
analysis is to take that dependence into account (e.g., a “paired t” procedure).
Independence/Association First, remember that we talk about
independence/association between two variables. We don’t talk about the outcomes of the
variables or levels of the variables, but the entire variable. Two variables are associated if they are
related to each other, that is, if knowledge of one gives us information about
the other.
Other notes:
·
With
bar graphs, always use percentages as the vertical scale (instead of just
number of)
·
Remember,
compare two or more populations OR
examine the association between two variables.
·
A p-value is not the probability of a null
hypothesis or a conclusion being true.
·
Remember
your p-value allows you to make a conclusion about whether there is evidence
again H0 or not. We can’t say
“there is strong evidence of no association” because we assumed no association
in the calculations/simulation. So all we can say is “there is not strong evidence on an
association.”
·
Make the link between your p-value and your decision
explicit. Don’t forget to then make a
conclusion in context.
Question Translations If the question asks you to
describe/compare
the distribution(s) of a categorical variable |
Look at
(conditional) proportions |
describe/compare
the distribution(s) of a quantitative variable |
Shape,
center, spread, unusual observations Use mean,
median, SD, IQR if available |
comment on “statistical
significance” or “strength of evidence” |
Consider the
p-value |
estimate “how
large the difference is” or “plausible values for the parameter” |
Consider the
confidence interval |
comment on
generalizability |
Consider the
data collection methods and specify a reasonable population |
comment on
causation |
Consider
whether you have a randomized experiment and statistical significance |
describe a
confounding variable |
Specify a
variable and argue how it might differ between the explanatory variable
groups and relate to the response variable |
describe a
parameter |
Specify the
number (e.g., mean or proportion or slope), the variable (e.g., how you are
defining success), and the population (don’t worry too much at this point
about whether it’s a reasonable population) |
interpret a
p-value |
Begin the
sentence “the probability that…” or “the proportion of …” and put your answer
in context of the problem (e.g., what source of randomness are we are modelling?
what statistic are you talking about, what value was observed, what the null
hypothesis specified in context, what do you mean by “or more extreme”) |
interpret a
confidence interval |
Begin the
sentence “I am 95% confident that <<parameter>> is in (XX, XX)”. Clarify
parameter, population, context If an
interval about a difference, clarify which population parameter has a higher
value: I’m 95% confident that <<>> is XX to XX (times) (larger,
smaller) than <<>> |
Interpret the
confidence level |
Talk about
the reliability of the method, if you repeated the process for different
samples, what percentage of the resulting intervals would succeed in
capturing the parameter |
draw a
conclusion from a p-value (evaluate a p-value) |
Comment on
whether the p-value should be considered small, reject
or fail to reject the null hypothesis, and restate the conclusion you are
going with in the study context |
calculate a
confidence interval by hand |
Use 2SD
short-cut |
state
hypotheses |
Probably want
both null and alternative. Could ask for
you to do this in words and/or in symbols. Make sure you
are clearly talking about the population parameter and in context |
identify the
procedure |
Name the test
you would use (e.g., one proportion). Also be prepared to describe a simulation
process you could use to estimate a p-value (e.g., flip a coin X times,
shuffle X blue and X green poker chips X times) |
comment on
validity conditions |
Consider the sample
size condition for the relevant procedure as on the Overview of Statistical
Procedures handout |
If the question
asks you to calculate a p-value
|
Simulation |
Exact |
Theory-based |
One proportion |
One
proportion applet |
Random
sampling: Binomial distribution (Or
hypergeometric if sampling from a finite population) |
one-sample z-test (need at least 10 successes and
at least 10 failures) – One proportion or TBI applets |
One mean |
Random
sampling: not really, need a population to sample from where the null hypothesis
is true (Sampling from Finite Populations applet), could use bootstrapping |
Not unless
could list out each possible random sample from the population and calculate
the statistic for each |
one-sample t-test (need n > 30 or symmetric population) – TBI applet or JMP or
R |
Two proportions |
Random
sampling: independent random samples from binomial process Random
assignment: Analyzing two-way tables applet |
Random sampling:
no (but can fix both margins and approximate with Fisher’s) Random
assignment: Fisher’s Exact Test (hypergeometric distribution) |
two-sample z-test (need at least 5 successes and
at least 5 failures in each group) – Analyzing two-way tables applet or TBI
applet or JMP or R |
Two means |
Random
sampling: not really, need populations to sample from that have the same
population mean, could use bootstrapping or Two Populations applet Random
assignment: Comparing Groups (Quant) applet |
Random
sampling: no Random
assignment: not really, probably too many different random assignments to
list out/ find the statistic for each |
two-sample t-test (need both n’s > 20 or
both populations normal) –TBI applet or JMP or R (can also use Comparing
Groups (Quant) applet |
Matched pairs |
Quantitative:
Matched pairs applet Categorical:
sampling from binomial with n =
number of differing responses |
Not really Exact
binomial (“sign test”) |
One-sample t-test on differences (need at least
30 differences or normality of differences) z-test for binomial (need at least 10 successes and
failures) |
You should also
be able to answer questions like – how would this
(e.g., p-value, margin of error, conclusion) change if you did this (e.g.,
change sample size, change hypothesized value, confidence level), as well as
more conceptually-based questions (e.g., what does it all mean, explain the
reasoning, what is this number measuring).