Stat 301 – Final Review
Optional Review Sessions: Sunday 4pm (Zoom), Tuesday 4pm (Zoom)
Final Exam: 12pm
section Monday 10:101pm, 1pm section Wednesday 1:104pm
Format: You are
allowed 3 pages of notes. You will have access to R, JMP, applets. If I want
you to use a datafile, I will post it in all 3 formats for easy loading. The computer questions won’t be a huge
component, but I do want some demonstration that you can carry out an analysis.
The exam will be a mix of multiple choice, short answer, and longer answer
questions. It’s possible I will ask you to perform an analysis, describe your
process, describe your output. I will try to designate problems where the
computer will be helpful. You should
feel especially encouraged to ask computer questions during the exam (I want to
do x, but it’s not working) and I can often help.
The final exam
will be cumulative, including a comparison of methods across chapters. You
should focus on the entire statistical process: How do we design a study to
achieve particular research goals? How
do we describe the data we have? How do we test claims about population
parameters or processes and/or estimate parameters? How do we state our final conclusions,
especially after considering how the study was conducted? You should also think about the reasoning
behind the statistical methods, such as standardizing, significance, and
confidence in general.
Advice:
Understand and be able
to apply the procedures first, then worry about more subtle issues and review
the how and why behind the development of the procedure. Also, for all the procedures, know what must
be done by hand and what can be done on the computer. Review the Case Study
(trying the questions from scratch) as well as the “final exam multiple choice
practice” in Canvas.
The focus of Chapter 4 was a quantitative response variable.
From Section 4.1 you should know that
· You
can create an “exact randomization” distribution if you list all possible
random assignments to the two groups, identify an appropriate statistic
(formula) and calculate the statistic for each possible random assignment,
count how many of the random assignments give a statistic at least as extreme
as the one you observed, and evaluate the pvalue.
From Section 4.2 you should know how to
·
Create numerical and graphical summaries comparing two groups with
a quantitative variable
o Compare distributions in terms of shape, center, and spread (citing
appropriate numerical evidence when available), unusual observations
·
Including what is meant by “variability” in a distribution and
different ways it can be measured
·
Possibly explain outliers (know how they can be identified)
o Make sure your comments are
always in context (including measurement units)
·
Predict the behavior of the sampling distribution of the differences
in two sample means from random sampling
(applet;
see also population models version?) Including whether the sampling
distribution of the differences in sample means should be normal or
approximately normal
o
Reasoning behind the standard error formula (adding variances), should make sense to you that the SE of the
difference is larger than the individual variances
o
Reasoning behind the tdistribution
for the standardized statistic
·
Carry out a twosample ttest
for the difference in population means using
technology (JMP, R, TBI applet)
o State null and alternative
hypotheses
o Assess validity of the
procedure
o Raw data vs. Summary
statistics
o Interpret results, including interpretations of test statistic and pvalue (in
terms of random sampling)
·
Factors that influence pvalue and confidence interval midpoint
and width
·
Determine and interpret a twosample tconfidence interval
o Make sure confidence level,
mean, variable (context), and “direction” are clear in interpretation
From
Section 4.3 you should be able to
·
Set up and explain the reasoning process behind simulating a
randomization test (applet)
o How models the null
hypothesis being true/random assignment in the study
o How to simulate the null
distribution (random assignment vs. random sampling)
o How to use the generated
null distribution to calculate an empirical pvalue (one or twosided)
o Provide a detailed
interpretation of what the pvalue is measuring in the context of the research
study (and in terms of random assignment)
o Make a decision about the
null hypothesis based on the pvalue
o Be able to do this for
difference in sample means, difference in sample medians, or other sample
statistics
·
Summarize (and justify) study conclusions in terms of
significance, estimation (=confidence interval), causation, and
generalizability
·
Check the technical conditions to assess the validity of the t procedures
·
Carry out twosample ttests
for comparing two longrun treatment means using technology (JMP, R, TBI
applet)
o Consider alternatives if
technical conditions are not met (e.g., simulation, transformations)
o Be able to test hypothesized
differences other than zero
·
Calculate (using JMP, R, TBI applet) and interpret twosample tconfidence intervals for the
difference in longrun treatment means
o Explore a data
transformation for improving the validity of a tprocedure
o Interpret a confidence interval
after a log transformation in terms of the original units (multiplicative
change in median)
·
Explain the effects of the difference in sample means, sample
sizes, and within group variability on the pvalue, confidence interval, and
power
From
Section 4.4 you should be able to:
·
Distinguish
between matched pairs designs and twoindependent “samples” (or completely
randomized) designs
o Consider benefits and disadvantages
(including feasibility) of the two types of designs
o Be able to describe how to set up a
matchedpairs design, including randomizing the order of the treatments if
repeated observations or within two similar experimental units in pairs
·
Determine
whether a data collection plan necessitates a matched pairs analysis
·
Create
numerical and graphical summaries comparing matched pairs data by calculating
and examining the differences
·
Carry
out and interpret a matched pairs test simulation
o Logic behind it (exchangeability of the
two responses within each pair)
o Interpreting the results
·
Use
technology (JMP, R, TBI applet on the differences, Matched Pairs applet) to
carry out and interpret a matched pairs ttest
·
Use
technology (JMP, R, TBI applet, Matched Pairs applet) to calculate and
interpret a matched pairs tinterval
·
Use
technology to carry out and interpret a sign test with paired data
o Exact binomial and/or normal
approximation
·
Set
up a twoway table with paired categorical data (the two treatments are the two
variables = row/column variables)
·
Use
technology to carry out and interpret McNemar’s test
o Exact binomial and/or normal approximation
Other Notes
·
Be
able to distinguish the standard deviation of the sample, the pooled standard
deviation for two samples (combining the groups together to estimate one SD),
and the standard deviation of the distribution of the difference in sample
means (the pooled and unpooled versions), the version with the “correlation”
between the random variables
·
Keep
in mind that a confidence interval doesn’t really give you “additional
evidence” (against the null hypothesis) but it’s a different way of presenting
the same evidence
o
Keep
in mind that we never can use the procedures in this course to establish
evidence for the null hypothesis
·
We
can easily test nonzero values for the hypothesized difference in means
o
How
to do this with a ttest
o
How
to represent this in a simulation (e.g., need to remove the treatment effect,
then groups should be equivalent so shuffle, and then add the treatment effect
back)
·
Be
able to distinguish/explain bootstrapping vs. sampling from a finite
(hypothetical) population
o
Key
goal is estimating the sample to sample variation in the statistic
o
Can
compare to theoretical results, but latter is only available for certain
statistics
·
Can
use bootstrapping to compare two samples
o
Bootstrapping
from each sample separately vs. pooling together the samples first
Keep in mind
·
When
comparing distributions, remember to cite your evidence if you think there is a
difference in the groups. In particular, tell me what you see in the summary
statistics (e.g., a higher mean) that leads to your conclusion (e.g., sleep
deprived subjects tend to have lower improvements)
o
Remember
that the confidence level refers to
the reliability of the method – how often, in the long run, random samples will
produce an interval that succeeds in capturing the population parameter
·
Remember
to think about/decipher the direction of subtraction used by the technology
·
Why
is variability in the data an important consideration and how can we reduce it?
What are advantages and disadvantages to doing so?
·
We
can use a twosample ttest even when
the sample sizes are small if we have reason to believe the population
distributions are themselves normally distributed. You can try to judge this,
especially if you don’t have past experience with the variable, based on graphs
of the sample data. If the sample data
looks plausibly normally distributed (normal probability plots are a useful
tool for helping this judgment), you can cite this as evidence that the
population distribution is normally distributed. If you aren’t sure, then use an
alternative analysis (e.g., simulationbased) instead.
·
Keep
in mind the twosample ttest only
compares the two means (vs. other aspects of the distributions or other
measures of center)
·
Try
to avoid the word “accurate” without explaining exactly what you mean by it.
·
Try
to avoid use of the word “group” but clarify if you mean the sample or the
population or the treatment in general
·
Avoid
use of the word “it”
The Cumulative Component (also see old
Review handouts)
Things to
remember include:
·
Identifying
observational units and defining variables, samples vs. populations vs. null
(sampling/randomization) distributions, parameters vs. statistics, explanatory
vs. response variable, bias vs. precision, random assignment vs. random
sampling (including goals)
·
Experiments
vs. Observational Studies
o How to design a randomized experiment,
How to properly select a sample
o Scope of conclusions depending on how
study was conducted (Can you draw a cause and effect conclusion? Can you
generalize to a larger population?)
o Sampling errors, nonsampling errors, and
random sample errors (and which of these are measured by the “margin of
error”?)
·
Describing
and comparing distributions of data
o Categorical: segmented bar graphs,
conditional percentages, difference in proportions vs. relative risk vs. odds
ratio (and how to interpret)
o Quantitative: shape, center, and spread,
stemplots, boxplots, histograms, dotplots, resistance of median and IQR
o When describing distributions, if you
have access to numerical summaries, use them to support your claims
·
How
to interpret probability as a longrun relative frequency
·
How
to carry out a test of significance
o About a population proportion and/or
population mean and/or treatment effect and/or difference in population
proportions and/or difference in population means
·
Make
sure you can state Ho and Ha in symbols and in words
o Onesided vs. twosided alternatives
o Which technical conditions apply and how
to check them and what they tell you
·
e.g.,
proportions: n >
10 and n(1) > 10, means: n > 30 or normal population
o Interpretation of test statistic (if
appropriate)
·
General
form: (statistichypothesized)/(standard error of statistic)
o Ideas and distinctions of sampling
distribution and randomization distribution
o How to calculate and/or approximate
pvalue
o How to make a decision based on the pvalue
and level of significance
o How to interpret the pvalue
·
Source
of randomness, choice of statistic, observed result, direction, null hypothesis
o Factors that affect the size of the
pvalue
o Defining (and stating the consequences
of) Type I and Type II Errors in context (including direction of Ha)
o How to determine the probabilities of a
Type I Error and of a Type II Error and Power
·
Type
II/Power is for a particular instance of alternative hypothesis
o Factors that affect the probability of
Type I and Type II Errors, Power
·
How
to calculate and interpret a confidence interval
o General form: estimate ± (critical
value)×(standard error)
o Interpretation: level, parameter,
context (with differences/ratios, include “direction”)
·
Clarify
larger population/process
o Interpret confidence “level” (separate
from interpreting interval)
o How to solve for the sample size necessary
to obtain a specific margin of error for a stated confidence level
·
Duality
between intervals and tests: Any parameter value not contained in a C% CI will
be rejected by a twosided test at
(100C)/100 significance level
·
Describe
the difference between statistical significance and practical significance
(e.g., is it a meaningful difference in context)
·
Calculating
pvalues for Fisher’s Exact Test and/or binomial process (when ok to do)
·
How
to decide which procedure you should use (quantitative or categorical data, one
or two populations, Fisher’s Exact Test vs. binomial vs. normal vs. t)
·
For
validity of “theorybased” procedures, I tend to worry less about the
randomness condition and more about the sample size condition. The randomness condition is more important to
scope of conclusions.
· Be able to get t* and z* values using technology use 2 as the approximate
multiplier with 95% confidence
·
Make
sure you know where ()s go in prediction interval standard error
See summary tables (including on
technology) and end of chapter examples
Some big
picture stuff
What is Statistical Inference?
The population parameter and the sample statistic summarize the same
variable. The population parameter summarizes the variable for the population,
which is what we want to know, e.g. _{} or
_{}_{1}_{}_{2}. However, we can’t observe the whole
population so we don’t know what the parameter value really is. However, we can measure the variable on a
sample or randomized groups and compute a sample statistic, e.g. _{}or _{}_{1} – _{}_{2}. The question is what can we infer about the parameter based on this
statistic? Because these statistics follow a null distribution/regular pattern
due to the randomness in the study design, we can estimate or calculate
probabilities of different values of the statistic occurring. Different statistics follow different
distributions, but once we know which distribution we should use, we can make
conclusions about the value of the parameter, e.g., it’s in some interval or we
have evidence that it is not a particular value.
Null Distributions
If we specify a
value for the population parameter, we can take (or simulate) lots of samples
from this population or lots of random assignments and calculate a statistic
for each sample/random assignment. This
allows us to examine the behavior of the statistic so we can discuss the shape,
center, and variability of this “null” distribution. For example, what types of values do we
expect the statistic to have, how far away might the statistic stray from the
hypothesized value of the parameter?
Simulation vs. “Large sample” (TheoryBased)
procedures
In almost every
case we have seen two different ways to approximate the pvalue: simulation and
a mathematical model. We are considering the simulation approaches to always be
valid. The mathematical models are only appropriate if the “validity
conditions” of the theorybased approach are met. The advantage of the mathematical models is
we can easily get a confidence interval as well. So you may want to consider the mathematical
model way first but then if the validity conditions aren’t met, use the
simulation approach.
Confidence Intervals Estimate
population parameter
The goal of a
confidence interval is to get a range of plausible values that we think the
population parameter could be equal to.
To do this, we use the sample statistic and a measure of the sampling
(or shuffle to shuffle) variability of the sample statistic. This lets us form an interval around the
sample statistic that should contain the population parameter. Note, we are trying to contain the population
parameter in the interval, not the data and not the sample statistic. In fact,
the sample statistic better be the midpoint (center) of the interval.
Tests of Significance Test
claim about population parameter
The goal of a
test of significance is to make a decision about the population parameter. Here are the steps we use:
1) Define the
parameter(s) of interest. (Should also
be able to define the OUs and variable)
2) Specify the
hypotheses (e.g. H_{0}: _{} =1/3,
_{}=50, _{}_{1} _{}_{2 }=0, or no relationship_{ }between variable 1 and variable 2 in population)
Always in terms of the population
(parameters) because that’s what is unknown and what we are trying to make
statements about (take
off the hats!)
The null hypothesis is the “dull
hypothesis” or the “hohum hypothesis”
The alternative hypothesis specifies
something interesting (“aha!”)
One or twosided (decide
based on wording of research question)
3) Check the
validity conditions, sketch the null distribution of the (test) statistic
assuming H_{0} is true, and identify the appropriate test procedure by
name
If
the validity conditions are not met, use a randomizationbased (simulation)
method instead
4) Compare the
data observed in the sample to what’s “expected” from H_{0}. Find the
pvalue=probability of observing a value of the statistic as extreme or more
extreme when H_{0} is true.
Know how to get the computer to give
you the appropriate one or twosided pvalue
5) Draw a
conclusion in context
Decide
to reject or fail to reject H_{o}
Reject if pvalue_{}_{}, synonymous with saying result is
“statistically significant”
Make
conclusion about research question of interest (back to English)
If we
repeatedly took different samples or random shuffles and calculated the value
of the statistic for each sample, the pvalue indicates how often we would
expect to see the statistic value that we actually did observe, or one more
extreme, when H_{o} is true. If
the statistic value is very unlikely (so small pvalue) we stop believing H_{0}
(recall the loaded dice example). We can compare to the significance level as a benchmark to decide whether the pvalue is
“too small.”
T vs Z With proportions, our observations consist of “yeses” and
“nos” for each observational unit in the population. A picture of this population is simply a bar
graph. In particular, we don’t worry
about its shape or variability. We will always consider approximating the null
distribution of the sample proportions with the normal distribution. Thus, we
never worry about using the t
distribution with proportions. With
means, the t distribution is used to
take into account the extra variation we will see in the null distribution if
we also substitute the sample standard deviation, s, into the equation. The
key is that both a zstatistic and a tstatistic have “standardized” our observed
statistic onto a comparable scale.
Population vs Variable vs Parameter A population is a group of objects, a
variable is what we measure about the objects, a question (e.g., height). The observational units are the objects we
measure (e.g., buildings, volleyball players).
You need to be able to decide how many populations you have and how many
variables, e.g., are you measuring two different things/answering two different
questions about the objects (e.g., height and age); are you measuring the same
thing on two different groups (e.g. heights of men and heights of women). Parameters are numbers, we just may not know
their exact numerical value, that describe the population (e.g., the average
height of all buildings, the average age of all volleyball players).
Independence/Matched Pairs We can also assess the “independence”
between samples to justify a twosample procedure. This is not the same as the independence
between variables. Instead we are making sure the responses of one group are
not influencing or related to the responses in the other group. If they are, then a better analysis is to
take that dependence into account (e.g., a “paired t” procedure).
Independence/Association First, remember that we talk about
independence/association between two variables. We don’t talk about the outcomes of the
variables or levels of the variables, but the entire variable. Two variables are associated if they are
related to each other, that is, if knowledge of one gives us information about
the other.
Other notes:
·
With
bar graphs, always use percentages as the vertical scale (instead of just
number of)
·
Remember,
compare two or more populations OR examine the association between two variables.
·
A pvalue is not the probability of a null
hypothesis or a conclusion being true.
·
Remember
your pvalue allows you to make a conclusion about whether there is evidence
again H_{0} or not. We can’t say
“there is strong evidence of no association” because we assumed no association
in the calculations/simulation. So all
we can say is “there is not strong evidence on an association.”
·
Make the link between your pvalue and your decision
explicit. Don’t forget to then make a
conclusion in context.
Question Translations If the question asks you to
describe/compare
the distribution(s) of a categorical variable 
Look at
(conditional) proportions 
describe/compare
the distribution(s) of a quantitative variable 
Shape,
center, spread, unusual observations Use mean,
median, SD, IQR if available 
comment on
“statistical significance” or “strength of evidence” 
Consider the
pvalue 
estimate “how
large the difference is” or “plausible values for the parameter” 
Consider the
confidence interval 
comment on
generalizability 
Consider the
data collection methods and specify a reasonable population 
comment on
causation 
Consider
whether you have a randomized experiment and statistical significance 
describe a
confounding variable 
Specify a variable
and argue how it might differ between the explanatory variable groups and
relate to the response variable 
describe a
parameter 
Specify the
number (e.g., mean or proportion or slope), the variable (e.g., how you are
defining success), and the population (don’t worry too much at this point
about whether it’s a reasonable population) 
interpret a
pvalue 
Begin the
sentence “the probability that…” or “the proportion of …” and put your answer
in context of the problem (e.g., what source of randomness are we are
modelling? what statistic are you talking about, what value was observed,
what the null hypothesis specified in context, what do you mean by “or more
extreme”) 
interpret a
confidence interval 
Begin the
sentence “I am 95% confident that <<parameter>> is in (XX, XX)”. Clarify
parameter, population, context If an
interval about a difference, clarify which population parameter has a higher
value: I’m 95% confident that <<>> is XX to XX (times) (larger,
smaller) than <<>> 
Interpret the
confidence level 
Talk about
the reliability of the method, if you repeated the process for different
samples, what percentage of the resulting intervals would succeed in
capturing the parameter 
draw a
conclusion from a pvalue (evaluate a pvalue) 
Comment on
whether the pvalue should be considered small, reject or fail to reject the
null hypothesis, and restate the conclusion you are going with in the study
context 
calculate a
confidence interval by hand 
Use 2SD
shortcut 
state
hypotheses 
Probably want
both null and alternative. Could ask for
you to do this in words and/or in symbols. Make sure you
are clearly talking about the population parameter and in context 
identify the
procedure 
Name the test
you would use (e.g., one proportion). Also be prepared to describe a
simulation process you could use to estimate a pvalue (e.g., flip a coin X
times, shuffle X blue and X green poker chips X times) 
comment on
validity conditions 
Consider the
sample size condition for the relevant procedure as on the Overview of
Statistical Procedures handout 
If the question
asks you to calculate a pvalue

Simulation 
Exact 
Theorybased 
One proportion 
One
proportion applet 
Random
sampling: Binomial distribution (Or hypergeometric
if sampling from a finite population) 
onesample ztest (need at least 10 successes and
at least 10 failures) – One proportion or TBI applets 
One mean 
Random sampling:
not really, need a population to sample from where the null hypothesis is
true (Sampling from Finite Populations applet), could use bootstrapping 
Not unless
could list out each possible random sample from the population and calculate
the statistic for each 
onesample ttest (need n > 30 or symmetric population) – TBI applet or JMP or
R 
Two proportions 
Random
sampling: independent random samples from binomial process Random assignment:
Analyzing twoway tables applet 
Random
sampling: no (but can fix both margins and approximate with Fisher’s) Random
assignment: Fisher’s Exact Test (hypergeometric distribution) 
twosample ztest (need at least 5 successes and
at least 5 failures in each group) – Analyzing twoway tables applet or TBI
applet or JMP or R 
Two means 
Random
sampling: not really, need populations to sample from that have the same
population mean, could use bootstrapping or Two Populations applet Random
assignment: Comparing Groups (Quant) applet 
Random
sampling: no Random
assignment: not really, probably too many different random assignments to
list out/ find the statistic for each 
twosample ttest (need both n’s > 20 or
both populations normal) –TBI applet or JMP or R (can also use Comparing
Groups (Quant) applet 
Matched pairs 
Quantitative:
Matched pairs applet Categorical:
sampling from binomial with n =
number of differing responses 
Not really Exact
binomial (“sign test”) 
Onesample ttest on differences (need at least
30 differences or normality of differences) ztest for binomial (need at least 10 successes and
failures) 
Also expect
questions like – how would this (e.g., pvalue, margin of error, conclusion)
change if you did this (e.g., change sample size, change hypothesized value,
confidence level), as well as more conceptuallybased questions (e.g., what
does it all mean, explain the reasoning, what is this number measuring).