Stat 321 – Final Review
Finals week office hours: Tuesday 12-2 and by appointment
Review Session Tuesday?
Additional Review Problems:
In general, you should be able to
· Be able to distinguish (in words and with symbols) between sample statistic vs. parameter (of population or of probability distribution)
o Understand what is random and what is constant (though perhaps unknown)
· Understand the definition of (simple) random sample (SRS)
· Understand what is meant by “sampling distribution of a statistic”
· Know how to derive the exact sampling distribution of a statistic for small sample spaces (e.g., X discrete, n = 2)
· Understand how we approximate the sampling distribution of a statistic/estimator using simulation
· Be able to interpret a “repeated sampling” simulation done in Minitab
· Interpret probability plots (section 4.10)
From Section 2.5 you
should know how to
·
Derive
the expected value of a linear combination of random variables (especially
,
)
o
E(
) = m (always true),
o
E(
= X/n) =
p (always true when X is binomial)
·
Derive
the variance of a linear combination of independent random variables
(especially
,
)
o
SD(
) = s /
(if independent rv’s)
o
SD(
) =
(if random sample from
a large population, independent observations)
From Section 4.11 you
should be able to:
·
Use the Central Limit Theorem to predict the
shape of the distribution of ![]()
o Normal if population distribution is normal
o Approximately normal if sample size is “large” (e.g., n > 30)
·
Use the Central Limit Theorem to predict the
shape of the distribution of ![]()
o Approximately normal if np>10 and n(1-p)>10
· Use simulations to verify the predictions made by the CLT
·
Calculate probabilities for sample statistics
(e.g., probability we find a
> 3.12 when
)
and how these probabilities are affected by changes in sample size, parameter
values.
o Be able to sketch the theoretical sampling distribution (scale and label the horizontal axis).
o Be able to apply the empirical rule
o Be able to apply a continuity correction
o Be able to interpret these probability statements in context. Be able to make decisions about “claimed” values of the parameter based on this probability.
Note: Can pay less attention
to the distribution of the sum or count (convert to
,
instead)
From Section 4.9 and 4.12 (point estimation) you should be able to:
· Understand the difference between an estimator and an estimate
·
Evaluate
the bias of an estimator (formulas, simulation)
o Explain what “unbiased” means? Why is this a desirable property of an estimator? Always?
o Understand
the definition of bias in terms of
overall systematic tendency
·
Evaluate
the variability of an estimator (formulas, simulation)
·
Evaluate
the mean square error of an estimator (bias2 + variance)
·
For
example, be able to compare performance of estimators using Minitab
simulation results
o
Different simulation plans:
§
Estimating
a probability by repeating a random process many times
§
Random
sampling (e.g., estimating bias in a statistic)
·
Sampling
from a probability distribution
·
Sampling
from a finite population
·
Bootstrapping
(sampling from the sample with replacement, useful for estimating
SD(statistic))
·
You
should also be able to find the expected value of an estimator and use that to
suggest the form of an estimator that would be unbiased
From Section 5.1-5.3 (Confidence Intervals) you should be able to:
· Understand the principle of confidence intervals
o General form: estimate + (critical value)(standard error of estimate)
o margin of error = half-width of interval
§ measures amount of random sampling variability (only)
o specifies range of plausible values for parameter based on sample statistic
· Interpret “confidence” in your own words
o without using the words “confidence” or “sure” or “chance” or “probability of parameter”…
· Understand limitations, misinterpretations of confidence intervals
· Predict how intervals are affected by changes in sample size, confidence level, population size, parameter value
· Calculate and interpret confidence intervals for population mean, population proportion
o Know the “validity conditions” for each method and how to check them
§ CI
for m:
+ ta/2,n-1s/
if SRS; n > 30 or pop normal (normal prob
plot) -- t-interval
§ Prediction
interval:
+ ta/2,n-1s
if SRS; pop
normal
§ CI for p:
·
+ za/2
, if SRS; n
> 10, n(1-
) > 10 -- Wald
·
(95%)
+
1.96
where
= (X+2)/(n+4),
if SRS -- Agresti-Coull (aka “Plus
Four”)
· Use a bootstrap distribution to estimate a confidence interval for other statistics (e.g., median)
· Determine necessary sample size n to ensure desired width or half-width for given confidence level
·
Know how
and when to calculate a prediction interval for an individual value (recognize
language asking for this)
Earlier material to be especially aware of:
· Describing distributions of data numerically and graphically (and in context)
· What is probability?
· What is a random variable?
· What is the expected value of a random variable? Standard deviation?
· What is a pmf, cdf, pdf? How do I graph them? How do I express the function for all x?
· Being able to identify the appropriate discrete probability distribution.
· Calculating probabilities and expected value involving famous continuous and discrete distributions (e.g., binomial, normal) and generic distributions (integration?).
· Normal approximation to binomial, binomial approximation to hypergeometric
· Independence
· Conditional probability
· The distinction between “data” and “model” and between a distribution of data and a probability distribution
SOME PROBLEM
SOLVING STRATEGIES
1. Are you finding a conditional or an unconditional probability?
2. If it
involves
or
, can you use the central limit theorem to state the
probability distribution?
3. If it involves a random variable which follows one of the common probability distributions (e.g., binomial, gamma) then use the formulas page (e.g., cdf).
Look for the phrase “approximate probability” in case one of the approximations (e.g., normal to binomial, poisson to binomial) might apply.
You may have to recognize whether the random variable belongs to a known discrete probability family yourself.
4. If it involves a random variable but you are given the pmf or pdf, determine the probability directly (summing or integrating).
5. If it does not involve a random variable, use techniques from chapter 2 (permutations, combinations, addition rule, multiplication rule). Make sure you are not applying any results without checking assumptions first (e.g., mutually exclusive, independent).
6. If you are told only a situation, you could be asked to outline or interpret a simulation to determine empirical probabilities.
1. If the expression is a linear function of random variable(s), first simplify using the rules for expected value
e.g., E(2X+3Y) = 2E(X)+3E(Y)
2. Once you
get to E(rv), is the rv a sample mean (
) or a sample proportion (
)? If so, then E(
)=m = E(X) or E(
) = p
3. Once you get to E(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value
e.g., if Y is a binomial random variable, E(Y) = np
4. If Y is not from a common probability distribution family, determine E(Y) or E(h(Y)) directly given the pmf (summing) or pdf (integrating)
discrete: E(Y) = SyP(Y=y) E(h(Y)) = Sh(y)P(Y=y)
continuous: E(Y) = òyf(y)dy E(h(Y)) = òh(y)f(y)dy
1. If the expression is a linear function of random variable(s), first simplify using the rules for variance
e.g., V(2X+3Y) = 4V(X)+9V(Y) if X and Y are independent
2. Once you
get to V(rv) is the rv a sample mean (
) or a sample proportion (
)? If so, then V(
) = s2/n = V(X)/n or V(
) = p(1 – p)/n
3. Once you get to V(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value
e.g., if Y is a binomial random variable, V(Y) = np(1 – p)
4. If Y is not from a common probability distribution family, determine V(Y) directly given the pmf or pdf using V(Y)=E(Y2) - [E(Y)]2
discrete: E(Y2) = Sy2P(Y=y) continuous: E(Y2) = òy2f(y)dy
Note: the previous two sections discuss finding a “mean” or
a “standard deviation.” Remember, you could
also be given a set of data and asked to use the techniques from chapter 1 to
find
and s.
0. Define the parameter in words (e.g., let p = proportion of all Cal Poly students who…)
1. Is it for a population mean or a population proportion?
If a
population mean, are you told a value of s?
(if yes use z, otherwise use t)
2. Check the validity conditions to see whether our formulas are valid
3. Calculate
the interval and write a one sentence summary (e.g., “I’m 95% confident that”
being clear what the parameter is you are estimating)
4. Be able to interpret the phrase “confidence” in your own words if asked (“95% of intervals…”)
5. Know what factors affect the behavior of the confidence intervals (width, midpoint, coverage)
·
Know the
“validity conditions” required by different procedures and how to check them.
·
Be able
to make interpretations and explanations of your calculations
·
Remember
to follow the “of,” probability “of what”?!
MINITAB OUTPUT YOU COULD
BE EXPECTED TO INTERPRET
· Numerical and graphical summaries of a distribution of data (e.g., histogram, interquartile range)
· Calculate probabilities, cumulative probabilities for known probabilities distributions (e.g., binomial, normal, gamma, etc.)
· Confidence intervals for m, p
· Simulation, including through a macro
SOME GENERAL EXAM
PREPARATION ADVICE:
· Be prepared to think/explain/interpret
· Understand rather than only memorize
· Don’t plan to rely heavily on the notes pages
· Reread handouts, earlier exams, the text
· Rework examples from class, homework exercises, review problems
· Consider the big ideas from labs
· Make sure you can read/use Minitab output
Notation, Acronyms:
|
E(X) |
expected value of the random variable X (aka m) |
|
V(X) |
variance of the random variable X (aka s2) |
|
Z |
standard normal random variable; number of standard
deviations from mean (X-m)/s |
|
m |
population mean; expected value of a random variable; mean of normal distribution |
|
s |
population standard deviation; standard deviation of a random variable; SD of normal |
|
|
sample mean |
|
s |
sample standard deviation |
|
|
sample proportion of successes |
|
p |
population proportion of successes; probability of success |
|
q |
generic unknown parameter |
|
|
estimator of generic unknown parameter value |
|
n |
sample size (number of trials, number of observations recorded) |
N
|
population size |
|
q |
1 – p |
|
|
Standard deviation of sample means SD( |
|
|
Mean of sample means, E( |
|
a, b |
parameters of distribution, e.g., Weibull, Gamma |
|
l |
parameter of exponential, Poisson distributions |
|
G |
gamma function, see formulas page |
|
F |
cdf of standard normal distribution (we didn’t use) |
|
rv |
random variable |
|
pmf p(x) |
probability that a discrete random variable is equal to x |
|
pdf f(x) |
integrated to determine the probability for a continuous random variable over interval |
|
cdf F(x) |
cumulative distribution function, P(X<x) for continuous or discrete random variable |