Stat 321 – Final Review

 

Final Exam Times: Wednesday, 10:10am-1pm

 

Finals week office hours: Tuesday 12-2 and by appointment

Review Session Tuesday?

 

Final Exam Format: This is a closed book exam. You may bring three 8½ x 11 pages (front and back).  You will turn in these pages with the exam.  Sample formulas page. The exam will be cumulative but will emphasize more recent material. There will be a mixture of interpretation and calculation questions. See me for unclaimed graded assignments (quizzes, labs, exams). I especially encourage review of “big picture” ideas from labs.

 

Additional Review Problems:

 

In general, you should be able to

·         Be able to distinguish (in words and with symbols) between sample statistic vs. parameter (of population or of probability distribution)

o   Understand what is random and what is constant (though perhaps unknown)

·         Understand the definition of (simple) random sample (SRS)

·         Understand what is meant by “sampling distribution of a statistic”

·         Know how to derive the exact sampling distribution of a statistic for small sample spaces (e.g., X discrete, n = 2)

·         Understand how we approximate the sampling distribution of a statistic/estimator using simulation

·         Be able to interpret a “repeated sampling” simulation done in Minitab

·         Interpret probability plots (section 4.10)

 

From Section 2.5 you should know how to

·     Derive the expected value of a linear combination of random variables (especially , )

o   E() = m  (always true),

o   E( = X/n) = p (always true when X is binomial)

·     Derive the variance of a linear combination of independent random variables (especially , )

o   SD() = s / (if independent rv’s)

o   SD() = (if random sample from a large population, independent observations)

 

From Section 4.11 you should be able to:

·     Use the Central Limit Theorem to predict the shape of the distribution of

o   Normal if population distribution is normal

o   Approximately normal if sample size is “large” (e.g., n > 30)

·     Use the Central Limit Theorem to predict the shape of the distribution of

o   Approximately normal if np>10 and n(1-p)>10

·         Use simulations to verify the predictions made by the CLT

·     Calculate probabilities for sample statistics (e.g., probability we find a  > 3.12 when ) and how these probabilities are affected by changes in sample size, parameter values.

o   Be able to sketch the theoretical sampling distribution (scale and label the horizontal axis).

o   Be able to apply the empirical rule

o   Be able to apply a continuity correction

o   Be able to interpret these probability statements in context.  Be able to make decisions about “claimed” values of the parameter based on this probability.

Note: Can pay less attention to the distribution of the sum or count (convert to , instead)

 

From Section 4.9 and 4.12 (point estimation) you should be able to:

·         Understand the difference between an estimator and an estimate

·         Evaluate the bias of an estimator (formulas, simulation)

o   Explain what “unbiased” means? Why is this a desirable property of an estimator?  Always?

o   Understand the definition of bias in terms of overall systematic tendency

·         Evaluate the variability of an estimator (formulas, simulation)

·         Evaluate the mean square error of an estimator (bias2 + variance)

·         For example, be able to compare performance of estimators using Minitab simulation results

o   Different simulation plans:

§  Estimating a probability by repeating a random process many times

§  Random sampling (e.g., estimating bias in a statistic)

·         Sampling from a probability distribution

·         Sampling from a finite population

·         Bootstrapping (sampling from the sample with replacement, useful for estimating SD(statistic))

·         You should also be able to find the expected value of an estimator and use that to suggest the form of an estimator that would be unbiased

 

From Section 5.1-5.3 (Confidence Intervals) you should be able to:

·         Understand the principle of confidence intervals

o   General form: estimate + (critical value)(standard error of estimate)

o   margin of error = half-width of interval

§  measures amount of random sampling variability (only)

o   specifies range of plausible values for parameter based on sample statistic

·         Interpret “confidence” in your own words

o   without using the words “confidence” or “sure” or “chance” or “probability of parameter”…

·         Understand limitations, misinterpretations of confidence intervals

·         Predict how intervals are affected by changes in sample size, confidence level, population size, parameter value

·         Calculate and interpret confidence intervals for population mean, population proportion

o   Know the “validity conditions” for each method and how to check them

§  CI for m:  + ta/2,n-1s/ if SRS; n > 30 or pop normal (normal prob plot) -- t-interval

§  Prediction interval: + ta/2,n-1s if SRS; pop normal

§  CI for p:

·          + za/2, if SRS; n> 10, n(1-) > 10 -- Wald

·     (95%)  + 1.96 where  = (X+2)/(n+4), if SRS --     Agresti-Coull (aka “Plus Four”)

·         Use a bootstrap distribution to estimate a confidence interval for other statistics (e.g., median)

·         Determine necessary sample size n to ensure desired width or half-width for given confidence level

·         Know how and when to calculate a prediction interval for an individual value (recognize language asking for this)

 

Earlier material to be especially aware of:

·         Describing distributions of data numerically and graphically (and in context)

·         What is probability?

·         What is a random variable?

·         What is the expected value of a random variable? Standard deviation?

·         What is a pmf, cdf, pdf? How do I graph them? How do I express the function for all x?

·         Being able to identify the appropriate discrete probability distribution.

·         Calculating probabilities  and expected value involving famous continuous and discrete distributions (e.g., binomial, normal) and generic distributions (integration?).

·         Normal approximation to binomial, binomial approximation to hypergeometric

·         Independence            

·         Conditional probability          

·         The distinction between “data” and “model” and between a distribution of data and a probability distribution

 

SOME PROBLEM SOLVING STRATEGIES

If you are asked to determine a probability

1. Are you finding a conditional or an unconditional probability?

2. If it involves  or , can you use the central limit theorem to state the probability distribution?

3. If it involves a random variable which follows one of the common probability distributions (e.g., binomial, gamma) then use the formulas page (e.g., cdf).

Look for the phrase “approximate probability” in case one of the approximations (e.g., normal to binomial, poisson to binomial) might apply.

You may have to recognize whether the random variable belongs to a known discrete probability family yourself.

4. If it involves a random variable but you are given the pmf or pdf, determine the probability directly (summing or integrating).

5. If it does not involve a random variable, use techniques from chapter 2 (permutations, combinations, addition rule, multiplication rule). Make sure you are not applying any results without checking assumptions first (e.g., mutually exclusive, independent).

6. If you are told only a situation, you could be asked to outline or interpret a simulation to determine empirical probabilities.

 

If you are asked to determine the expected value of an expression

1. If the expression is a linear function of random variable(s), first simplify using the rules for expected value

            e.g., E(2X+3Y) = 2E(X)+3E(Y)

2. Once you get to E(rv), is the rv a sample mean () or a sample proportion ()?  If so, then E()=m = E(X) or E() = p

3. Once you get to E(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value

            e.g., if Y is a binomial random variable, E(Y) = np

4. If Y is not from a common probability distribution family, determine E(Y) or E(h(Y)) directly given the pmf (summing) or pdf (integrating)

            discrete: E(Y) = SyP(Y=y)      E(h(Y)) = Sh(y)P(Y=y)

            continuous: E(Y) = òyf(y)dy    E(h(Y)) = òh(y)f(y)dy

 

If you are asked to determine the variance of an expression

1. If the expression is a linear function of random variable(s), first simplify using the rules for variance

            e.g., V(2X+3Y) = 4V(X)+9V(Y) if X and Y are independent

2. Once you get to V(rv) is the rv a sample mean () or a sample proportion ()?  If so, then V() = s2/n = V(X)/n or V() = p(1 – p)/n

3. Once you get to V(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value

            e.g., if Y is a binomial random variable, V(Y) = np(1 – p)

4. If Y is not from a common probability distribution family, determine V(Y) directly given the pmf or pdf using V(Y)=E(Y2) - [E(Y)]2

            discrete: E(Y2) = Sy2P(Y=y)               continuous: E(Y2) = òy2f(y)dy

 

Note: the previous two sections discuss finding a “mean” or a “standard deviation.”  Remember, you could also be given a set of data and asked to use the techniques from chapter 1 to find  and s.

 

If you are asked to compute a confidence interval

0. Define the parameter in words (e.g., let p = proportion of all Cal Poly students who…)

1. Is it for a population mean or a population proportion?

            If a population mean, are you told a value of s? (if yes use z, otherwise use t)

2. Check the validity conditions to see whether our formulas are valid

3. Calculate the interval and write a one sentence summary (e.g., “I’m 95% confident that” being clear what the parameter is you are estimating)

4. Be able to interpret the phrase “confidence” in your own words if asked (“95% of intervals…”)

5. Know what factors affect the behavior of the confidence intervals (width, midpoint, coverage)

 

·         Know the “validity conditions” required by different procedures and how to check them.

·         Be able to make interpretations and explanations of your calculations

·         Remember to follow the “of,” probability “of what”?!

 

MINITAB OUTPUT YOU COULD BE EXPECTED TO INTERPRET

·         Numerical and graphical summaries of a distribution of data (e.g., histogram, interquartile range)

·         Calculate probabilities, cumulative probabilities for known probabilities distributions (e.g., binomial, normal, gamma, etc.)

·         Confidence intervals for m, p

·         Simulation, including through a macro

 

SOME GENERAL EXAM PREPARATION ADVICE:

·         Be prepared to think/explain/interpret

·         Understand rather than only memorize

·         Don’t plan to rely heavily on the notes pages

·         Reread handouts, earlier exams, the text

·         Rework examples from class, homework exercises, review problems

·         Consider the big ideas from labs

·         Make sure you can read/use Minitab output

 

Notation, Acronyms:

E(X)

expected value of the random variable X (aka m)

V(X)

variance of the random variable X (aka s2)

Z

standard normal random variable; number of standard deviations from mean (X-m)/s

m

population mean; expected value of a random variable; mean of normal distribution

s

population standard deviation; standard deviation of a random variable; SD of normal

sample mean

s

sample standard deviation

sample proportion of successes

p

population proportion of successes; probability of success

q

generic unknown parameter

estimator of generic unknown parameter value

n

sample size (number of trials, number of observations recorded)

N

population size

q

1 – p

Standard deviation of sample means SD()= s/ = SD(X)/ 

   

Mean of sample means, E() = m = E(X)

a, b

parameters of distribution, e.g., Weibull, Gamma

l

parameter of exponential, Poisson distributions

G

gamma function, see formulas page

F

cdf of standard normal distribution (we didn’t use)

rv

random variable

pmf p(x)

probability that a discrete random variable is equal to x

pdf f(x)

integrated to determine the probability for a continuous random variable over interval

cdf F(x)

cumulative distribution function, P(X<x) for continuous or discrete random variable