Stat 301 – Final Review

Stat 301 – Final Review

Optional Review Sessions: Sunday 5pm (Zoom), Tuesday 4pm (Zoom)

Final Exam: 12pm section Monday 10:10-1pm, 1pm section Wednesday 1:10-4pm

Format: You are allowed 3 pages of notes. If I want you to use a datafile, I will post it as a .txt file for easy loading. The computer questions won’t be a huge component, but I do want some demonstration that you can carry out an analysis.

The exam will be more of a “performance assessment.” Question types could include

- Design a study to answer a particular research question

- Explore data and how they agree/don’t agree with a particular research conjecture, suggest steps for “cleaning” the data or next steps in the analysis

- Use R and/or and applet to analyze data to answer a particular research question

- Summarize conclusions to a study based on study description, provided output, justifying why you can or cannot do certain things (e.g., significance, estimation, generalize, causation)

- Develop methodology for a new type of comparison or statistic

You will be expected to apply the knowledge you have gained this quarter. Because this may involve more writing, you will be asked to type in your responses, but can also write them out if you prefer. You will have access to R and applets, but if you aren’t sure how to carry out a step in either software, I will likely be able to help you (e.g., I want to do x, but it’s not working). It will be important for you to document any steps you use the computer form (e.g., tell me what R command or what applet you used, what inputs you used, etc.). You will have the option to screen capture and upload output.

The final exam will be cumulative, including a comparison of methods across chapters. You should focus on the entire statistical process: How do we design a study to achieve particular research goals? How do we describe the data we have? How do we test claims about population parameters or processes and/or estimate parameters? How do we state our final conclusions, especially after considering how the study was conducted? You should also think about the reasoning behind the statistical methods, such as standardizing, significance, and confidence in general.

Advice: Understand and be able to apply the procedures first, then worry about more subtle issues and review the how and why behind the development of the procedure. Also, for all the procedures, know what must be done by hand and what can be done on the computer. Review the Case Study (trying the questions from scratch) as well as the “final exam multiple choice practice” in Canvas.

The focus of Chapter 4 was a quantitative response variable.

From Section 4.1 you should know that

· You can create an “exact randomization” distribution if you list all possible random assignments to the two groups, identify an appropriate statistic (formula) and calculate the statistic for each possible random assignment, count how many of the random assignments give a statistic at least as extreme as the one you observed, and evaluate the p-value.

From Section 4.2 you should know how to

· Create numerical and graphical summaries comparing two groups with a quantitative variable

o Compare distributions in terms of shape, center, and spread (citing appropriate numerical evidence when available), unusual observations

· Including what is meant by “variability” in a distribution and different ways it can be measured

· Possibly explain outliers (know how they can be identified)

o Make sure your comments are always in context (including measurement units)

· Predict the behavior of the sampling distribution of the differences in two sample means from random sampling

(Sampling from two populations applet; see also population models version?) Including whether the sampling distribution of the differences in sample means should be normal or approximately normal

o Reasoning behind the standard error formula (adding variances), should make sense to you that the SE of the difference is larger than the individual variances

o Reasoning behind the t-distribution for the standardized statistic (additional uncertainly from estimating standard deviations as well as means)

· Carry out a two-sample t-test for the difference in population means using technology (R, TBI applet)

o State null and alternative hypotheses (in symbols and in words)

o Assess validity of the procedure

o Raw data vs. Summary statistics, Stacked vs. Unstacked data

o Interpret results, including interpretations of test statistic and p-value (in terms of random sampling)

· Factors that influence p-value and confidence interval midpoint and width

· Determine and interpret a two-sample t-confidence interval for the difference in population means

o Make sure confidence level, mean, variable (context), and “direction” are clear in interpretation

From Section 4.3 you should be able to

· Set up and explain the reasoning process behind simulating a randomization test (Comparing Groups – Quantitative applet)

o How models the null hypothesis being true/random assignment in the study

o How to simulate the null distribution (random assignment vs. random sampling)

o How to use the generated null distribution to calculate an empirical p-value (one or two-sided)

o Provide a detailed interpretation of what the p-value is measuring in the context of the research study (and in terms of random assignment)

o Make a decision about the null hypothesis based on the p-value

o Be able to do this for difference in sample means, difference in sample medians, or other sample statistics

· Summarize (and justify) study conclusions in terms of significance, estimation (=confidence interval), causation, and generalizability

· Check the technical conditions to assess the validity of the t procedures

· Carry out two-sample t-tests for comparing two long-run treatment means using technology (R, TBI applet)

o Consider alternatives if technical conditions are not met (e.g., simulation, transformations)

o Be able to test hypothesized differences other than zero

· Calculate (using R, TBI applet) and interpret two-sample t-confidence intervals for the difference in long-run treatment means

o Explore a data transformation for improving the validity of a t-procedure

o Interpret a confidence interval after a log transformation in terms of the original units (multiplicative change in median)

· Explain the effects of the difference in sample means, sample sizes, and within group variability on the p-value, confidence interval, and power

From Section 4.4 you should be able to:

· Distinguish between matched pairs designs and two-independent “samples” (or completely randomized) designs

o Consider benefits and disadvantages (including feasibility) of the two types of designs

o Be able to describe how to set up a matched-pairs design, including randomizing the order of the treatments if repeated observations or within two similar experimental units in pairs

· Determine whether a data collection plan necessitates a matched pairs analysis

· Create numerical and graphical summaries comparing matched pairs data by calculating and examining the differences

· Carry out and interpret a matched pairs test simulation

o Logic behind it (exchangeability of the two responses within each pair)

o Interpreting the results

· Use technology (R, TBI applet on the differences, Matched Pairs applet) to carry out and interpret a matched pairs t-test

· Use technology (R, TBI applet, Matched Pairs applet) to calculate and interpret a matched pairs t-interval

· Use technology to carry out and interpret a sign test with paired data

o Exact binomial and/or normal approximation

· Set up a two-way table with paired categorical data (the two treatments are the two variables = row/column variables)

· Use technology (R, One Proportion applet) to carry out and interpret McNemar’s test

o Exact binomial and/or normal approximation

Other Notes

· Be able to distinguish the standard deviation of the sample, the pooled standard deviation for two samples (combining the groups together to estimate one SD), and the standard deviation of the distribution of the difference in sample means (the pooled and unpooled versions), ~~the version with the “correlation” between the two sets of measurements~~

· Keep in mind that a confidence interval doesn’t really give you “additional evidence” (against the null hypothesis) but it’s a different way of presenting the same evidence

o Keep in mind that we never can use the procedures in this course to establish evidence for the null hypothesis (“absence of evidence is not evidence of absence”)

· We can easily test non-zero values for the hypothesized difference in means

o How to do this with a t-test

o How to represent this in a simulation (e.g., need to remove the treatment effect, then groups should be equivalent so shuffle, and then add the treatment effect back)

· Be able to distinguish/explain bootstrapping vs. sampling from a finite (hypothetical) population

o Key goal is estimating the sample to sample variation in the statistic

o Can compare to theoretical results, but latter is only available for certain statistics

· Can use bootstrapping to compare two samples

o Bootstrapping from each sample separately vs. pooling together the samples first

Keep in mind

· When comparing distributions, remember to cite your evidence if you think there is a difference in the groups. In particular, tell me what you see in the summary statistics (e.g., a higher mean) that leads to your conclusion (e.g., sleep deprived subjects tend to have lower improvements)

o Remember that the confidence level refers to the reliability of the method – how often, in the long run, random samples will produce an interval that succeeds in capturing the population parameter

· Remember to think about/decipher the direction of subtraction used by the technology

· Why is variability in the data an important consideration and how can we reduce it? What are advantages and disadvantages to doing so?

· We can use a two-sample t-test even when the sample sizes are small if we have reason to believe the population distributions are themselves normally distributed. You can try to judge this, especially if you don’t have past experience with the variable, based on graphs of the sample data. If the sample data looks plausibly normally distributed (normal probability plots are a useful tool for helping this judgment), you can cite this as evidence that the population distribution is normally distributed. If you aren’t sure, then use an alternative analysis (e.g., simulation-based) instead.

· Keep in mind the two-sample t-test only compares the two means (vs. other aspects of the distributions or other measures of center)

· Try to avoid the word “accurate” without explaining exactly what you mean by it.

· Try to avoid use of the word “group” but clarify if you mean the sample or the population or the treatment in general

· Avoid use of the word “it”

The Cumulative Component (also see old Review handouts)

Things to remember include:

· Identifying observational units and defining variables, samples vs. populations vs. null (sampling/randomization) distributions, parameters vs. statistics, explanatory vs. response variable, bias vs. precision, random assignment vs. random sampling (including goals)

· Experiments vs. Observational Studies

o How to design a randomized experiment, How to properly select a sample

o Scope of conclusions depending on how study was conducted (Can you draw a cause and effect conclusion? Can you generalize to a larger population?)

o Sampling errors, nonsampling errors, and random sample errors (and which of these are measured by the “margin of error”?)

· Describing and comparing distributions of data

o Categorical: segmented bar graphs, conditional percentages, difference in proportions vs. relative risk vs. odds ratio (and how to interpret)

o Quantitative: shape, center, and spread, stemplots, boxplots, histograms, dotplots, resistance of median and IQR

o When describing distributions, if you have access to numerical summaries, use them to support your claims

· How to interpret probability as a long-run relative frequency

· How to carry out a test of significance

o About a population proportion and/or population mean and/or treatment effect and/or difference in population proportions and/or difference in population means

· Make sure you can state Ho and Ha in symbols and in words

o One-sided vs. two-sided alternatives

o Which technical conditions apply and how to check them and what they tell you

· e.g., proportions: n > 10 and n(1-) > 10, means: n > 30 or normal population

o Interpretation of test statistic (if appropriate)

· General form: (statistic-hypothesized)/(standard error of statistic)

o Ideas and distinctions of sampling distribution and randomization distribution

o How to calculate and/or approximate p-value

o How to make a decision based on the p-value and level of significance

o How to interpret the p-value

· Source of randomness, choice of statistic, observed result, direction, null hypothesis

o Factors that affect the size of the p-value

o ~~Defining (and stating the consequences of) Type I and Type II Errors in context (including direction of Ha)~~

o ~~How to determine the probabilities of a Type I Error and of a Type II Error and Power~~

· ~~Type II/Power is for a particular instance of alternative hypothesis~~

o Explain the idea of Power and what things might effect the power of a study/analysis (e.g., sample size(s), type of study design)

· How to calculate and interpret a confidence interval

o General form: observed statistic ± (critical value)×(standard error of statistic)

· Note, sometimes we think of “statistic” as a single number and some times as a formula…

o Interpretation: level, parameter, context (with differences/ratios, include “direction”)

· Clarify larger population/process

o Interpret confidence “level” (separate from interpreting interval)

o How to solve for the sample size necessary to obtain a specific margin of error for a stated confidence level

· Duality between intervals and tests: Any parameter value not contained in a C% CI will be rejected by a two-sided test at (100-C)/100 significance level

· Describe the difference between statistical significance and practical significance (e.g., is it a meaningful difference in context)

· Calculating p-values for Fisher’s Exact Test and/or binomial process (when ok to do)

· How to decide which procedure you should use (quantitative or categorical data, one or two populations, Fisher’s Exact Test vs. binomial vs. normal vs. t)

· For validity of “theory-based” procedures, I tend to worry less about the randomness condition and more about the sample size condition. The randomness condition is more important to scope of conclusions.

· ~~Be able to get t* and z* values using technology~~ use 2 as the approximate multiplier with 95% confidence

· Make sure you know where ()s go in prediction interval standard error

See summary tables (including on technology) and end of chapter examples

Some big picture stuff

What is Statistical Inference?

The population parameter and the sample statistic summarize the same variable. The population parameter summarizes the variable for the population, which is what we want to know, e.g. or ₁-₂. However, we can’t observe the whole population so we don’t know what the parameter value really is. However, we can measure the variable on a sample or randomized groups and compute a sample statistic, e.g. or ₁ – ₂. The question is what can we infer about the parameter based on this statistic? Because these statistics follow a null distribution/regular pattern due to the randomness in the study design, we can estimate or calculate probabilities of different values of the statistic occurring. Different statistics follow different distributions, but once we know which distribution we should use, we can make conclusions about the value of the parameter, e.g., it’s in some interval or we have evidence that it is not a particular value.

Null Distributions

If we specify a value for the population parameter, we can take (or simulate) lots of samples from this population or lots of random assignments and calculate a statistic for each sample/random assignment. This allows us to examine the behavior of the statistic so we can discuss the shape, center, and variability of this “null” distribution. For example, what types of values do we expect the statistic to have, how far away might the statistic stray from the hypothesized value of the parameter?

Simulation vs. “Large sample” (Theory-Based) procedures

In almost every case we have seen two different ways to approximate the p-value: simulation and a mathematical model. We are considering the simulation approaches to always be valid. The mathematical models are only appropriate if the “validity conditions” of the theory-based approach are met. The advantage of the mathematical models is we can easily get a confidence interval as well. So you may want to consider the mathematical model way first but then if the validity conditions aren’t met, use the simulation approach.

Confidence Intervals Estimate population parameter

The goal of a confidence interval is to get a range of plausible values that we think the population parameter could be equal to. To do this, we use the sample statistic and a measure of the sampling (or shuffle to shuffle) variability of the sample statistic. This lets us form an interval around the sample statistic that should contain the population parameter. Note, we are trying to contain the population parameter in the interval, not the data and not the sample statistic. In fact, the sample statistic better be the midpoint (center) of the interval.

Tests of Significance Test claim about population parameter

The goal of a test of significance is to make a decision about the population parameter. Here are the steps we use:

1) Define the parameter(s) of interest. (Should also be able to define the OUs and variable)

2) Specify the hypotheses (e.g. H₀: =1/3, =50, ₁- ₂=0, or no relationshipbetween variable 1 and variable 2 in population)

Always in terms of the population (parameters) because that’s what is unknown and what we are trying to make statements about (take off the hats!)

The null hypothesis is the “dull hypothesis” or the “ho-hum hypothesis”

The alternative hypothesis specifies something interesting (“a-ha!”)

One or two-sided (decide based on wording of research question)

3) Check the validity conditions, sketch the null distribution of the (test) statistic assuming H₀ is true, and identify the appropriate test procedure by name

If the validity conditions are not met, use a randomization-based (simulation) method instead

4) Compare the data observed in the sample to what’s “expected” from H₀. Find the p-value=probability of observing a value of the statistic as extreme or more extreme when H₀ is true.

Know how to get the computer to give you the appropriate one or two-sided p-value

5) Draw a conclusion in context

Decide to reject or fail to reject H_o

Reject if p-value, synonymous with saying result is “statistically significant”

Make conclusion about research question of interest (back to English)

If we repeatedly took different samples or random shuffles and calculated the value of the statistic for each sample, the p-value indicates how often we would expect to see the statistic value that we actually did observe, or one more extreme, when H_o is true. If the statistic value is very unlikely (so small p-value) we stop believing H₀ (recall the loaded dice example). We can compare to the significance level as a benchmark to decide whether the p-value is “too small.”

T vs Z With proportions, our observations consist of “yeses” and “nos” for each observational unit in the population. A picture of this population is simply a bar graph. In particular, we don’t worry about its shape or variability. We will always consider approximating the null distribution of the sample proportions with the normal distribution. Thus, we never worry about using the t distribution with proportions. With means, the t distribution is used to take into account the extra variation we will see in the null distribution if we also substitute the sample standard deviation, s, into the equation. The key is that both a z-statistic and a t-statistic have “standardized” our observed statistic onto a comparable scale.

Population vs Variable vs Parameter A population is a group of objects, a variable is what we measure about the objects, a question (e.g., height). The observational units are the objects we measure (e.g., buildings, volleyball players). You need to be able to decide how many populations you have and how many variables, e.g., are you measuring two different things/answering two different questions about the objects (e.g., height and age); are you measuring the same thing on two different groups (e.g. heights of men and heights of women). Parameters are numbers, we just may not know their exact numerical value, that describe the population (e.g., the average height of all buildings, the average age of all volleyball players).

Independence/Matched Pairs We can also assess the “independence” between samples to justify a two-sample procedure. This is not the same as the independence between variables. Instead we are making sure the responses of one group are not influencing or related to the responses in the other group. If they are, then a better analysis is to take that dependence into account (e.g., a “paired t” procedure).

Independence/Association First, remember that we talk about independence/association between two variables. We don’t talk about the outcomes of the variables or levels of the variables, but the entire variable. Two variables are associated if they are related to each other, that is, if knowledge of one gives us information about the other.

Other notes:

· With bar graphs, always use percentages as the vertical scale (instead of just number of)

· Remember, compare two or more populations OR examine the association between two variables.

· A p-value is not the probability of a null hypothesis or a conclusion being true.

· Remember your p-value allows you to make a conclusion about whether there is evidence again H₀ or not. We can’t say “there is strong evidence of no association” because we assumed no association in the calculations/simulation. So all we can say is “there is not strong evidence on an association.”

· Make the link between your p-value and your decision explicit. Don’t forget to then make a conclusion in context.

Question Translations If the question asks you to

describe/compare the distribution(s) of a categorical variable	Look at (conditional) proportions
describe/compare the distribution(s) of a quantitative variable	Shape, center, spread, unusual observations Use mean, median, SD, IQR if available
comment on “statistical significance” or “strength of evidence”	Consider the p-value
estimate “how large the difference is” or “plausible values for the parameter”	Consider the confidence interval
comment on generalizability	Consider the data collection methods and specify a reasonable population
comment on causation	Consider whether you have a randomized experiment and statistical significance
describe a confounding variable	Specify a variable and argue how it might differ between the explanatory variable groups and relate to the response variable
describe a parameter	Specify the number (e.g., mean or proportion or slope), the variable (e.g., how you are defining success), and the population (don’t worry too much at this point about whether it’s a reasonable population)
interpret a p-value	Begin the sentence “the probability that…” or “the proportion of …” and put your answer in context of the problem (e.g., what source of randomness are we are modelling? what statistic are you talking about, what value was observed, what the null hypothesis specified in context, what do you mean by “or more extreme”)
interpret a confidence interval	Begin the sentence “I am 95% confident that <<parameter>> is in (XX, XX)”. Clarify parameter, population, context If an interval about a difference, clarify which population parameter has a higher value: I’m 95% confident that <<>> is XX to XX (times) (larger, smaller) than <<>>
Interpret the confidence level	Talk about the reliability of the method, if you repeated the process for different samples, what percentage of the resulting intervals would succeed in capturing the parameter
draw a conclusion from a p-value (evaluate a p-value)	Comment on whether the p-value should be considered small, reject or fail to reject the null hypothesis, and restate the conclusion you are going with in the study context
calculate a confidence interval by hand	Use 2SD short-cut
state hypotheses	Probably want both null and alternative. Could ask for you to do this in words and/or in symbols. Make sure you are clearly talking about the population parameter and in context
identify the procedure	Name the test you would use (e.g., one proportion). Also be prepared to describe a simulation process you could use to estimate a p-value (e.g., flip a coin X times, shuffle X blue and X green poker chips X times)
comment on validity conditions	Consider the sample size condition for the relevant procedure as on the Overview of Statistical Procedures handout

If the question asks you to calculate a p-value

	Simulation	Exact	Theory-based
One proportion	One proportion applet	Random sampling: Binomial distribution (Or hypergeometric if sampling from a finite population)	one-sample z-test (need at least 10 successes and at least 10 failures) – One proportion or TBI applets
One mean	Random sampling: not really, need a population to sample from where the null hypothesis is true (Sampling from Finite Populations applet), could use bootstrapping	Not unless could list out each possible random sample from the population and calculate the statistic for each	one-sample t-test (need n > 30 or symmetric population) – TBI applet or JMP or R
Two proportions	Random sampling: independent random samples from binomial process Random assignment: Analyzing two-way tables applet	Random sampling: no (but can fix both margins and approximate with Fisher’s) Random assignment: Fisher’s Exact Test (hypergeometric distribution)	two-sample z-test (need at least 5 successes and at least 5 failures in each group) – Analyzing two-way tables applet or TBI applet or JMP or R
Two means	Random sampling: not really, need populations to sample from that have the same population mean, could use bootstrapping or Two Populations applet Random assignment: Comparing Groups (Quant) applet	Random sampling: no Random assignment: not really, probably too many different random assignments to list out/ find the statistic for each	two-sample t-test (need both n’s > 20 or both populations normal) –TBI applet or JMP or R (can also use Comparing Groups (Quant) applet
Matched pairs	Quantitative: Matched pairs applet Categorical: sampling from binomial with n = number of differing responses	Not really Exact binomial (“sign test”)	One-sample t-test on differences (need at least 30 differences or normality of differences) z-test for binomial (need at least 10 successes and failures)

You should also be able to answer questions like – how would this (e.g., p-value, margin of error, conclusion) change if you did this (e.g., change sample size, change hypothesized value, confidence level), as well as more conceptually-based questions (e.g., what does it all mean, explain the reasoning, what is this number measuring).