Stat 301 – Week 2

 

Participation Assignments

·       Daily I wonder questions

·       Week 2 survey (due Friday, Jan 17)

·       Water Usage Survey (due Feb. 3): Keep a journal of your water usage for 7 consecutive days. The variables you will keep track of include

§  How many times did you shower

§  Average shower length

§  How many baths did you take

§  How long do you leave your bathroom sink water running

§  Number of toliet flushes

§  How long do you leave your kitchen faucets running

§  Hand washing/dishwasher use

§  Car washes

§  Miles driven each day

§  How much you recycle, donate

§  Type of diet

§  Budget for pet food

After the 7 days, open the water use survey (you will need to make a copy first) and complete the survey as best you can for your current living arrangement (and indicate CA for the state you live in). Be sure to make any conversions you need before entering your values in column D (e.g., average per day, number per year). When you have completed your journal, use the “Water Survey” link in Canvas to:

(a) Upload a copy/documentation of your journal

(b) Report your result for cell F21

(c) Report any suspected data quality errors.

 

HW 2 Assignment – Due midnight, Friday, Jan 17

 

Please consider using 1.5 spacing and/or wider margins so we have space for writing comments. If you are submitting joining, you both MUST join a HW 2 group first (even if same as HW 1 group) and include both names of individuals who worked on the assignment in each file. Please use Word or PDF format only. Remember to integrate your output with your discussion.  Points will be deducted if you are missing output.

 

1) Below are some graphs of answers to questions by the first 70 Stat 301 students to complete the Initial Course Survey.

(a) One of these is a bar graph of answers to the “Mac vs. PC” question and one is the answer to the “Coke vs. Pepsi” question.  Which graph do you think is which? You will be graded on your justification more than the correctness of your matches.  (Be sure to clarify any assumptions you make e.g., about the variable, about Cal Poly, about the computer program used to make the graphs when I pasted in the data.)

A blue rectangular bar with black numbers

Description automatically generated  A graph of a bar

Description automatically generated with medium confidence

 

(b) Assuming this sample is representative of all Cal Poly students, considering 0.6857 as a statistic, identify the corresponding parameter. 

 

(c) Consider the following quantitative variables (again from the survey, but simulating #7). Identify which graph belongs to which variable in the list below. You will be graded on your justification more than the correctness of your matches.  (You can cite “process of elimination” for at most one graph but should give justifications for the others, clearly state any assumptions you make along the way. For example, you might consider whether reasonable numerical values can be placed along the horizontal axis as well as what shape you expect the distribution to have. Be sure you offer conjectures to choose between graphs of similar shape. Some of these will be pure guesses, but provide a justification for your choice based on the behavior of the graph.)

1.     Heights of students

2.     Number of siblings

3.     Number of states visited

4.     Political inclination (conservative, moderate, or liberal)

5.     Cost of last hair cut

6.     Ratings of the value of statistics on a scale of (1)-(9)

7.     Number of heads recorded when asked to toss a coin 50 times

(Make sure you are seeing the entire image!)

 

A.A graph with blue dots

Description automatically generated

B.A graph with blue dots

Description automatically generated

C.A white background with blue dots

Description automatically generated

D.A graph with blue dots

Description automatically generated

E.A blue dot pattern with dots

Description automatically generated with medium confidence

 

F.A blue dots on a white background

Description automatically generated

 

(d) Pick two of the variables you have seen in this question and suggest a possible research question that could be investigated with these data.

 

2) Dogs have a keen sense of smell. They are used for search and rescue, explosive detection, sniffing out illegal drugs in luggage at airports, and locating game while hunting. Can they also tell whether someone has COVID-19 by sniffing a specimen of sweat from a person? (example) We will look at one study on Maika, a 3-year-old female Belgian Malinois whose specialty is search and rescue. See Grandjean, Dominique, et al. Can the detection dog alert on COVID-19 positive persons by sniffing axillary sweat samples? A proof-of-concept study https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243122

Maika completed 57 trials where she would sniff four different sweat specimens, one of which was from a COVID positive person, and then sit in front of the specimen she determined to be the positive specimen. In these 57 trials, Maika correctly chose the COVID positive specimen 47 times.

 

(a) Why is it important to use new specimens each time? [Hint: How does this help us meet the binomial model?]

(b) Identify one advantage and one disadvantage to using multiple trials on the same dog rather than using different dogs.

 

(c) Report the value of the observed statistic.

(d) Define the parameter of interest and use an appropriate symbol to refer to this unknown value.

(e) State the null and alternative hypotheses for Maika, in symbols and in words.

(f) Let X refer to the number of correct identifications by Maika.  If the null hypothesis is true, what probability distribution does X follow? (Give the name of the distribution, and define the inputs of the probability distribution)

(g) Use JMP or R to calculate the exact binomial p-value for this study.  Be sure to include your output (including what information you assumed) and proper notation.

(h) According to the distribution you identified in (f), what are the expected value (mean) and standard deviation of X, assuming the null hypothesis is true? Show your work.

(i) Using your values from (h), how many standard deviations is Maika’s result from the expected number of successes assuming the null hypothesis is true?  Do you consider this value to be convincing evidence against the null hypothesis?  Justify your answer.  Is your answer consistent with (g)?  Explain.

 

(j) Now use JMP or R to calculate the p-value for assessing whether there is convincing evidence that Maika’s probability of successfully identifying the correct specimen is larger than 0.70.  (Include your output.)

(k) How do the p-values in (g) and (j) compare (which is larger)? Why?

 

(l) Calculate a two-sided p-value to assess whether there is convincing evidence that Maika’s probability of successfully identifying the correct specimen differs from 0.70. (Include your output.)

(m) How do the p-values in (j) and (l) compare (which is larger)? Why?

 

(n) Using trial-and-error find the smallest plausible value for the probability of Maika correctly identifying the Covid sample? (Hint: Use 0.05 as the cut-off for deciding whether the p-value is small. Include justification. Use 3 decimal places. Feel free to switch to applet for this one!)

 

Example Extension Assignments

·       Dogs’ ability to sniff out disease has also been used for many other diseases as well. Why do you think this method isn’t used more regularly for disease diagnosis? (See this Ask Marilyn column https://parade.com/929476/marilynvossavant/how-well-can-dogs-detect-cancer/ )

·       Animals have been used to psychically predict the outcomes of sporting events (e.g., Paul the Octopus, World Cup).  Find another study of an animal used to predict outcomes of sporting events.  Discuss the results.

·       One of the first legal cases that relied heavily on statistical analysis was Castaneda v. Partida (Mexican-Americans were underrepresented in grand jury selection). Many employment discrimination cases have since relied heavily on this landmark case. But others have arugment that use of the binomial distribution is very quesionnable in such cases reference.  Summarize the debate. What do you think?

·       Look into some of the controversy and debate over confidence intervals for a single proportion!

·       Look into some of the controversy and debate over the use of p-value in scientific studies! (e.g., ASA's Statement on p-values)

·     Watch this inappropriate critique of scientific studies by John Oliver (19:27), identify 2-3 points you agree with and 2-3 points you don’t agree with

·     Watch this wonderful commentary on the Census from the old Daily Show (4:03). How would you explain statistical sampling?