§ How
many times did you shower
§ Average
shower length
§ How
many baths did you take
§ How
long do you leave your bathroom sink water running
§ Number
of toliet flushes
§ How
long do you leave your kitchen faucets running
§ Hand
washing/dishwasher use
§ Car
washes
§ Miles
driven each day
§ How
much you recycle, donate
§ Type
of diet
§ Budget
for pet food
After
the 7 days, open the water use survey
(you will need to make a copy first) and complete the survey as best you can
for your current living arrangement (and indicate CA for the state you live
in). Be sure to make any conversions you need before entering your values in
column D (e.g., average per day, number per year). When you have completed
your journal, use the “Water Survey” link in
Canvas to:
(a)
Upload a copy/documentation of your journal
(b)
Report your result for cell F21
(c)
Report any suspected data quality errors.
Please
consider using 1.5 spacing and/or wider margins so we have space for writing
comments. If you are submitting
joining, you both MUST join a HW 2 group first (even if same as HW 1 group) and
include both names of individuals who worked on the assignment in each file. Please
use Word or PDF format only. Remember
to integrate your output with your discussion.
Points will be deducted if you are missing output.
1)
Below
are some graphs of answers to questions
by the first 70 Stat 301 students to complete the Initial Course Survey.
(a)
One of these is a bar graph of answers to the “Mac vs. PC” question and one is
the answer to the “Coke vs. Pepsi” question.
Which graph do you think is which? You will be graded on your
justification more than the correctness of your matches. (Be sure to clarify any assumptions you make
e.g., about the variable, about Cal Poly, about the computer program used to
make the graphs when I pasted in the data.)
(b)
Assuming this sample is representative of all Cal Poly students, considering
0.6857 as a statistic, identify the corresponding parameter.
(c)
Consider the following quantitative variables (again from the survey, but
simulating #7). Identify which graph belongs to which variable in the list
below. You will be graded on your justification more than the correctness of
your matches. (You can cite “process of
elimination” for at most one graph but should give justifications for the
others, clearly state any assumptions you make along the way. For example, you
might consider whether reasonable numerical values can be placed along the
horizontal axis as well as what shape you expect the distribution to have. Be
sure you offer conjectures to choose between graphs of similar shape. Some of
these will be pure guesses, but provide a justification for your choice based
on the behavior of the graph.)
1.
Heights
of students
2.
Number
of siblings
3.
Number
of states visited
4.
Political
inclination (conservative, moderate, or liberal)
5.
Cost of
last hair cut
6.
Ratings
of the value of statistics on a scale of (1)-(9)
7.
Number
of heads recorded when asked to toss a coin 50 times
(Make sure you are seeing the entire image!)
A.
B.
C.
D.
E.
F.
(d) Pick two of
the variables you have seen in this question and suggest a possible research
question that could be investigated with these data.
2) Dogs have a keen sense of smell. They are used for search and rescue, explosive detection, sniffing out illegal drugs in luggage at airports, and locating game while hunting. Can they also tell whether someone has COVID-19 by sniffing a specimen of sweat from a person? (example) We will look at one study on Maika, a 3-year-old female Belgian Malinois whose specialty is search and rescue. See Grandjean, Dominique, et al. Can the detection dog alert on COVID-19 positive persons by sniffing axillary sweat samples? A proof-of-concept study https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243122
Maika completed 57
trials where she would sniff four different sweat specimens, one of which was
from a COVID positive person, and then sit in front of the specimen she
determined to be the positive specimen. In these 57 trials, Maika correctly
chose the COVID positive specimen 47 times. |
|
(a)
Why is it important to use new specimens each time? [Hint: How does this
help us meet the binomial model?]
(b)
Identify one advantage and one disadvantage to using multiple
trials on the same dog rather than using different dogs.
(c) Report the value of
the observed statistic.
(d) Define the
parameter of interest and use an appropriate symbol to refer to this unknown
value.
(e) State the null and
alternative hypotheses for Maika, in symbols and in words.
(f) Let X refer
to the number of correct identifications by Maika. If the null hypothesis is true, what
probability distribution does X follow? (Give the name of the
distribution, and define the inputs of the probability distribution)
(g) Use JMP or R to calculate the
exact binomial p-value for this study.
Be sure to include your output (including what information you assumed)
and proper notation.
(h) According to the
distribution you identified in (f), what are the expected value (mean) and
standard deviation of X, assuming the null hypothesis is true? Show your
work.
(i) Using your values
from (h), how many standard deviations is Maika’s result from the expected
number of successes assuming the null hypothesis is true? Do you consider this value to be convincing
evidence against the null hypothesis?
Justify your answer. Is your
answer consistent with (g)? Explain.
(j) Now use JMP or R to
calculate the p-value for assessing whether there is convincing evidence that
Maika’s probability of successfully identifying the correct specimen is larger
than 0.70. (Include your output.)
(k) How do the p-values
in (g) and (j) compare (which is larger)? Why?
(l) Calculate a
two-sided p-value to assess whether there is convincing evidence that Maika’s
probability of successfully identifying the correct specimen differs from 0.70.
(Include your output.)
(m) How do the p-values
in (j) and (l) compare (which is larger)? Why?
(n) Using
trial-and-error find the smallest plausible value for the probability of
Maika correctly identifying the Covid sample? (Hint: Use 0.05 as the
cut-off for deciding whether the p-value is small. Include justification. Use 3 decimal places. Feel
free to switch to applet for this one!)
Example Extension
Assignments
·
Dogs’ ability to sniff out disease has also been used for many
other diseases as well. Why do you think this method isn’t used more regularly
for disease diagnosis? (See this Ask Marilyn column https://parade.com/929476/marilynvossavant/how-well-can-dogs-detect-cancer/
)
·
Animals have been used to psychically predict the outcomes of
sporting events (e.g., Paul
the Octopus, World Cup). Find
another study of an animal used to predict outcomes of sporting events. Discuss the results.
·
One of the first legal cases that relied heavily on
statistical analysis was Castaneda v.
Partida (Mexican-Americans were underrepresented in grand jury selection).
Many employment discrimination cases have since relied heavily on this landmark
case. But others have arugment that use of the binomial distribution is very
quesionnable in such cases reference. Summarize the debate. What do you think?
·
Look into some of the controversy and debate over confidence
intervals for a single proportion!
·
Look into some of the controversy and debate over the use of
p-value in scientific studies! (e.g., ASA's Statement on p-values)
· Watch this inappropriate
critique of scientific studies by John Oliver (19:27), identify 2-3 points you agree
with and 2-3 points you don’t agree with
· Watch this wonderful
commentary on the Census from the old Daily Show (4:03). How would you explain statistical sampling?