*Please
use 1.5 spacing and/or wider margins so we have space for writing comments. Include (both) names of individuals who
worked on the assignment in each file. Please use Word or PDF format
only. Remember to integrate your output
with your discussion. Points will be
deducted if you are missing output.*

**1)** One
of the first legal cases that relied heavily on statistical analysis was *Castaneda v. Partida*. Many employment
discrimination cases have since relied heavily on this landmark case. One of the main considerations is what
comparison to make. The Castaneda case involved a claim of underrepresentation
of Mexican-Americans in grand jury selection.
Over an 11-year period, 870 people had been summoned to serve on grand jury in
Hidalgo County, but only 338 had been Mexican-American. The Court held that because about 79.1% of
the population of Hidalgo County were Mexican-American, then about 79.1% of
those summoned to serve as grand jurors should be Mexican-American. In other words, if jury selection had been a
fair, random process during this 11-year period, then = 0.791.
(Think of the jury selection process as a
random process of which we have observed one instance in time.)

(a)
Define the observational units and variable in this study. Which outcome is
being considered success?

(b)
Define the parameter of interest, in words.

(c)
Report the value of the observed statistic.

(d)
State the null and alternative hypotheses in symbols and in words. Is your
alternative hypothesis, one-sided or two-sided? Justify your choice.

(e)
Produce (and include) a graph of the binomial distribution for the null
hypothesis (using R or JMP or the One Proportion Inference applet. If the applet, you can skip drawing samples
and proceed directly to checking the Exact Binomial box.) Also determine (by hand or with applet) the
theoretical expected value (mean) and standard deviation for this binomial
distribution.

Rather
than calculating the binomial probability, many court cases have adopted the
general rule: "[I]f the difference between the expected value and the
observed number is greater than two or three standard deviations, then the
hypothesis that the ... drawing was random would be suspect to a social
scientist."

(f)
Use these mean and standard deviation values to determine how many standard
deviations 338 is from the hypothesized mean. Is this number consistent with
your graph? Explain.

(g)
Based on your answer to (f), do you find convincing evidence to reject the null
hypothesis in favor of your alternative hypothesis? (Justify your answer.)

(h)
(opinion) Why do you think many court cases have adopted the 2 or 3 SD rule
rather than calculating a p-value?

The above analysis assumes the random process follows the
binomial distribution. This is especially questionable for employment
discrimination cases. Why is that?
(Consider reviewing this reference.)

**2)** Dogs have a keen sense of smell. They
are used for search and rescue, explosive detection, sniffing out illegal drugs
in luggage at airports, and locating game while hunting. Can they also tell
whether someone has COVID-19 by sniffing a specimen of sweat from a person? (example)
We will look at one study on Maika, a 3-year-old female Belgian Malinois whose
specialty is search and rescue. See Grandjean, Dominique, et al. Can
the detection dog alert on COVID-19 positive persons by sniffing axillary sweat
samples? A proof-of-concept study https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243122

Maika completed 57
trials where she would sniff four different sweat specimens, one of which was
from a COVID positive person, and then sit in front of the specimen she
determined to be the positive specimen. In these 57 trials, Maika chose the
COVID positive specimen 47 times.

(a) Use JMP or R to carry
out a binomial test (define parameter, state hypotheses, calculate exact
p-value, evaluate size of p-value, state conclusion in context) to determine
whether Maika’s probability is larger than we would expect by chance alone.
(Include relevant output.)

(b) Calculate a *two-sided
binomial p-value* to decide whether 0.50 is a plausible value for Maika.
Include relevant output.

In JMP, I recommend the
Distribution Calculator here.

(c) Calculate a *two-sided binomial p-value* to decide
whether 0.90 is a plausible value for Maika. Include relevant output.

(d) Which of the three
hypothesized probabilities that you have tested here produces the largest
p-value? Explain why that makes sense intuitively.

(e) Using
trial-and-error (document), what do you think is the smallest plausible value for the probability of Maika correctly
identifying the covid sample? (*Hint*: Use 0.05 as the cut-off for
deciding the p-value is small.)

(f) Using trial-and-error
(document), what do you think is the largest plausible
value for the probability of Maika correctly
identifying the covid sample?

(g) (opinion) Are you
willing to generalize these results to all dogs? Explain why or why not.

Dogs’ ability to sniff out disease has also been used for
many other diseases as well. Why do you think this method isn’t used more
regularly for disease diagnosis? (See this Ask Marilyn column https://parade.com/929476/marilynvossavant/how-well-can-dogs-detect-cancer/
)