### Due midnight, Friday, Jan. 20

Please use 1.5 spacing and/or wider margins so we have space for writing comments. Include (both) names of individuals who worked on the assignment in each file. Please use Word or PDF format only. Remember to integrate your output with your discussion.  Points will be deducted if you are missing output.

1) One of the first legal cases that relied heavily on statistical analysis was Castaneda v. Partida. Many employment discrimination cases have since relied heavily on this landmark case.  One of the main considerations is what comparison to make. The Castaneda case involved a claim of underrepresentation of Mexican-Americans in grand jury selection. Over an 11-year period, 870 people had been summoned to serve on grand jury in Hidalgo County, but only 338 had been Mexican-American.  The Court held that because about 79.1% of the population of Hidalgo County were Mexican-American, then about 79.1% of those summoned to serve as grand jurors should be Mexican-American.  In other words, if jury selection had been a fair, random process during this 11-year period, then  = 0.791.  (Think of the jury selection process as a random process of which we have observed one instance in time.)

(a) Define the observational units and variable in this study. Which outcome is being considered success?

(b) Define the parameter of interest,  in words.

(c) Report the value of the observed statistic.

(d) State the null and alternative hypotheses in symbols and in words. Is your alternative hypothesis, one-sided or two-sided? Justify your choice.

(e) Produce (and include) a graph of the binomial distribution for the null hypothesis (using R or JMP or the One Proportion Inference applet.  If the applet, you can skip drawing samples and proceed directly to checking the Exact Binomial box.)  Also determine (by hand or with applet) the theoretical expected value (mean) and standard deviation for this binomial distribution.

Rather than calculating the binomial probability, many court cases have adopted the general rule: "[I]f the difference between the expected value and the observed number is greater than two or three standard deviations, then the hypothesis that the ... drawing was random would be suspect to a social scientist."

(f) Use these mean and standard deviation values to determine how many standard deviations 338 is from the hypothesized mean. Is this number consistent with your graph? Explain.

(h) (opinion) Why do you think many court cases have adopted the 2 or 3 SD rule rather than calculating a p-value?

The above analysis assumes the random process follows the binomial distribution. This is especially questionable for employment discrimination cases. Why is that?  (Consider reviewing this reference.)

2) Dogs have a keen sense of smell. They are used for search and rescue, explosive detection, sniffing out illegal drugs in luggage at airports, and locating game while hunting. Can they also tell whether someone has COVID-19 by sniffing a specimen of sweat from a person? (example) We will look at one study on Maika, a 3-year-old female Belgian Malinois whose specialty is search and rescue. See Grandjean, Dominique, et al. Can the detection dog alert on COVID-19 positive persons by sniffing axillary sweat samples? A proof-of-concept study https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0243122

Maika completed 57 trials where she would sniff four different sweat specimens, one of which was from a COVID positive person, and then sit in front of the specimen she determined to be the positive specimen. In these 57 trials, Maika chose the COVID positive specimen 47 times.

(a) Use JMP or R to carry out a binomial test (define parameter, state hypotheses, calculate exact p-value, evaluate size of p-value, state conclusion in context) to determine whether Maika’s probability is larger than we would expect by chance alone. (Include relevant output.)

(b) Calculate a two-sided binomial p-value to decide whether 0.50 is a plausible value for Maika. Include relevant output.

In JMP, I recommend the Distribution Calculator here.

(c) Calculate a two-sided binomial p-value to decide whether 0.90 is a plausible value for Maika. Include relevant output.

(d) Which of the three hypothesized probabilities that you have tested here produces the largest p-value? Explain why that makes sense intuitively.

(e) Using trial-and-error (document), what do you think is the smallest plausible value for the probability of Maika correctly identifying the covid sample? (Hint: Use 0.05 as the cut-off for deciding the p-value is small.)

(f) Using trial-and-error (document), what do you think is the largest plausible value for the probability of Maika correctly identifying the covid sample?

(g) (opinion) Are you willing to generalize these results to all dogs? Explain why or why not.

Dogs’ ability to sniff out disease has also been used for many other diseases as well. Why do you think this method isn’t used more regularly for disease diagnosis? (See this Ask Marilyn column