---
title: "Stat 301 - HW 5, Problem 2"
author: name and section
output: word_document
---
**2) I asked both sections to the complete a water usage journal and then submit the results from the water usage calculator. The results for 68 students are here. Define μ as the population mean water usage across all California college students.**
**Load the data into R. **
```{r}
load(url("https://www.rossmanchance.com/iscam3/ISCAM.RData"))
waterusage = read.table("https://www.rossmanchance.com/chance/stat301W24/data/waterusage.txt", sep="\t", header=T)
#So I don't have to type the data frame name each time
usage = waterusage$usage
head(usage)
length(usage) #use length instead of nrow when just a vector not a data frame
```
**(a) Create a well-labeled dotplot or histogram of the results.**
```{r}
#Pick one of these
iscamdotplot(usage)
hist(usage)
qqnorm(usage, datax=F)
iscamsummary(usage)
```
**Summarize the shape, center, and variability of the distribution in context. Do you have any conjectures for any usual features to this distribution?**
**(b) Discuss whether you think a one-sample t-procedure is valid for these data.**
**(c) Even if you said “no” in (b), calculate a one-sample *t*-confidence interval.**
```{r}
t.test(usage, conf.level = .95)
```
**Interpret your interval in context.**
**(d) The online water usage calculator you were provided included a national average of 1744. Does this seem to be a plausible value for $\mu$ based on these data? Explain your reasoning based on your interval in (c).**
**(e) Discuss whether you think a one-sample *t*-prediction is valid for these data.**
**(f) Even if you said “no” in (e), calculate a one-sample t¬-prediction interval.**
```{r}
#Fill in the values and uncomment
# mean =
# sd =
# t = iscaminvt(.95, 68-1, "between")$answer2
#mean - t*sd*sqrt(1 + 1/68)
#mean + t*sd*sqrt(1 + 1/68)
#optional, compare to
#predict(lm(usage ~ 1), newdata = data.frame(usage = 0), interval = "predict")
```
**Interpret your interval in context. **
**(g) How do the intervals in (c) and (f) compare (midpoint, width)? Is this what you expected? Explain.**
```{r}
#Bootstrapping
resamples = lapply(1:1000, function(i) sample(usage, 68, replace = TRUE) )
bootstrapmeans = sapply(resamples, mean)
hist(bootstrapmeans); qqnorm(bootstrapmeans, datax=T)
iscamsummary(bootstrapmeans)
```
**(h) Explain why the mean of the bootstrap means is similar to 555 gallons.**
**(i) Some argue the shape of this distribution should be similar to the shape of the sampling distribution of means. Does the sample size in this study appear to be large enough to assume the distribution of sample means is approximately normal despite the strange looking sample shape?**
**(j) But what we really care about is the standard deviation of the sampling distribution. Does this bootstrapping procedure appear to accurately estimate the theoretical standard deviation of sample means? Explain how you are deciding.**
*So again, the main use would be for a statistic where we didn’t have a fancy SE formula. Then we could use something like statistic + 2SE(statistic) to approximate a confidence interval for the parameter.*