## 6.2 Confidence Interval

A confidence interval is a way of estimating parameters through an interval. Its interpretation may be tricky, and to explain this idea the nominal coverage probability is usually considered.

Exercise 6.5 Access the link https://seeing-theory.brown.edu/frequentist-inference/index.html#section2 and perform the simulation with different distributions, sample sizes ($$n$$) and confidence levels ($$1-\alpha$$).

### 6.2.1 Proportion ($$\pi$$)

The CI for the population proportion $$\pi$$ is given by the expression $$$IC \left[ \pi, 1-\alpha \right] = p \mp z \sqrt{\dfrac{p(1-p)}{n}} = \left[ p - z \sqrt{\dfrac{p(1-p)}{n}}, p + z \sqrt{\dfrac{p(1-p)}{n}} \right] \tag{6.4}$$$

where $$1-\alpha$$ is the confidence of the interval, $$p$$ is the sample proportion, $$n$$ is the sample size and $$z=z_{\frac{\alpha}{2}}$$ is the quantile of the standard normal distribution that accumulates $$\frac{\alpha}{2}$$ of probability. For a more detailed discussion, see .

Example 6.7 (CI for $$\pi$$) Consider again the data from Example 6.4, where we want to calculate the IC for the proportion of smokers in PUCRS. It is known that $$\hat{\pi} = p = 25/125 = 0.2$$, $$n=125$$ and $$z=1.96$$. The CI of $$1-\alpha=95\%$$ is $IC \left[ \pi, 95\% \right] = 0.2 \mp 1.96 \sqrt{\dfrac{0.2 \left( 1-0.2 \right) } {125}} \approx 0.2 \mp 0.07 = \left[ 0.13, 0.27 \right] = \left[ 13\%, 27\% \right].$ The margin of error is approximately $$0.07 = 7\%$$, Note the difference in precision between the table, consulting the probability 0.0250 corresponding to $$z=-1.96$$, and the value calculated with the qnorm function.

n = 125
p = 25/n
z = abs(qnorm(0.025))         # |-1.959964|
(e = z*sqrt(p*(1-p)/n))       # Error margin
## [1] 0.0701218
(Lpi = p - e)                 # Lower bound
## [1] 0.1298782
(Upi = p + e)                 # Upper bound
## [1] 0.2701218
# Automatic report principle
cat('The CI 95% for the proportion is [',
round(Lpi,2), ',',
round(Upi,2), '].')
## The CI 95% for the proportion is [ 0.13 , 0.27 ].

Exercise 6.6 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to resolve them.

### 6.2.2 Mean ($$\mu$$)

The most realistic case for calculating the CI for the universal mean involves not knowing $$\sigma$$. It is given by the expression $\begin{eqnarray} IC \left[ \mu, 1-\alpha \right] = \bar{x} \mp t \dfrac{s}{\sqrt{n}} = \left[ \bar{x} - t \dfrac{s}{\sqrt{n}}, \bar{x} + t \dfrac{s}{\sqrt{n}} \right], \tag{6.5} \end{eqnarray}$ where $$1-\alpha$$ is the confidence interval, $$\bar{x}$$ is the sample mean, $$s$$ is the sample standard deviation, $$n$$ is the sample size and $$t=t_{n-1 , \frac{\alpha}{2}}$$ is the quantile of the distribution $$t$$ with $$n-1$$ degrees of freedom that accumulates $$1-\frac{\alpha}{2}$$ of probability. In the less realistic case, the known $$\sigma$$ is used instead of $$s$$, implying the use of a standard normal instead of a $$t$$ with $$n-1$$ degrees of freedom.

Example 6.8 (CI for $$\mu$$ with $$\sigma$$ unknown) Consider a sample of $$n=10$$ women in which the variable $X was observed$: ‘height’. Suppose that $$X \sim \mathcal{N}(\mu,\sigma)$$, i.e., the variable ‘women’s height’ has a normal distribution with mean $$\mu$$ and universal standard deviation $$\sigma$$, both unknown. From the distribution table $$t$$ with $$10-1=9$$ degrees of freedom, it is known that the quantiles $$\pm 2.262$$ limit an area of approximately $$95\%$$, therefore $$t=2.262$$. If the sample calculated a mean of $$\bar{x}_{10} = 1.63$$ and a standard deviation of $$s=0.05$$, the CI of $$1-\alpha=95\%$$ is $IC \left[ \mu, 95\% \right] = 1.63 \mp 2.262 \dfrac{0.05}{\sqrt{10}} \approx 1.63 \mp 0.04 \approx \left[ 1.59, 1.67 \right].$ The margin of error is approximately $$0.04$$ or 4 cm, greater than the margin of error $$0.03$$ when assuming $$\sigma$$ is known since $$z=1.96 < 2.262=t$$.

n <- 10
m <- 1.63
s <- 0.05                     # sample standard deviation
t <- abs(qt(0.025, n-1))      # |-2.2621572|
(e <- t*s/sqrt(n))            # Error margin
## [1] 0.03576785
(Lmu <- m - e)                # Lower bound
## [1] 1.594232
(Umu <- m + e)                # Upper bound
## [1] 1.665768
# Automatic reporting principle
cat('The 95% CI for the mean is [',
round(Lmu,2), ',',
round(Umu,2), '].')
## The 95% CI for the mean is [ 1.59 , 1.67 ].

Exercise 6.7 Redo the calculations in Example 6.8, but now considering $$\sigma=0.05$$. What changes?

### 6.2.3 Variance ($$\sigma^2$$)

The CI for the variance $$\sigma^2$$ is given by the expression $$$IC \left[ \sigma^2, 1-\alpha \right] = \left[ \frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}}, \frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}} \right] \tag{6.6}$$$ where $$1-\alpha$$ is the confidence interval, $$s^2$$ is the sample variance, $$n$$ is the sample size, $$\chi_{1-\frac{\alpha}{2}}$$ is the quantile of the chi-square distribution with $$\nu = n-1$$ degrees of freedom that accumulates $$1-\frac{\alpha}{2}$$ of probability and $$\chi_{\frac{\alpha}{2}}$$ is the quantile of the chi-square distribution with $$n-1$$ degrees of freedom that accumulates $$\frac{\alpha}{2}$$ of probability.

Example 6.9 (CI for $$\sigma^2$$) Again using the first 10 observations from the table in Example 2.12, it is known that the sample variance is $$s^2=0.05^2 = 0.0025$$ and $= 10-1 = 9$. According to the chi-square table $$\chi_{0.025}^2 = 2.70$$ and $$\chi_{0.975}^2 = 19.02$$. The IC of $$1-\alpha = 95\%$$ for $$\sigma^2$$ is $IC \left[ \sigma^2, 95\% \right] = \left[ \dfrac{(10-1) \times 0.0025}{19.02}, \dfrac{(10-1) \times 0.0025}{2.70} \right] \approx \left[ 0.0018, 0.0083 \right].$

s = 0.05
n = 10
gl = n-1
# quantiles via qchisq (more accurate)
qui.025.qchi = qchisq(.025, gl)
qui.975.qchi = qchisq(.975, gl)
# CI for variance via qchisq
(Lvar.qchi <- gl*s^2/qui.975.qchi)  # Lower bound
## [1] 0.001182793
(Uvar.qchi <- gl*s^2/qui.025.qchi)  # Upper bound
## [1] 0.008332131

### 6.2.4 Standard deviation ($$\sigma$$)

The CI for the standard deviation $$\sigma$$ is basically the square root of the CI for the variance. It is given by the expression $$$IC \left[ \sigma, 1-\alpha \right] = \left[ \sqrt{\frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}} }, \sqrt{\frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}}} \right] \tag{6.7}$$$ where $$1-\alpha$$ is the confidence interval, $$s^2$$ is the sample variance, $$n$$ is the sample size, $$\chi_{1-\frac{\alpha}{2}}$$ is the quantile of the chi-square distribution with $$\nu = n-1$$ degrees of freedom that accumulates $$1-\frac{\alpha}{2}$$ of probability and $$\chi_{\frac{\alpha}{2}}$$ is the quantile of the chi-square distribution with $$n-1$$ degrees of freedom that accumulates $$\frac{\alpha}{2}$$ of probability.

Example 6.10 (CI for $$\sigma$$) From the Example 6.9, $IC \left[ \sigma, 95\% \right] = \left[ \sqrt{\dfrac{(10-1) \times 0.0025}{19.02}}, \sqrt{\dfrac{(10-1) \times 0.0025}{2.70}} \right] \approx \left[ 0.0344, 0.0913 \right].$

# CI for standard deviation
(Lsd <- sqrt(Lvar.qchi))  # Lower bound
## [1] 0.03439176
(Usd <- sqrt(Uvar.qchi))  # Upper bound
## [1] 0.09128051

Exercise 6.8 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to solve them.

### References

Agresti, Alan, and Brent A Coull. 1998. “Approximate Is Better Than ‘Exact’ for Interval Estimation of Binomial Proportions.” The American Statistician 52 (2): 119–26. https://www.tandfonline.com/doi/pdf/10.1080/00031305.1998.10480550.