6.2 Confidence Interval

A confidence interval is a way of estimating parameters through an interval. Its interpretation may be tricky, and to explain this idea the nominal coverage probability is usually considered.

Exercise 6.5 Access the link https://seeing-theory.brown.edu/frequentist-inference/index.html#section2 and perform the simulation with different distributions, sample sizes ($n$) and confidence levels ($1-\alpha$).

6.2.1 Proportion ($\pi$)

The CI for the population proportion $\pi$ is given by the expression \[\begin{equation} IC \left[ \pi, 1-\alpha \right] = p \mp z \sqrt{\dfrac{p(1-p)}{n}} = \left[ p - z \sqrt{\dfrac{p(1-p)}{n}}, p + z \sqrt{\dfrac{p(1-p)}{n}} \right] \tag{6.4} \end{equation}\]

where $1-\alpha$ is the confidence of the interval, $p$ is the sample proportion, $n$ is the sample size and $z=z_{\frac{\alpha}{2}}$ is the quantile of the standard normal distribution that accumulates $\frac{\alpha}{2}$ of probability. For a more detailed discussion, see (Agresti and Coull 1998).

Example 6.7 (CI for $\pi$) Consider again the data from Example 6.4, where we want to calculate the IC for the proportion of smokers in PUCRS. It is known that $\hat{\pi} = p = 25/125 = 0.2$, $n=125$ and $z=1.96$. The CI of $1-\alpha=95\%$ is \[ IC \left[ \pi, 95\% \right] = 0.2 \mp 1.96 \sqrt{\dfrac{0.2 \left( 1-0.2 \right) } {125}} \approx 0.2 \mp 0.07 = \left[ 0.13, 0.27 \right] = \left[ 13\%, 27\% \right]. \] The margin of error is approximately $0.07 = 7\%$, Note the difference in precision between the table, consulting the probability 0.0250 corresponding to $z=-1.96$, and the value calculated with the qnorm function.

n = 125
p = 25/n
z = abs(qnorm(0.025))         # |-1.959964|
(e = z*sqrt(p*(1-p)/n))       # Error margin

## [1] 0.0701218

(Lpi = p - e)                 # Lower bound

## [1] 0.1298782

(Upi = p + e)                 # Upper bound

## [1] 0.2701218

# Automatic report principle
cat('The CI 95% for the proportion is [',
    round(Lpi,2), ',',
    round(Upi,2), '].')

## The CI 95% for the proportion is [ 0.13 , 0.27 ].

Exercise 6.6 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to resolve them.

6.2.2 Mean ($\mu$)

The most realistic case for calculating the CI for the universal mean involves not knowing $\sigma$. It is given by the expression \[\begin{eqnarray} IC \left[ \mu, 1-\alpha \right] = \bar{x} \mp t \dfrac{s}{\sqrt{n}} = \left[ \bar{x} - t \dfrac{s}{\sqrt{n}}, \bar{x} + t \dfrac{s}{\sqrt{n}} \right], \tag{6.5} \end{eqnarray}\] where $1-\alpha$ is the confidence interval, $\bar{x}$ is the sample mean, $s$ is the sample standard deviation, $n$ is the sample size and $t=t_{n-1 , \frac{\alpha}{2}}$ is the quantile of the distribution $t$ with $n-1$ degrees of freedom that accumulates $1-\frac{\alpha}{2}$ of probability. In the less realistic case, the known $\sigma$ is used instead of $s$, implying the use of a standard normal instead of a $t$ with $n-1$ degrees of freedom.

Example 6.8 (CI for $\mu$ with $\sigma$ unknown) Consider a sample of $n=10$ women in which the variable $X was observed $: ‘height’. Suppose that $X \sim \mathcal{N}(\mu,\sigma)$, i.e., the variable ‘women’s height’ has a normal distribution with mean $\mu$ and universal standard deviation $\sigma$, both unknown. From the distribution table $t$ with $10-1=9$ degrees of freedom, it is known that the quantiles $\pm 2.262$ limit an area of approximately $95\%$, therefore $t=2.262$. If the sample calculated a mean of $\bar{x}_{10} = 1.63$ and a standard deviation of $s=0.05$, the CI of $1-\alpha=95\%$ is \[ IC \left[ \mu, 95\% \right] = 1.63 \mp 2.262 \dfrac{0.05}{\sqrt{10}} \approx 1.63 \mp 0.04 \approx \left[ 1.59, 1.67 \right]. \] The margin of error is approximately $0.04$ or 4 cm, greater than the margin of error $0.03$ when assuming $\sigma$ is known since $z=1.96 < 2.262=t$.

n <- 10
m <- 1.63
s <- 0.05                     # sample standard deviation
t <- abs(qt(0.025, n-1))      # |-2.2621572|
(e <- t*s/sqrt(n))            # Error margin

## [1] 0.03576785

(Lmu <- m - e)                # Lower bound

## [1] 1.594232

(Umu <- m + e)                # Upper bound

## [1] 1.665768

# Automatic reporting principle
cat('The 95% CI for the mean is [',
    round(Lmu,2), ',',
    round(Umu,2), '].')

## The 95% CI for the mean is [ 1.59 , 1.67 ].

Exercise 6.7 Redo the calculations in Example 6.8, but now considering $\sigma=0.05$. What changes?

6.2.3 Variance ($\sigma^2$)

The CI for the variance $\sigma^2$ is given by the expression \[\begin{equation} IC \left[ \sigma^2, 1-\alpha \right] = \left[ \frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}}, \frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}} \right] \tag{6.6} \end{equation}\] where $1-\alpha$ is the confidence interval, $s^2$ is the sample variance, $n$ is the sample size, $\chi_{1-\frac{\alpha}{2}}$ is the quantile of the chi-square distribution with $\nu = n-1$ degrees of freedom that accumulates $1-\frac{\alpha}{2}$ of probability and $\chi_{\frac{\alpha}{2}}$ is the quantile of the chi-square distribution with $n-1$ degrees of freedom that accumulates $\frac{\alpha}{2}$ of probability.

Example 6.9 (CI for $\sigma^2$) Again using the first 10 observations from the table in Example 2.12, it is known that the sample variance is $s^2=0.05^2 = 0.0025$ and $ = 10-1 = 9$. According to the chi-square table $\chi_{0.025}^2 = 2.70$ and $\chi_{0.975}^2 = 19.02$. The IC of $1-\alpha = 95\%$ for $\sigma^2$ is \[ IC \left[ \sigma^2, 95\% \right] = \left[ \dfrac{(10-1) \times 0.0025}{19.02}, \dfrac{(10-1) \times 0.0025}{2.70} \right] \approx \left[ 0.0018, 0.0083 \right]. \]

s = 0.05
n = 10
gl = n-1
# quantiles via qchisq (more accurate)
qui.025.qchi = qchisq(.025, gl)
qui.975.qchi = qchisq(.975, gl)
# CI for variance via qchisq
(Lvar.qchi <- gl*s^2/qui.975.qchi)  # Lower bound

## [1] 0.001182793

(Uvar.qchi <- gl*s^2/qui.025.qchi)  # Upper bound

## [1] 0.008332131

6.2.4 Standard deviation ($\sigma$)

The CI for the standard deviation $\sigma$ is basically the square root of the CI for the variance. It is given by the expression \[\begin{equation} IC \left[ \sigma, 1-\alpha \right] = \left[ \sqrt{\frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}} }, \sqrt{\frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}}} \right] \tag{6.7} \end{equation}\] where $1-\alpha$ is the confidence interval, $s^2$ is the sample variance, $n$ is the sample size, $\chi_{1-\frac{\alpha}{2}}$ is the quantile of the chi-square distribution with $\nu = n-1$ degrees of freedom that accumulates $1-\frac{\alpha}{2}$ of probability and $\chi_{\frac{\alpha}{2}}$ is the quantile of the chi-square distribution with $n-1$ degrees of freedom that accumulates $\frac{\alpha}{2}$ of probability.

Example 6.10 (CI for $\sigma$) From the Example 6.9, \[ IC \left[ \sigma, 95\% \right] = \left[ \sqrt{\dfrac{(10-1) \times 0.0025}{19.02}}, \sqrt{\dfrac{(10-1) \times 0.0025}{2.70}} \right] \approx \left[ 0.0344, 0.0913 \right]. \]

# CI for standard deviation
(Lsd <- sqrt(Lvar.qchi))  # Lower bound

## [1] 0.03439176

(Usd <- sqrt(Uvar.qchi))  # Upper bound

## [1] 0.09128051

Exercise 6.8 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to solve them.

References

Agresti, Alan, and Brent A Coull. 1998. “Approximate Is Better Than ‘Exact’ for Interval Estimation of Binomial Proportions.” The American Statistician 52 (2): 119–26. https://www.tandfonline.com/doi/pdf/10.1080/00031305.1998.10480550.

Basic Statistics

6.2 Confidence Interval

6.2.1 Proportion (\(\pi\))

6.2.2 Mean (\(\mu\))

6.2.3 Variance (\(\sigma^2\))

6.2.4 Standard deviation (\(\sigma\))

References