6.2 Confidence Interval
A confidence interval is a way of estimating parameters through an interval. Its interpretation may be tricky, and to explain this idea the nominal coverage probability is usually considered.
Exercise 6.5 Access the link https://seeing-theory.brown.edu/frequentist-inference/index.html#section2 and perform the simulation with different distributions, sample sizes (\(n\)) and confidence levels (\(1-\alpha\)).
6.2.1 Proportion (\(\pi\))
The CI for the population proportion \(\pi\) is given by the expression \[\begin{equation} IC \left[ \pi, 1-\alpha \right] = p \mp z \sqrt{\dfrac{p(1-p)}{n}} = \left[ p - z \sqrt{\dfrac{p(1-p)}{n}}, p + z \sqrt{\dfrac{p(1-p)}{n}} \right] \tag{6.4} \end{equation}\]
where \(1-\alpha\) is the confidence of the interval, \(p\) is the sample proportion, \(n\) is the sample size and \(z=z_{\frac{\alpha}{2}}\) is the quantile of the standard normal distribution that accumulates \(\frac{\alpha}{2}\) of probability. For a more detailed discussion, see (Agresti and Coull 1998).
Example 6.7 (CI for \(\pi\)) Consider again the data from Example 6.4, where we want to calculate the IC for the proportion of smokers in PUCRS. It is known that \(\hat{\pi} = p = 25/125 = 0.2\), \(n=125\) and \(z=1.96\). The CI of \(1-\alpha=95\%\) is \[ IC \left[ \pi, 95\% \right] = 0.2 \mp 1.96 \sqrt{\dfrac{0.2 \left( 1-0.2 \right) } {125}} \approx 0.2 \mp 0.07 = \left[ 0.13, 0.27 \right] = \left[ 13\%, 27\% \right]. \] The margin of error is approximately \(0.07 = 7\%\), Note the difference in precision between the table, consulting the probability 0.0250 corresponding to \(z=-1.96\), and the value calculated with the qnorm
function.
## [1] 0.0701218
## [1] 0.1298782
## [1] 0.2701218
# Automatic report principle
cat('The CI 95% for the proportion is [',
round(Lpi,2), ',',
round(Upi,2), '].')
## The CI 95% for the proportion is [ 0.13 , 0.27 ].
Exercise 6.6 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to resolve them.
6.2.2 Mean (\(\mu\))
The most realistic case for calculating the CI for the universal mean involves not knowing \(\sigma\). It is given by the expression \[\begin{eqnarray} IC \left[ \mu, 1-\alpha \right] = \bar{x} \mp t \dfrac{s}{\sqrt{n}} = \left[ \bar{x} - t \dfrac{s}{\sqrt{n}}, \bar{x} + t \dfrac{s}{\sqrt{n}} \right], \tag{6.5} \end{eqnarray}\] where \(1-\alpha\) is the confidence interval, \(\bar{x}\) is the sample mean, \(s\) is the sample standard deviation, \(n\) is the sample size and \(t=t_{n-1 , \frac{\alpha}{2}}\) is the quantile of the distribution \(t\) with \(n-1\) degrees of freedom that accumulates \(1-\frac{\alpha}{2}\) of probability. In the less realistic case, the known \(\sigma\) is used instead of \(s\), implying the use of a standard normal instead of a \(t\) with \(n-1\) degrees of freedom.
Example 6.8 (CI for \(\mu\) with \(\sigma\) unknown) Consider a sample of \(n=10\) women in which the variable $X was observed $: ‘height’. Suppose that \(X \sim \mathcal{N}(\mu,\sigma)\), i.e., the variable ‘women’s height’ has a normal distribution with mean \(\mu\) and universal standard deviation \(\sigma\), both unknown. From the distribution table \(t\) with \(10-1=9\) degrees of freedom, it is known that the quantiles \(\pm 2.262\) limit an area of approximately \(95\%\), therefore \(t=2.262\). If the sample calculated a mean of \(\bar{x}_{10} = 1.63\) and a standard deviation of \(s=0.05\), the CI of \(1-\alpha=95\%\) is \[ IC \left[ \mu, 95\% \right] = 1.63 \mp 2.262 \dfrac{0.05}{\sqrt{10}} \approx 1.63 \mp 0.04 \approx \left[ 1.59, 1.67 \right]. \] The margin of error is approximately \(0.04\) or 4 cm, greater than the margin of error \(0.03\) when assuming \(\sigma\) is known since \(z=1.96 < 2.262=t\).
n <- 10
m <- 1.63
s <- 0.05 # sample standard deviation
t <- abs(qt(0.025, n-1)) # |-2.2621572|
(e <- t*s/sqrt(n)) # Error margin
## [1] 0.03576785
## [1] 1.594232
## [1] 1.665768
# Automatic reporting principle
cat('The 95% CI for the mean is [',
round(Lmu,2), ',',
round(Umu,2), '].')
## The 95% CI for the mean is [ 1.59 , 1.67 ].
Exercise 6.7 Redo the calculations in Example 6.8, but now considering \(\sigma=0.05\). What changes?
6.2.3 Variance (\(\sigma^2\))
The CI for the variance \(\sigma^2\) is given by the expression \[\begin{equation} IC \left[ \sigma^2, 1-\alpha \right] = \left[ \frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}}, \frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}} \right] \tag{6.6} \end{equation}\] where \(1-\alpha\) is the confidence interval, \(s^2\) is the sample variance, \(n\) is the sample size, \(\chi_{1-\frac{\alpha}{2}}\) is the quantile of the chi-square distribution with \(\nu = n-1\) degrees of freedom that accumulates \(1-\frac{\alpha}{2}\) of probability and \(\chi_{\frac{\alpha}{2}}\) is the quantile of the chi-square distribution with \(n-1\) degrees of freedom that accumulates \(\frac{\alpha}{2}\) of probability.
Example 6.9 (CI for \(\sigma^2\)) Again using the first 10 observations from the table in Example 2.12, it is known that the sample variance is \(s^2=0.05^2 = 0.0025\) and $ = 10-1 = 9$. According to the chi-square table \(\chi_{0.025}^2 = 2.70\) and \(\chi_{0.975}^2 = 19.02\). The IC of \(1-\alpha = 95\%\) for \(\sigma^2\) is \[ IC \left[ \sigma^2, 95\% \right] = \left[ \dfrac{(10-1) \times 0.0025}{19.02}, \dfrac{(10-1) \times 0.0025}{2.70} \right] \approx \left[ 0.0018, 0.0083 \right]. \]
s = 0.05
n = 10
gl = n-1
# quantiles via qchisq (more accurate)
qui.025.qchi = qchisq(.025, gl)
qui.975.qchi = qchisq(.975, gl)
# CI for variance via qchisq
(Lvar.qchi <- gl*s^2/qui.975.qchi) # Lower bound
## [1] 0.001182793
## [1] 0.008332131
6.2.4 Standard deviation (\(\sigma\))
The CI for the standard deviation \(\sigma\) is basically the square root of the CI for the variance. It is given by the expression \[\begin{equation} IC \left[ \sigma, 1-\alpha \right] = \left[ \sqrt{\frac{(n-1)s^2}{\chi_{1-\frac{\alpha}{2}}} }, \sqrt{\frac{(n-1)s^2}{\chi_{\frac{\alpha}{2}}}} \right] \tag{6.7} \end{equation}\] where \(1-\alpha\) is the confidence interval, \(s^2\) is the sample variance, \(n\) is the sample size, \(\chi_{1-\frac{\alpha}{2}}\) is the quantile of the chi-square distribution with \(\nu = n-1\) degrees of freedom that accumulates \(1-\frac{\alpha}{2}\) of probability and \(\chi_{\frac{\alpha}{2}}\) is the quantile of the chi-square distribution with \(n-1\) degrees of freedom that accumulates \(\frac{\alpha}{2}\) of probability.
Example 6.10 (CI for \(\sigma\)) From the Example 6.9, \[ IC \left[ \sigma, 95\% \right] = \left[ \sqrt{\dfrac{(10-1) \times 0.0025}{19.02}}, \sqrt{\dfrac{(10-1) \times 0.0025}{2.70}} \right] \approx \left[ 0.0344, 0.0913 \right]. \]
## [1] 0.03439176
## [1] 0.09128051
Exercise 6.8 Access the material Classic Statistics on RStudio and solve extra exercises 1 to 9 on pages 99 and 100. Observe Appendix B with the answers to the exercises, but only after trying to solve them.