4.6 Sample size calculation
The sample size calculation is based on a series of assumptions made by the researcher. The values suggested by the different sample size calculation methods should be considered only as a reference, given the arbitrariness of the measurements used to obtain them. Time and cost are two limitations that must be taken into account, which may overlap with sample size calculations.
Below, very simple cases will be presented, but sufficient to illustrate the principles used. For more features, the R package pwr
(Champely 2020) and the software G*Power (A.-G. L. Franz Faul Edgard Erdfelder and Buchner 2007), (A. B. Franz Faul Edgard Erdfelder and Lang 2009) are recommended. For a more theoretical approach, (Chow, Wang, and Shao 2007) is recommended.
4.6.1 Average
One way to estimate the sample size in the case of inference to the universal average \(\mu\) is to consider the margin of error of Equation (??) and isolate \(n\) in the form \[\begin{equation} n = \left \lceil{ \left( \frac{z \sigma}{\varepsilon} \right)^2 }\right \rceil. \tag{4.5} \end{equation}\]
The operator \(\left \lceil{ x }\right \rceil\) indicates the ceiling function of \(x\), i.e., it indicates the first integer above \(x\).
Example 4.30 (Sample size for average) We want to obtain the sample size to estimate the average height of PUCRS students. A confidence interval of \(1-\alpha = 95\%\) is considered, with a margin of error of \(\varepsilon = 3\) cm. From previous studies, \(\sigma = 15\) cm is assumed. Considering the Equation (4.5), it is known from the standard normal distribution table that \(z = 1.96\), thus \[n = \left \lceil{ \left( \frac{1.96 \times 15}{3} \right)^2 }\right \rceil = \left \lceil{ 96.04 }\right \rceil = 97.\]
n_m <- function(z,sigma,e) {
exact <- (z*sigma/e)^2
ceil <- ceiling(exact)
return(list(exact=exact,
ceiling=ceil))
}
n_m(1.96,15,3)
## $exact
## [1] 96.04
##
## $ceiling
## [1] 97
4.6.2 Proportion
One way to estimate the sample size in the case of inference for the universal proportion \(\pi\) is to consider the margin of error of Equation (6.4) and isolate \(n\) in the form \[\begin{equation} n = \left \lceil{ \frac{z^2 p (1-p)}{\varepsilon^2} }\right \rceil. \tag{4.6} \end{equation}\]
In certain cases there is information available about the proportion, but when there is no knowledge about this measure, \(p=\frac{1}{2}\) is considered, the point at which \(p(1-p)\) reaches its maximum .
Exercise 4.15 Obtain the result of Equation (4.6) from the margin of error of Equation (6.4). \(\\\)
Exercise 4.16 Verify that \(p(1-p)\) reaches its maximum when \(p=\frac{1}{2}\). \(\\\)
Example 4.31 (Sample size for proportion) In an electoral survey you want to calculate the approximate sample size so that the margin of error is \(\varepsilon = 2\%\) with confidence of \(1-\alpha = 95\%\). Considering the Equation (4.6), it is known from the standard normal distribution table that \(z = 1.96 \approx 2\), and that \(p(1-p)\) reaches its maximum when \(p= \frac{1}{2}\). Thus, \[\begin{equation} n \approx \left \lceil{ \frac{2^2 \times \frac{1}{2} \times (1-\frac{1}{2})}{\varepsilon^2} }\right \rceil = \left \lceil{ \frac{1}{\varepsilon^2} }\right \rceil \tag{4.7} \end{equation}\]
Therefore, a CI for the proportion with \(\alpha = 5\%\) for a margin of error of \(\varepsilon = 2\%\) can be calculated with a sample size of approximately \[ n \approx \left \lceil{ \frac{1}{0.02^2} }\right \rceil = 2500. \]
n_p <- function(e) {
exact <- 1/e^2
ceil <- ceiling(exact)
return(list(exact=exact,
ceiling=ceil))
}
n_p(0.02)
## $exact
## [1] 2500
##
## $ceiling
## [1] 2500
Exercise 4.17 Test the function n_p
from Example 4.31 with different margin of error values. Make a graph to analyze the variation in sample size as \(\varepsilon\) increases. \(\\\)