4.6 Sample size calculation

The sample size calculation is based on a series of assumptions made by the researcher. The values suggested by the different sample size calculation methods should be considered only as a reference, given the arbitrariness of the measurements used to obtain them. Time and cost are two limitations that must be taken into account, which may overlap with sample size calculations.
Below, very simple cases will be presented, but sufficient to illustrate the principles used. For more features, the R package pwr (Champely 2020) and the software G*Power (A.-G. L. Franz Faul Edgard Erdfelder and Buchner 2007), (A. B. Franz Faul Edgard Erdfelder and Lang 2009) are recommended. For a more theoretical approach, (Chow, Wang, and Shao 2007) is recommended.

4.6.1 Average

One way to estimate the sample size in the case of inference to the universal average \(\mu\) is to consider the margin of error of Equation (??) and isolate \(n\) in the form \[\begin{equation} n = \left \lceil{ \left( \frac{z \sigma}{\varepsilon} \right)^2 }\right \rceil. \tag{4.5} \end{equation}\]

The operator \(\left \lceil{ x }\right \rceil\) indicates the ceiling function of \(x\), i.e., it indicates the first integer above \(x\).

Exercise 4.14 Obtain the result of Equation (4.5) from the margin of error of Equation (??). \(\\\)

Example 4.30 (Sample size for average) We want to obtain the sample size to estimate the average height of PUCRS students. A confidence interval of \(1-\alpha = 95\%\) is considered, with a margin of error of \(\varepsilon = 3\) cm. From previous studies, \(\sigma = 15\) cm is assumed. Considering the Equation (4.5), it is known from the standard normal distribution table that \(z = 1.96\), thus \[n = \left \lceil{ \left( \frac{1.96 \times 15}{3} \right)^2 }\right \rceil = \left \lceil{ 96.04 }\right \rceil = 97.\]

n_m <- function(z,sigma,e) {
  exact <- (z*sigma/e)^2     
  ceil <- ceiling(exact)
  return(list(exact=exact, 
              ceiling=ceil))
}      
n_m(1.96,15,3)

## $exact
## [1] 96.04
## 
## $ceiling
## [1] 97

4.6.2 Proportion

One way to estimate the sample size in the case of inference for the universal proportion \(\pi\) is to consider the margin of error of Equation (6.4) and isolate \(n\) in the form \[\begin{equation} n = \left \lceil{ \frac{z^2 p (1-p)}{\varepsilon^2} }\right \rceil. \tag{4.6} \end{equation}\]

In certain cases there is information available about the proportion, but when there is no knowledge about this measure, \(p=\frac{1}{2}\) is considered, the point at which \(p(1-p)\) reaches its maximum .

Exercise 4.15 Obtain the result of Equation (4.6) from the margin of error of Equation (6.4). \(\\\)

Exercise 4.16 Verify that \(p(1-p)\) reaches its maximum when \(p=\frac{1}{2}\). \(\\\)

Example 4.31 (Sample size for proportion) In an electoral survey you want to calculate the approximate sample size so that the margin of error is \(\varepsilon = 2\%\) with confidence of \(1-\alpha = 95\%\). Considering the Equation (4.6), it is known from the standard normal distribution table that \(z = 1.96 \approx 2\), and that \(p(1-p)\) reaches its maximum when \(p= \frac{1}{2}\). Thus, \[\begin{equation} n \approx \left \lceil{ \frac{2^2 \times \frac{1}{2} \times (1-\frac{1}{2})}{\varepsilon^2} }\right \rceil = \left \lceil{ \frac{1}{\varepsilon^2} }\right \rceil \tag{4.7} \end{equation}\]

Therefore, a CI for the proportion with \(\alpha = 5\%\) for a margin of error of \(\varepsilon = 2\%\) can be calculated with a sample size of approximately \[ n \approx \left \lceil{ \frac{1}{0.02^2} }\right \rceil = 2500. \]

n_p <- function(e) {
  exact <- 1/e^2
  ceil <- ceiling(exact)
  return(list(exact=exact, 
              ceiling=ceil))
}      
n_p(0.02)

## $exact
## [1] 2500
## 
## $ceiling
## [1] 2500

Exercise 4.17 Test the function n_p from Example 4.31 with different margin of error values. Make a graph to analyze the variation in sample size as \(\varepsilon\) increases. \(\\\)

References

Champely, Stephane. 2020. Pwr: Basic Functions for Power Analysis. https://CRAN.R-project.org/package=pwr.

Chow, Shein-Chung, Hansheng Wang, and Jun Shao. 2007. Sample Size Calculations in Clinical Rsesearch, Second Edition. CRC press. https://books.google.com.br/books?id=ju-sojS3sa0C&printsec=frontcover&hl=pt-BR#v=onepage&q&f=false.

Franz Faul, Albert-Geroge Lang, Edgard Erdfelder, and Axel Buchner. 2007. “G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences.” Behavior Research Methods, 39, 175-191. https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPower3-BRM-Paper.pdf.

Franz Faul, Axel Buchner, Edgard Erdfelder, and Albert-Geroge Lang. 2009. “Statistical Power Analyses Using G*Power 3.1: Tests for Correlation and Regression Analyses.” Behavior Research Methods, 41, 1149-1160. https://www.psychologie.hhu.de/fileadmin/redaktion/Fakultaeten/Mathematisch-Naturwissenschaftliche_Fakultaet/Psychologie/AAP/gpower/GPower31-BRM-Paper.pdf.