6.1 Point estimation

In point estimation a statistic is used, calculated from an estimator as a(n) (point) estimate of a certain parameter, according to Definitions 6.1 and 6.2. In other words, a single sample value (point) is used to estimate \(\theta\), symbolized by \(\hat{\theta}\) and read as theta hat. From the perspective of Decision Theory, an estimator is called a decision rule (Berger 1985, 9).

Definition 6.1 An estimator \(\hat{\theta}(\boldsymbol{x}) \equiv \hat{\theta}\) is a function that aims to infer about a parameter \(\theta(\boldsymbol{X}) \equiv \theta\). \(\\\)

Definition 6.2 An estimate is a particular value obtained from applying sample data to an estimator. \(\\\)

Example 6.1 The sample mean \(\bar{x}\) given by Eq. (2.9) is a point estimator for the universal mean \(\mu\) (Eq. (2.8)).

6.1.1 Unbiased estimators

Definition 6.3 An estimator is said to be unbiased according to a sampling plan \(\lambda\) if

\[\begin{equation} E_\lambda \left[ \hat{\theta} \right] = \theta. \tag{6.1} \end{equation}\]

Sample mean \(\bar{x}\)

The sample mean in Example (2.9) is an unbiased estimator of the universal mean \(\mu\) according to the AAS sampling plan, with or without replacement. This occurs because the expectation is linear, so the dependence between observations does not affect the result. \(\\\)

Example 6.2 Let the random variables be \(X_1, X_2, \ldots, X_n\) independent identically distributed (iid) with \(E(X_i)=\mu\) and an AAS type sampling plan, where for simplicity the equivalence \(E_{AAS} \equiv E\) will be considered.

\[\begin{eqnarray} E\left[\bar{X}\right] &=& E\left[\frac{1}{n} \sum_{i=1}^{n} X_i \right] \\ &=& \frac{1}{n} E\left[\sum_{i=1}^{n} X_i \right] \\ &=& \frac{1}{n} \sum_{i=1}^{n} E\left[X_i \right] \\ &=& \frac{1}{n} \sum_{i=1}^{n} \mu \\ &=& \frac{1}{n} n\mu \\ E\left[\bar{X}\right] &=& \mu. \tag{6.2} \end{eqnarray}\]

Example 6.3 The universal mean of the variable age in Example 4.4 is given by \[\mu = \frac{24+32+49}{3} = \frac{105}{3} = 35. \] From the Example 4.19 it can be seen that the average (expectation) of the sample means considering the AASc plan is equal to \(\mu\), i.e., \[E\left[h(\boldsymbol{X})\right] = E\left[\bar{X}\right] = \frac{24.0+28.0+36.5+28.0+32.0+40.5+36.5+40.5+49.0}{9}=\frac{315}{9}=35.\]

X <- c(24,32,49)
mean(X)
## [1] 35

From the Example 4.22 we have the vector mxc <- c(24.0,28.0,36.5,28.0,32.0,40.5,36.5,40.5,49.0).

mean(mxc)
## [1] 35

Exercise 6.1 Check in the AASs sampling plan of Example 4.20 that \(E\left[\bar{X}\right] = \mu\).\(\\\)

Sample proportion \(p\)

The sample proportion is an unbiased estimator of the universal proportion \(\pi\) (Eq. (4.1)) according to the AAS sampling plan, with or without replacement. This estimator can be defined by \[\begin{align*} p = \frac{\sum_{i=1}^n x_i}{n} \tag{6.3} \end{align*}\]

Example 6.4 (Point estimate of the proportion) Suppose you want to calculate the point estimate for the ‘proportion of PUCRS smokers’, denoted by \(\pi\). The characteristic of interest, or success, is that the interviewee is a ‘smoker’, for which \(x=1\) is associated; in this way, failure is the interviewee being a ‘non-smoker’, for which \(x=0\) is associated. In a sample of \(n = 125\) university attendees, \(\sum_{i=1}^n x_i = 25\) smokers were observed. The point estimate of \(\pi\) is given by \[ \hat{\pi} = \dfrac{25}{125} = 0.2 = 20\%. \]

Sample variance \(s^2\)

The sampling variance is an unbiased estimator of the universal variance \(\sigma^2\) according to the AAS sampling plan with replacement. \(\\\)

Example 6.5 Let the independent random variables \(X_1, X_2, \ldots, ^2)=\sigma^2+\mu^2\), \(E(\bar{X}^2)=\frac{\sigma^2}{n}+\mu^2\) and a sampling plan of the type AASc, where for simplicity the equivalence \(E_{AASc} \equiv E\) will be considered. See this discussion for details of \(E(\bar{X}^2)\).

\[\begin{eqnarray} E\left[S^2\right] &=& E\left[\frac{1}{n-1} \sum_{i=1}^{n} (X_{i}-\bar{X})^2 \right] \\ &=& \frac{1}{n-1} E\left[\sum_{i=1}^{n} X_{i}^2 - 2 \bar{X} \sum_{i=1}^{n} X_{i} + n\bar{X}^2 \right] \\ &=& \frac{1}{n-1} \left[\sum_{i=1}^{n} E\left[X_{i}^2\right] - E\left[n\bar{X}^2\right] \right] \\ &=& \frac{1}{n-1} \left[\sum_{i=1}^{n} E\left[X_{i}^2\right] - nE\left[\bar{X}^2\right] \right] \\ &=& \frac{1}{n-1} \left[n\sigma^2 + n\mu^2 - \sigma^2 - n\mu^2\right] \\ &=& \frac{(n-1)\sigma^2}{n-1} \\ E\left[S^2\right] &=& \sigma^2 \tag{6.2} \end{eqnarray}\]

Exercise 6.2 Check in the SRSwi sampling plan of Example 4.19 if \(E_{SRSwi}\left[S^2\right] = \sigma^2\). \(\\\)

Exercise 6.3 Check in the SRSwo sampling plan of Example 4.20 if \(E_{SRSwo}\left[S^2\right] = \sigma^2\). \(\\\)

Median

(David and Ghosh 1985) show that the median according to Eq. (2.17) is the most bias-resistant estimator in the class of L-statistics with non-negative coefficients that sum to one, for a class of distributions that includes the normal, double exponential and logistics.

6.1.2 Maximum likelihood estimators

A maximum likelihood estimator is one that proposes the estimation of \(\theta\) by \(\hat{\theta}\), a value that maximizes the likelihood function according to Definition 5.3. According to (Barnett 1999), the maximum likelihood method was first used by Johann Heinrich Lambert and Daniel Bernoulli in the mid-1760s, but detailed by Ronald Fisher in the early 1920s.

Example 6.6 Adapted from (Casella and Berger 2002, 317–18). Let \(X_1, \ldots, X_n\) a sequence (conditionally) iid \(\mathcal{Ber}(\theta) \equiv \mathcal{B}(1,\theta)\). The likelihood function is \[\begin{eqnarray} L(\theta|x) &=& \Pi_{i=1}^n {1 \choose x_i} \theta^{x_i} (1-\theta)^{1-x_i} \nonumber \\ &=& \theta^{s} (1-\theta)^{n - s}, \end{eqnarray}\] where \(s=\sum_{i=1}^{n} x_i\). If we take the logarithm in the natural base of \(L(\theta|x)\), we have from the properties of logarithms that \[\begin{eqnarray} l(\theta|x) &=& \log(\theta^{s} (1-\theta)^{n - s}) \nonumber \\ &=& s \log(\theta) + (n-s)\log(1-\theta). \end{eqnarray}\] Using principles of Calculus it is possible to derive \(l(\theta|x)\) in relation to \(\theta\) and equate it to zero, from which we obtain the maximum likelihood estimation \[\begin{eqnarray} \frac{s}{\hat{\theta}} - \frac{n-s}{1-\hat{\theta}} &=& 0 \;\; \therefore \;\; \hat{\theta} &=& \frac{s}{n} \end{eqnarray}\]

Exercise 6.4 Consider the information in Example 6.6.
a. Show, from the definition, that \(L(\theta | x) = \theta^{s} (1-\theta)^{n-s}\), \(s=\sum_{i=1}^{n} x_i\).
b. Show, applying the principles of Calculus, that \(\hat{\theta} = \frac{1}{n} \sum_{i=1}^{n} x_i\).

\(\\\)

References

Barnett, Vic. 1999. Comparative Statistical Inference. John Wiley & Sons. https://onlinelibrary.wiley.com/doi/book/10.1002/9780470316955.
Berger, James O. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer Science & Business Media. https://www.springer.com/gp/book/9780387960982.
Casella, George, and Roger L Berger. 2002. Statistical Inference. Duxbury - Thompson Learning.
David, HA, and JK Ghosh. 1985. “The Effect of an Outlier on l-Estimators of Location in Symmetric Distributions.” Biometrika 72 (1): 216–18. https://www.jstor.org/stable/2336355.