5.5 Prior distribution
All deductions are made a priori. (Wittgenstein 1921, 5.133)
- Fundamentals covered in Chapter 2 of (Paulino, Turkman, and Murteira 2003) and Chapter 5 of (S. J. Press 2003)
- (Morris, Oakley, and Crowe 2014) present MATCH, a web tool for obtaining expert probability distributions, available at http://optics.eee.nottingham.ac.uk/match/uncertainty.php.
- (Oakley 2021) presents SHELF, a tool that implements multiple methods of eliciting univariate probability distributions for one or more experts. R Shiny applications are included for most methods.
5.5.1 Jeffreys’ prior
A law of chance is invariant for all transformations of the parameters when the law is differentiable with regard to all parameters. (Jeffreys 1946, 453)
(Jeffreys 1946) suggests an invariant form for the prior probability in estimation problems. It is defined by
\[\begin{equation} \pi_J \propto \sqrt{\det I(\theta)} \tag{5.3} \end{equation}\]
\(I(\theta)\) is the Fisher information (Ly et al. 2017), given by
\[\begin{equation} I(\theta) = -E \left[ \frac{\partial^2 \log f(x|\theta)}{\partial \theta^2} \right] \tag{5.4} \end{equation}\]
The Fisher Information is the curvature of the log-likelihood, and \(I(\theta) \ge 0\).
Example 5.7 Let \(X|\theta \sim B(n,\theta)\) with fmp \[p(x|\theta) = {n \choose x }\theta^{x}(1-\theta)^{n-x}, \;\; 0 \le \theta \le 1\] according to Eq. (3.49). Like this \[\begin{align*} \log p(x|\theta) &= x \log \theta + (n-x) \log (1-\theta) \\ \frac{\partial \log p(x|\theta)}{\partial \theta} &= \frac{x}{\theta} - \frac{n-x}{1-\theta} \\ \frac{\partial^2 \log p(x|\theta)}{\partial \theta^2} &= -\frac{x}{\theta^2} - \frac{n-x}{(1-\ theta)^2} \\ \end{align*}\]
Since \(E(X|\theta) = n\theta\), \[\begin{align*} I(\theta) &= -E \left[ \frac{\partial^2 \log p(x|\theta)}{\partial \theta^2} \right] \\ &= -E \left[ -\frac{x}{\theta^2} - \frac{n-x}{(1-\theta)^2} \right] \\ &= \frac{n\theta}{\theta^2} + \frac{n-n\theta}{(1-\theta)^2} \\ &= \frac{n}{\theta} + \frac{n(1-\theta)}{(1-\theta)^2} \\ &= \frac{n}{\theta} + \frac{n}{(1-\theta)} \\ I(\theta) &= \frac{n}{\theta(1-\theta)} \end{align*}\]
This way \[\pi_J = \sqrt{\frac{n}{\theta(1-\theta)}} \propto \theta^{-\frac{1}{2}}(1-\theta)^{ -\frac{1}{2}} \propto \theta^{\frac{1}{2}-1}(1-\theta)^{\frac{1}{2}-1},\] which is the core of a \(Beta\left(\frac{1}{2},\frac{1}{2}\right)\).
5.5.2 Reference prior
A procedure is proposed to derive reference posterior distributions which approximately describe the inferential content of the data without incorporating any other information. (Bernardo 1979)
Definition 5.4 Seja \(x\) o resultado de um experimento \(\varepsilon=\{X,\Theta,p(x|\theta)\}\) e seja \(C\) a classe de prioris admissíveis. A posteriori de referência de \(\theta\) após se observar \(x\) é definida por \(\pi(\theta|x)=\lim \pi_k(\theta|x)\), onde \(\pi_k(\theta|x) \propto p(x|\theta)\pi_k(\theta)\) é a densidade a posteriori correspondendo àquela priori \(\pi_k(\theta)\) que maximiza \(I^{\theta}\{\varepsilon(k),p(\theta)\}\) em \(C\). Uma priori de referência para \(\theta\) é uma função positiva \(\pi(\theta)\) que satisfaz \(\pi(\theta|x) \propto p(x|\theta)\pi(\theta)\).
5.5.3 Subjective prior
The diversity of man’s beliefs is as wide as the uncounted millions that have been or are now cluttered upon earth. (Lea 1909, 3)
Cromwell’s rule
Almost all thinking people agree that you should not have probability 1 (or 0) for any event, other than one demonstrable by logic, like \(2 \times 2 = 4\). The rule that denies probabilities of 1 or 0 is called Cromwell’s rule, named after Oliver Cromwell who said to the Church of Scotland, “think it possible you may be mistaken”. (Lindley 2006, 91)