5.5 Prior distribution

All deductions are made a priori. (Wittgenstein 1921, 5.133)

5.5.1 Jeffreys’ prior

A law of chance is invariant for all transformations of the parameters when the law is differentiable with regard to all parameters. (Jeffreys 1946, 453)

(Jeffreys 1946) suggests an invariant form for the prior probability in estimation problems. It is defined by

\[\begin{equation} \pi_J \propto \sqrt{\det I(\theta)} \tag{5.3} \end{equation}\]

\(I(\theta)\) is the Fisher information (Ly et al. 2017), given by

\[\begin{equation} I(\theta) = -E \left[ \frac{\partial^2 \log f(x|\theta)}{\partial \theta^2} \right] \tag{5.4} \end{equation}\]

The Fisher Information is the curvature of the log-likelihood, and \(I(\theta) \ge 0\).

Example 5.7 Let \(X|\theta \sim B(n,\theta)\) with fmp \[p(x|\theta) = {n \choose x }\theta^{x}(1-\theta)^{n-x}, \;\; 0 \le \theta \le 1\] according to Eq. (3.49). Like this \[\begin{align*} \log p(x|\theta) &= x \log \theta + (n-x) \log (1-\theta) \\ \frac{\partial \log p(x|\theta)}{\partial \theta} &= \frac{x}{\theta} - \frac{n-x}{1-\theta} \\ \frac{\partial^2 \log p(x|\theta)}{\partial \theta^2} &= -\frac{x}{\theta^2} - \frac{n-x}{(1-\ theta)^2} \\ \end{align*}\]

Since \(E(X|\theta) = n\theta\), \[\begin{align*} I(\theta) &= -E \left[ \frac{\partial^2 \log p(x|\theta)}{\partial \theta^2} \right] \\ &= -E \left[ -\frac{x}{\theta^2} - \frac{n-x}{(1-\theta)^2} \right] \\ &= \frac{n\theta}{\theta^2} + \frac{n-n\theta}{(1-\theta)^2} \\ &= \frac{n}{\theta} + \frac{n(1-\theta)}{(1-\theta)^2} \\ &= \frac{n}{\theta} + \frac{n}{(1-\theta)} \\ I(\theta) &= \frac{n}{\theta(1-\theta)} \end{align*}\]

This way \[\pi_J = \sqrt{\frac{n}{\theta(1-\theta)}} \propto \theta^{-\frac{1}{2}}(1-\theta)^{ -\frac{1}{2}} \propto \theta^{\frac{1}{2}-1}(1-\theta)^{\frac{1}{2}-1},\] which is the core of a \(Beta\left(\frac{1}{2},\frac{1}{2}\right)\).

5.5.2 Reference prior

A procedure is proposed to derive reference posterior distributions which approximately describe the inferential content of the data without incorporating any other information. (Bernardo 1979)

Definition 5.4 Seja \(x\) o resultado de um experimento \(\varepsilon=\{X,\Theta,p(x|\theta)\}\) e seja \(C\) a classe de prioris admissíveis. A posteriori de referência de \(\theta\) após se observar \(x\) é definida por \(\pi(\theta|x)=\lim \pi_k(\theta|x)\), onde \(\pi_k(\theta|x) \propto p(x|\theta)\pi_k(\theta)\) é a densidade a posteriori correspondendo àquela priori \(\pi_k(\theta)\) que maximiza \(I^{\theta}\{\varepsilon(k),p(\theta)\}\) em \(C\). Uma priori de referência para \(\theta\) é uma função positiva \(\pi(\theta)\) que satisfaz \(\pi(\theta|x) \propto p(x|\theta)\pi(\theta)\).

5.5.3 Subjective prior

The diversity of man’s beliefs is as wide as the uncounted millions that have been or are now cluttered upon earth. (Lea 1909, 3)

Cromwell’s rule

Almost all thinking people agree that you should not have probability 1 (or 0) for any event, other than one demonstrable by logic, like \(2 \times 2 = 4\). The rule that denies probabilities of 1 or 0 is called Cromwell’s rule, named after Oliver Cromwell who said to the Church of Scotland, “think it possible you may be mistaken”. (Lindley 2006, 91)


Bernardo, Jose M. 1979. “Reference Posterior Distributions for Bayesian Inference.” Journal of the Royal Statistical Society Series B: Statistical Methodology 41 (2): 113–28. https://people.eecs.berkeley.edu/~jordan/sail/readings/bernardo-1979.pdf.
Jeffreys, Harold. 1946. “An Invariant Form for the Prior Probability in Estimation Problems.” Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences 186 (1007): 453–61. https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.1946.0056.
Lea, Homer. 1909. The Valor of Ignorance. Harper & brothers. https://archive.org/details/valorofignorance00leahuoft.
Lindley, Dennis V. 2006. “Understanding Uncertainty.” New Jersey. http://www.al-edu.com/wp-content/uploads/2014/05/Lindley-D.V.-Understanding-uncertainty-2006.pdf.
Ly, Alexander, Maarten Marsman, Josine Verhagen, Raoul Grasman, and Eric-Jan Wagenmakers. 2017. “A Tutorial on Fisher Information.” https://doi.org/10.48550/arXiv.1705.01064.
Morris, David E, Jeremy E Oakley, and John A Crowe. 2014. “A Web-Based Tool for Eliciting Probability Distributions from Experts.” Environmental Modelling & Software 52: 1–4. http://dx.doi.org/10.1016/j.envsoft.2013.10.010.
Oakley, Jeremy. 2021. SHELF: Tools to Support the Sheffield Elicitation Framework. https://CRAN.R-project.org/package=SHELF.
Paulino, Carlos Daniel Mimoso, Maria Antónia Amaral Turkman, and Bento Murteira. 2003. Estatı́stica Bayesiana. Fundação Calouste Gulbenkian, Lisboa. http://primo-pmtna01.hosted.exlibrisgroup.com/PUC01:PUC01:puc01000334509.
Press, S James. 2003. Subjective and Objective Bayesian Statistics: Principles, Models, and Applications, 2nd. Edition. John Wiley & Sons. http://primo-pmtna01.hosted.exlibrisgroup.com/PUC01:PUC01:oclc(OCoLC)587388980.
Wittgenstein, Ludwig. 1921. Tractatus Logico-Philosophicus. http://public-library.uk/pdfs/9/292.pdf.