5.4 Likelihood principle

Informally, the likelihood principle admits that if two decision-makers have the same degree of knowledge and the same information about \(\theta\), both must decide exactly the same way about \(\theta\). (Berger 1985) defines it as follows:

The Likelihood Principle. In making inferences or decisions about \(\theta\) after x is observed, all relevant experimental information is contained in the likelihood function for the observed x. Furthermore, two likelihood functions contain the same information about \(\theta\) if they are proportional to each other (as functions of \(\theta\)). (Berger 1985, 28)

(Birnbaum 1962) put it this way:

The likelihood principle states that the “evidential meaning” of experimental results is characterized fully by the likelihood function, without other reference to the structure of an experiment, in contrast with standard methods in which significance and confidence levels are based on the complete experimental model.

Example 5.5 (Likelihood Principle 1, adapted from (Paulino, Turkman, and Murteira 2003, 34)) Consider a succession of coin tosses, independent and conditioned by \(\theta\), the probability of coming up ‘heads’. Suppose the result \[x = \lbrace H,H,H,H,H,T,H,H,H,H,T,T \rbrace,\] where \(H\): ‘heads’ and \(T\): ‘tails’. This result could be obtained by several experimental processes or stopping rules, such as
- perform 12 launches, fixed a priori - flip the coin until 3 ‘tails’ appear
- flip the coin until 2 consecutive ‘tails’ appear
- throw the coin until the player is saturated, with saturation occurring on the 12th toss

In any case, the likelihood (function) is proportional to \(\theta^9 \left( 1 - \theta \right)^3\), i.e., the sample reports 9 successes (heads) and 3 failures (tails). Thus, adopting the likelihood principle, all the information that \(x\) can provide about \(\theta\) is found in this expression. Knowing which of the four experimental processes was used (each with a different sample space) or knowing which stopping rule was adopted has nothing to add. Note that the possibility of the experimenter stopping at his own discretion when considering the result \(x\) satisfactory, in no way changes the opinion about \(\theta\). \(\\\)

Example 5.6 (Likelihood Principle 2, adapted from (Lindley and Phillips 1976, 113–14), (Berger 1985, 28) and (Paulino, Turkman, and Murteira 2003, 34–35)) Suppose a coin with probability \(\theta\) on its face face. You want to test the hypothesis \(H_0 : \theta = 1/2\) against \(H_1 : \theta > 1/2\). Assume that an experiment is carried out, in which a series of tosses are made, resulting in \(x=9\) ‘heads’ and \(n-x=k=3\) ‘tails’. Two experimental processes can be considered:

\(E_1\): toss the coin \(n=12\) times
\(E_2\): flip the coin until \(k=3\) ‘tails’ appear

This is a particular realization of the random variable (1) binomial \(\mathcal{B}(12,\theta)\) (\(X\): number of heads in 12 tosses) or (2) negative binomial \(\mathcal{BN}(3,1-\theta)\) (\(X\): number of heads (failures) until the third tails, according to Eq. (3.52)). Note the convenience of the parameterization, with emphasis on the \(1-\theta\) parameter of the negative binomial.

From a classical perspective, the critical level (or \(p\)-value, the probability of obtaining \(X \ge 9\)) of the hypothesis \(H_0 : \theta = 1/2\) differs in the two cases.

In case \(E_1\), \(X\) has a binomial distribution – \(X \sim \mathcal{B} \left(12, \theta \right)\) – whose critical level is

\[Pr\left( X \ge 9 \bigg\rvert \theta = \frac{1}{2} \right) = \left[ \binom{12}{9} + \binom {12}{10} + \binom {12}{11} + \binom {12}{12} \right] \left[ \frac{1}{2} \right]^{12} \approx 0.0730.\]

1-pbinom(8, 12, 1/2)

## [1] 0.07299805

In the case \(E_2\), \(X\) has a negative binomial distribution – \(X \sim \mathcal{BN} \left( 3, 1-\theta \right)\) –, with a critical level

\[Pr\left( X \ge 9 \bigg\rvert \theta = \frac{1}{2} \right) = \binom{11}{9} \left( \frac{1}{2} \right)^{9} \left( \frac{1}{2} \right)^{3} + \binom{12}{10} \left( \frac{1}{2} \right)^{10} \left( \frac{1}{2} \right)^{3} + \cdots \approx 0.0327\]

1-pnbinom(8, 3, 1/2)

## [1] 0.03271484

Therefore, adopting a significance threshold of 5%, \(H_0\) is rejected in case \(E_2\) and not rejected in \(E_1\). Assuming the principle of likelihood, the conclusions should be identical in both cases. In both cases, the likelihood (function) is proportional to \(\theta^9 \left(1 - \theta \right)^3\). In fact, the likelihoods at \(E_1\) and \(E_2\) are

\[L_1 \left( \theta \right| x = 9 ) = \binom {12}{9} \theta^{9} \left( 1-\theta \right) ^{3} = 220 \; \theta^{9} \left( 1-\theta \right)^{3} \propto \theta^{9} \left( 1-\theta \right)^{3}\]

\[L_2 \left( \theta \right| x = 9 ) = \binom {11}{9} \theta^{9} \left( 1-\theta \right) ^{3} = 55 \; \theta^{9} \left( 1-\theta \right)^{3} \propto \theta^{9} \left( 1-\theta \right)^{3}\]

Exercise 5.5

Comment on the tweet https://twitter.com/agpatriota/status/1487877332627070983.
Regarding the statement that “the stopping rule provides information about the variability of the estimators”, indicate how and when this occurs.
Considering the tweet https://twitter.com/agpatriota/status/1487888981329170442, comment on how to decide in real situations that do not involve rare events. Usually in the literature, a case is considered rare in which the proportion of estimated occurrence in the population is less than 1%.

\(\\\)

5.4.1 See also

Section 1.6 from (Paulino, Turkman, and Murteira 2003)
Sections 3.3 e 3.4 from (S. J. Press 2003) (Likelihood principle)
The fundamentals are discussed by (Birnbaum 1962), (Savage et al. 1962) and (Wechsler, Pereira, and Marques 2008)

References

Berger, James O. 1985. Statistical Decision Theory and Bayesian Analysis. 2nd ed. Springer Science & Business Media. https://www.springer.com/gp/book/9780387960982.

Birnbaum, Allan. 1962. “On the Foundations of Statistical Inference.” Journal of the American Statistical Association 57 (298): 269–306. https://www.jstor.org/stable/2281640.

Lindley, Dennis V., and Lawrence D. Phillips. 1976. “Inference for a Bernoulli Process (a Bayesian View).” The American Statistician 30 (3): 112–19. https://www.jstor.org/stable/2683855.

Paulino, Carlos Daniel Mimoso, Maria Antónia Amaral Turkman, and Bento Murteira. 2003. Estatı́stica Bayesiana. Fundação Calouste Gulbenkian, Lisboa. http://primo-pmtna01.hosted.exlibrisgroup.com/PUC01:PUC01:puc01000334509.

Press, S James. 2003. Subjective and Objective Bayesian Statistics: Principles, Models, and Applications, 2nd. Edition. John Wiley & Sons. http://primo-pmtna01.hosted.exlibrisgroup.com/PUC01:PUC01:oclc(OCoLC)587388980.

Savage, Leonard J, George Barnard, Jerome Cornfield, Irwin Bross, IJ Good, DV Lindley, CW Clunies-Ross, et al. 1962. “On the Foundations of Statistical Inference: Discussion.” Journal of the American Statistical Association 57 (298): 307–26. https://www.jstor.org/stable/2281641.

Wechsler, Sérgio, Carlos Alberto de B. Pereira, and Paulo C. F. Marques. 2008. “Birnbaum’s Theorem Redux.” https://www.ime.usp.br/~pmarques/papers/redux.pdf.