5.3 Inferência variacional

The goal of variational inference is to approximate a conditional density of latent variables given observed variables. (Blei, Kucukelbir, and McAuliffe 2017, 4)

Exercício 5.10 Assista aos seguintes vídeos do canal de Chieh Wu. Total 1h39m48s.

5.3.1 Entropia

Informação (surpresa) \[\begin{equation} I = \log \left( \frac{1}{p(x)} \right) = -\log p(x) \tag{5.2} \end{equation}\]

(Shannon 1948, 11) define entropia no caso discreto por \[\begin{equation} H = - \sum_{i=1}^{|R_x|} p(X_i) \log p(X_i) \tag{5.3} \end{equation}\]

\(H\) pode ser interpretada como a informação média de uma V.A. discreta.

5.3.2 Entropia diferencial

Shannon assumed, without calculating, that the analog of \(\sum p_i \log p_i\) was \(\int w \log w dx\), and got into trouble for lack of invariance. (Jaynes 1963, 202)

A entropia diferencial é a informação média de uma V.A. contínua, e pode ser descrita simplificadamente por \[\begin{equation} H = - \int_{i=1}^{|R_x|} p(X_i) \log p(X_i) \tag{5.4} \end{equation}\]

Para a medida invariante de entropia para o caso contínuo, veja a Eq. (63) de (Jaynes 1963).

5.3.3 Divergência de Kullback–Leibler

The idea behind VI is to first posit a family of densities and then to find the member of that family which is close to the target. Closeness is measured by Kullback-Leibler divergence. (Blei, Kucukelbir, and McAuliffe 2017, 1)

A divergência de Kullback-Leibler é uma medida de entropia relativa proposta por (Kullback and Leibler 1951). Quando baseada em \(P\), é definida por \[\begin{equation} D_{KL}(P||Q) = \sum_{x \in X} P(x) \log \frac{P(x)}{Q(x)} \tag{5.5} \end{equation}\]

Quando baseada em \(Q\), define-se por \[\begin{equation} D_{KL}(Q||P) = \sum_{x \in X} Q(x) \log \frac{Q(x)}{P(x)} \tag{5.6} \end{equation}\]

Segundo (Kullback 1978, 6–7), \(D_{KL}(P||Q)\) e \(D_{KL}(Q||P)\) possuem todas as propriedades de uma distância (ou métrica), exceto pela desigualdade triangular, sendo portanto consideradas divergências direcionadas tais que \[D_{KL}(P||Q) \ne D_{KL}(Q||P)\]

5.3.3.1 Biblioteca `philentropy`

A biblioteca philentropy (Drost 2018) implementa medidas otimizadas de distância e similaridade para comparar funções de probabilidade. Para ilustrar o funcionamento da função philentropy::kullback_leibler_distance utilizou-se o exemplo básico do artigo da Wikipedia.

P <- c(9/25,12/25,4/25)
Q <- c(1/3,1/3,1/3)
par(mfrow=c(1,2))
barplot(P, main = 'P, B(2,0.4)', names.arg = 0:2)
barplot(Q, main = 'Q, U{0,2}', names.arg = 0:2)

\(x\)	0	1	2
Distribuição \(P(x)\)	\(\frac{9}{25}\)	\(\frac{12}{25}\)	\(\frac{4}{25}\)
Distribuição \(Q(x)\)	\(\frac{1}{3}\)	\(\frac{1}{3}\)	\(\frac{1}{3}\)

# libs
library(philentropy)

# D(P||Q)
kullback_leibler_distance(
  P = c(9/25,12/25,4/25),
  Q = c(1/3,1/3,1/3),
  unit = 'log', 
  testNA = FALSE,
  epsilon = 0.00001)

## [1] 0.0852996

# D(Q||P)
kullback_leibler_distance(
  P = c(1/3,1/3,1/3),
  Q = c(9/25,12/25,4/25),
  unit = 'log', 
  testNA = FALSE,
  epsilon = 0.00001)

## [1] 0.09745501

5.3.4 ELBO (Evidence Lower BOund)

\[\text{Evidência} = \text{ELBO} + \text{KL}\]

Exercício 5.11 Assista ao vídeo Variational Inference | Evidence Lower Bound (ELBO) | Intuition & Visualization do canal Machine Learning & Simulation

Exercício 5.12 https://englishprobabilistic-machine-learningelbo-interactive--or5u7m.streamlit.app/ https://gregorygundersen.com/blog/2021/04/16/variational-inference/

Exercício 5.13 Escreva o que você entendeu sobre:

Teoria da Informação.
Informação.
Entropia.
Entropia diferencial.
Divergência de Kullback–Leibler.
ELBO (Evidence Lower BOund).
Inferência Variacional.

5.3.5 Para saber mais

Carlos Gomes faz um resumo dos tópicos discutidos pelo Journal Club - VAE, grupo destinado a discutir a arquitetura do Variational Auto-Encoder.

Anna-Lena Popkes faz algumas considerações sobre o tópico de IV.

Referências

Blei, David M, Alp Kucukelbir, and Jon D McAuliffe. 2017. “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association 112 (518): 859–77. https://doi.org/10.1080/01621459.2017.1285773.

Drost, HG. 2018. “Philentropy: Information Theory and Distance Quantification with R.” Journal of Open Source Software 3 (26): 765. https://joss.theoj.org/papers/10.21105/joss.00765.

Jaynes, Edwin T. 1963. “Information Theory and Statistical Mechanics (Notes by the Lecturer).” Statistical Physics 3, 181.

Kullback, Solomon. 1978. Information Theory and Statistics. Dover Publications, Inc. https://store.doverpublications.com/0486696847.html.

Kullback, Solomon, and Richard A Leibler. 1951. “On Information and Sufficiency.” The Annals of Mathematical Statistics 22 (1): 79–86. https://www.jstor.org/stable/pdf/2236703.pdf?casa_token=Uqm6dzNWzOQAAAAA:TMwXT3Ql9DzNhg6ey9XaMyrEhmABcetYSLzf7c3IW_un9Lrmc1Z__z5_dXi71Gy7cHbI-AOceYSbXna_F9jz82sqnXROTiah6ooJibU8DzfIHHnnqzFz.

Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” The Bell System Technical Journal 27 (3): 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.