Mathematical Derivation of The Mean-field Approximation
In variational inference, the posterior distribution over a set of unobserved variables given some data is approximated by a variational distribution, :
The distribution is restricted to belong to a family of distributions of simpler form than, selected with the intention of making similar to the true posterior, . The lack of similarity is measured in terms of a dissimilarity function and hence inference is performed by selecting the distribution that minimizes .
The most common type of variational Bayes, known as mean-field variational Bayes, uses the Kullback–Leibler divergence (KL-divergence) of P from Q as the choice of dissimilarity function. This choice makes this minimization tractable. The KL-divergence is defined as
Note that Q and P are reversed from what one might expect. This use of reversed KL-divergence is conceptually similar to the expectation-maximization algorithm. (Using the KL-divergence in the other way produces the expectation propagation algorithm.)
The KL-divergence can be written as
or
As the log evidence is fixed with respect to, maximising the final term minimizes the KL divergence of from . By appropriate choice of, becomes tractable to compute and to maximize. Hence we have both an analytical approximation for the posterior, and a lower bound for the evidence . The lower bound is known as the (negative) variational free energy because it can also be expressed as an "energy" plus the entropy of .
Read more about this topic: Variational Bayesian Methods
Famous quotes containing the word mathematical:
“As we speak of poetical beauty, so ought we to speak of mathematical beauty and medical beauty. But we do not do so; and that reason is that we know well what is the object of mathematics, and that it consists in proofs, and what is the object of medicine, and that it consists in healing. But we do not know in what grace consists, which is the object of poetry.”
—Blaise Pascal (16231662)