Mathematical Derivation of The Mean-field Approximation
In variational inference, the posterior distribution over a set of unobserved variables given some data is approximated by a variational distribution, :
The distribution is restricted to belong to a family of distributions of simpler form than, selected with the intention of making similar to the true posterior, . The lack of similarity is measured in terms of a dissimilarity function and hence inference is performed by selecting the distribution that minimizes .
The most common type of variational Bayes, known as mean-field variational Bayes, uses the Kullback–Leibler divergence (KL-divergence) of P from Q as the choice of dissimilarity function. This choice makes this minimization tractable. The KL-divergence is defined as
Note that Q and P are reversed from what one might expect. This use of reversed KL-divergence is conceptually similar to the expectation-maximization algorithm. (Using the KL-divergence in the other way produces the expectation propagation algorithm.)
The KL-divergence can be written as
or
As the log evidence is fixed with respect to, maximising the final term minimizes the KL divergence of from . By appropriate choice of, becomes tractable to compute and to maximize. Hence we have both an analytical approximation for the posterior, and a lower bound for the evidence . The lower bound is known as the (negative) variational free energy because it can also be expressed as an "energy" plus the entropy of .
Read more about this topic: Variational Bayesian Methods
Famous quotes containing the word mathematical:
“All science requires mathematics. The knowledge of mathematical things is almost innate in us.... This is the easiest of sciences, a fact which is obvious in that no ones brain rejects it; for laymen and people who are utterly illiterate know how to count and reckon.”
—Roger Bacon (c. 1214c. 1294)