Implementation
Gibbs sampling, in its basic incarnation, is a special case of the Metropolis–Hastings algorithm. The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution. Suppose we want to obtain samples of from a joint distribution . Denote the th sample by . We proceed as follows:
- We begin with some initial value for each variable.
- For each sample, sample each variable from the conditional distribution . That is, sample each variable from the distribution of that variable conditioned on all other variables, making use of the most recent values and updating the variable with its new value as soon as it has been sampled.
The samples then approximate the joint distribution of all variables. Furthermore, the marginal distribution of any subset of variables can be approximated by simply examining the samples for that subset of variables, ignoring the rest. In addition, the expected value of any variable can be approximated by averaging over all the samples.
- The initial values of the variables can be determined randomly or by some other algorithm such as expectation-maximization.
- It is not actually necessary to determine an initial value for the first variable sampled.
- It is common to ignore some number of samples at the beginning (the so-called burn-in period), and then consider only every th sample when averaging values to compute an expectation. For example, the first 1,000 samples might be ignored, and then every 100th sample averaged, throwing away all the rest. The reason for this is that (1) successive samples are not independent of each other but form a Markov chain with some amount of correlation; (2) the stationary distribution of the Markov chain is the desired joint distribution over the variables, but it may take a while for that stationary distribution to be reached. Sometimes, algorithms can be used to determine the amount of autocorrelation between samples and the value of (the period between samples that are actually used) computed from this, but in practice there is a fair amount of "black magic" involved.
- The process of simulated annealing is often used to reduce the "random walk" behavior in the early part of the sampling process (i.e. the tendency to move slowly around the sample space, with a high amount of autocorrelation between samples, rather than moving around quickly, as is desired). Other techniques that may reduce autocorrelation are collapsed Gibbs sampling, blocked Gibbs sampling, and ordered overrelaxation; see below.
Read more about this topic: Gibbs Sampling