Gibbs Sampling - Inference

Inference

Gibbs sampling is commonly used for statistical inference (e.g. determining the best value of a parameter, such as determining the number of people likely to shop at a particular store on a given day, the candidate a voter will most likely vote for, etc.). The idea is that observed data is incorporated into the sampling process by creating separate variables for each piece of observed data and fixing the variables in question to their observed values, rather than sampling from those variables. The distribution of the remaining variables is then effectively a posterior distribution conditioned on the observed data.

The most likely value of a desired parameter (the mode) could then simply be selected by choosing the sample value that occurs most commonly; this is essentially equivalent to maximum a posteriori estimation of a parameter. (Since the parameters are usually continuous, it is often necessary to "bin" the sampled values into one of a finite number of ranges or "bins" in order to get a meaningful estimate of the mode.) More commonly, however, the expected value (mean or average) of the sampled values is chosen; this is a Bayes estimator that takes advantage of the additional data about the entire distribution that is available from Bayesian sampling, whereas a maximization algorithm such as expectation maximization (EM) is capable of only returning a single point from the distribution. For example, for a unimodal distribution the mean (expected value) is usually similar to the mode (most common value), but if the distribution is skewed in one direction, the mean will be moved in that direction, which effectively accounts for the extra probability mass in that direction. (Note, however, that if a distribution is multimodal, the expected value may not return a meaningful point, and any of the modes is typically a better choice.)

Although some of the variables typically correspond to parameters of interest, others are uninteresting ("nuisance") variables introduced into the model to properly express the relationships among variables. Although the sampled values represent the joint distribution over all variables, the nuisance variables can simply be ignored when computing expected values or modes; this is equivalent to marginalizing over the nuisance variables. When a value for multiple variables is desired, the expected value is simply computed over each variable separately. (When computing the mode, however, all variables must be considered together.)

Supervised learning, unsupervised learning and semi-supervised learning (aka learning with missing values) can all be handled by simply fixing the values of all variables whose values are known, and sampling from the remainder.

For observed data, there will be one variable for each observation — rather than, for example, one variable corresponding to the sample mean or sample variance of a set of observations. In fact, there generally will be no variables at all corresponding to concepts such as "sample mean" or "sample variance". Instead, in such a case there will be variables representing the unknown true mean and true variance, and the determination of sample values for these variables results automatically from the operation of the Gibbs sampler.

Generalized linear models (i.e. variations of linear regression) can sometimes be handled by Gibbs sampling as well. For example, probit regression for determining the probability of a given binary (yes/no) choice, with normally distributed priors placed over the regression coefficients, can be implemented with Gibbs sampling because it is possible to add additional variables and take advantage of conjugacy. However, logistic regression cannot be handled this way. One possibility is to approximate the logistic function with a mixture (typically 7-9) of normal distributions. More commonly, however, Metropolis-Hastings is used instead of Gibbs sampling.

Read more about this topic:  Gibbs Sampling

Famous quotes containing the word inference:

    I have heard that whoever loves is in no condition old. I have heard that whenever the name of man is spoken, the doctrine of immortality is announced; it cleaves to his constitution. The mode of it baffles our wit, and no whisper comes to us from the other side. But the inference from the working of intellect, hiving knowledge, hiving skill,—at the end of life just ready to be born,—affirms the inspirations of affection and of the moral sentiment.
    Ralph Waldo Emerson (1803–1882)

    I shouldn’t want you to be surprised, or to draw any particular inference from my making speeches, or not making speeches, out there. I don’t recall any candidate for President that ever injured himself very much by not talking.
    Calvin Coolidge (1872–1933)

    Rules and particular inferences alike are justified by being brought into agreement with each other. A rule is amended if it yields an inference we are unwilling to accept; an inference is rejected if it violates a rule we are unwilling to amend. The process of justification is the delicate one of making mutual adjustments between rules and accepted inferences; and in the agreement achieved lies the only justification needed for either.
    Nelson Goodman (b. 1906)