Mutual Information - Relation To Other Quantities

Relation To Other Quantities

Mutual information can be equivalently expressed as


\begin{align}
I(X;Y) & {} = H(X) - H(X|Y) \\
& {} = H(Y) - H(Y|X) \\
& {} = H(X) + H(Y) - H(X,Y) \\
& {} = H(X,Y) - H(X|Y) - H(Y|X)
\end{align}

where H(X) and H(Y) are the marginal entropies, H(X|Y) and H(Y|X) are the conditional entropies, and H(X,Y) is the joint entropy of X and Y. Using the Jensen's inequality on the definition of mutual information, we can show that I(X;Y) is non-negative; so consequently, H(X) ≥ H(X|Y).

Intuitively, if entropy H(X) is regarded as a measure of uncertainty about a random variable, then H(X|Y) is a measure of what Y does not say about X. This is "the amount of uncertainty remaining about X after Y is known", and thus the right side of the first of these equalities can be read as "the amount of uncertainty in X, minus the amount of uncertainty in X which remains after Y is known", which is equivalent to "the amount of uncertainty in X which is removed by knowing Y". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.

Note that in the discrete case H(X|X) = 0 and therefore H(X) = I(X;X). Thus I(X;X) ≥ I(X;Y), and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide.

Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution:

Furthermore, let p(x|y) = p(x, y) / p(y). Then


\begin{align}
I(X;Y) & {} = \sum_y p(y) \sum_x p(x|y) \log_2 \frac{p(x|y)}{p(x)} \\
& {} = \sum_y p(y) \; D_{\mathrm{KL}}(p(x|y)\|p(x)) \\
& {} = \mathbb{E}_Y\{D_{\mathrm{KL}}(p(x|y)\|p(x))\}.
\end{align}

Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain.

Read more about this topic:  Mutual Information

Famous quotes containing the words relation to, relation and/or quantities:

    The psychoanalysis of individual human beings, however, teaches us with quite special insistence that the god of each of them is formed in the likeness of his father, that his personal relation to God depends on his relation to his father in the flesh and oscillates and changes along with that relation, and that at bottom God is nothing other than an exalted father.
    Sigmund Freud (1856–1939)

    Much poetry seems to be aware of its situation in time and of its relation to the metronome, the clock, and the calendar. ... The season or month is there to be felt; the day is there to be seized. Poems beginning “When” are much more numerous than those beginning “Where” of “If.” As the meter is running, the recurrent message tapped out by the passing of measured time is mortality.
    William Harmon (b. 1938)

    James Brown and Frank Sinatra are two different quantities in the universe. They represent two different experiences of the world.
    Imamu Amiri Baraka [Everett Leroi Jones] (b. 1934)