Beta Distribution - Parameter Estimation - Fisher Information Matrix

Fisher Information Matrix

Let a random variable X have a probability density f(x;α). The partial derivative with respect to the (unknown, and to be estimated) parameter α of the log likelihood function is called the score. The second moment of the score is called the Fisher information:

$\mathcal{I}(\alpha)=\operatorname{E} ,$

The expectation of the score is zero, therefore the Fisher information is also the second moment centered around the mean of the score: the variance of the score.

If the log likelihood function is twice differentiable with respect to the parameter α, and under certain regularity conditions, then the Fisher information may also be written as follows (which is often a more convenient form for calculation purposes):

$\mathcal{I}(\alpha) = - \operatorname{E} \,.$

Thus, the Fisher information is the negative of the expectation of the second derivative with respect to the parameter α of the log likelihood function. Therefore Fisher information is a measure of the curvature of the log likelihood function of α. A low curvature (and therefore high radius of curvature), flatter log likelihood function curve has low Fisher information; while a log likelihood function curve with large curvature (and therefore low radius of curvature) has high Fisher information. When the Fisher information matrix is computed at the evaluates of the parameters ("the observed Fisher information matrix") it is equivalent to the replacement of the true log likelihood surface by a Taylor's series approximation, taken as far as the quadratic terms. The word information, in the context of Fisher information, refers to information about the parameters. Information such as: estimation, sufficiency and properties of variances of estimators. The Cramér–Rao bound states that the inverse of the Fisher information is a lower bound on the variance of any estimator of a parameter α:

$\operatorname{var}\left \, \geq \, \frac{1}{\mathcal{I}\left(\alpha\right)}.$

The precision to which one can estimate the estimator of a parameter α is limited by the Fisher Information of the log likelihood function. The Fisher information is a measure of the minimum error involved in estimating a parameter of a distribution and it can be viewed as a measure of the resolving power of an experiment needed to discriminate between two alternative hypothesis of a parameter.

When there are N parameters, then the Fisher information takes the form of an NxN positive semidefinite symmetric matrix, the Fisher Information Matrix, with typical element:

Under certain regularity conditions, the Fisher Information Matrix may also be written in the following form, which is often more convenient for computation:

With iid random variables, an N-dimensional "box" can be constructed with sides . Costa and Cover show that the (Shannon) differential entropy h(X) is related to the volume of the typical set (having the sample entropy close to the true entropy), while the Fisher information is related to the surface of this typical set.

Read more about this topic: Beta Distribution, Parameter Estimation

Famous quotes containing the words fisher, information and/or matrix:

“... having bowed to the inevitability of the dictum that we must eat to live, we should ignore it and live to eat ...”
—M.F.K. Fisher (1908–1992)

“But while ignorance can make you insensitive, familiarity can also numb. Entering the second half-century of an information age, our cumulative knowledge has changed the level of what appalls, what stuns, what shocks.”
—Anna Quindlen (b. 1952)

“As all historians know, the past is a great darkness, and filled with echoes. Voices may reach us from it; but what they say to us is imbued with the obscurity of the matrix out of which they come; and try as we may, we cannot always decipher them precisely in the clearer light of our day.”
—Margaret Atwood (b. 1939)

Related Subjects

Beta Distributions

Related Phrases

Average Run

Bessel Function

Binomial Distribution

Posterior Distribution

Prior Probability Beta