Multivariate Normal Distribution - Multivariate Normality Tests

Multivariate Normality Tests

Multivariate normality tests check a given set of data for similarity to the multivariate normal distribution. The null hypothesis is that the data set is similar to the normal distribution, therefore a sufficiently small p-value indicates non-normal data. Multivariate normality tests include the Cox-Small test and Smith and Jain's adaptation of the Friedman-Rafsky test.

Mardia's test is based on multivariate extensions of skewness and kurtosis measures. For a sample {x₁, ..., x_n} of k-dimensional vectors we compute

$\begin{align} & \hat\boldsymbol\Sigma = \frac{1}{n} \sum_{j=1}^n (\mathbf{x}_j - \bar \mathbf{x})(\mathbf{x}_j - \bar \mathbf{x})' \\ & A = \frac{1}{6n} \sum_{i=1}^n \sum_{j=1}^n \Big^3 \\ & B = \frac{\sqrt{n}}{\sqrt{8k(k+2)}}\bigg^2 - k(k+2) \bigg] \end{align}$

Under the null hypothesis of multivariate normality, the statistic A will have approximately a chi-squared distribution with 1/6⋅k(k + 1)(k + 2) degrees of freedom, and B will be approximately standard normal N(0,1).

Mardia's kurtosis statistic is skewed and converges very slowly to the limiting normal distribution. For medium size samples, the parameters of the asymptotic distribution of the kurtosis statistic are modified For small sample tests empirical critical values are used. Tables of critical values for both statistics are given by Rencher for k=2,3,4.

Mardia's tests are affine invariant but not consistent. For example, the multivariate skewness test is not consistent against symmetric non-normal alternatives.

The BHEP test computes the norm of the difference between the empirical characteristic function and the theoretical characteristic function of the normal distribution. Calculation of the norm is performed in the L2(μ) space of square-integrable functions with respect to the Gaussian weighting function . The test statistic is

$\begin{align} T_\beta &= \int_{\mathbb{R}^k} \Big| {\textstyle \frac1n \sum_{j=1}^n e^{i\mathbf{t}'\hat\boldsymbol\Sigma^{-1/2}(\mathbf{x}_j - \bar \mathbf{x})}} - e^{-|\mathbf{t}|^2/2} \Big|^2 \; \boldsymbol\mu_\beta(\mathbf{t}) d\mathbf{t} \\ &= \frac{1}{n^2} \sum_{i,j=1}^n e^{-\frac{\beta}{2}(\mathbf{x}_i-\mathbf{x}_j)'\hat\boldsymbol\Sigma^{-1}(\mathbf{x}_i-\mathbf{x}_j)} - \frac{2}{(1 + \beta^2)^{k/2}} \frac{1}{n} \sum_{i=1}^n e^{ -\frac{\beta^2}{2(1+\beta^2)} (\mathbf{x}_i-\bar \mathbf{x})'\hat\boldsymbol\Sigma^{-1}(\mathbf{x}_i-\bar \mathbf{x})} + \frac{1}{(1 + 2\beta^2)^{k/2}} \end{align}$

The limiting distribution of this test statistic is a weighted sum of chi-squared random variables, however in practice it is more convenient to compute the sample quantiles using the Monte-Carlo simulations.

A detailed survey of these and other test procedures is available.

Read more about this topic: Multivariate Normal Distribution

Famous quotes containing the word tests:

“The cinema is going to form the mind of England. The national conscience, the national ideals and tests of conduct, will be those of the film.”
—George Bernard Shaw (1856–1950)