Principal Component Analysis - Discussion

Discussion

Mean subtraction (a.k.a. "mean centering") is necessary for performing PCA to ensure that the first principal component describes the direction of maximum variance. If mean subtraction is not performed, the first principal component might instead correspond more or less to the mean of the data. A mean of zero is needed for finding a basis that minimizes the mean square error of the approximation of the data.

Assuming zero empirical mean (the empirical mean of the distribution has been subtracted from the data set), the principal component w1 of a data set X can be defined as:

\mathbf{w}_1 = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{\arg\,max}}\,\operatorname{Var}\{ \mathbf{w}^{\rm T} \mathbf{X} \} = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{\arg\,max}}\,E\left\{ \left( \mathbf{w}^{\rm T} \mathbf{X}\right)^2 \right\}

(See arg max for the notation.) With the first k − 1 components, the kth component can be found by subtracting the first principal components from X:

\mathbf{\hat{X}}_{k - 1} = \mathbf{X} - \sum_{i = 1}^{k - 1} \mathbf{w}_i \mathbf{w}_i^{\rm T} \mathbf{X}

and by substituting this as the new data set to find a principal component in

\mathbf{w}_k = \underset{\Vert \mathbf{w} \Vert = 1}{\operatorname{arg\,max}}\,E\left\{ \left( \mathbf{w}^{\rm T} \mathbf{\hat{X}}_{k - 1} \right)^2 \right\}.

PCA is equivalent to empirical orthogonal functions (EOF), a name which is used in meteorology.

An autoencoder neural network with a linear hidden layer is similar to PCA. Upon convergence, the weight vectors of the K neurons in the hidden layer will form a basis for the space spanned by the first K principal components. Unlike PCA, this technique will not necessarily produce orthogonal vectors.

PCA is a popular primary technique in pattern recognition. It is not, however, optimized for class separability. An alternative is the linear discriminant analysis, which does take this into account.

Read more about this topic:  Principal Component Analysis

Famous quotes containing the word discussion:

    We should seek by all means in our power to avoid war, by analysing possible causes, by trying to remove them, by discussion in a spirit of collaboration and good will. I cannot believe that such a programme would be rejected by the people of this country, even if it does mean the establishment of personal contact with the dictators.
    Neville Chamberlain (1869–1940)

    What chiefly distinguishes the daily press of the United States from the press of all other countries is not its lack of truthfulness or even its lack of dignity and honor, for these deficiencies are common to the newspapers everywhere, but its incurable fear of ideas, its constant effort to evade the discussion of fundamentals by translating all issues into a few elemental fears, its incessant reduction of all reflection to mere emotion. It is, in the true sense, never well-informed.
    —H.L. (Henry Lewis)

    If we had had more time for discussion we should probably have made a great many more mistakes.
    Leon Trotsky (1879–1940)