Details
PCA is mathematically defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.
Define a data matrix, XT, with zero empirical mean (the empirical (sample) mean of the distribution has been subtracted from the data set), where each of the n rows represents a different repetition of the experiment, and each of the m columns gives a particular kind of datum (say, the results from a particular probe). (Note that XT is defined here and not X itself, and what we are calling XT is often alternatively denoted as X itself.) The singular value decomposition of X is X = WΣVT, where the m × m matrix W is the matrix of eigenvectors of the covariance matrix XXT, the matrix Σ is an m × n rectangular diagonal matrix with nonnegative real numbers on the diagonal, and the n × n matrix V is the matrix of eigenvectors of XTX. The PCA transformation that preserves dimensionality (that is, gives the same number of principal components as original variables) is then given by:
V is not uniquely defined in the usual case when m < n − 1, but Y will usually still be uniquely defined. Since W (by definition of the SVD of a real matrix) is an orthogonal matrix, each row of YT is simply a rotation of the corresponding row of XT. The first column of YT is made up of the "scores" of the cases with respect to the "principal" component, the next column has the scores with respect to the "second principal" component, and so on.
If we want a reduced-dimensionality representation, we can project X down into the reduced space defined by only the first L singular vectors, WL:
- where with the rectangular identity matrix.
The matrix W of singular vectors of X is equivalently the matrix W of eigenvectors of the matrix of observed covariances C = X XT,
Given a set of points in Euclidean space, the first principal component corresponds to a line that passes through the multidimensional mean and minimizes the sum of squares of the distances of the points from the line. The second principal component corresponds to the same concept after all correlation with the first principal component has been subtracted from the points. The singular values (in Σ) are the square roots of the eigenvalues of the matrix XXT. Each eigenvalue is proportional to the portion of the "variance" (more correctly of the sum of the squared distances of the points from their multidimensional mean) that is correlated with each eigenvector. The sum of all the eigenvalues is equal to the sum of the squared distances of the points from their multidimensional mean. PCA essentially rotates the set of points around their mean in order to align with the principal components. This moves as much of the variance as possible (using an orthogonal transformation) into the first few dimensions. The values in the remaining dimensions, therefore, tend to be small and may be dropped with minimal loss of information. PCA is often used in this manner for dimensionality reduction. PCA has the distinction of being the optimal orthogonal transformation for keeping the subspace that has largest "variance" (as defined above). This advantage, however, comes at the price of greater computational requirements if compared, for example and when applicable, to the discrete cosine transform, and in particular to the DCT-II which is simply known as the "DCT"; introduced by N. Ahmed, T.Natarajan and K.R.Rao in 1974; see Reference 1 of discrete cosine transform. Nonlinear dimensionality reduction techniques tend to be more computationally demanding than PCA.
PCA is sensitive to the scaling of the variables. If we have just two variables and they have the same sample variance and are positively correlated, then the PCA will entail a rotation by 45° and the "loadings" for the two variables with respect to the principal component will be equal. But if we multiply all values of the first variable by 100, then the principal component will be almost the same as that variable, with a small contribution from the other variable, whereas the second component will be almost aligned with the second original variable. This means that whenever the different variables have different units (like temperature and mass), PCA is a somewhat arbitrary method of analysis. (Different results would be obtained if one used Fahrenheit rather than Celsius for example.) Note that Pearson's original paper was entitled "On Lines and Planes of Closest Fit to Systems of Points in Space" – "in space" implies physical Euclidean space where such concerns do not arise. One way of making the PCA less arbitrary is to use variables scaled so as to have unit variance.
Read more about this topic: Principal Component Analysis
Famous quotes containing the word details:
“Working women today are trying to achieve in the work world what men have achieved all alongbut men have always had the help of a woman at home who took care of all the other details of living! Today the working woman is also that woman at home, and without support services in the workplace and a respect for the work women do within and outside the home, the attempt to do both is taking its tollon women, on men, and on our children.”
—Jeanne Elium (20th century)
“Anyone can see that to write Uncle Toms Cabin on the knee in the kitchen, with constant calls to cooking and other details of housework to punctuate the paragraphs, was a more difficult achievement than to write it at leisure in a quiet room.”
—Anna Garlin Spencer (18511931)