ML&PR_2: Principal Component Analysis: Maximum variance formulation | 12.1.1.
-
date_range 17/04/2020 21:52 infosortMachine_Learning_n_Pattern_Recognitionlabelmlbishop
12.1. Principal Component Analysis (PCA)
- PCA is a technique is widely used for: dimensionality reduction, lossy data compression, feature extraction and data visualization.
12.1.1. Maximum variance formulation
Consider a data set where and is a -dimensional Euclidean variable. Our goal is to project the data onto a space having dimensionality while maximizing the variance of the projected data. We assume is given. Our task is determine an suitable of from data.
To begin with, consider the projection onto a one-dimension space (). We define -dimensional vector , which . Each data point is then projected onto a scalar value .
The mean of projected data is where is mean of sample data set:
and the variance of projected data is given by:
where is sample data covariance matrix:
Now we maximize with respect to . That could make . However, as we mentioned, we define . To enforce this constraint, we use Lagrange multiplier method, then we make an unconstrained maximization of:
Now we set derivative with respect equal to zero:
It means must be an eigenvector of and is a eigenvalue of .
We have (because ). We want to maximize , so must be the largest eigenvalue of .
If we consider the general case of an -dimensional projection space, the optimal linear projection for which the variance of the projection data is maximized is now define by eigenvectors of corresponding to the largest eigenvalues .
Reference:
- Mục 12.1.1 | Pattern Recognition and Machine Learning | C.M.Bishop.