ML&PR_2: Principal Component Analysis: Maximum variance formulation

ML&PR_2: Principal Component Analysis: Maximum variance formulation | 12.1.1.

17/04/2020 21:52

Machine_Learning_n_Pattern_Recognition

ml

bishop

12.1. Principal Component Analysis (PCA)

PCA is a technique is widely used for: dimensionality reduction, lossy data compression, feature extraction and data visualization.

12.1.1. Maximum variance formulation

Consider a data set $\{x_n\}$ where $n=1,...,N$ and $x_n$ is a $D$ -dimensional Euclidean variable. Our goal is to project the data onto a space having dimensionality $M<D$ while maximizing the variance of the projected data. We assume $M$ is given. Our task is determine an suitable of $M$ from data.
To begin with, consider the projection onto a one-dimension space ( $M=1$ ). We define $D$ -dimensional vector $u_1$ , which $u_1^Tu_1=1$ . Each data point $x_n$ is then projected onto a scalar value $u_1^Tx_n$ .
The mean of projected data is $u_1^T\overline{x}$ where $\overline{x}$ is mean of sample data set:
$\overline{x}=\frac{1}{N}\sum_{n=1}^Nx_n \tag1$
and the variance of projected data is given by:
$\frac{1}{N}\sum_{n=1}^N\{u_1^Tx_n-u_1^T\overline{x}\}^2=u_1^TCu_1\tag2$
where $C$ is sample data covariance matrix:
$C=\frac{1}{N}\sum_{n=1}^N(x_n-\overline{x})(x_n-\overline{x})^T\tag3$
Now we maximize $u_1^TCu_1$ with respect to $u_1$ . That could make $||u_1||\to\infty$ . However, as we mentioned, we define $u_1^Tu_1=1$ . To enforce this constraint, we use Lagrange multiplier method, then we make an unconstrained maximization of:
$f(u_1)=u_1^TCu_1+\lambda_1(1-u_1^Tu_1)\tag4$
Now we set derivative $f(x)$ with respect $u_1$ equal to zero:
$Cu_1=\lambda_1 u_1\tag5$
It means $u_1$ must be an eigenvector of $C$ and $\lambda_ 1$ is a eigenvalue of $C$ .

We have $(5)\iff u_1^TCu_1=\lambda_1 \ \ (6)$ (because $u_1^Tu_1=1$ ). We want to maximize $u_1^TCu_1$ , so $\lambda_1$ must be the largest eigenvalue of $C$ .
If we consider the general case of an $M$ -dimensional projection space, the optimal linear projection for which the variance of the projection data is maximized is now define by $M$ eigenvectors $u_1,...,u_M$ of $C$ corresponding to the $M$ largest eigenvalues $\lambda_1,...,\lambda_M$ .

Reference:

Mục 12.1.1 | Pattern Recognition and Machine Learning | C.M.Bishop.