Some interesting problems in PCA

1.For PCA, in order to find U and W we need to find a solution to the following objective function:

$argmin_ {U\in R^{d\times r}:U^TU=I}\sum^n_{i=1}\Vert x_i-UU^Tx_i\Vert_2^2$

How can we simplify the equation above such that we transform the problem into a trace maximisation problem? Show the intermediate steps and the new objective function.

Solution:

$\sum^n_{i=1}\Vert x_i-UU^Tx_i\Vert^2_2=\Vert X-UU^TX\Vert^2_2$

$=tr((X-UU^TX)^T(X-UU^TX))=tr(X^TX-2XTUU^TX+X^TUU^TUU^TX)$

$=tr(X^TX)-tr(X^TUU^TX)=\Vert X\Vert^2_2-tr(U^TXX^TU)=\Vert X\Vert^2_2-tr(U^TXX^TU)$

Now we turn the objective into a trace maximisation problem:
$argmax_{U\in R^{d\times r}:U^TU=I}trace(U^TXX^TU)$

$=argmax_{U\in R^{d\times r}:U^TU=I}trace(U^T\sum^n_{i=1}\pmb{x}_i\pmb{x}_i^TU)$

If we define $\Sigma=\frac{1}{n}\sum^n_{i=1}\pmb{x}_i\pmb{x}_i^T)$ , this is equivalent (up to a constant factor) to:
$argmax_{U\in R^{d\times r}:U^TU=I}race(U^T\Sigma U)$

2.With a mean-centered data matrix X, a key component of PCA is the spectral decomposition of a matrix M defined as M= VDV^Twhere:

M ∈R^d,d

V is a matrix whose columns $\pmb{v}_1, \pmb{v}_2, ..., \pmb{v}_d$ are the eigenvectors of M

D is the diagonal matrix with $D_{i,i}=\lambda_i$ and $\lambda_i$ is the eigenvalue of $\pmb{v}_i$

Assume that obtaining the square matrix M is computationally expensive. Can you

suggest an alternative method to obtain the eigenvectors of M?

Solution:

Consider $K=X^TX$ such that $K_{ij}=\langle x_i, x_j\rangle$

Suppose v is an eigenvector of K: $K\pmb{v}^*=\lambda\pmb{v}^*$ for some $\lambda\in \mathbb{R}$ .

Multiplying the equation by $\frac{1}{n}X$ and using the definition of K we obtain:

$\frac{1}{n}XX^TX\pmb{v}^*=\frac{1}{n}\lambda X\pmb{v}^*$

But, using the definition of $\Sigma$ , we get that $\Sigma(X\pmb{v}^*)=\frac{\lambda}{n}(X\pmb{v}^*)$

Hence $\frac{X\pmb{v}^*}{\Vert X\pmb{v}^*\Vert}$ is an eigenvector of $\Sigma$ with eigenvalue $\frac{\lambda}{n}$

Therefore, we can calculate the PCA solution by calculating the eigenvalues of K rather than $\Sigma$ .

3.In PCA, W is the compression matrix and U is the recovering matrix. The corresponding objective of finding W and U is phrased as:

$argmin_{W\in\mathbb{R}^{r\times d},U\in\mathbb{R}^{d\times r}}\sum^n_{i=1}\Vert x_i-UWx_i\Vert^2_2$

Prove the lemma: Let (U,W) be a solution to Objective(1). Then the columns of U are orthonormal (that is, $U^TU$ is the identity matrix of $\mathbb{R}^r$ ) and $W=U^T$ .

Solution:

Choose any U and W and consider the mapping $\pmb{x}\rightarrow UW\pmb{x}$

The range of this mapping, $R=\left\{ UW\pmb{x}:\pmb{x}\in\mathbb{R}^d\right\}$ is an r-dimensional linear subspace of $\mathbb{R}^d$ .

Let $V\in\mathbb{R}^{d\times r}$ be a matrix whose columns form an orthonormal basis of this subspace (i.e. $V^TV=I$ ).

Each vector in R can be written as $V\pmb{y}$ , where $\pmb{y}\in\mathbb{R}^r$ .

For every $\pmb{x}\in\mathbb{R}^d$ and $\pmb{y}\in\mathbb{R}^r$ we have

$\Vert \pmb{x}-V\pmb{y}\Vert^2_2=\Vert x\Vert^2+\Vert\pmb{y}\Vert^2-2\pmb{y}^T(V^T\pmb{x})$

This difference is minimised for $\pmb{y}=V^T\pmb{x}$ .

Therefore, for each $\pmb{x}$ we have that:

$V\pmb{y}=VV^T\pmb{x}=argmin_{\tilde{x}\in R}min\Vert\pmb{x}-\tilde{\pmb{x}}\Vert^2_2$

This holds for all $\pmb{x}_i$ and we can replace U,W by $V, V^T$ without increasing the objective:

$\sum^n_{i=1}\Vert\pmb{x}_i-UW\pmb{x}_i\Vert^2_2\geq\sum^n_{i=1}-VV^T\pmb{x}_i\Vert^2_2$

As this holds for any U,W, the proof is complete.

4.Let $\Sigma=VDV^T$ be the spectral decomposition of $\Sigma$ . D is a diagonal matrix, such that $D_{i,i}$ is the i-th largest eigenvalue of $\Sigma$ . The columns of V are the corresponding eigenvectors, and $V^TV=VV^T=I$ . Then the solution to $argmax_{U\in\mathbb{R}^{d\times r}:U^TU=I}trace(U^T\Sigma U)$ is the matrix U whose columns are the r first eigenvectors of $\Sigma$ .

Solution:

Choose a matrix $U\in\mathbb{R}^{d,r}$ with orthonormal columns and let $B=V^TU$ . Then, $VB=VV^TU=U$ . It follows that

$U^T\Sigma U=B^TV^TVDV^TVB=B^TDB$

and therefore

$trace(U^T\Sigma U)=\Sigma^d_{j=1}D_{j,j}\sum^r_{i=1}B_{j,i}^2$ .

Note that $B^TB=U^TVV^TU=U^TU=I$ . Hence the columns of B are orthonormal and $\sum^d_{j=1}\sum^r_{i=1}B_{j,i}^2=r$ .

Define $\tilde{B}\in\mathbb{R}^{d\times d}$ to be the matrix such that the first r columns are the columns of B and in addition, $\tilde{B}^T\tilde{B}=I$ . Then for every j: $\sum^d_{i=1}\tilde{B}^2_{j,i}=1$ , which implies that $\sum^r_{i=1}B_{j,i}^2\leq 1$ .

It follows that

$trace(U^T\Sigma U)\leq max_{\beta\in [0,1]^d:\Vert\beta\Vert_1\leq r}\sum^d_{j=1}D_{j,j}\beta_j=\sum^r_{j=1}D_{j,j}$

Therefore for every matrix $U\in\mathbb{R}^{d\times r}$ with orthonormal columns, the following inequality hols: $trace(U^T\Sigma U)\leq\sum^r_{j=1}D_{j,j}$

But if we set U to the matrix with the r leading eigenvectors of $\Sigma$ as its columns, we obtain $trace(U^T\Sigma U)=\Sigma^r_{j=1}D_{j,j}$ and thereby the optimal solution.

5.The variance captured by the first r eigenvectors of $\Sigma$ is the sum over its r largest eigenvalues $\sum^r_{i=1}\lambda_i$ .

Solution:

The variance in a dataset X is defined as

$var(X)=\frac{1}{n}\sum^n_{i=1}\Vert x_i-0\Vert^2$

$=\frac{1}{n}\sum^n_{i=1}\Vert x_i\Vert^2=\frac{1}{n}\sum^n_{i=1}\langle x_i,x_i\rangle$ $=tr(\Sigma)=tr(V^TDV)=tr(VV^TD)=tr(D)$

$=\sum^d_{i=1}D_{i,i}=\sum^d_{i=1}\lambda_i$

The variance in a projected dataset WX, with $W=[\pmb{v}_1,...,\pmb{v}_r]^T$ , is defined as

$var(WX)=\frac{1}{n}\Vert W\pmb{x}_j\Vert^2 =\frac{1}{n}\sum^n_{j=1}\langle W\pmb{x}_j,W\pmb{x}_j\rangle =\frac{1}{n}\sum^n_{j=1}\pmb{x}_j^T W^TW\pmb{x}_j$

$\frac{1}{n}\sum^n_{j=1}\pmb{x}_j^T(\pmb{v}_1\pmb{v}_1^T+...+\pmb{v}_r\pmb{v}_r^T)\pmb{x}_j=\sum^r_{i=1}\pmb{v}_i^T\sum^n_{j=1}\frac{1}{n}(\pmb{x}_j\pmb{x}_j^T)\pmb{v}_i=$

$(\pmb{v}_1^T\Sigma\pmb{v}_1+...+\pmb{v}_r^T\Sigma\pmb{v}_r)=\sum^r_{i=1}\pmb{v}_i^T\Sigma\pmb{v}_i=\sum^r_{i=1}\pmb{v}_i^T\lambda_i\pmb{v}_i=\sum^r_{i=1}\lambda_i$

(Based on 4, we know that the eigenvectors are orthonormal.)

Some interesting problems in PCA

Published by Dandan Peng

Leave a comment Cancel reply

Share this:

Related

Published by Dandan Peng

Leave a comment Cancel reply