Some interesting problems in PCA

1.For PCA, in order to find U and W we need to find a solution to the following objective function:

argmin_ {U\in R^{d\times r}:U^TU=I}\sum^n_{i=1}\Vert x_i-UU^Tx_i\Vert_2^2

How can we simplify the equation above such that we transform the problem into a trace maximisation problem? Show the intermediate steps and the new objective function.

Solution:

\sum^n_{i=1}\Vert x_i-UU^Tx_i\Vert^2_2=\Vert X-UU^TX\Vert^2_2

=tr((X-UU^TX)^T(X-UU^TX))=tr(X^TX-2XTUU^TX+X^TUU^TUU^TX)

=tr(X^TX)-tr(X^TUU^TX)=\Vert X\Vert^2_2-tr(U^TXX^TU)=\Vert X\Vert^2_2-tr(U^TXX^TU)

Now we turn the objective into a trace maximisation problem:
argmax_{U\in R^{d\times r}:U^TU=I}trace(U^TXX^TU)

=argmax_{U\in R^{d\times r}:U^TU=I}trace(U^T\sum^n_{i=1}\pmb{x}_i\pmb{x}_i^TU)

If we define \Sigma=\frac{1}{n}\sum^n_{i=1}\pmb{x}_i\pmb{x}_i^T), this is equivalent (up to a constant factor) to:
argmax_{U\in R^{d\times r}:U^TU=I}race(U^T\Sigma U)


2.With a mean-centered data matrix X, a key component of PCA is the spectral decomposition of a matrix M defined as M= VDVT where:

M ∈Rd,d

V is a matrix whose columns \pmb{v}_1, \pmb{v}_2, ..., \pmb{v}_d  are the eigenvectors of M

D is the diagonal matrix with D_{i,i}=\lambda_i and \lambda_i is the eigenvalue of \pmb{v}_i

Assume that obtaining the square matrix M is computationally expensive. Can you

suggest an alternative method to obtain the eigenvectors of M?

Solution:

Consider K=X^TX such that K_{ij}=\langle x_i, x_j\rangle

Suppose v is an eigenvector of K: K\pmb{v}^*=\lambda\pmb{v}^* for some \lambda\in \mathbb{R}.

Multiplying the equation by \frac{1}{n}X and using the definition of K we obtain:

\frac{1}{n}XX^TX\pmb{v}^*=\frac{1}{n}\lambda X\pmb{v}^*

But, using the definition of \Sigma, we get that \Sigma(X\pmb{v}^*)=\frac{\lambda}{n}(X\pmb{v}^*)

Hence \frac{X\pmb{v}^*}{\Vert X\pmb{v}^*\Vert} is an eigenvector of \Sigma with eigenvalue \frac{\lambda}{n}

Therefore, we can calculate the PCA solution by calculating the eigenvalues of K rather than \Sigma.


3.In PCA, W is the compression matrix and U is the recovering matrix. The corresponding objective of finding W and U is phrased as:

argmin_{W\in\mathbb{R}^{r\times d},U\in\mathbb{R}^{d\times r}}\sum^n_{i=1}\Vert x_i-UWx_i\Vert^2_2

Prove the lemma: Let (U,W) be a solution to Objective(1). Then the columns of U are orthonormal (that is, U^TU is the identity matrix of \mathbb{R}^r) and W=U^T.

Solution:

Choose any U and W and consider the mapping \pmb{x}\rightarrow UW\pmb{x}

The range of this mapping, R=\left\{ UW\pmb{x}:\pmb{x}\in\mathbb{R}^d\right\} is an r-dimensional linear subspace of \mathbb{R}^d .

Let V\in\mathbb{R}^{d\times r} be a matrix whose columns form an orthonormal basis of this subspace (i.e. V^TV=I ).

Each vector in R can be written as V\pmb{y} , where \pmb{y}\in\mathbb{R}^r .

For every \pmb{x}\in\mathbb{R}^d and \pmb{y}\in\mathbb{R}^r we have

\Vert \pmb{x}-V\pmb{y}\Vert^2_2=\Vert x\Vert^2+\Vert\pmb{y}\Vert^2-2\pmb{y}^T(V^T\pmb{x})

This difference is minimised for \pmb{y}=V^T\pmb{x} .

Therefore, for each \pmb{x} we have that:

V\pmb{y}=VV^T\pmb{x}=argmin_{\tilde{x}\in R}min\Vert\pmb{x}-\tilde{\pmb{x}}\Vert^2_2

This holds for all \pmb{x}_i and we can replace U,W by V, V^T without increasing the objective:

\sum^n_{i=1}\Vert\pmb{x}_i-UW\pmb{x}_i\Vert^2_2\geq\sum^n_{i=1}-VV^T\pmb{x}_i\Vert^2_2

As this holds for any U,W, the proof is complete.


4.Let \Sigma=VDV^T be the spectral decomposition of \Sigma. D is a diagonal matrix, such that D_{i,i} is the i-th largest eigenvalue of \Sigma. The columns of V are the corresponding eigenvectors, and V^TV=VV^T=I. Then the solution to argmax_{U\in\mathbb{R}^{d\times r}:U^TU=I}trace(U^T\Sigma U) is the matrix U whose columns are the r first eigenvectors of \Sigma.

 Solution:

Choose a matrix U\in\mathbb{R}^{d,r} with orthonormal columns and let B=V^TU. Then, VB=VV^TU=U. It follows that

U^T\Sigma U=B^TV^TVDV^TVB=B^TDB

and therefore

trace(U^T\Sigma U)=\Sigma^d_{j=1}D_{j,j}\sum^r_{i=1}B_{j,i}^2.

Note that B^TB=U^TVV^TU=U^TU=I. Hence the columns of B are orthonormal and \sum^d_{j=1}\sum^r_{i=1}B_{j,i}^2=r.

Define \tilde{B}\in\mathbb{R}^{d\times d} to be the matrix such that the first r columns are the columns of B and in addition, \tilde{B}^T\tilde{B}=I. Then for every j: \sum^d_{i=1}\tilde{B}^2_{j,i}=1, which implies that \sum^r_{i=1}B_{j,i}^2\leq 1.

It follows that

trace(U^T\Sigma U)\leq max_{\beta\in [0,1]^d:\Vert\beta\Vert_1\leq r}\sum^d_{j=1}D_{j,j}\beta_j=\sum^r_{j=1}D_{j,j}

Therefore for every matrix U\in\mathbb{R}^{d\times r} with orthonormal columns, the following inequality hols: trace(U^T\Sigma U)\leq\sum^r_{j=1}D_{j,j}

But if we set U to the matrix with the r leading eigenvectors of \Sigma as its columns, we obtain trace(U^T\Sigma U)=\Sigma^r_{j=1}D_{j,j} and thereby the optimal solution.


5.The variance captured by the first r eigenvectors of \Sigma is the sum over its r largest eigenvalues \sum^r_{i=1}\lambda_i .

Solution:

The variance in a dataset X is defined as

var(X)=\frac{1}{n}\sum^n_{i=1}\Vert x_i-0\Vert^2

=\frac{1}{n}\sum^n_{i=1}\Vert x_i\Vert^2=\frac{1}{n}\sum^n_{i=1}\langle x_i,x_i\rangle =tr(\Sigma)=tr(V^TDV)=tr(VV^TD)=tr(D)

=\sum^d_{i=1}D_{i,i}=\sum^d_{i=1}\lambda_i

The variance in a projected dataset WX, with W=[\pmb{v}_1,...,\pmb{v}_r]^T , is defined as

var(WX)=\frac{1}{n}\Vert W\pmb{x}_j\Vert^2 =\frac{1}{n}\sum^n_{j=1}\langle  W\pmb{x}_j,W\pmb{x}_j\rangle =\frac{1}{n}\sum^n_{j=1}\pmb{x}_j^T W^TW\pmb{x}_j

\frac{1}{n}\sum^n_{j=1}\pmb{x}_j^T(\pmb{v}_1\pmb{v}_1^T+...+\pmb{v}_r\pmb{v}_r^T)\pmb{x}_j=\sum^r_{i=1}\pmb{v}_i^T\sum^n_{j=1}\frac{1}{n}(\pmb{x}_j\pmb{x}_j^T)\pmb{v}_i=

(\pmb{v}_1^T\Sigma\pmb{v}_1+...+\pmb{v}_r^T\Sigma\pmb{v}_r)=\sum^r_{i=1}\pmb{v}_i^T\Sigma\pmb{v}_i=\sum^r_{i=1}\pmb{v}_i^T\lambda_i\pmb{v}_i=\sum^r_{i=1}\lambda_i

(Based on 4, we know that the eigenvectors are orthonormal.)

Leave a comment