1.For PCA, in order to find U and W we need to find a solution to the following objective function:
How can we simplify the equation above such that we transform the problem into a trace maximisation problem? Show the intermediate steps and the new objective function.
Solution:
Now we turn the objective into a trace maximisation problem:
If we define , this is equivalent (up to a constant factor) to:
2.With a mean-centered data matrix X, a key component of PCA is the spectral decomposition of a matrix M defined as M= VDVT where:
M ∈Rd,d
V is a matrix whose columns are the eigenvectors of M
D is the diagonal matrix with and
is the eigenvalue of
Assume that obtaining the square matrix M is computationally expensive. Can you
suggest an alternative method to obtain the eigenvectors of M?
Solution:
Consider such that
Suppose v is an eigenvector of K: for some
.
Multiplying the equation by and using the definition of K we obtain:
But, using the definition of , we get that
Hence is an eigenvector of
with eigenvalue
Therefore, we can calculate the PCA solution by calculating the eigenvalues of K rather than .
3.In PCA, W is the compression matrix and U is the recovering matrix. The corresponding objective of finding W and U is phrased as:
Prove the lemma: Let (U,W) be a solution to Objective(1). Then the columns of U are orthonormal (that is, is the identity matrix of
) and
.
Solution:
Choose any U and W and consider the mapping
The range of this mapping, is an r-dimensional linear subspace of
.
Let be a matrix whose columns form an orthonormal basis of this subspace (i.e.
).
Each vector in R can be written as , where
.
For every and
we have
This difference is minimised for .
Therefore, for each we have that:
This holds for all and we can replace U,W by
without increasing the objective:
As this holds for any U,W, the proof is complete.
4.Let be the spectral decomposition of
. D is a diagonal matrix, such that
is the i-th largest eigenvalue of
. The columns of V are the corresponding eigenvectors, and
. Then the solution to
is the matrix U whose columns are the r first eigenvectors of
.
Solution:
Choose a matrix with orthonormal columns and let
. Then,
. It follows that
and therefore
.
Note that . Hence the columns of B are orthonormal and
.
Define to be the matrix such that the first r columns are the columns of B and in addition,
. Then for every j:
, which implies that
.
It follows that
Therefore for every matrix with orthonormal columns, the following inequality hols:
But if we set U to the matrix with the r leading eigenvectors of as its columns, we obtain
and thereby the optimal solution.
5.The variance captured by the first r eigenvectors of is the sum over its r largest eigenvalues
.
Solution:
The variance in a dataset X is defined as
The variance in a projected dataset WX, with , is defined as
(Based on 4, we know that the eigenvectors are orthonormal.)
