Singular Value Decomposition (SVD)
tutorial
BE.400 / 7.548
Singular
value decomposition takes a rectangular matrix of gene expression data (defined
as A, where A is a n x p matrix) in
which the n rows represents the genes, and the p columns represents
the experimental conditions. The SVD theorem states:
Anxp= Unxn Snxp VTpxp
Where
UTU = Inxn
VTV = Ipxp (i.e.
U and V are orthogonal)
Where
the columns of U are the left singular vectors (gene coefficient vectors);
S (the same dimensions as A) has singular values and is diagonal (mode
amplitudes); and VT has rows that are the right singular vectors
(expression level vectors). The SVD represents an expansion of the
original data in a coordinate system where the covariance matrix is diagonal.
Calculating the SVD consists of finding the eigenvalues and eigenvectors of AAT and ATA. The eigenvectors of ATA make up the columns of V , the eigenvectors of AAT make up the columns of U. Also, the singular values in S are square roots of eigenvalues from AAT or ATA. The singular values are the diagonal entries of the S matrix and are arranged in descending order. The singular values are always real numbers. If the matrix A is a real matrix, then U and V are also real.
To understand how to solve for SVD, let’s take the example of the matrix that was provided in Kuruvilla et al:
In this example the matrix is a 4x2 matrix.
We know that for an n x n matrix W, then a nonzero vector x is the
eigenvector of W if:
W x = l x
For some scalar l. Then
the scalar l is called an
eigenvalue of A, and x is said to be an eigenvector of A corresponding to l.
So to find the eigenvalues of the above
entity we compute matrices AAT and ATA.
As
previously stated ,
the eigenvectors of AAT make
up the columns of U so we can
do the following analysis to
Now that we have a n
x n matrix we can determine the eigenvalues of the matrix W.
Since W x = l x then (W- lI) x = 0
For a unique set
of eigenvalues to determinant of the matrix (W-lI) must be equal to zero. Thus from the solution of the characteristic
equation, |W-lI|=0 we obtain:
l=0, l=0; l = 15+Ö221.5 ~ 29.883; l = 15-Ö221.5 ~ 0.117 (four eigenvalues since it is a
fourth degree
polynomial). This value can be used to determine
the eigenvector that can be placed in the columns of U. Thus we obtain the following equations:
19.883 x1 + 14 x2
= 0
14 x1 + 9.883 x2 =
0
x3 = 0
x4 = 0
Upon simplifying the first two equations we
obtain a ratio which relates the value of x1 to x2. The values of x1 and x2 are chosen such that
the elements of the S are the square roots of the eigenvalues. Thus a solution that satisfies the above equation
x1 = -0.58 and x2 = 0.82 and x3 = x4 = 0 (this is the second column of the U
matrix).
Substituting the other eigenvalue we obtain:
-9.883
x1 + 14 x2 = 0
14 x1 - 19.883 x2
= 0
x3 = 0
x4 = 0
Thus a solution that satisfies this set of
equations is x1 = 0.82 and x2 = -0.58 and x3 = x4 = 0 (this is the first column
of the U matrix). Combining these we obtain:
Similarly ATA makes
up the columns of V so we can do
a similar analysis to find the value of V.
and similarly we obtain the expression:
Finally as mentioned previously the S is the
square root of the eigenvalues from AAT
or ATA. and can be obtained
directly giving us:
Note that:
s1 > s2 > s3 > … which
is what the paper was indicating by the figure 4 of the Kuruvilla
paper. In that paper the values were
computed and normalized such that the highest singular value was equal to 1.
Proof:
A=USVT and AT=VSUT
ATA = VSUTUSVT
ATA = VS2VT
ATAV = VS2
References