The SVD of an m×n matrix A of rank r, from class, is A=UΣVT, where U is an m×r matrix with orthonormal columns (the "best" orthonormal basis for C(A)), Σ is an r×r diagonal matrix of singular values σk, and V is an n×r matrix with orthonormal columns (the "best" orthonormal basis for the row space C(AT)).
Suppose that r=n (i.e. A has full column rank).
(a) (VT)−1 simplifies to ______________ because V in this case is ______________?
(b) Show that the matrix B=(ATA)−1AT has an especially simple form in terms of the SVD factors U,Σ,V.
(c) B is even simpler if r=n=m: what is it?
(This matrix B plays an important role in least-squares problems, as we shall see.
(a) V is a square matrix whose columns form an orthonormal basis of Rn so VT=V−1 (unitary). That is, it simplifies to (VT)−1=V because V in this case is unitary (a.k.a. an orthogonal matrix).
(b) We start by computing ATA. Since ΣT=Σ and UTU=Ir we have
ATA=(UΣVT)TUΣVT=VΣUTUΣVT=VΣ2VT.Since VTV=In we have V−1=VT and (VT)−1=V so
(ATA)−1=(VΣ2VT)−1=(VT)−1Σ−2V−1=VΣ−2VTwhere Σ−2 stands for (Σ2)−1. It follows that
(ATA)−1AT=VΣ−2VT(UΣVT)T=VΣ−2VTVΣUT=VΣ−1UT.To conclude,
B=VΣ−1UT where Σ−1=(σ−11σ−12⋱σ−1n),noting that Σ−1 is especially easy because it is a diagonal matrix.
(c)
If r=n=m then UT=U−1 (U is unitary) so that
B=VΣ−1UT=(VT)−1Σ−1U−1=(UΣVT)−1=A−1.Equivalently, if r=n=m then A is invertible, so B=(ATA)−1AT=A−1(AT)−1AT⏟I=A−1.
Either way, we obtain B=A−1.
Suppose that v1,…,vr are any basis for C(AT) (not necessarily orthogonal as in the SVD), where A is an m×n matrix of rank r.
The vectors Av1,…,Avr are not necessarily orthogonal either, but you can show that they must be linearly independent and hence a basis for C(A). Fill in the blanks of the following proof by contradiction:
If Av1,…,Avr were linearly dependent, that would mean that c1Av1+⋯+crAvr=0 for some coefficients →c≠0. But that would require c1v1+⋯+crvr to be in the _______________ space of A, which means that c1v1+⋯+crvr=___ because the _______________ space is _______________ to v1,…,vr∈C(AT). But it's impossible to have c1v1+⋯+crvr=___ for →c≠0 because the vectors v1,…,vr are _______________.
If Av1,…,Avr were linearly dependent, that would mean that c1Av1+⋯+crAvr=0 for some coefficients →c≠0. But that would require c1v1+⋯+crvr to be in the null space of A (because c1Av1+⋯+crAvr=A(c1v1+⋯+crvr)), which means that c1v1+⋯+crvr=0 because the null space is orthogonal to the row space C(AT) (which contains v1,…,vr). But it's impossible to have c1v1+⋯+crvr=0 for →c≠0 because the vectors v1,…,vr are linearly independent (because they form a basis of C(AT)).
Let A=(110100) and b=(234).
(a) Compute (by hand) the orthogonal projection of b onto the _____-dimensional column space C(A) using the projection matrix P=A(ATA)−1AT derived in class. (As usual, you should be able to multiply P by a vector without explicitly computing any matrix inverses.)
(b) Your answer from (a) could have been obtained by no computation at all, just using geometric reasoning: C(A) in this case is simply the subspace of R3 consisting of _______________, and hence the orthgonal projection of any vector onto this subspace is simply _______________.
(c) Compute x=(ATA)−1ATb in Julia (see code below). What is the relationship between x and your answer from (a)?
(d) Compute A \ b
in Julia. A is non-square/non-invertible, so this can't be A−1x anymore. What is it?
(e) In terms of the SVD A=UΣVT computed in Julia with U,Σ,V = svd(A)
(see code below), how does UUTb (computed in Julia) compare to your answers from the previous parts? Why is this true in general (for any A)?
(a)
The space C(A) is two-dimensional since the columns of A are linearly independent. Letting y stand for the orthogonal projection b onto C(A) we have y=Pb=A(ATA)−1ATb. (At this point you might think that we should multiply by AT on both sides to get ATy=ATA(ATA)−1ATb=ATb and then solve for y. Why is this the wrong thing to do?)
Let c=ATb=[2,5] so our goal is to solve p=A(ATA)−1c for p. Let ˆx=(ATA)−1c so once we have ˆx we will get p via p=Aˆx. To solve for ˆx we note that ˆx=(ATA)−1c is equivalent to c=ATAˆx. Since ATA=(1112) we get the system of equations 2=ˆx1+ˆx2 and 5=ˆx1+2ˆx2. The solution is ˆx1=−1 and ˆx2=3 so we get that p=Aˆx=[2,3,0]. We conclude Pb=[2,3,0].
(b)
The key here is to notice that the columns of A are both 0 in the third component. This means that the column space C(A) is the "xy plane" in R3 (all points [x,y,0], i.e. the subspace orthogonal to [0,0,1]) and the orthogonal projection of any vector [x,y,z] onto this plane is just [x,y,0].
Hence, the projection of b=[2,3,4] from part(a) could be seen by inspection to be simply Pb=[2,3,0].
(c)
A = [1 1; 0 1; 0 0]
b = [2, 3, 4]
x = (A'A)\(A'b)
2-element Vector{Float64}: -1.0 3.0
This is the point x (the solution to the normal equations) such that the projection is Pb=Ax, i.e. it is exactly the same as the intermediate step ˆx from part (a).
(d)
A\b
2-element Vector{Float64}: -1.0 2.999999999999999
The output from Julia is a vector, essentially the same as x from part (c). We see that A\b
, for a non-square "tall" matrix A, solves the least-square problem which minimizes ‖ over x.
(This fact was also noted in lecture, but it's nice to check it explicitly.)
(e)
using LinearAlgebra # for the svd(A) function
U,Σ,V = svd(A)
U*U'*b
3-element Vector{Float64}: 1.9999999999999998 3.0 0.0
The output is essentially the same as Pb from part (a).
The reason is that UU^T is a projection matrix onto C(A). To see this we note that, by definition, U is an orthonormal basis of C(A) (while V is an orthonormal basis of C(A^T)) so UU^T = U(U^T U)^{-1} U^T is orthogonal projection onto C(A).
Another way to see this, is to check that UU^Tx=x for any x\in C(A) and that UU^Tx=0 for x\in C(A)^{\perp}=N(A^T). Indeed, UU^TA=UU^TU\Sigma V^T=U\Sigma V^T=A so UU^Tx=x for any x\in C(A). Next, given x\in N(A^T) we have, by definition, A^Tx=V\Sigma U^Tx=0 and multiplying by V on both sides gives \Sigma U^Tx=0. Multiplying by \Sigma^{-1} on both sides gives U^Tx=0, and hence UU^Tx=0.
(From Strang 4.2, problem 10.)
Project a_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix} onto the line spanned by a_2 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}. Then project the result back onto the line spanned by a_1. Multiply these projection matrices P_1 P_2: is this a projection?
We can project a_1 onto the line spanned by a_2 by using the projection formula: P_2 a_1 = a_2 \frac{a_2^T a_1}{a_2^T a_2} = \frac{1}{5} \begin{pmatrix} 1 \\ 2 \end{pmatrix}. We can then project the result back onto a_1 using the analogous projection formula: P_1 \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix} = a_1 \frac{a_1^T \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix}}{a_1^T a_1} = \frac{1}{5} \begin{pmatrix} 1 \\ 0 \end{pmatrix}.
We can explicitly calculate the components of the projection matrix using the formulae: \begin{align} P_2 &= \frac{a_2 a_2^T}{a_2^Ta_2} = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix},\\ P_1 &= \frac{a_1 a_1^T}{a_1^Ta_1} = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}. \end{align}
Multiplying these together yields: P = P_1P_2 = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix} This is not a projection matrix. In particular, P^2 = \frac{1}{25} \begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix} \neq P.
(Extension of Strang 4.2 problem 19.)
(a) Find the projection matrix P onto the plane x-y-2z=0: choose two vectors in that plane (the null space of what matrix?) and make them columns of A so that the plane is C(A). Then compute (by hand) the projection of the point \begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix} onto this plane.
(b) Alternatively, P = I - \mbox{(rank-1 matrix)} for the rank-1 projection matrix _________ because the plane is the nullspace of _________.
(a)
We first want to find two linearly independent vectors (x,y,z) lying in the plane described by the equation x-y-2z=0. This means we want to find the nullspace of the 1 \times 3 matrix A= (1 -1 -2). This matrix A is in the form where the first column is the pivot column, and the last two columns are free columns. We can then seek two special solutions to Ay = 0 of the usual form: \begin{align} y = \begin{pmatrix} y_1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; y = \begin{pmatrix} y_2 \\ 0 \\ 1 \end{pmatrix}. \end{align} We then solve to find y_1 = 1 and y_2 =2, so that a basis for the null space of A is given by the two vectors \begin{align} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; \begin{pmatrix} 2 \\ 0 \\ 1 \end{pmatrix}. \end{align} We can then define a 3\times 2 matrix A as A = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} so that C(A) is the plane in question. We can then compute the projection p of b = \begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix} onto this plane by p = A\underbrace{(A^T A)^{-1}A^Tb}_\hat{x} As usual, we want to avoid explicitly inverting any matrices if we can, so we will find \hat{x} by solving the normal equations A^TA\hat{x} = A^Tb. We can then compute p=A\hat{x}. We can compute \begin{align} A^TA &= \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix},\\ A^Tb &= \begin{pmatrix} 6 \\ 12 \end{pmatrix}.\\ \end{align} Then we have the equation \begin{align} \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 6 \\ 12 \end{pmatrix} \end{align} which gives us a linear system \begin{align} 2x_1 + 2x_2 &= 6\\ 2x_1 + 5x_2 &= 12 \end{align} with solution x_1 =1 , x_2 =2 . We can then find the projection p=A\hat{x}: \begin{align} p = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \boxed{\begin{pmatrix} 5 \\ 1 \\ 2 \end{pmatrix}} \end{align}
(b)
The plane is the nullspace of A=\begin{pmatrix}1& -1& -2\end{pmatrix} so P = I -\frac{aa^T}{a^Ta} where a=[1 ,-1, -2] and \frac{aa^T}{a^Ta} is the rank-1 projection matrix onto the normal direction a.
(From Strang, section 4.2, problem 30.)
(a) Find the projection matrix P_C onto the column space C(A) (after looking closely at the matrix!) for A = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix}.
(b) Find the 3 by 3 projection matrix P_R onto the row space of A. (There is a simple way to do it if you realize that the row space is _____
-dimensional.) Multiply B = P_C A P_R. Your answer B may be a little surprising at first — can you explain it?
(a) Notice that A only has one linearly independent column, so C(A) is spanned by the vector x_1 = \begin{pmatrix} 3 \\ 4 \end{pmatrix}. We can then find the projection matrix P_C using the simple formula for projecting onto a vector: P_C = \frac{x_1 x_1^T}{x_1^T x_1} = \frac{1}{25}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix}
(b) Similarly, the row space is one dimensional and is spanned by the vector x_2 = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}. We can then find the projection matrix P_R using the simple formula for projecting onto a vector: P_R = \frac{x_2 x_2^T}{x_2^T x_2} = \frac{1}{9}\begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix} We can then multiply B = P_CAP_R \begin{align} B = P_CAP_R = \frac{1}{25\times 9}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix} \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix} = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} = A \end{align}
Since A is rank one, we can write A = uv^T, where u = x_1 and v = x_2. Then AP_R = \frac{(x_1x_2^T)(x_2x_2^T)}{(x_2^Tx_2)} = x_1x_2^T =A\\ AP_R = \frac{(x_1x_1^T)(x_1x_2^T)}{(x_1^Tx_1)} = x_1x_2^T =A\\
So even better than P_CAP_R = A, we have both that P_CA = A and AP_R = A (and then P_CAP_R = A follows from these two).
Why is P_CA = A? Well, two matrices M, N of the same size are equal if and only if Mx = Nx for all possible column vectors x so that the multiplication makes sense. So to see that P_CA = A, we only need to see that P_CAx = Ax for any vector x; but Ax is in the column space C(A), so it's projection P_CAx is itself, i.e. P_CAx = Ax. So P_CA = A.
Similarly, to check that AP_R = A we can just check that AP_Rx = P_Rx for all possible x. Any x in \mathbb{R}^n (let's say that A is m \times n can be written in the form x = x_n + x_r with x_n in the nullspace and x_r in the row space of A (because these spaces are orthogonal complements). So it's enough to check that AP_Rx_r = Ax_r and AP_rx_n = Ax_n. In the first case, we have P_Rx_r = x_r by definition of projection, so indeed AP_Rx_r = Ax_r. In the second case, we have P_Rx_n = 0 because N(A) and C(A^T) are orthogonal; as Ax_n = 0 as well by the definition of nullspace, we have AP_Rx_n = Ax_n (= 0). So AP_Rx = Ax for all x, so AP_R = A.