The SVD of an $m \times n$ matrix $A$ of rank $r$, from class, is $$A = U\Sigma V^T,$$ where $U$ is an $m \times r$ matrix with orthonormal columns (the "best" orthonormal basis for $C(A)$), $\Sigma$ is an $r \times r$ diagonal matrix of singular values $\sigma_k$, and $V$ is an $n \times r$ matrix with orthonormal columns (the "best" orthonormal basis for the row space $C(A^T)$).
Suppose that $r = n$ (i.e. $A$ has full column rank).
(a) $(V^T)^{-1}$ simplifies to ______________ because $V$ in this case is ______________?
(b) Show that the matrix $B = (A^T A)^{-1}A^T$ has an especially simple form in terms of the SVD factors $U,\Sigma,V$.
(c) $B$ is even simpler if $r = n = m$: what is it?
(This matrix $B$ plays an important role in least-squares problems, as we shall see.
(a) $V$ is a square matrix whose columns form an orthonormal basis of $\mathbb R^n$ so $V^T=V^{-1}$ (unitary). That is, it simplifies to $\boxed{(V^T)^{-1}=V}$ because $V$ in this case is unitary (a.k.a. an orthogonal matrix).
(b) We start by computing $A^TA$. Since $\Sigma^T=\Sigma$ and $U^TU=I_r$ we have
$$ A^TA=(U\Sigma V^T)^TU\Sigma V^T=V\Sigma U^TU\Sigma V^T=V\Sigma^2 V^T. $$Since $V^TV=I_n$ we have $V^{-1}=V^T$ and $(V^T)^{-1}=V$ so
$$(A^TA)^{-1}=(V\Sigma^2 V^T)^{-1}=(V^T)^{-1}\Sigma^{-2}V^{-1}=V\Sigma^{-2}V^T$$where $\Sigma^{-2}$ stands for $(\Sigma^2)^{-1}$. It follows that
$$ (A^TA)^{-1}A^T=V\Sigma^{-2}V^T(U\Sigma V^T)^T=V\Sigma^{-2}V^TV\Sigma U^T=V\Sigma^{-1} U^T. $$To conclude,
$$ \boxed{B=V \Sigma^{-1} U^T} \mbox{ where } \Sigma^{-1} = \begin{pmatrix} \sigma_1^{-1} & & & \\ & \sigma_2^{-1} & & \\ & & \ddots & \\ & & & \sigma_n^{-1} \end{pmatrix} , $$noting that $\Sigma^{-1}$ is especially easy because it is a diagonal matrix.
(c)
If $r=n=m$ then $U^T=U^{-1}$ ($U$ is unitary) so that
$$ B=V\Sigma^{-1} U^T=(V^T)^{-1}\Sigma^{-1}U^{-1}=(U\Sigma V^T)^{-1}=A^{-1} . $$Equivalently, if $r=n=m$ then $A$ is invertible, so $$ B = (A^T A)^{-1}A^T = A^{-1} \underbrace{(A^T)^{-1} A^T}_I = A^{-1} . $$
Either way, we obtain $\boxed{B=A^{-1}}$.
Suppose that $v_1, \ldots, v_r$ are any basis for $C(A^T)$ (not necessarily orthogonal as in the SVD), where $A$ is an $m \times n$ matrix of rank $r$.
The vectors $Av_1, \ldots, Av_r$ are not necessarily orthogonal either, but you can show that they must be linearly independent and hence a basis for $C(A)$. Fill in the blanks of the following proof by contradiction:
If $Av_1, \ldots, Av_r$ were linearly dependent, that would mean that $c_1 Av_1 + \cdots + c_rAv_r = 0$ for some coefficients $\vec{c} \ne 0$. But that would require $c_1 v_1 + \cdots + c_r v_r$ to be in the _______________ space of $A$, which means that $c_1 v_1 + \cdots + c_r v_r = \_\_\_$ because the _______________ space is _______________ to $v_1, \ldots, v_r \in C(A^T)$. But it's impossible to have $c_1 v_1 + \cdots + c_r v_r = \_\_\_$ for $\vec{c} \ne 0$ because the vectors $v_1, \ldots, v_r$ are _______________.
If $Av_1, \ldots, Av_r$ were linearly dependent, that would mean that $c_1 Av_1 + \cdots + c_rAv_r = 0$ for some coefficients $\vec{c} \ne 0$. But that would require $c_1 v_1 + \cdots + c_r v_r$ to be in the null space of $A$ (because $c_1 Av_1 + \cdots + c_rAv_r = A(c_1v_1+\cdots +c_rv_r))$, which means that $c_1 v_1 + \cdots + c_r v_r = \boxed{0}$ because the null space is orthogonal to the row space $C(A^T)$ (which contains $v_1, \ldots, v_r$). But it's impossible to have $c_1 v_1 + \cdots + c_r v_r =\boxed{0}$ for $\vec{c} \ne 0$ because the vectors $v_1, \ldots, v_r$ are linearly independent (because they form a basis of $C(A^T)$).
Let $A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$ and $b = \begin{pmatrix} 2 \\ 3 \\ 4 \end{pmatrix}$.
(a) Compute (by hand) the orthogonal projection of $b$ onto the _____-dimensional column space $C(A)$ using the projection matrix $P = A(A^TA)^{-1}A^T$ derived in class. (As usual, you should be able to multiply $P$ by a vector without explicitly computing any matrix inverses.)
(b) Your answer from (a) could have been obtained by no computation at all, just using geometric reasoning: $C(A)$ in this case is simply the subspace of $\mathbb{R}^3$ consisting of _______________, and hence the orthgonal projection of any vector onto this subspace is simply _______________.
(c) Compute $x = (A^T A)^{-1} A^T b$ in Julia (see code below). What is the relationship between $x$ and your answer from (a)?
(d) Compute A \ b
in Julia. $A$ is non-square/non-invertible, so this can't be $A^{-1} x$ anymore. What is it?
(e) In terms of the SVD $A = U\Sigma V^T$ computed in Julia with U,Σ,V = svd(A)
(see code below), how does $UU^Tb$ (computed in Julia) compare to your answers from the previous parts? Why is this true in general (for any $A$)?
(a)
The space $C(A)$ is two-dimensional since the columns of $A$ are linearly independent. Letting $y$ stand for the orthogonal projection $b$ onto $C(A)$ we have $y=Pb=A(A^TA)^{-1}A^Tb$. (At this point you might think that we should multiply by $A^T$ on both sides to get $A^Ty=A^TA(A^TA)^{-1}A^Tb=A^Tb$ and then solve for $y$. Why is this the wrong thing to do?)
Let $c=A^Tb=[2,5]$ so our goal is to solve $p=A(A^TA)^{-1}c$ for $p$. Let $\hat{x}=(A^TA)^{-1}c$ so once we have $\hat{x}$ we will get $p$ via $p=A\hat{x}$. To solve for $\hat{x}$ we note that $\hat{x}=(A^TA)^{-1}c$ is equivalent to $c=A^TA\hat{x}$. Since $A^TA=\begin{pmatrix} 1 & 1\\ 1 &2\end{pmatrix}$ we get the system of equations $2=\hat{x}_1+\hat{x}_2$ and $5=\hat{x}_1+2\hat{x}_2$. The solution is $\hat{x}_1=-1$ and $\hat{x}_2=3$ so we get that $p=A\hat{x}=[2,3,0]$. We conclude $\boxed{Pb=[2,3,0]}$.
(b)
The key here is to notice that the columns of $A$ are both 0 in the third component. This means that the column space $C(A)$ is the "xy plane" in $\mathbb R^3$ (all points $[x,y,0]$, i.e. the subspace orthogonal to $[0,0,1]$) and the orthogonal projection of any vector $[x,y,z]$ onto this plane is just $\boxed{[x,y,0]}$.
Hence, the projection of $b=[2,3,4]$ from part(a) could be seen by inspection to be simply $Pb = [2,3,0]$.
(c)
A = [1 1; 0 1; 0 0]
b = [2, 3, 4]
x = (A'A)\(A'b)
2-element Vector{Float64}: -1.0 3.0
This is the point $x$ (the solution to the normal equations) such that the projection is $Pb = Ax$, i.e. it is exactly the same as the intermediate step $\hat{x}$ from part (a).
(d)
A\b
2-element Vector{Float64}: -1.0 2.999999999999999
The output from Julia is a vector, essentially the same as $x$ from part (c). We see that A\b
, for a non-square "tall" matrix $A$, solves the least-square problem which minimizes $\Vert Ax-b\Vert^2$ over $x$.
(This fact was also noted in lecture, but it's nice to check it explicitly.)
(e)
using LinearAlgebra # for the svd(A) function
U,Σ,V = svd(A)
U*U'*b
3-element Vector{Float64}: 1.9999999999999998 3.0 0.0
The output is essentially the same as $Pb$ from part (a).
The reason is that $UU^T$ is a projection matrix onto $C(A)$. To see this we note that, by definition, $U$ is an orthonormal basis of $C(A)$ (while $V$ is an orthonormal basis of $C(A^T)$) so $UU^T = U(U^T U)^{-1} U^T$ is orthogonal projection onto $C(A)$.
Another way to see this, is to check that $UU^Tx=x$ for any $x\in C(A)$ and that $UU^Tx=0$ for $x\in C(A)^{\perp}=N(A^T)$. Indeed, $UU^TA=UU^TU\Sigma V^T=U\Sigma V^T=A$ so $UU^Tx=x$ for any $x\in C(A)$. Next, given $x\in N(A^T)$ we have, by definition, $A^Tx=V\Sigma U^Tx=0$ and multiplying by $V$ on both sides gives $\Sigma U^Tx=0$. Multiplying by $\Sigma^{-1}$ on both sides gives $U^Tx=0$, and hence $UU^Tx=0$.
(From Strang 4.2, problem 10.)
Project $a_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ onto the line spanned by $a_2 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}$. Then project the result back onto the line spanned by $a_1$. Multiply these projection matrices $P_1 P_2$: is this a projection?
We can project $a_1$ onto the line spanned by $a_2$ by using the projection formula: $$ P_2 a_1 = a_2 \frac{a_2^T a_1}{a_2^T a_2} = \frac{1}{5} \begin{pmatrix} 1 \\ 2 \end{pmatrix}. $$ We can then project the result back onto $a_1$ using the analogous projection formula: $$ P_1 \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix} = a_1 \frac{a_1^T \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix}}{a_1^T a_1} = \frac{1}{5} \begin{pmatrix} 1 \\ 0 \end{pmatrix}. $$
We can explicitly calculate the components of the projection matrix using the formulae: \begin{align} P_2 &= \frac{a_2 a_2^T}{a_2^Ta_2} = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix},\\ P_1 &= \frac{a_1 a_1^T}{a_1^Ta_1} = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}. \end{align}
Multiplying these together yields: $$ P = P_1P_2 = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix} $$ This is not a projection matrix. In particular, $P^2 = \frac{1}{25} \begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix} \neq P$.
(Extension of Strang 4.2 problem 19.)
(a) Find the projection matrix $P$ onto the plane $x-y-2z=0$: choose two vectors in that plane (the null space of what matrix?) and make them columns of $A$ so that the plane is $C(A)$. Then compute (by hand) the projection of the point $\begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix}$ onto this plane.
(b) Alternatively, $P = I - \mbox{(rank-1 matrix)}$ for the rank-1 projection matrix _________ because the plane is the nullspace of _________.
(a)
We first want to find two linearly independent vectors $(x,y,z)$ lying in the plane described by the equation $x-y-2z=0$. This means we want to find the nullspace of the $1 \times 3$ matrix $A= (1 -1 -2)$. This matrix $A$ is in the form where the first column is the pivot column, and the last two columns are free columns. We can then seek two special solutions to $Ay = 0$ of the usual form: \begin{align} y = \begin{pmatrix} y_1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; y = \begin{pmatrix} y_2 \\ 0 \\ 1 \end{pmatrix}. \end{align} We then solve to find $y_1 = 1$ and $y_2 =2$, so that a basis for the null space of $A$ is given by the two vectors \begin{align} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; \begin{pmatrix} 2 \\ 0 \\ 1 \end{pmatrix}. \end{align} We can then define a $3\times 2$ matrix $A$ as $$ A = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} $$ so that $C(A)$ is the plane in question. We can then compute the projection $p$ of $b = \begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix}$ onto this plane by $$ p = A\underbrace{(A^T A)^{-1}A^Tb}_\hat{x} $$ As usual, we want to avoid explicitly inverting any matrices if we can, so we will find $\hat{x}$ by solving the normal equations $A^TA\hat{x} = A^Tb$. We can then compute $p=A\hat{x}$. We can compute \begin{align} A^TA &= \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix},\\ A^Tb &= \begin{pmatrix} 6 \\ 12 \end{pmatrix}.\\ \end{align} Then we have the equation \begin{align} \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 6 \\ 12 \end{pmatrix} \end{align} which gives us a linear system \begin{align} 2x_1 + 2x_2 &= 6\\ 2x_1 + 5x_2 &= 12 \end{align} with solution $x_1 =1 , x_2 =2 $. We can then find the projection $p=A\hat{x}$: \begin{align} p = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \boxed{\begin{pmatrix} 5 \\ 1 \\ 2 \end{pmatrix}} \end{align}
(b)
The plane is the nullspace of $A=\begin{pmatrix}1& -1& -2\end{pmatrix}$ so $P = I -\frac{aa^T}{a^Ta}$ where $a=[1 ,-1, -2]$ and $\frac{aa^T}{a^Ta}$ is the rank-1 projection matrix onto the normal direction $a$.
(From Strang, section 4.2, problem 30.)
(a) Find the projection matrix $P_C$ onto the column space $C(A)$ (after looking closely at the matrix!) for $A = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix}$.
(b) Find the 3 by 3 projection matrix $P_R$ onto the row space of $A$. (There is a simple way to do it if you realize that the row space is _____
-dimensional.) Multiply $B = P_C A P_R$. Your answer $B$ may be a little surprising at first — can you explain it?
(a) Notice that $A$ only has one linearly independent column, so $C(A)$ is spanned by the vector $x_1 = \begin{pmatrix} 3 \\ 4 \end{pmatrix}$. We can then find the projection matrix $P_C$ using the simple formula for projecting onto a vector: $$ P_C = \frac{x_1 x_1^T}{x_1^T x_1} = \frac{1}{25}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix} $$
(b) Similarly, the row space is one dimensional and is spanned by the vector $x_2 = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}$. We can then find the projection matrix $P_R$ using the simple formula for projecting onto a vector: $$ P_R = \frac{x_2 x_2^T}{x_2^T x_2} = \frac{1}{9}\begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix} $$ We can then multiply $B = P_CAP_R$ \begin{align} B = P_CAP_R = \frac{1}{25\times 9}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix} \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix} = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} = A \end{align}
Since $A$ is rank one, we can write $A = uv^T$, where $u = x_1$ and $v = x_2$. Then $$ AP_R = \frac{(x_1x_2^T)(x_2x_2^T)}{(x_2^Tx_2)} = x_1x_2^T =A\\ AP_R = \frac{(x_1x_1^T)(x_1x_2^T)}{(x_1^Tx_1)} = x_1x_2^T =A\\ $$
So even better than $P_CAP_R = A$, we have both that $P_CA = A$ and $AP_R = A$ (and then $P_CAP_R = A$ follows from these two).
Why is $P_CA = A$? Well, two matrices $M, N$ of the same size are equal if and only if $Mx = Nx$ for all possible column vectors $x$ so that the multiplication makes sense. So to see that $P_CA = A$, we only need to see that $P_CAx = Ax$ for any vector $x$; but $Ax$ is in the column space $C(A)$, so it's projection $P_CAx$ is itself, i.e. $P_CAx = Ax$. So $P_CA = A$.
Similarly, to check that $AP_R = A$ we can just check that $AP_Rx = P_Rx$ for all possible $x$. Any $x$ in $\mathbb{R}^n$ (let's say that $A$ is $m \times n$ can be written in the form $x = x_n + x_r$ with $x_n$ in the nullspace and $x_r$ in the row space of $A$ (because these spaces are orthogonal complements). So it's enough to check that $AP_Rx_r = Ax_r$ and $AP_rx_n = Ax_n$. In the first case, we have $P_Rx_r = x_r$ by definition of projection, so indeed $AP_Rx_r = Ax_r$. In the second case, we have $P_Rx_n = 0$ because $N(A)$ and $C(A^T)$ are orthogonal; as $Ax_n = 0$ as well by the definition of nullspace, we have $AP_Rx_n = Ax_n (= 0)$. So $AP_Rx = Ax$ for all $x$, so $AP_R = A$.