pset5sol

Problem 1 (12 points)¶

The SVD of an $m \times n$ matrix $A$ of rank $r$ , from class, is $A = U\Sigma V^T,$ where $U$ is an $m \times r$ matrix with orthonormal columns (the "best" orthonormal basis for $C(A)$ ), $\Sigma$ is an $r \times r$ diagonal matrix of singular values $\sigma_k$ , and $V$ is an $n \times r$ matrix with orthonormal columns (the "best" orthonormal basis for the row space $C(A^T)$ ).

Suppose that $r = n$ (i.e. $A$ has full column rank).

(a) $(V^T)^{-1}$ simplifies to ______________ because $V$ in this case is ______________?

(b) Show that the matrix $B = (A^T A)^{-1}A^T$ has an especially simple form in terms of the SVD factors $U,\Sigma,V$ .

(c) $B$ is even simpler if $r = n = m$ : what is it?

(This matrix $B$ plays an important role in least-squares problems, as we shall see.

Solution¶

(a) $V$ is a square matrix whose columns form an orthonormal basis of $\mathbb R^n$ so $V^T=V^{-1}$ (unitary). That is, it simplifies to $\boxed{(V^T)^{-1}=V}$ because $V$ in this case is unitary (a.k.a. an orthogonal matrix).

(b) We start by computing $A^TA$ . Since $\Sigma^T=\Sigma$ and $U^TU=I_r$ we have

$A^TA=(U\Sigma V^T)^TU\Sigma V^T=V\Sigma U^TU\Sigma V^T=V\Sigma^2 V^T.$

Since $V^TV=I_n$ we have $V^{-1}=V^T$ and $(V^T)^{-1}=V$ so

$(A^TA)^{-1}=(V\Sigma^2 V^T)^{-1}=(V^T)^{-1}\Sigma^{-2}V^{-1}=V\Sigma^{-2}V^T$

where $\Sigma^{-2}$ stands for $(\Sigma^2)^{-1}$ . It follows that

$(A^TA)^{-1}A^T=V\Sigma^{-2}V^T(U\Sigma V^T)^T=V\Sigma^{-2}V^TV\Sigma U^T=V\Sigma^{-1} U^T.$

To conclude,

$\boxed{B=V \Sigma^{-1} U^T} \mbox{ where } \Sigma^{-1} = \begin{pmatrix} \sigma_1^{-1} & & & \\ & \sigma_2^{-1} & & \\ & & \ddots & \\ & & & \sigma_n^{-1} \end{pmatrix} ,$

noting that $\Sigma^{-1}$ is especially easy because it is a diagonal matrix.

(c)

If $r=n=m$ then $U^T=U^{-1}$ ( $U$ is unitary) so that

$B=V\Sigma^{-1} U^T=(V^T)^{-1}\Sigma^{-1}U^{-1}=(U\Sigma V^T)^{-1}=A^{-1} .$

Equivalently, if $r=n=m$ then $A$ is invertible, so $B = (A^T A)^{-1}A^T = A^{-1} \underbrace{(A^T)^{-1} A^T}_I = A^{-1} .$

Either way, we obtain $\boxed{B=A^{-1}}$ .

Problem 2 (10 points)¶

Suppose that $v_1, \ldots, v_r$ are any basis for $C(A^T)$ (not necessarily orthogonal as in the SVD), where $A$ is an $m \times n$ matrix of rank $r$ .

The vectors $Av_1, \ldots, Av_r$ are not necessarily orthogonal either, but you can show that they must be linearly independent and hence a basis for $C(A)$ . Fill in the blanks of the following proof by contradiction:

If $Av_1, \ldots, Av_r$ were linearly dependent, that would mean that $c_1 Av_1 + \cdots + c_rAv_r = 0$ for some coefficients $\vec{c} \ne 0$ . But that would require $c_1 v_1 + \cdots + c_r v_r$ to be in the _______________ space of $A$ , which means that $c_1 v_1 + \cdots + c_r v_r = \_\_\_$ because the _______________ space is _______________ to $v_1, \ldots, v_r \in C(A^T)$ . But it's impossible to have $c_1 v_1 + \cdots + c_r v_r = \_\_\_$ for $\vec{c} \ne 0$ because the vectors $v_1, \ldots, v_r$ are _______________.

Solution¶

If $Av_1, \ldots, Av_r$ were linearly dependent, that would mean that $c_1 Av_1 + \cdots + c_rAv_r = 0$ for some coefficients $\vec{c} \ne 0$ . But that would require $c_1 v_1 + \cdots + c_r v_r$ to be in the null space of $A$ (because $c_1 Av_1 + \cdots + c_rAv_r = A(c_1v_1+\cdots +c_rv_r))$ , which means that $c_1 v_1 + \cdots + c_r v_r = \boxed{0}$ because the null space is orthogonal to the row space $C(A^T)$ (which contains $v_1, \ldots, v_r$ ). But it's impossible to have $c_1 v_1 + \cdots + c_r v_r =\boxed{0}$ for $\vec{c} \ne 0$ because the vectors $v_1, \ldots, v_r$ are linearly independent (because they form a basis of $C(A^T)$ ).

Problem 3 (15 points)¶

Let $A = \begin{pmatrix} 1 & 1 \\ 0 & 1 \\ 0 & 0 \end{pmatrix}$ and $b = \begin{pmatrix} 2 \\ 3 \\ 4 \end{pmatrix}$ .

(a) Compute (by hand) the orthogonal projection of $b$ onto the _____-dimensional column space $C(A)$ using the projection matrix $P = A(A^TA)^{-1}A^T$ derived in class. (As usual, you should be able to multiply $P$ by a vector without explicitly computing any matrix inverses.)

(b) Your answer from (a) could have been obtained by no computation at all, just using geometric reasoning: $C(A)$ in this case is simply the subspace of $\mathbb{R}^3$ consisting of _______________, and hence the orthgonal projection of any vector onto this subspace is simply _______________.

(c) Compute $x = (A^T A)^{-1} A^T b$ in Julia (see code below). What is the relationship between $x$ and your answer from (a)?

(d) Compute A \ b in Julia. $A$ is non-square/non-invertible, so this can't be $A^{-1} x$ anymore. What is it?

(e) In terms of the SVD $A = U\Sigma V^T$ computed in Julia with U,Σ,V = svd(A) (see code below), how does $UU^Tb$ (computed in Julia) compare to your answers from the previous parts? Why is this true in general (for any $A$ )?

Solution¶

(a)

The space $C(A)$ is two-dimensional since the columns of $A$ are linearly independent. Letting $y$ stand for the orthogonal projection $b$ onto $C(A)$ we have $y=Pb=A(A^TA)^{-1}A^Tb$ . (At this point you might think that we should multiply by $A^T$ on both sides to get $A^Ty=A^TA(A^TA)^{-1}A^Tb=A^Tb$ and then solve for $y$ . Why is this the wrong thing to do?)

Let $c=A^Tb=[2,5]$ so our goal is to solve $p=A(A^TA)^{-1}c$ for $p$ . Let $\hat{x}=(A^TA)^{-1}c$ so once we have $\hat{x}$ we will get $p$ via $p=A\hat{x}$ . To solve for $\hat{x}$ we note that $\hat{x}=(A^TA)^{-1}c$ is equivalent to $c=A^TA\hat{x}$ . Since $A^TA=\begin{pmatrix} 1 & 1\\ 1 &2\end{pmatrix}$ we get the system of equations $2=\hat{x}_1+\hat{x}_2$ and $5=\hat{x}_1+2\hat{x}_2$ . The solution is $\hat{x}_1=-1$ and $\hat{x}_2=3$ so we get that $p=A\hat{x}=[2,3,0]$ . We conclude $\boxed{Pb=[2,3,0]}$ .

(b)

The key here is to notice that the columns of $A$ are both 0 in the third component. This means that the column space $C(A)$ is the "xy plane" in $\mathbb R^3$ (all points $[x,y,0]$ , i.e. the subspace orthogonal to $[0,0,1]$ ) and the orthogonal projection of any vector $[x,y,z]$ onto this plane is just $\boxed{[x,y,0]}$ .

Hence, the projection of $b=[2,3,4]$ from part(a) could be seen by inspection to be simply $Pb = [2,3,0]$ .

(c)

In [1]:

A = [1 1; 0 1; 0 0]
b = [2, 3, 4]
x = (A'A)\(A'b)

Out[1]:

2-element Vector{Float64}:
 -1.0
  3.0

This is the point $x$ (the solution to the normal equations) such that the projection is $Pb = Ax$ , i.e. it is exactly the same as the intermediate step $\hat{x}$ from part (a).

In [2]:

A\b

Out[2]:

2-element Vector{Float64}:
 -1.0
  2.999999999999999

The output from Julia is a vector, essentially the same as $x$ from part (c). We see that A\b, for a non-square "tall" matrix $A$ , solves the least-square problem which minimizes $\Vert Ax-b\Vert^2$ over $x$ .

(This fact was also noted in lecture, but it's nice to check it explicitly.)

In [3]:

using LinearAlgebra # for the svd(A) function

U,Σ,V = svd(A)
U*U'*b

Out[3]:

3-element Vector{Float64}:
 1.9999999999999998
 3.0
 0.0

The output is essentially the same as $Pb$ from part (a).

The reason is that $UU^T$ is a projection matrix onto $C(A)$ . To see this we note that, by definition, $U$ is an orthonormal basis of $C(A)$ (while $V$ is an orthonormal basis of $C(A^T)$ ) so $UU^T = U(U^T U)^{-1} U^T$ is orthogonal projection onto $C(A)$ .

Another way to see this, is to check that $UU^Tx=x$ for any $x\in C(A)$ and that $UU^Tx=0$ for $x\in C(A)^{\perp}=N(A^T)$ . Indeed, $UU^TA=UU^TU\Sigma V^T=U\Sigma V^T=A$ so $UU^Tx=x$ for any $x\in C(A)$ . Next, given $x\in N(A^T)$ we have, by definition, $A^Tx=V\Sigma U^Tx=0$ and multiplying by $V$ on both sides gives $\Sigma U^Tx=0$ . Multiplying by $\Sigma^{-1}$ on both sides gives $U^Tx=0$ , and hence $UU^Tx=0$ .

Problem 4 (10 points)¶

(From Strang 4.2, problem 10.)

Project $a_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ onto the line spanned by $a_2 = \begin{pmatrix} 1 \\ 2 \end{pmatrix}$ . Then project the result back onto the line spanned by $a_1$ . Multiply these projection matrices $P_1 P_2$ : is this a projection?

Solution¶

We can project $a_1$ onto the line spanned by $a_2$ by using the projection formula: $P_2 a_1 = a_2 \frac{a_2^T a_1}{a_2^T a_2} = \frac{1}{5} \begin{pmatrix} 1 \\ 2 \end{pmatrix}.$ We can then project the result back onto $a_1$ using the analogous projection formula: $P_1 \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix} = a_1 \frac{a_1^T \begin{pmatrix} \frac{1}{5} \\ \frac{2}{5} \end{pmatrix}}{a_1^T a_1} = \frac{1}{5} \begin{pmatrix} 1 \\ 0 \end{pmatrix}.$

We can explicitly calculate the components of the projection matrix using the formulae: $\begin{align} P_2 &= \frac{a_2 a_2^T}{a_2^Ta_2} = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 2 & 4 \end{pmatrix},\\ P_1 &= \frac{a_1 a_1^T}{a_1^Ta_1} = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}. \end{align}$

Multiplying these together yields: $P = P_1P_2 = \frac{1}{5}\begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix}$ This is not a projection matrix. In particular, $P^2 = \frac{1}{25} \begin{pmatrix} 1 & 2 \\ 0 & 0 \end{pmatrix} \neq P$ .

Problem 5 (10 points)¶

(Extension of Strang 4.2 problem 19.)

(a) Find the projection matrix $P$ onto the plane $x-y-2z=0$ : choose two vectors in that plane (the null space of what matrix?) and make them columns of $A$ so that the plane is $C(A)$ . Then compute (by hand) the projection of the point $\begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix}$ onto this plane.

(b) Alternatively, $P = I - \mbox{(rank-1 matrix)}$ for the rank-1 projection matrix _________ because the plane is the nullspace of _________.

Solution¶

(a)

We first want to find two linearly independent vectors $(x,y,z)$ lying in the plane described by the equation $x-y-2z=0$ . This means we want to find the nullspace of the $1 \times 3$ matrix $A= (1 -1 -2)$ . This matrix $A$ is in the form where the first column is the pivot column, and the last two columns are free columns. We can then seek two special solutions to $Ay = 0$ of the usual form: $\begin{align} y = \begin{pmatrix} y_1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; y = \begin{pmatrix} y_2 \\ 0 \\ 1 \end{pmatrix}. \end{align}$ We then solve to find $y_1 = 1$ and $y_2 =2$ , so that a basis for the null space of $A$ is given by the two vectors $\begin{align} \begin{pmatrix} 1 \\ 1 \\ 0 \end{pmatrix}, \;\;\; \text{and} \;\;\; \begin{pmatrix} 2 \\ 0 \\ 1 \end{pmatrix}. \end{align}$ We can then define a $3\times 2$ matrix $A$ as $A = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix}$ so that $C(A)$ is the plane in question. We can then compute the projection $p$ of $b = \begin{pmatrix} 0 \\ 6 \\ 12 \end{pmatrix}$ onto this plane by $p = A\underbrace{(A^T A)^{-1}A^Tb}_\hat{x}$ As usual, we want to avoid explicitly inverting any matrices if we can, so we will find $\hat{x}$ by solving the normal equations $A^TA\hat{x} = A^Tb$ . We can then compute $p=A\hat{x}$ . We can compute $\begin{align} A^TA &= \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix},\\ A^Tb &= \begin{pmatrix} 6 \\ 12 \end{pmatrix}.\\ \end{align}$ Then we have the equation $\begin{align} \begin{pmatrix} 2 & 2 \\ 2 & 5 \end{pmatrix}\begin{pmatrix} x_1 \\ x_2 \end{pmatrix} = \begin{pmatrix} 6 \\ 12 \end{pmatrix} \end{align}$ which gives us a linear system $\begin{align} 2x_1 + 2x_2 &= 6\\ 2x_1 + 5x_2 &= 12 \end{align}$ with solution $x_1 =1 , x_2 =2$ . We can then find the projection $p=A\hat{x}$ : $\begin{align} p = \begin{pmatrix} 1 & 2 \\ 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \boxed{\begin{pmatrix} 5 \\ 1 \\ 2 \end{pmatrix}} \end{align}$

(b)

The plane is the nullspace of $A=\begin{pmatrix}1& -1& -2\end{pmatrix}$ so $P = I -\frac{aa^T}{a^Ta}$ where $a=[1 ,-1, -2]$ and $\frac{aa^T}{a^Ta}$ is the rank-1 projection matrix onto the normal direction $a$ .

Problem 6 (10 points)¶

(From Strang, section 4.2, problem 30.)

(a) Find the projection matrix $P_C$ onto the column space $C(A)$ (after looking closely at the matrix!) for $A = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix}$ .

(b) Find the 3 by 3 projection matrix $P_R$ onto the row space of $A$ . (There is a simple way to do it if you realize that the row space is _____-dimensional.) Multiply $B = P_C A P_R$ . Your answer $B$ may be a little surprising at first — can you explain it?

Solution¶

(a) Notice that $A$ only has one linearly independent column, so $C(A)$ is spanned by the vector $x_1 = \begin{pmatrix} 3 \\ 4 \end{pmatrix}$ . We can then find the projection matrix $P_C$ using the simple formula for projecting onto a vector: $P_C = \frac{x_1 x_1^T}{x_1^T x_1} = \frac{1}{25}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix}$

(b) Similarly, the row space is one dimensional and is spanned by the vector $x_2 = \begin{pmatrix} 1 \\ 2 \\ 2 \end{pmatrix}$ . We can then find the projection matrix $P_R$ using the simple formula for projecting onto a vector: $P_R = \frac{x_2 x_2^T}{x_2^T x_2} = \frac{1}{9}\begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix}$ We can then multiply $B = P_CAP_R$ $\begin{align} B = P_CAP_R = \frac{1}{25\times 9}\begin{pmatrix} 9 & 12 \\ 12 & 16 \end{pmatrix} \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} \begin{pmatrix} 1 & 2 & 2 \\ 2 & 4 & 4 \\ 2 & 4 & 4 \end{pmatrix} = \begin{pmatrix} 3 & 6 & 6 \\ 4 & 8 & 8 \end{pmatrix} = A \end{align}$

Since $A$ is rank one, we can write $A = uv^T$ , where $u = x_1$ and $v = x_2$ . Then $AP_R = \frac{(x_1x_2^T)(x_2x_2^T)}{(x_2^Tx_2)} = x_1x_2^T =A\\ AP_R = \frac{(x_1x_1^T)(x_1x_2^T)}{(x_1^Tx_1)} = x_1x_2^T =A\\$

So even better than $P_CAP_R = A$ , we have both that $P_CA = A$ and $AP_R = A$ (and then $P_CAP_R = A$ follows from these two).

Why is $P_CA = A$ ? Well, two matrices $M, N$ of the same size are equal if and only if $Mx = Nx$ for all possible column vectors $x$ so that the multiplication makes sense. So to see that $P_CA = A$ , we only need to see that $P_CAx = Ax$ for any vector $x$ ; but $Ax$ is in the column space $C(A)$ , so it's projection $P_CAx$ is itself, i.e. $P_CAx = Ax$ . So $P_CA = A$ .
Similarly, to check that $AP_R = A$ we can just check that $AP_Rx = P_Rx$ for all possible $x$ . Any $x$ in $\mathbb{R}^n$ (let's say that $A$ is $m \times n$ can be written in the form $x = x_n + x_r$ with $x_n$ in the nullspace and $x_r$ in the row space of $A$ (because these spaces are orthogonal complements). So it's enough to check that $AP_Rx_r = Ax_r$ and $AP_rx_n = Ax_n$ . In the first case, we have $P_Rx_r = x_r$ by definition of projection, so indeed $AP_Rx_r = Ax_r$ . In the second case, we have $P_Rx_n = 0$ because $N(A)$ and $C(A^T)$ are orthogonal; as $Ax_n = 0$ as well by the definition of nullspace, we have $AP_Rx_n = Ax_n (= 0)$ . So $AP_Rx = Ax$ for all $x$ , so $AP_R = A$ .

18.06 Problem Set 5 Solutions¶

Problem 1 (12 points)¶

Solution¶

Problem 2 (10 points)¶

Solution¶

Problem 3 (15 points)¶

Solution¶

Problem 4 (10 points)¶

Solution¶

Problem 5 (10 points)¶

Solution¶

Problem 6 (10 points)¶

Solution¶