(1) Let A=UΣVᵀ be the compact svd of A. Write the projection matrix onto the column space of A in simplest terms using possibly U,Σ, or V.
Solution: In the compact svd, $\text{col}(U)=\text{col}(A)$. As $U$ is an orthogonal matrix (square or "Tall-Skinny"), the projection onto the column space of $U$ is given by $UU^{\intercal}$.
(2) Let A be the matrix below with the full SVD (Note: numbers with an e-16 or e-15 may be considered to be 0)
A = [1 3 1;3 8 2;2 4 0]
3×3 Array{Int64,2}: 1 3 1 3 8 2 2 4 0
using LinearAlgebra
U,s,V =svd(A, full=true)
display(U)
display(s)
display(V)
3×3 Array{Float64,2}: -0.31854 -0.369631 -0.872872 -0.848437 -0.299463 0.436436 -0.422713 0.879599 -0.218218
3-element Array{Float64,1}: 10.335397940329974 1.0860706307708408 5.333338897353784e-16
3×3 Adjoint{Float64,Array{Float64,2}}: -0.358891 0.452251 0.816497 -0.912783 0.0127012 -0.408248 -0.195001 -0.8918 0.408248
A) argue that (b₁,b₂,b₃) is the column space if 4b₁-2b₂+b₃=0 (you can use Julia or eyeball and approximate the right numbers to two digits or so)
B) What combinations of rows of A gives the zero row?
C) Eyeballing the numbers some more, what combination of columns of A gives the zero column?
Solution:
A) As $A$ has $2$ positive singular values, we conclude that $A$ is of rank $2$ and therefore that $\text{col}(A)$ is the space of dimension $2$ spanned by the first two columns of $U$. Such a space is simply a plane through the origin in $\mathbb{R}^3$ and so exhibiting a single vector $n \in \mathbb{R}^3$ that is orthogonal to $\text{col}(A)$ suffices to describe $\text{col}(A)$. As $U$ is an orthogonal matrix, the third column of $U$ is orthogonal to the space spanned by the first two columns, and so, since $(4,-2,1)$ is parrallel to the third column of $U$, $n=(4,-2,1)$ is normal to $\text{col}(A)$. Therefore
$$\text{col}(A)=\{(b_1,b_2,b_3)|(b_1,b_2,b_3) \cdot (4,-2,1)=0\}=\{(b_1,b_2,b_3)|4b_1-2b_2+b_3=0\}$$.
B) Writing $A_1,A_2,A_3$ for the rows of $A$, we have that $c_1 A_1+ c_2 A_2 + c_3 A_3 =0$ precisely when $(c_1,c_2,c_3) \in \text{null}(A^{\intercal})$. As $\text{null}(A^{\intercal})$ is spanned by the third column of $U$, we have that $c_1 A_1+ c_2 A_2 + c_3 A_3 =0$ precisely when $(c_1,c_2,c_3)=\alpha(4,-2,1)$ for some $\alpha \in \mathbb{R}$.
C) Analagously to part (B), if we write $K_1,K_2,K_3$ for the columns of $A$, we have that $c_1 K_1+ c_2 K_2 + c_3 K_3 =0$ precisely when $(c_1,c_2,c_3) \in \text{null}(A)$. As $\text{null}(A)$ is spanned by the third column of $V^{\intercal}$, we have that $c_1 K_1+ c_2 K_2 + c_3 K_3 =0$ precisely when $(c_1,c_2,c_3)=\alpha(2,-1,1)$ for some $\alpha \in \mathbb{R}$.
# D) What value of i would have made the answer to A and B easier?
# Type it in and execute.
i = 3
U[:,i]/U[i,i]
3-element Array{Float64,1}: 3.999999999999992 -1.999999999999997 1.0
we therefore see that the third column is certainly parallel to $(4,-2,1)$.
(3) In the below we practice finding the general solution to Ax=b, in the context of a floating point computation. You should be able to eyeball the solutions and write nice numbers, assuming that reasonable rounding and approximate guessing will work out.
A = [1 3 1 2;2 6 4 8; 0 0 2 4]
3×4 Array{Int64,2}: 1 3 1 2 2 6 4 8 0 0 2 4
# using LinearAlgebra
U,s,V =svd(A, full=true)
display(U)
display(s)
display(V)
3×3 Array{Float64,2}: -0.297548 -0.494771 0.816497 -0.903516 -0.130351 -0.408248 -0.308421 0.859191 0.408248
3-element Array{Float64,1}: 12.117224233268539 2.858824387871658 1.7206525645354256e-15
4×4 Adjoint{Float64,Array{Float64,2}}: -0.173685 -0.26426 -0.948683 -1.7461e-16 -0.521055 -0.792781 0.316228 5.1856e-17 -0.373721 0.245628 3.05311e-16 -0.894427 -0.747441 0.491255 6.10623e-16 0.447214
Suppose we calculate (and we ignore floating point effects)
U*U'*[-.5, 2, 3] #Careful this is the full U
3-element Array{Float64,1}: -0.5 1.9999999999999978 2.9999999999999987
A) what does the above say about whether Ax=b has a solution?
B)
Find the right r in the below. Fill it in and execute.
C) What does the below say about the solution to Ax=b?
D)Convert these decimals to a simple fraction and check by hand.
r =2
(V[:,1:r]*((U[:,1:r]'*[-.5,2,3])./s[1:r]))
4-element Array{Float64,1}: -0.1999999999999994 -0.6000000000000001 0.2999999999999999 0.5999999999999998
E) Write down the general solution to Ax=b by eyeballing the information in the svd.
Solution:
A) In the full SVD, $U$ is a square orthogonal matrix and so $UU^{\intercal}=I$, therefore the above calculation says nothing about whether or not $Ax=b$ has a solution.
B) As $A$ has two positive singular values, it has rank $r=2$.
C) Writing $U_1$ for the first two columns of $A$, we see that $Ax=b$ has at least one solution precisely when $U_1 U_1^{\intercal} b=b$. By direct (Julia) calculation, $U_1 U_1^{\intercal} b=b$ holds, and so $Ax=b$ has at least one solution. In particular, if $A$ has (compact) SVD $A=U_1 \Sigma_r V_1^{\intercal}$, then $x=V_1 \Sigma_r^{-1} U_1^{\intercal}b$ is clearly a solution, as $Ax=U_1 \Sigma_r V_1^{\intercal}V_1 \Sigma_r^{-1} U_1^{\intercal}b=U_1 \Sigma_r \Sigma_r^{-1} U_1^{\intercal}b=U_1 U_1^{\intercal}b=b$. Given that $V_1 \Sigma_r^{-1} U_1^{\intercal}b$ is precisely what is being computed below, we see that $x=\frac{1}{10}(-2,-6,3,6)$ is a solution.
D) $$Ax=\frac{1}{10}\left[\begin{array}{cccc} 1 & 3 & 1 & 2\\ 2 & 6 & 4 & 8\\ 0 & 0 & 2 & 4\end{array}\right] \left[\begin{array}{c} -2\\ -6\\ 3\\ 6\end{array}\right]= \frac{1}{10}\left[\begin{array}{c} -5\\ 20\\ 30\end{array}\right]= \left[\begin{array}{c} -.5\\ 2\\ 3\end{array}\right].$$
E) Writing $s=\frac{1}{10}(-2,-6,3,6)$ for the particular solution obtained above, the set of all solutions of $Ax=b$ is given by $\{s+n|n\in \text{null}(A)\}$. As the nullspace of $A$ is spanned by the columns of $V_2$, where $V_2$ is precisely the last two columns of $V$, we conclude that the set of all solutions of $Ax=b$ is precisely $\{s+\alpha p +\beta q|\alpha,\beta \in \mathbb{R}\}$ where $p=(-3,1,0,0)$ and $q=(0,0,-2,1)$.
(4) (based on p1. on p175 of GS) Use the svd to show A) that v₁,v₂,v₃ are independent but B) v₁,v₂,v₃,v₄ are dependent.
v₁ = [1,0,0]
v₂ = [1,1,0]
v₃ = [1,1,1]
v₄ = [2,3,4]
A = [v₁ v₂ v₃]
3×3 Array{Int64,2}: 1 1 1 0 1 1 0 0 1
svdvals(A)
3-element Array{Float64,1}: 2.246979603717467 0.8019377358048381 0.5549581320873711
B = [v₁ v₂ v₃ v₄] #fill in the right values (Note you can copy-past subscripts or type v\_1<tab>)
svdvals(B)
3-element Array{Float64,1}: 5.737738647419861 1.3222083275870498 0.5745610083915811
Solution:
For an $m \times n$ matrix $T$, write $T_1,\ldots,T_n$ where $T_i$ is the vector consisting of all entries in column $i$ of $T$. Then $T_1,\ldots,T_n$ are linearly independent if and only if the only $c_1,\ldots,c_n \in \mathbb{R}$ such that $c_1 T_1+\cdots+c_n T_n=0$ is $c_1=\cdots=c_n=0$. This occurs if and only if the nullspace of $T$ consists only of the zero vector.
A)$A$ has $3$ positive singular values, and so it has rank $3$. $A$ has $3$ columns, and so its nullspace consists only of the zero vector, which by the above implies that the columns of $A$ are independent. As the columns of $A$ are precisely $v_1,v_2,v_3$ we are done.
B)$B$ also has $3$ positive singular values, and so its rank is also $3$. However, $B$ has $4$ columns, and so its nullspace does not consist of only the zero vector, which by the above implies that the columns of $B$ are dependent. As the columns of $B$ are precisely $v_1,v_2,v_3,v_4$ we are done.
(5) From GS Problem 16 (page 176)
Solution:
A) This subspace consists of all vectors of the form $(x,x,x,x) \in \mathbb{R}^4$, which is a $1$-dimensional subspace of $\mathbb{R}^4$ for which a valid basis is the single vector $(1,1,1,1)$.
B) This subspace consists of all vectors of the form $(a,b,c,d) \in \mathbb{R}^4$ such that $(a,b,c,d) \cdot (1,1,1,1) =0$, which is a $3$-dimensional subspace of $\mathbb{R}^3$, specifically a hyperplane through the origin (and, in fact, has a rather neat relationship to the subspace in part (a), as these two subspaces are orthogonal to each other and, together, in a certain sense, give the entire space $\mathbb{R}^4$, which is an idea we will explore soon in class, but for now, foreshadowing!). A valid basis would then be any three vectors that are linearly independent and are all orthogonal to $(1,1,1,1)$, for example the three vectors $(1,0,0,-1),(0,1,0,-1),(0,0,1,-1)$.
C) This subspace consists of all vectors of the form $(a,b,c,d) \in \mathbb{R}^4$ such that $(a,b,c,d) \cdot (1,1,0,0) =0$, and $(a,b,c,d) \cdot (1,0,1,1)=0$. As $(1,1,0,0)$ and $(1,0,1,1)$ are linearly independent, the subspace in this question is of dimension $2$, and so a valid basis would be any two linearly independent vectors that are orthogonal to both $(1,1,0,0)$ and $(1,0,1,1)$, for example the two vectors $(1,-1,-1,0)$ and $(0,0,1,-1)$.
D) As $I_4$ is invertible, its column space is all of $\mathbb{R}^4$ and so an example of a basis would be the four vectors $(1,0,0,0),(0,1,0,0),(0,0,1,0),(0,0,0,1)$. Similarly, due to the fact that $I_4$ is invertible, the nullspace of $I_4$ consists exclusively of the zero vector, and so the basis of the nullspace is the empty basis (to be clear, a basis is a set of vectors, the empty basis is a set that has no elements at all; in particular, the empty basis is not a set that consists of only the zero vector).
(6). From GS Problem 24 (page 177) True of False (with a reason based on the SVD)
(A) If the columns of a matrix are dependent, so are the rows.
(B) The column space of a 2x2 matrix is the same as its row space.
(C) The column space of a 2x2 matrix has the same dimension as the row space.
(D) The columns of a matrix are a basis for the column space.
Solution:
A)False. Consider a matrix $A$ with (compact) SVD $U_1 \Sigma_r V_1^{\intercal}$. Its columns are dependent precisely when $\text{null}(A)$ is non-trivial (we say $\text{null}(A)$ is non-trivial when it contains at least one vector other than the zero vector). Similarly, the rows of $A$ are dependent precisely when $\text{null}(A^{\intercal})$ is non-trivial. If $A$ is $m \times n$ and has rank $r$, then $\text{null}(A)$ is non-trivial precisely when $r<n$ and $\text{null}(A^{\intercal})$ is non-trivial precisely when $r<m$. Let $A$ be an $m \times n$ matrix of rank $r$ in which $m=r$ an $n=r+1$ (for example, take the $m \times m$ identity matrix and glue a column of all zeros onto it). Then the columns of this matrix will be dependent as $r<n$ but the rows are independent as $r=m$.
B)False. Let $u,v$ denote two linearly independent unit vectors in $\mathbb{R}^2$, for example $u=(1,0)$ and $v=(0,1)$, and consider that matrix $A=uv^{\intercal}$. $A$ is of rank $1$ so it will have a (compact) SVD $U_1 \Sigma_1 V_1^{\intercal}$ where $\Sigma_1$ is the $1 \times 1$ identity matrix (the single entry is $1$) and $U_1$ and $V_1$ are $2 \times 1$ matrices where $U_1$ consists of the entries of $u$ and $V_1$ consists of the entries of $v$. The column space of $A$ is the space spanned by $u$ but the rowspace of $A$ is the space spanned by $v$. As $u$ and $v$ are linearly independent, these spaces are different.
C)True. If $A$ has rank $r$, then both the column space of $A$ and the rowspace of $A$ have dimension $r$. This follows from the fact that, if $A$ has (compact) SVD $U_1 \Sigma_r V_1^{\intercal}$ a basis of the column space of $A$ is given by the columns of $U_1$ and a basis of the rowspace of $A$ is given by the columns of $V_1$. As $U_1$ and $V_1$ both have $r$ columns, we conclude that these two spaces both have dimension $r$.
D)False. Let $A$ be an $m \times n$ matrix of rank $r$ for which $r<n$, (the example used in the above solution to part (A) is such a matrix), then $A$ will have a (compact) SVD $U_1 \Sigma_r V_1^{\intercal}$. As noted above, the columns of $U_1$ form a basis of the column space of $A$ and so the column space of $A$ must be of dimension $r$. However, $A$ has $n>r$ columns, and so the columns of $A$ cannot be a basis of the column space of $A$, as they cannot be linearly independent (though, of course, the columns of any matrix $A$ do span the column space of $A$, by definition).
(7). If A=QR, where R is non-singular, A) what is the projection P onto the column space of A in terms of possibly Q and R.
B) Write P$^T$ in terms of P.
C) Write P$^2$ in terms of P.
Solution:
A) If $R$ is invertible, then $\text{col}(A)=\text{col}(Q)$. As $Q$ is an orthogonal matrix, the desired projection is simply $QQ^{\intercal}$.
B) $P^{\intercal}=\left(QQ^{\intercal}\right)^{\intercal}=\left(Q^{\intercal}\right)^{\intercal} Q^{\intercal}=QQ^{\intercal}=P$.
C) $P^2=\left(QQ^{\intercal}\right)^2=\left(QQ^{\intercal}\right)\left(QQ^{\intercal}\right)=Q\left(Q^{\intercal}Q\right)Q^{\intercal}=QQ^{\intercal}=P$.
(8) (GS p217 problem 26) If an m by m matrix has $A^2=A$ and is rank m prove that A=I. (Hint: for any vector x, let w solve Aw=x. Now show Ax=x.)
Solution: $A$ is an $m \times m$ matrix of rank $m$, which immediately implies that $A$ is invertible. We then have $I=AA^{-1}=A^2 A^{-1}=A$.
Altneratively, suppose that $A$ is a $m\times m$ rank $m$ matrix for which $A^2=A$. Since $A$ has full column and row rank, the equation $Aw=x$ has a unique solution for all $x$. Multiplying both sides of the equation by $A$ yields $$A^2w=Ax \implies Aw = Ax \implies x = Ax.$$ If this is true for all $x$, then $\boxed{A=I}$.
(9) (A projection matrix is a symmetric matrix such that $P^2=P$.)
If P is a projection matrix, then show that I-P is as well.
Solution: If $P$ is a projection matrix, we have $(I-P)^2=I^2-IP-PI+P^2=I-2P+P=I-P$ and $(I-P)^{\intercal}=I^{\intercal}-P^{\intercal}=I-P$, which implies that $I-P$ is a projection matrix.
(10) We will do an in class demo of recognizing digits (software a bit involved for a hw, but it would have been fun.) Here is the math.
Suppose in 784 dimensional space, we have a 1000 vectors collected in a 784x1000 matrix called apples. We have another 1000 vectors collected in a 784x1000 matrix called oranges. Each matrix has rank 784, but the best rank 50 approximation is very good for both the apples and oranges matrix.
(Note: much of science and engineering is about learning to deal with the inexact. We all find the exact so much more comforting, so this problem might take you out of your comfort zone, but only a little.)
(A) What is the size of the U matrix for the exact compact SVD for the apples (or the oranges) matrix?
(B) Suppose a new vector comes along and we want to decide if it's best classified as an apple or as an orange.
Would projecting it onto the column space of either U help? Why or why not?
(C) Consider the dot product of the new vector with the first column of the apple U and the orange U. We now have two numbers. What might you hope is true, if the dot product of the orange is much larger than that of the apple?
(D) Let's consider taking the first k columns of the apple U and also the first k columns of the orange U. Consider ||Uᵀ * (new vector)|| for the apple and orange. What might you expect as k goes from 1 to 784.
Solution:
Let $A$ and $O$ denote, respectively, the apple matrix and the orange matrix. Let $A$ have a (compact) SVD $U_A \Sigma_A V_A^{\intercal}$ and $O$ have a (compact) SVD $U_O \Sigma_O V_O^{\intercal}$.
A) As both $A$ and $O$ have rank $r=784$ and dimensions $m \times n$ where $m=784$ and $n=1000$, we conclude that both $U_A$ and $U_O$ have dimension $m \times r$ which is $784 \times 784$.
B) By part (A), each such $U$ is $784 \times 784$, which immediately implies that the column space of each such $U$ is all of $\mathbb{R}^{784}$ (as the $784$ columns of $U$ form a basis of the column space of $U$, the column space must have dimension $784$; the column space of $U$ must be contained in $\mathbb{R}^{784}$ and so the column space of $U$ is all of $\mathbb{R}^{784}$). Therefore, projecting onto the column space of either $U$ would be of no help, as such a projection is merely the identity (projecting any $v \in \mathbb{R}^{784}$ onto $\mathbb{R}^{784}$ yields the same $v$).
C) Let $u_A$ and $u_O$ denote respectively the first column of $U_A$ and $U_O$. The first column of $U_A$ corresponds to the largest singular value of $A$ and the first column of $U_O$ corresponds to the largest singular value of $O$, and so, if, for some vector $v$, $|v \cdot u_O| \gg |v \cdot u_A|$, then one might hope $v$ is more orange-ish. If, instead, $|v \cdot u_O| \gg |v \cdot u_A|$, then one might hope $v$ is more apple-y.
D) Let $U_{A,k}$ and $U_{O,k}$ denote, respectively, the first $k$ columns of $U_A$ and $U_O$. Then, for a vector $v \in \mathbb{R}^{784}$, $U_{A,k}^{\intercal} v$ is a vector $w \in \mathbb{R}^k$, where the $i^{\text{th}}$ entry of $w$ is simply the dot product of the $i^{\text{th}}$ column of $U_{A,k}$ with $v$, and so $\lVert U_{A,k}^{\intercal} v \rVert$ is simply the square root of the sum of the squares of all of these dot products.
Now fix some vector $v \in \mathbb{R}^{784}$ and consider $\lVert U_{A,k}^{\intercal} v \rVert$ and $\lVert U_{O,k}^{\intercal} v \rVert$ as $k$ goes from $1$ to $784$.
First, we consider what happens as $k$ goes from $1$ to, say, $50$, as by the problem statement the rank $50$ approximation to $U_A$ and $U_O$ is quite good. If $v$ really is an apple, then, by the same logic in part (C), one would expect that $\lVert U_{A,k}^{\intercal} v \rVert$ would tend to be larger than $\lVert U_{O,k}^{\intercal} v \rVert$, and that this would tend to be more true as $k$ increases (but is still smaller than $50$), as we are considering dot products of $v$ with singular vectors that correspond to large singular values. Therefore, as $k$ goes from $1$ to $50$, the observation $\lVert U_{A,k}^{\intercal} v \rVert \gg \lVert U_{O,k}^{\intercal} v \rVert$ becomes increasingly better evidence of the apple-y-ness of $v$, and the observation $\lVert U_{A,k}^{\intercal} v \rVert \ll \lVert U_{O,k}^{\intercal} v \rVert$ becomes increasingly better evidence of the orange-osity of $v$.
At the other extreme, when $k=784$, then $\lVert U_{A,784}^{\intercal} v \rVert = \lVert U_{O,784}^{\intercal} v \rVert=\lVert v \rVert$, as $U_{A,784}=U_A$ and $U_{O,784}=U_O$ and $U_A$ and $U_O$ are orthogonal matrices, and applying an orthogonal matrix to a vector does not change the magnitude of that vector.
One would expect that observations of $\lVert U_{A,k}^{\intercal} v \rVert$ and $\lVert U_{O,k}^{\intercal} v \rVert$ gives less useful approximation as $k$ goes from $50$ to $784$, due to the fact that, if the rank $50$ approximation to each of $U_A$ and $U_O$ are good, then all singular values of $U_A$ an $U_O$ after the first $50$ must be quite small, and so one would expect that a dot product of $v$ with a singular vector corresponding to small singular value would have very little to do with apple-icty, orange-itude, or anything else.