18.06 Spring 2019 pset 1 - SOLUTIONS¶

due Wednesday 2/13 at 10:55am by submitting pdf through Gradescope¶

Problem 1 (10 points)¶

Warm up problem. On a clock, 12 unit vectors point to each hour, uniformly spaced. What is the sum of all of these vectors? Why?

Solution¶

Let $v_i$ be the unit vector pointing to the $i$-th hour on a clockface (so $v_1$ points to 1 o'clock, and so on). Note then that $v_1=-v_7$, $v_2=-v_8$, and so on. Therefore: $$\sum_{i=1}^{12} v_i = \sum_{i=1}^6 v_i + \sum_{i=7}^{12} v_i = \sum_{i=1}^6 v_i - \sum_{i=1}^6 v_i = 0$$

Problem 2 (10 points)¶

Warm up problem. On the neural net, matrix times vector, diagram that we saw in class, if there are $p$ nodes (circles) on the left and $q$ nodes on the right, then how many connections are there in terms of $p$ and $q$? (In the diagram example there were 15.)

Solution¶

Each node on the left is connected to every single node on the right. Therefore there are $q$ connections connecting each node on the left to all the nodes on the right. If there are $p$ nodes on the left we will have $p\times q$ connections in total.

We can also think about this in terms of the matrix representation of the neural net. If $u$ is a vector of length $p$ describing the nodes on the left, and $v$ is a vector of length $q$ describing the nodes on the right, then we can write $v=Au$, where $A$ is the matrix of weights. In order for this matrix equation to make sense, $A$ must be a $q\times p$ matrix, which will have $pq$ components.

Problem 3 (10 points)¶

What is the formula for the inverse of the 2x2 matrix $\begin{pmatrix} a & b \\ c & d \end{pmatrix}$? (Answer is on Strang p. 84) On what conditions for $a,b,c,d$ does the inverse exist?

Solution¶

Let $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$. Then $$A^{-1} = \frac{1}{ad-bc}\begin{pmatrix} d & -b \\ -c & a\end{pmatrix}.$$ This equation holds provided that $ad-bc\neq 0$. If $ad-bc = 0$ then the matrix $A$ would not have an inverse. This quantity is known as the determinant of the matrix, $\text{det}(A)$. Whenever $\text{det}(A)\neq 0$, the matrix $A$ has a unique inverse. However, $\text{det}(A) = ad-bc$ is only the formula for $2\times 2$ matrices. There is a more complicated formula for a general $n\times n$ matrix, which we will see later in the course.

Problem 4 (10 points)¶

A worked matrix factorization problem is at the end of this notebook.

This is called the 2 x 2 LU factorization: Given $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$, find a factorization of $A$ in the form $ \begin{pmatrix} 1 & 0 \\ x & 1 \end{pmatrix}\begin{pmatrix} u & v \\ 0 & w \end{pmatrix}$.

(Write down $x,u,v,w$ in terms of $a,b,c,d$) On what conditions for $a,b,c,d$ does the LU factorization exist? (This answer is different from that of Problem 3.)

Solution¶

We have $$A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ x & 1 \end{pmatrix}\begin{pmatrix} u & v \\ 0 & w \end{pmatrix}$$ Multiplying out the two matrices on the right hand yields: $$\begin{pmatrix} a & b \\ c & d \end{pmatrix} = \begin{pmatrix} u & v \\ ux & vx+w\end{pmatrix}$$ This gives us four equations: \begin{align*} u &= a\\ v &= b\\ ux &= c\\ vx+w &= d. \end{align*} The first two equations immediately tell us that $u=a$ and $v=b$. We can solve the third to tell us that $x = c/a$, and then substituting into the fourth tells us that $w = d-bc/a$. We have thus found the LU factorization: $$\boxed{A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ c/a & 1 \end{pmatrix}\begin{pmatrix} a & b \\ 0 & d-bc/a \end{pmatrix}}$$

This exists provided that $\boxed{a\neq 0}$. We can check our answer for a random $2\times 2$ matrix in Julia:

In [53]:

A = rand(2,2)
a,c,b,d = A
u = a
v = b
x = c/a
w = d-b*c/a
display(A)
display( [1 0;x 1] * [u v;0 w] )

2×2 Array{Float64,2}:
 0.655945  0.856256
 0.705002  0.605831

2×2 Array{Float64,2}:
 0.655945  0.856256
 0.705002  0.605831

We can run this code a couple of times to double check that the results always agree.

Problem 5 (10 points)¶

A 2x2 rotation matrix has the form $ \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}$. Remembering $\sin^2 \theta + \cos^2 \theta =1$, what is $ \begin{pmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix}$ $ \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}$?

This is called the 2x2 QR factorization: Given $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$, find a factorization of $A$ in the form $ \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}\begin{pmatrix} u & v \\ 0 & w \end{pmatrix}$. The matrix $Q = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}$ is known as a 2x2 rotation matrix. (Hint1: what are the polar coordinates for the point(a,c)?? You should be able to write a formula for $u,\cos \theta,\sin \theta$ from a and c using only division and square roots. (at least if a and c are not both 0). Hint 2, use what we just mentioned about rotation matrices.)

Solution¶

Firstly we multiply out \begin{align*} \begin{pmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix} \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix} &= \begin{pmatrix} \cos^2{\theta}+\sin^2{\theta} & -\sin{\theta}\cos{\theta}+\sin{\theta}\cos{\theta}\\ - \sin{\theta}\cos{\theta}+\sin{\theta}\cos{\theta} & \cos^2{\theta}+\sin^2{\theta} \end{pmatrix}\\ &= \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \end{align*} We have just shown that if $Q = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}$, then $Q^{-1} = \begin{pmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix}$.

To perform the QR factorization, we want to find $u,v,w,\theta$ such that $$\begin{pmatrix} a & b \\ c & d \end{pmatrix} = \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}\begin{pmatrix} u & v \\ 0 & w \end{pmatrix}.$$ Comparing the first column of both sides of the equation, we obtain $a=u\cos{\theta}$ and $c = u\sin{\theta}$. But these are just the polar equations for the point $(a,c)$, and so: \begin{align*} u&=\sqrt{a^2+c^2}\\ \cos{\theta} &= \frac{a}{\sqrt{a^2+c^2}}\\ \sin{\theta} &= \frac{c}{\sqrt{a^2+c^2}}. \end{align*}

Next we notice that \begin{align*} \begin{pmatrix} b \\ d \end{pmatrix} &= \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}\begin{pmatrix} v \\ w \end{pmatrix} \\ &= Q\begin{pmatrix} v \\ w \end{pmatrix} \end{align*} and so \begin{align*} \begin{pmatrix} v \\ w \end{pmatrix} &= Q^{-1} \begin{pmatrix} b \\ d \end{pmatrix}\\ &= \begin{pmatrix} \cos \theta & \sin \theta \\ -\sin \theta & \cos \theta \end{pmatrix}\begin{pmatrix} b \\ d \end{pmatrix}\\ &= \begin{pmatrix} a/u & c/u \\ -c/u & a/u \end{pmatrix}\begin{pmatrix} b \\ d \end{pmatrix}\\ &= \begin{pmatrix} \frac{ab+cd}{\sqrt{a^2+c^2}} \\ \frac{ad-bc}{\sqrt{a^2+c^2}} \end{pmatrix} \end{align*}

These expressions all hold provided that both $a$ and $c$ are not identically 0. We can double check the calculation using Julia:

In [54]:

A = rand(2,2)
a,c,b,d = A
u = sqrt(a^2+c^2)
v = (a*b+c*d)/u
w = (a*d-b*c)/u
display(A)
display( [a/u -c/u;c/u a/u]*[u v; 0 w])

2×2 Array{Float64,2}:
 0.805679  0.139147
 0.49966   0.66488

2×2 Array{Float64,2}:
 0.805679  0.139147
 0.49966   0.66488

Problem 6 (10 points)¶

The 2x2 singular value decomposition: An SVD has the form $A = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$ = (rotation 1)(diagonal with non-negative entries)(rotation 2) = $ \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}$ $ \begin{pmatrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{pmatrix}$ $ \begin{pmatrix} \cos \phi & \sin \phi \\ -\sin \phi & \cos \phi \end{pmatrix}$. We will see in this class that the SVD is very important in data science, biology, and so many fields.

Compute $A$ if $\theta=\pi/6$ and $\phi=\pi/3$ and $\sigma_1=\sigma_2=1$.

Solution¶

\begin{align*} A &= \begin{pmatrix} \cos \theta & -\sin \theta \\ \sin \theta & \cos \theta \end{pmatrix} \begin{pmatrix} \sigma_1 & 0 \\ 0 & \sigma_2 \end{pmatrix} \begin{pmatrix} \cos \phi & \sin \phi \\ -\sin \phi & \cos \phi \end{pmatrix}\\ &= \begin{pmatrix} \sqrt{3}/2 & -1/2 \\ 1/2 & \sqrt{3}/2 \end{pmatrix}\begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}\begin{pmatrix} 1/2 & \sqrt{3}/2 \\ -\sqrt{3}/2 & 1/2 \end{pmatrix}\\ &= \begin{pmatrix} \sqrt{3}/2 & -1/2 \\ 1/2 & \sqrt{3}/2 \end{pmatrix}\begin{pmatrix} 1/2 & \sqrt{3}/2 \\ -\sqrt{3}/2 & 1/2 \end{pmatrix}\\ &= \begin{pmatrix} \sqrt{3}/2 & 1/2 \\ -1/2 & \sqrt{3}/2 \end{pmatrix} \end{align*}

In [55]:

# Check your work numerically
 θ = π/6
 ϕ = π/3
 σ₁ = σ₂ = 1
display([cos(θ) -sin(θ);sin(θ) cos(θ)] * [1 0;0 1] * [cos(ϕ) sin(ϕ);-sin(ϕ) cos(ϕ)])
display([sqrt(3)/2 1/2;-1/2 sqrt(3)/2])

2×2 Array{Float64,2}:
  0.866025  0.5     
 -0.5       0.866025

2×2 Array{Float64,2}:
  0.866025  0.5     
 -0.5       0.866025

Next Problems investigate structured matrices¶

In [56]:

n = 5
D =  rand(1:9, n, n)  # Create a random n x n matrix with the digits 1 through 9

Out[56]:

5×5 Array{Int64,2}:
 6  6  3  8  3
 7  8  9  7  7
 1  9  3  6  1
 8  5  2  7  9
 7  3  5  5  6

Put your cursor and click on the cell above and use control-enter to run the cell a few times without moving to another cell. If we asked how many numbers are stored in this n by n matrix you would say $n^2$.

In [57]:

using LinearAlgebra
n = 5
D =  Diagonal(rand(1:9, n))  # Create a random n x n digonal matrix with the digits 1 through 9

Out[57]:

5×5 Diagonal{Int64,Array{Int64,1}}:
 2  ⋅  ⋅  ⋅  ⋅
 ⋅  5  ⋅  ⋅  ⋅
 ⋅  ⋅  8  ⋅  ⋅
 ⋅  ⋅  ⋅  4  ⋅
 ⋅  ⋅  ⋅  ⋅  7

While this is an n by n matrix, the number of parameters is n (which is what Julia stores in memory.)

Problem 7 (10 points): As a function of $n$, how many parameters are there in an $n$ by $n$ (upper) Bidiagonal Matrix?¶

In [58]:

B = Bidiagonal(rand(1:9,n), rand(1:9,n-1), :U)

Out[58]:

5×5 Bidiagonal{Int64,Array{Int64,1}}:
 7  2  ⋅  ⋅  ⋅
 ⋅  9  5  ⋅  ⋅
 ⋅  ⋅  2  6  ⋅
 ⋅  ⋅  ⋅  1  9
 ⋅  ⋅  ⋅  ⋅  8

Solution¶

There are $n$ entries along the diagonal, and $n-1$ entries along the upper diagonal. Therefore the upper bidiagonal matrix has $\boxed{2n-1}$ parameters.

Problem 8 (10 points): As a function of $n$, how many (independent) parameters are there in an $n$ by $n$ Symmetric Tridiagonal Matrix? Why is the answer the same as for the bidiagonal?¶

In [59]:

S = SymTridiagonal(rand(1:9,n), rand(1:9,n-1))

Out[59]:

5×5 SymTridiagonal{Int64,Array{Int64,1}}:
 3  8  ⋅  ⋅  ⋅
 8  9  8  ⋅  ⋅
 ⋅  8  8  9  ⋅
 ⋅  ⋅  9  1  2
 ⋅  ⋅  ⋅  2  3

Solution¶

There are $n$ entries along the diagonal, and $n-1$ entries along each the upper and lower diagonals. However, since $S$ is symmetric, the lower and upper diagonals are identical and so there are still only $\boxed{2n-1}$ independent parameters.

Problem 9 (10 points) A symmetric "rank 1" matrix has the form $A_{ij} = v_i v_j$ where $v$ is a vector. Why does such a matrix really have only $n$ parameters?¶

In [60]:

#Example
v = [1,2,3,4,5]
[vi*vj for vi∈v, vj∈v]

Out[60]:

5×5 Array{Int64,2}:
 1   2   3   4   5
 2   4   6   8  10
 3   6   9  12  15
 4   8  12  16  20
 5  10  15  20  25

In [61]:

v = rand(1:9,n)
println(v)
[vi*vj for vi∈v, vj∈v]

[5, 4, 9, 7, 9]

Out[61]:

5×5 Array{Int64,2}:
 25  20  45  35  45
 20  16  36  28  36
 45  36  81  63  81
 35  28  63  49  63
 45  36  81  63  81

Solution¶

Every component of $A$ is just the product of two of the components of the vector $v$, which has length $n$. Since $v$ only depends on $n$ independent parameters, the matrix $A$ will also only depend on $n$ independent parameters.

Problem 10 (10 points) A general "rank 1" matrix has the form $A_{ij}=v_iw_j$ where $v$ and $w$ are vectors of size $m$ and $n$ respectively. Think of a reason why this matrix really has only $m+n-1$ parameters.¶

(If it helps, you can assume "generically" that the (1,1) entry is not 0.)

In [62]:

v = rand(1:9,3)
w = rand(1:9,4)
println(v)
println(w)
[vi*wj for vi∈v, wj∈w]

[3, 8, 1]
[5, 6, 3, 4]

Out[62]:

3×4 Array{Int64,2}:
 15  18   9  12
 40  48  24  32
  5   6   3   4

Solution¶

Following the reasoning from Problem 9, you might first think that $A$ should depend on $m+n$ parameters. However, consider the case where $m=n=1$. Then $A$ is just a $1\times 1$ matrix. So $A$ only contains a single entry, and so can only depend on one independent parameter. But $m+n=2$....

Instead, we can think as follows. Writing the components of $A$ as $A_{ij}=v_iw_j$ for some vector $v$ of length $m$, and a vector $w$ of length $n$, is not unique. For example, choosing $v/2$ and $2w$ would give the same matrix $A$. Assume without loss of generality then that the first component of $v$ is nonzero. Then consider the transformation $v\to\frac{v}{v_1}$ and $w\to v_1 w$. These will still give rise to the same matrix $A$, but now the first component of $v$ is always just 1. This shows that we can always write $A_{ij} = v_iw_j$ where the first component of $v$ is always just 1, so $v$ has $m-1$ independent parameters, $w$ has $n$ parameters, and so $A$ depends only on $n+m-1$ parameters.

Another way to see this is as follows: $$A_{ij} = v_i w_j = \frac{(v_i w_1)(v_1 w_j)}{v_1w_1}=\frac{A_{i1}A_{1j}}{A_{11}}.$$ From this identity, we see that it is sufficient just to know the first row and the first column of the matrix $A$ in order to construct the entire matrix. The first column and row depend only on $n+m-1$ parameters (since they overlap on the entry $A_{11}\neq 0$), and so the matrix $A$ only depends on $n+m-1$ independent parameters.

In [ ]: