18.06 Problem Set 4¶

Due Friday, March 4 at 11am.

Problem 1 (10+5)¶

Suppose, $x,y,a,b$ are $n$-component column vectors. Then we can compute the following scalar $\alpha$:

$$ \alpha = a^T (y x^T + x y^T) b $$

(a) Find a vector $c$ (made from $y,a,b$) such that we get the same $\alpha$ from: $$ \alpha = c^T x $$

(Fun fact: Any scalar that is a linear function of $x$ can be written as a dot product like this. The fancy name for this idea is the Riesz representation theorem.)

(b) Check your answer in Julia for some random vectors. Note that $x^T$ in Julia is written as x' for real column vectors and matrices.

1(a) solution¶

First, we expand the expresion of $\alpha$, getting $\alpha=(a^Ty)(x^Tb)+(a^Tx)(y^Tb)$, where we've used parentheses to help identify the scalar dot-product terms. Now, our goal is to move the $x$ terms to the right.

Since $x^Tb$ is a scalar, we have $x^Tb=b^Tx$. Since $y^Tb$ is a scalar, we have $(a^Tx)(y^Tb)=(y^Tb)(a^Tx)$.

In conclusion, we find $\alpha=(a^Ty)(b^Tx)+(y^Tb)(a^Tx)=(a^Tyb^T+y^Tba^T)x$. Thus $c=(a^Tyb^T+y^Tba^T)^T=\boxed{b(y^Ta)+a(b^Ty)}$.

1(b) solution¶

In [1]:
a = [1,2,7,3]
b = [9,-2,4,5]
y = [0,2,3,1]
x = [-3,2,4,-5]

α = a' * (y * x' + x * y') * b # write aᵀ(yxᵀ+xyᵀ)b in Julia
Out[1]:
-938
In [2]:
c = b * (y' * a) + a * (b' * y) # our formula in terms of a, b, y

# this should output "true":
c' * x ≈ α
Out[2]:
true

Hooray, it matches as desired.

(Since the result is an exact integer with no roundoff errors, we could have equivalently checked for exact equality with c' * x == α rather than using ≈.)

Problem 2 (10+10+5)¶

Let $f(x)$ be the function:

$$ f(x) = \Vert (B + xx^T)^{-1} b \Vert $$

where $B$ is an $n \times n$ matrix, and $b$ is an $n$-component column vector, and the input $x$ is an $n$-component column vector.

(a) Give a formula for $df = f(x+dx) - f(x)$, i.e. to first order in $dx$, as a linear operation on $dx$. (Your answer will have matrix inverses in it; that's okay.)

It might be helpful to break this into steps (a.k.a. a chain rule):

  1. What is $d(\sqrt{\alpha})$ for a small $d\alpha$, i.e. to first order in the change in the scalar $\alpha$? (Just 18.01!)
  2. In class, we saw $d(y^T y) = 2y^T dy$. So what is $d\Vert y\Vert = d\sqrt{y^T y}$ to first order in $dy$, the change for a small change $dy$ in $y$?
  3. What is $d(xx^T)$ to first order in $dx$? (Product rule!)
  4. In class, we saw that $d(A^{-1}) = -A^{-1} dA A^{-1}$. So what is $d\left[(B + xx^T)^{-1} b \right]$ to first order in $dx$?
  5. Put it all together to write $df$ to first order in $dx$.

(b) Using your answer from problem 1, write $df = (\mbox{some vector})^T dx$, and hence give a formula for $\nabla f$.

(c) Check your $\nabla f$ formula by comparing it the numerically computed $f(x+dx) - f(x)$ for a small arbitrary $dx \in \mathbb{R}^4$.

Solution of 2(a)¶

  1. $d\sqrt{\alpha}= (\sqrt{\alpha})' d\alpha = \boxed{\frac{d\alpha}{2\sqrt{\alpha}}}$. (That is, we just used 18.01 to linearize $d\sqrt{\alpha} = \sqrt{\alpha + d\alpha} - \sqrt{\alpha}$ via the derivative $(\sqrt{\alpha})' = 1/2\sqrt{\alpha}$.)
  2. Put $\alpha=y^Ty$ in the above formula, we get $$d\lVert y\rVert=\frac{d(y^Ty)}{2\lVert y\rVert}=\boxed{\frac{y^T dy}{\lVert y\rVert}}. $$
  3. From the product rule, we have $d(xx^T)=dx\, x^T+x d(x^T)=\boxed{dx\, x^T+x(dx)^T}$. (Note that $d(x^T) = (x+dx)^T - x^T = (dx)^T$ by linearity of the transpose.)
  4. From the product rule, since $db=0$, we have $d\bigl((B + xx^T)^{-1} b \bigr)=d\bigl((B + xx^T)^{-1}\bigr)b$. Hence $dy$ equals to
$$-(B + xx^T)^{-1}d(B + xx^T)(B + xx^T)^{-1}b, $$

where we let $y=(B + xx^T)^{-1} b$. Since $dB=0$, the above result is further simplified to be $$\boxed{-(B + xx^T)^{-1}\bigl(dx\, x^T+x(dx)^T\bigr)y}. $$ 5. We put $y=(B + xx^T)^{-1} b$ in the second formula, and get $$df=\frac{y^Tdy}{f(x)}=\boxed{-\frac{y^T(B + xx^T)^{-1}\bigl(dx\, x^T+x(dx)^T\bigr)y}{f(x)}}. $$

Solution of 2(b)¶

The key thing is to realize that $dx\,x^T + x(dx)^T$ is multiplied by a column vector $y$ on the right and by a row vector on the left, so we can apply our formula from problem 1.

The row vector on the left is $z^T = -y^T (B+xx^T)^{-1} / f(x)$, i.e. $z = -(B^T+xx^T)^{-1} y / f(x)$. Hence, we have $df = z^T \left( dx\,x^T + x(dx)^T \right) y$, which by problem 1 simplifies to: $$ {\underbrace{\left[ y(x^Tz)+z(y^Tx) \right]}_{\nabla f}}^T dx $$ and hence $\nabla f$ is $y(x^Tz)+z(y^Tx) = [yx^T + (y^T x)I] z$ (which could be written in several ways): $$ \nabla f = \boxed{- \frac{\left[yx^T + (y^T x)I \right] (B^T+xx^T)^{-1} y }{f(x)}} . $$

Solution of 2(c)¶

In [3]:
using LinearAlgebra

# some B matrix and b vector
B = [3   5   6  6
     2  -2  -5  8
     4   3   2  2
     5  -8  -1  3]
b = [1,-1,2,-2]

f(x) = norm((B + x*x') \ b) # = ‖(B+xxᵀ)⁻¹b‖

# random test inputs:
x = [1,-2,4,6]
dx = [-3,5,0,2] * 1e-8 # a small change in x

f(x + dx) - f(x)
Out[3]:
8.416810981515255e-6
In [4]:
y = (B + x*x') \ b
Out[4]:
4-element Vector{Float64}:
 17.396551724138078
 -1.6206896551724292
  1.9655172413793267
 -6.948275862069026
In [5]:
∇f = -(1/f(x)) * (y*x' + (x'*y) * I) * ((B' + x*x')\y)# your formula from (a)

# this should match f(x+dx)-f(x) above,
# at least to a few digits:
df = ∇f'*dx
Out[5]:
8.416807484728842e-6

It matches, pretty well, but not exactly to about 6 significant digits:

In [6]:
approx_df = f(x + dx) - f(x)
exact_df = ∇f'*dx
abs(approx_df - exact_df) / abs(exact_df) # fractional or "relative" error
Out[6]:
4.15452820896041e-7

We shouldn't expect a better match than this because dx here is small but not infinitesimal. So f(x + dx) - f(x) is not exactly equal to the derivative times dx. It is a finite-difference approximation to the derivative.

See also lecture 3, part 2 of our matrix-calculus course from IAP 2022.

Problem 3 (10+5+5+5)¶

Let $A_0= \begin{pmatrix} 3 & 1 & 2\\ 7 & 3 & -2\\ -2 & -5 & 4\\ \end{pmatrix}$, and consider the nonlinear function $$ g(x) = \underbrace{\begin{pmatrix} x_1 & & \\ & x_2 & \\ & & x_3 \end{pmatrix}}_{\mathrm{Diagonal}(x)\mbox{ in Julia}} A_0 x + x - \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} $$

Here, you will solve the nonlinear equations $g(x) = \vec{0}$ using Newton's method.

(a) Write down the $3\times 3$ Jacobian matrix $g'(x)$ as a function of $x$.

(b) Write a Julia function J(x) below that computes your Jacobian formula from (a).

(c) Check that g(x+dx) ≈ J(x)*dx for a small dx at some x.

(d) Plug your Jacobian, along with the provided g(x) function, into the Newton iteration below, and use the starting guess $x = (1,1,1)$. Give the resulting solution to at least 5 significant digits (in every component).

Solution of 3(a)¶

Let's denote $\mathrm{Diagonal}(x)$ by $\mathrm{diag}(x)$, for brevity. Note that $d(\text{diag}(x)) = \text{diag](x + dx) - \text{diag](x) = \text{diag](dx)$.

Since $g(x)=\text{diag}(x)A_0x+x-1$, we have

$$dg(x)=d(\text{diag}(x))A_0x+\text{diag}(x)A_0dx+dx=\text{diag}(dx)A_0x+\text{diag}(x)A_0dx+dx \, .$$

Notice that $\text{diag}(a)b$ is just the component-wise product of $a$ and $b$, so we have $\text{diag}(a)b=\text{diag}(b)a$. Thus:

$$dg(x)=\text{diag}(A_0x)dx+\text{diag}(x)A_0dx+dx=\bigl(\text{diag}(A_0x)+\text{diag}(x)A_0+I\bigr)dx$$

and hence:

$$g'(x)=\boxed{\text{diag}(A_0x)+\text{diag}(x)A_0+I} . $$

Solution of 3(b)¶

In [7]:
using LinearAlgebra

A₀ = [3 1 2; 7 3 -2; -2 -5 4]

g(x) = Diagonal(x) * (A₀*x) + x .- 1
Out[7]:
g (generic function with 1 method)
In [8]:
J(x) = Diagonal(A₀ * x) + Diagonal(x) * A₀ + I # your answer from (b)

J([1,2,3]) # try it — this should give a 3x3 matrix
Out[8]:
3×3 Matrix{Int64}:
 15    1   2
 14   14  -4
 -6  -15  13

Solution of 3(c)¶

In [9]:
x = randn(3)
dx = randn(3) * 1e-8

g(x+dx) - g(x)
Out[9]:
3-element Vector{Float64}:
  2.633208602276227e-7
 -1.0793489657601185e-7
  3.9174144816911394e-8
In [10]:
J(x) * dx # this should match g(x+dx) - g(x) to a few digits
Out[10]:
3-element Vector{Float64}:
  2.6332085894549496e-7
 -1.0793489378176674e-7
  3.9174142869503106e-8

As in problem 2c, this does not match exactly because dx is small but not infinitesimal, but matches to quite a few significant digits:

In [11]:
approx_dg = g(x+dx) - g(x)
exact_dg = J(x) * dx 
norm(approx_dg - exact_dg) / norm(exact_dg) # fractional or "relative" error
Out[11]:
1.2668479872663542e-8

Checking against finite-difference approximations like this is always a good idea when computing complicated derivatives by hand, because it is quite easy to make mistakes. A bug in your derivative code will almost always give a huge mismatch in the derivative, so a crude check like this is fine.

Solution of 3(d): Newton iteration¶

In [12]:
x = [1,1,1] # initial guess

# run 10 iterations of Newton's method
for i = 1:10
    # show some output:
    @show x
    @show g(x)
    println()

    x = x - J(x) \ g(x)
end
x = [1, 1, 1]
g(x) = [6, 8, -3]

x = [0.38888888888888895, 0.7222222222222222, 1.1944444444444444]
g(x) = [1.0524691358024696, 1.5277777777777781, 0.6589506172839501]

x = [0.29521557410237326, 0.42675822525240636, 0.7651521622715932]
g(x) = [0.13442762387231522, 0.20195573737161787, 0.022538924170478092]

x = [0.27616346456373086, 0.3805238232380848, 0.713167520051225]
g(x) = [0.003950645721066737, 0.007771942336304738, -0.0031886163236364284]

x = [0.2755084320562215, 0.3787394184213428, 0.7122008856233119]
g(x) = [3.722399865679904e-6, 1.428446952234097e-5, -6.153161123179096e-6]

x = [0.27550808219966877, 0.3787354883129502, 0.7121989245327538]
g(x) = [3.114397628678489e-12, 4.054756530535997e-11, -2.4525048658574633e-11]

x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287167]
g(x) = [0.0, 0.0, -2.220446049250313e-16]

x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168]
g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16]

x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168]
g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16]

x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168]
g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16]

Note, as in class, that it converges extremely quickly, as can be seen by how rapidly $g(x)$ is going to zero.

It requires only 5 iterations to get the requested 5 significant digits, and the resulting solution is $\boxed{x = [0.27551, 0.37874, 0.71220]}$.

In 10 iterations, we have the solution to almost 15 digits, or machine precision: it is limited by the roundoff errors stemming from the finite number of digits that are used in computer arithmetic.

Problem 4 (10+5)¶

In problem 3 of pset 3, you considered $Ax = b$ for $$ A = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 4 & 6 \\ 0 & 0 & 1 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} \alpha \\ 6 \\ 1 \end{pmatrix} $$ and found that it only has solutions for $\alpha = 5$.

(a) Find a basis for $N(A^T)$, the left nullspace.

(b) Because $N(A^T)^\perp = C(A)$, the equation $Ax = b$ only has solutions when $b$ is perpendicular to a basis for $N(A^T)$. Plug in your answer from (a) to check that it is orthogonal to $b$ only when $\alpha = 5$, consistent with your solution in pset 3.

Solution of 4(a)¶

Note that $A^T=\begin{pmatrix}1&1&0\\2&2&0\\3&4&1\\4&6&2\end{pmatrix}$. After a row operation, it is simplified to $U = \begin{pmatrix}1&1&0\\0&0&0\\0&1&1\\0&0&0\end{pmatrix}.$ This has rank 2, and 1 special solution $$ x = \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix} $$ Plugging this into $Ux = 0$, we obtain the equations for the pivot variables $$ \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \end{pmatrix} = \begin{pmatrix} 0 \\ -1 \end{pmatrix} $$ and hence, by backsubstitution, $p_2 = -1$ and $p_1 = 1$.

Hence the special solution basis for $N(A^T)$ is $\boxed{\begin{pmatrix}1\\-1\\1\end{pmatrix}}$.

Solution of 4(b)¶

$\begin{pmatrix}1\\-1\\1\end{pmatrix}$ being orthogonal to $b$ is equivalent to $1\times\alpha + (-1) \times 6 + 1\times 1=0$, which gives $\boxed{\alpha=5}$ exactly as in pset 3.

Problem 5 (10)¶

Suppose $S$ is spanned by (1,2,3) and (4,5,6). Then $S^\perp$ is the nullspace of what matrix?

Solution of 5¶

From the definition of column spaces, we have $S$ is the column space of $A=\begin{pmatrix}1&4\\2&5\\3&6\end{pmatrix}$. From what we know in the class, we have $C(A)^\perp = N(A^T)$. Thus $S^\perp$ is the nullspace of $A^T=\boxed{\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}}$.