Due Friday, March 4 at 11am.
Suppose, $x,y,a,b$ are $n$-component column vectors. Then we can compute the following scalar $\alpha$:
$$ \alpha = a^T (y x^T + x y^T) b $$(a) Find a vector $c$ (made from $y,a,b$) such that we get the same $\alpha$ from: $$ \alpha = c^T x $$
(Fun fact: Any scalar that is a linear function of $x$ can be written as a dot product like this. The fancy name for this idea is the Riesz representation theorem.)
(b) Check your answer in Julia for some random vectors. Note that $x^T$ in Julia is written as x'
for real column vectors and matrices.
First, we expand the expresion of $\alpha$, getting $\alpha=(a^Ty)(x^Tb)+(a^Tx)(y^Tb)$, where we've used parentheses to help identify the scalar dot-product terms. Now, our goal is to move the $x$ terms to the right.
Since $x^Tb$ is a scalar, we have $x^Tb=b^Tx$. Since $y^Tb$ is a scalar, we have $(a^Tx)(y^Tb)=(y^Tb)(a^Tx)$.
In conclusion, we find $\alpha=(a^Ty)(b^Tx)+(y^Tb)(a^Tx)=(a^Tyb^T+y^Tba^T)x$. Thus $c=(a^Tyb^T+y^Tba^T)^T=\boxed{b(y^Ta)+a(b^Ty)}$.
a = [1,2,7,3]
b = [9,-2,4,5]
y = [0,2,3,1]
x = [-3,2,4,-5]
α = a' * (y * x' + x * y') * b # write aᵀ(yxᵀ+xyᵀ)b in Julia
-938
c = b * (y' * a) + a * (b' * y) # our formula in terms of a, b, y
# this should output "true":
c' * x ≈ α
true
Hooray, it matches as desired.
(Since the result is an exact integer with no roundoff errors, we could have equivalently checked for exact equality with c' * x == α
rather than using ≈
.)
Let $f(x)$ be the function:
$$ f(x) = \Vert (B + xx^T)^{-1} b \Vert $$where $B$ is an $n \times n$ matrix, and $b$ is an $n$-component column vector, and the input $x$ is an $n$-component column vector.
(a) Give a formula for $df = f(x+dx) - f(x)$, i.e. to first order in $dx$, as a linear operation on $dx$. (Your answer will have matrix inverses in it; that's okay.)
It might be helpful to break this into steps (a.k.a. a chain rule):
(b) Using your answer from problem 1, write $df = (\mbox{some vector})^T dx$, and hence give a formula for $\nabla f$.
(c) Check your $\nabla f$ formula by comparing it the numerically computed $f(x+dx) - f(x)$ for a small arbitrary $dx \in \mathbb{R}^4$.
where we let $y=(B + xx^T)^{-1} b$. Since $dB=0$, the above result is further simplified to be $$\boxed{-(B + xx^T)^{-1}\bigl(dx\, x^T+x(dx)^T\bigr)y}. $$ 5. We put $y=(B + xx^T)^{-1} b$ in the second formula, and get $$df=\frac{y^Tdy}{f(x)}=\boxed{-\frac{y^T(B + xx^T)^{-1}\bigl(dx\, x^T+x(dx)^T\bigr)y}{f(x)}}. $$
The key thing is to realize that $dx\,x^T + x(dx)^T$ is multiplied by a column vector $y$ on the right and by a row vector on the left, so we can apply our formula from problem 1.
The row vector on the left is $z^T = -y^T (B+xx^T)^{-1} / f(x)$, i.e. $z = -(B^T+xx^T)^{-1} y / f(x)$. Hence, we have $df = z^T \left( dx\,x^T + x(dx)^T \right) y$, which by problem 1 simplifies to: $$ {\underbrace{\left[ y(x^Tz)+z(y^Tx) \right]}_{\nabla f}}^T dx $$ and hence $\nabla f$ is $y(x^Tz)+z(y^Tx) = [yx^T + (y^T x)I] z$ (which could be written in several ways): $$ \nabla f = \boxed{- \frac{\left[yx^T + (y^T x)I \right] (B^T+xx^T)^{-1} y }{f(x)}} . $$
using LinearAlgebra
# some B matrix and b vector
B = [3 5 6 6
2 -2 -5 8
4 3 2 2
5 -8 -1 3]
b = [1,-1,2,-2]
f(x) = norm((B + x*x') \ b) # = ‖(B+xxᵀ)⁻¹b‖
# random test inputs:
x = [1,-2,4,6]
dx = [-3,5,0,2] * 1e-8 # a small change in x
f(x + dx) - f(x)
8.416810981515255e-6
y = (B + x*x') \ b
4-element Vector{Float64}: 17.396551724138078 -1.6206896551724292 1.9655172413793267 -6.948275862069026
∇f = -(1/f(x)) * (y*x' + (x'*y) * I) * ((B' + x*x')\y)# your formula from (a)
# this should match f(x+dx)-f(x) above,
# at least to a few digits:
df = ∇f'*dx
8.416807484728842e-6
It matches, pretty well, but not exactly to about 6 significant digits:
approx_df = f(x + dx) - f(x)
exact_df = ∇f'*dx
abs(approx_df - exact_df) / abs(exact_df) # fractional or "relative" error
4.15452820896041e-7
We shouldn't expect a better match than this because dx
here is small but not infinitesimal. So f(x + dx) - f(x)
is not exactly equal to the derivative times dx
. It is a finite-difference approximation to the derivative.
See also lecture 3, part 2 of our matrix-calculus course from IAP 2022.
Let $A_0= \begin{pmatrix} 3 & 1 & 2\\ 7 & 3 & -2\\ -2 & -5 & 4\\ \end{pmatrix}$, and consider the nonlinear function $$ g(x) = \underbrace{\begin{pmatrix} x_1 & & \\ & x_2 & \\ & & x_3 \end{pmatrix}}_{\mathrm{Diagonal}(x)\mbox{ in Julia}} A_0 x + x - \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} $$
Here, you will solve the nonlinear equations $g(x) = \vec{0}$ using Newton's method.
(a) Write down the $3\times 3$ Jacobian matrix $g'(x)$ as a function of $x$.
(b) Write a Julia function J(x)
below that computes your Jacobian formula from (a).
(c) Check that g(x+dx) ≈ J(x)*dx
for a small dx
at some x
.
(d) Plug your Jacobian, along with the provided g(x)
function, into the Newton iteration below, and use the starting guess $x = (1,1,1)$. Give the resulting solution to at least 5 significant digits (in every component).
Let's denote $\mathrm{Diagonal}(x)$ by $\mathrm{diag}(x)$, for brevity. Note that $d(\text{diag}(x)) = \text{diag](x + dx) - \text{diag](x) = \text{diag](dx)$.
Since $g(x)=\text{diag}(x)A_0x+x-1$, we have
$$dg(x)=d(\text{diag}(x))A_0x+\text{diag}(x)A_0dx+dx=\text{diag}(dx)A_0x+\text{diag}(x)A_0dx+dx \, .$$Notice that $\text{diag}(a)b$ is just the component-wise product of $a$ and $b$, so we have $\text{diag}(a)b=\text{diag}(b)a$. Thus:
$$dg(x)=\text{diag}(A_0x)dx+\text{diag}(x)A_0dx+dx=\bigl(\text{diag}(A_0x)+\text{diag}(x)A_0+I\bigr)dx$$and hence:
$$g'(x)=\boxed{\text{diag}(A_0x)+\text{diag}(x)A_0+I} . $$using LinearAlgebra
A₀ = [3 1 2; 7 3 -2; -2 -5 4]
g(x) = Diagonal(x) * (A₀*x) + x .- 1
g (generic function with 1 method)
J(x) = Diagonal(A₀ * x) + Diagonal(x) * A₀ + I # your answer from (b)
J([1,2,3]) # try it — this should give a 3x3 matrix
3×3 Matrix{Int64}: 15 1 2 14 14 -4 -6 -15 13
x = randn(3)
dx = randn(3) * 1e-8
g(x+dx) - g(x)
3-element Vector{Float64}: 2.633208602276227e-7 -1.0793489657601185e-7 3.9174144816911394e-8
J(x) * dx # this should match g(x+dx) - g(x) to a few digits
3-element Vector{Float64}: 2.6332085894549496e-7 -1.0793489378176674e-7 3.9174142869503106e-8
As in problem 2c, this does not match exactly because dx
is small but not infinitesimal, but matches to quite a few significant digits:
approx_dg = g(x+dx) - g(x)
exact_dg = J(x) * dx
norm(approx_dg - exact_dg) / norm(exact_dg) # fractional or "relative" error
1.2668479872663542e-8
Checking against finite-difference approximations like this is always a good idea when computing complicated derivatives by hand, because it is quite easy to make mistakes. A bug in your derivative code will almost always give a huge mismatch in the derivative, so a crude check like this is fine.
x = [1,1,1] # initial guess
# run 10 iterations of Newton's method
for i = 1:10
# show some output:
@show x
@show g(x)
println()
x = x - J(x) \ g(x)
end
x = [1, 1, 1] g(x) = [6, 8, -3] x = [0.38888888888888895, 0.7222222222222222, 1.1944444444444444] g(x) = [1.0524691358024696, 1.5277777777777781, 0.6589506172839501] x = [0.29521557410237326, 0.42675822525240636, 0.7651521622715932] g(x) = [0.13442762387231522, 0.20195573737161787, 0.022538924170478092] x = [0.27616346456373086, 0.3805238232380848, 0.713167520051225] g(x) = [0.003950645721066737, 0.007771942336304738, -0.0031886163236364284] x = [0.2755084320562215, 0.3787394184213428, 0.7122008856233119] g(x) = [3.722399865679904e-6, 1.428446952234097e-5, -6.153161123179096e-6] x = [0.27550808219966877, 0.3787354883129502, 0.7121989245327538] g(x) = [3.114397628678489e-12, 4.054756530535997e-11, -2.4525048658574633e-11] x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287167] g(x) = [0.0, 0.0, -2.220446049250313e-16] x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168] g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16] x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168] g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16] x = [0.2755080822002063, 0.37873548830102655, 0.7121989245287168] g(x) = [0.0, -1.1102230246251565e-16, 2.220446049250313e-16]
Note, as in class, that it converges extremely quickly, as can be seen by how rapidly $g(x)$ is going to zero.
It requires only 5 iterations to get the requested 5 significant digits, and the resulting solution is $\boxed{x = [0.27551, 0.37874, 0.71220]}$.
In 10 iterations, we have the solution to almost 15 digits, or machine precision: it is limited by the roundoff errors stemming from the finite number of digits that are used in computer arithmetic.
In problem 3 of pset 3, you considered $Ax = b$ for $$ A = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 4 & 6 \\ 0 & 0 & 1 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} \alpha \\ 6 \\ 1 \end{pmatrix} $$ and found that it only has solutions for $\alpha = 5$.
(a) Find a basis for $N(A^T)$, the left nullspace.
(b) Because $N(A^T)^\perp = C(A)$, the equation $Ax = b$ only has solutions when $b$ is perpendicular to a basis for $N(A^T)$. Plug in your answer from (a) to check that it is orthogonal to $b$ only when $\alpha = 5$, consistent with your solution in pset 3.
Note that $A^T=\begin{pmatrix}1&1&0\\2&2&0\\3&4&1\\4&6&2\end{pmatrix}$. After a row operation, it is simplified to $U = \begin{pmatrix}1&1&0\\0&0&0\\0&1&1\\0&0&0\end{pmatrix}.$ This has rank 2, and 1 special solution $$ x = \begin{pmatrix} p_1 \\ p_2 \\ 1 \end{pmatrix} $$ Plugging this into $Ux = 0$, we obtain the equations for the pivot variables $$ \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} p_1 \\ p_2 \end{pmatrix} = \begin{pmatrix} 0 \\ -1 \end{pmatrix} $$ and hence, by backsubstitution, $p_2 = -1$ and $p_1 = 1$.
Hence the special solution basis for $N(A^T)$ is $\boxed{\begin{pmatrix}1\\-1\\1\end{pmatrix}}$.
$\begin{pmatrix}1\\-1\\1\end{pmatrix}$ being orthogonal to $b$ is equivalent to $1\times\alpha + (-1) \times 6 + 1\times 1=0$, which gives $\boxed{\alpha=5}$ exactly as in pset 3.
Suppose $S$ is spanned by (1,2,3) and (4,5,6). Then $S^\perp$ is the nullspace of what matrix?
From the definition of column spaces, we have $S$ is the column space of $A=\begin{pmatrix}1&4\\2&5\\3&6\end{pmatrix}$. From what we know in the class, we have $C(A)^\perp = N(A^T)$. Thus $S^\perp$ is the nullspace of $A^T=\boxed{\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}}$.