Due Friday, October 14 at 11am.
Any linearly independent vectors form a basis, but some basis choices can have distorting effects on data. In particular, nearly parallel vectors can have unpleasant results as a basis.
Consider the vector space $\mathbb{R}^2$ and the nearly parallel basis vectors $\vec{a}_1 = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$ and $\vec{a}_2 = \begin{pmatrix} 1 \\ 10^{-6} \end{pmatrix}$.
(a) Write the vectors $\vec{b} = \begin{pmatrix} 1 \\ 0 \end{pmatrix} = \vec{a}_1 x_1 + \vec{a}_2 x_2$ and $\vec{c} = \begin{pmatrix} 1 \\ 0.1 \end{pmatrix} = \vec{a}_1 y_1 + \vec{a}_2 y_2$ in the basis $\vec{a}_1, \vec{a}_2$: find the coefficients $\vec{x}$ and $\vec{y}$.
(b) Compare $\Vert \vec{c}-\vec{b}\Vert$ to $\Vert \vec{y}-\vec{x}\Vert$: what is the ratio $\Vert \vec{y}-\vec{x}\Vert / \Vert \vec{c}-\vec{b}\Vert$ (to 2 significant digits)?
(c) Suppose that you instead have an orthonormal basis given by the columns of $Q$ (an $m\times n$ matrix … not just $\mathbb{R}^2$ in this part). For any vectors $\vec{b} = Q\vec{x}$ and $\vec{c} = Q\vec{y}$ in this basis, show that $\Vert \vec{y}-\vec{x}\Vert = \Vert \vec{c}-\vec{b}\Vert$: orthonormal bases "preserve distances".
Suppose we have four unknowns $\vec{x} = [x_1, x_2, x_3, x_4] \in \mathbb{R}^4$, and we want to solve the four nonlinear equations: $$ \underbrace{\begin{pmatrix} -5 & -1 & -2 \\ -1 & -2 & 4 \\ -2 & 4 & -8 \end{pmatrix}}_A \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} = x_4 \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} \, , \\ x_1^2 + x_2^2 + x_3^2 = 1 \, . $$
(a) Rewrite this equation in the form $\vec{f}(\vec{x}) = \vec{0}$, and compute both $\vec{f}(\vec{x})$ and its Jacobian matrix $J(\vec{x})$.
(b) Fill in the Julia functions below to compute $f$ and $J$, and check that $\vec{f}(\vec{x} + \vec{\delta}) - \vec{f}(\vec{x})\approx J(\vec{x}) \vec{\delta}$ for a $x=[1,2,3,4]$ and a "random"small vector $\delta = [-2,3,4,1] \times 10^{-8}$, using the sample code below.
(c) Starting with $x=[1,2,3,4]$, run a few steps of Newton's method (either copy-and-paste a few steps @show x = x - J(x) \ f(x)
by hand, or write a loop), and verify that it converges quickly. Check that after a few steps you have a very accurate solution: make sure that A*x[1:3]
is nearly x[4]*x[1:3]
and that x[1:3]'*x[1:3]
is nearly 1
. How many Newton steps did it take for x[4]
to have 4 correct decimal digits?
# part (b)
using LinearAlgebra
A = [ -5 -1 -2
-1 -2 4
-2 4 -8]
function f(x)
return ???? # return a 4-component vector
end
function J(x)
return ???? # return a 4x4 matrix of the Jacobian
end
# part (b) continued: check
x = [1,2,3,4]
δ = [-2,3,4,1] * 1e-8 # a "random" small vector
# these should be very close in value:
@show f(x+δ) - f(x)
@show J(x)*δ
# part (c)
x = [1,2,3,4]
# a single Newton step:
@show x = x - J(x) \ f(x)
# part (c): checks
@show f(x)
@show A*x[1:3]
@show x[4]*x[1:3]
@show x[1:3]'*x[1:3] # same as x[1]^2 + x[2]^2 + x[3]^2
# Was x a correct solution?
In class, we showed how to differentiate some matrix-valued functions: $d(A^2) = A dA + dA A$, which is a linear operator acting on $dA$, and also $d(A^{-1}) = -A^{-1} dA A^{-1}$.
(a) Write $d(A^4)$ as a linear operator acting on $dA$.
(b) We can compute $f(A) = x^T A^{-1} y$ (a scalar-valued function of a square invertible matrix $A$) by solving only one linear system $z = A^{-1}y$ (i.e. solving $Az = y$) and then taking a dot product. Show that $df$ (also a scalar!), for any infinitesimal change $dA$, can be computed by solving only one additional linear system $\_\_\_\_\_\_ = \_\_\_\_\_\_$ (in addition to solving $Az = y$), and then taking some matrix/vector products. (This is called an "adjoint solve" by engineers or "backpropagation" by computer scientists.)
(c) Let $A = \begin{pmatrix} a_1 & a_3 \\ a_2 & a_4 \end{pmatrix}$ be a $2\times 2$ matrix. Let $\text{vec}(A) = \begin{pmatrix} a_1 \\ a_2 \\ a_3 \\ a_4\end{pmatrix} = \vec{a}$ be the vector formed by "stacking" the columns of $A$ (this is computed by vec(A)
in Julia). Equivalently, $\vec{a}$ are the coefficients you get from expanding $A$ in the basis ______________. Let $dA = \begin{pmatrix} da_1 & da_3 \\ da_2 & da_4 \end{pmatrix}$. Show that $\text{vec}(A dA + dA A) = (\text{some matrix}) \text{vec}(dA)$: find the $4\times 4$ "some matrix" (which expresses the linear operator $A dA + dA A$ in this "vectorized" basis). Check your answer in Julia for an arbitrary choice of A
and dA
:
# check your answer for part (c):
A = [1 2
3 4]
a = vec(A)
dA = [7 9
6 -3]
some_matrix = ????? # your answer from (c)
vec(A*dA + dA*A) ≈ some_matrix * vec(dA) # should return "true"
(From Strang, section 4.4.)
(a) If $A$ has three orthogonal columns of lengths 1,2,3, what is $A^T A$?
(b) Give an example of a matrix $Q$ with orthonormal columns but $QQ^T \ne I$.
(c) Give an example of two orthogonal vectors that are not linearly independent.
(d) If we have a basis $a_1 = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}, a_2, a_3$ of orthogonal vectors in $\mathbb{R}^3$ that are not normalized to length 1, and $$\underbrace{\begin{pmatrix} -1 \\ 3 \\ 2 \end{pmatrix}}_b = Ax = a_1 x_1 + a_2 x_2 + a_3 x_3\,,$$ then $x_1 = \_\_\_\_\_\_\_\_\_\_\_\_$.