Due 11am Wednesday March 30. This will be the last problem set covered on exam 2.
Suppose that $A$ is an $m \times n$ "tall" matrix ($m > n$) with full column rank (rank $n$). Consider the "wide" (underdetermined, i.e. fewer equations than unknowns) system of equations:
$$ A^T x = b $$You are also given the QR factorization $A = QR$ of $A$, where $Q$ is an $m \times n$ matrix with orthonormal columns and $R$ is an $n \times n$ invertible upper-triangular matrix. (For example, maybe you got this by Gram-Schmidt on $A$.) This corresponds to a factorization $A^T = R^T Q^T$ called an "LQ" factorization of $A^T$ (because $L = R^T$ is lower-triangular), which is used in practice to find minimum-norm solutions to underdetermined systems of equations (as opposed to least-square solutions of overdetermined systems), as you will explore in this problem.
(a) $A^T x = b$ has a solution for any $b \in \mathbb{R}^n$ because ____________ but the solution $x$ is not unique because __________.
(b) We can find a unique solution $x_0 \in C(A)$ of $A^T x_0 = b$, given by the formula ____________ in terms of $A,b$.
(c) Alternatively, any $x \in C(A)$ can be written $x = Qy$ for some $y\in \mathbb{R}^n$ because ____________. We can find the same solution $x_0 = Qy$ of $A^T x_0 = b$ as in (b), with $y$ given by the formula ____________ in terms of $Q,R,b$ (plug in $A=QR$ and simplify!).
(d) Is the solution $x_0$ in (b) or (c) generally the same as the "particular" solution we would have gotten from the exam-1 approach of setting the free variables equal to zero? Why or why not? (Maybe try with $A^T = [1\;1]$ and $b=[1]$.)
(e) In terms of $x_0$ from (b) or (c), any solution $x$ of $A^T x = b$ (the "complete" solution) can be written as $x = x_0 + x_1$ where $x_1$ is any vector in the subspace ____________. The solution $\hat{x}$ with the minimum norm $\Vert x \Vert$ is therefore $\hat{x} = \_\_\_\_\_\_$. (Hint: write $\Vert x \Vert^2 = x^Tx$ in terms of $x_0$ and $x_1$, remembering orthogonality of subspaces.)
(f) In Julia, x̂ = W \ b
for any "wide" full-row-rank matrix W
computes the minimum-norm solution $\hat{x}$ to $Wx = b$. Check that this matches your answer from parts (c) and (d) for a random $A$ using the code below:
using LinearAlgebra
A = randn(10, 3) # a random full-rank "tall" matrix
QR = qr(A) # its QR factorization
Q = Matrix(QR.Q) # the 10x3 "thin" Q factor
R = QR.R # and the 3x3 R factor
b = randn(3) # a random right-hand side
x̂ = A' \ b # the minimum-norm solution to Aᵀx = b using the built-in \
y = ?????? # your solution from (c), a formula in terms of Q, R, and/or b
x₀ = Q * y
x̂ₑ = ???? # your solution from (e)
x̂ₑ ≈ x̂ # should print "true"
(a) Apply Gram-Schmidt to the polynomials ${1, x, x^2}$ to find an orthonormal basis ${q_1,q_2,q_3}$ of polynomials under the inner ("dot") product (different from the one used in class): $$ f \cdot g = \int_0^\infty f(x) g(x) e^{-x} dx $$ (Unlike the Legendre polynomials in class, normalize your polynomials $q_k$ to have $\Vert q_k \Vert = \sqrt {q_k \cdot q_k} = 1$ under this inner product, i.e. to be really orthonormal.)
(b) Consider the function $f(x) = \begin{cases} x & x < 1 \\ 0 & x \ge 1 \end{cases}$. Find the degree-1 polynomial p(x)=αx+β that is the "best fit" to $f(x)$ in the sense of minimizing $$ \Vert f - \alpha x - \beta \Vert^2 = \int_0^\infty \left[ f(x) - p(x) \right]^2 e^{-x} dx $$ In particular, find $p(x)$ by performing the orthogonal projection (with this dot product) of $f(x)$ onto .........?
In a common variant of least-squares fitting called "ridge" or "Tikhonov" regression to get more robust solutions to noisy fitting problems, one minimizes: $$ f(x) = \Vert b - Ax \Vert^2 + \lambda \Vert x \Vert^2 $$ instead of just $\Vert b - Ax \Vert^2$, for some "penalty" parameter $\lambda \ge 0$. As $ \lambda$ gets bigger and bigger, this favors smaller solutions $x$. Here, $A$ is $m \times n$.
(a) Write this same $f(x)$ as $f(x) = \Vert c - Bx \Vert^2$ for some matrix $B$ and some vector $c$ defined in terms of $A,\lambda,b$. Hint: $\Vert x \Vert^2 + \Vert y \Vert^2$ equals the length of a single vector ______ made by stacking $x$ and $y$.
(b) Since part (a) rewrote $f(x)$ as an ordinary-looking least-squares problem, use the normal equations for $B$ and $c$ to write down an equation $(n\times n \mbox{ matrix})\hat{x} = (\mbox{right-hand side})$ for the minimizer $\hat{x}$ in terms of $A,\lambda,b$.
(c) For $\lambda > 0$, show that (b) has a unique solution even if $A$ is not full column rank (i.e. $A^TA$ is singular). In particular, why does your matrix on the left-hand-side have a nullspace of {0}?
A really common strategy in mathematics is to try to rewrite new problems so that they look like old problems, letting us re-use the old solutions!