# 18.06 Pset 7 (Due Wed 10/25 @ 11am)

## Problem 1 (10 points)

Working with a basis that is *orthogonal* but not *orthonormal* (i.e. not normalizing vectors to length 1) is often convenient, since some other normalization might be convenient in some applications (e.g. in hand calculations square roots are annoying). In this problem, you will see what happens if you don't normalize an orthogonal basis to unit length.

Suppose that we have a basis $v_1, v_2, \ldots, v_n \in \mathbb{R}^m$ for some subspace $S \subseteq \mathbb{R}^m$, which are orthogonal but not normalized to unit lengths. That is:
$$
v_i^T v_j = \begin{cases} 0 & \mbox{if } i \ne j \\ s_i & \mbox{otherwise} \end{cases}
$$
where $v_i^T v_i = \Vert v_i \Vert^2 = s_i > 0$ are the lengths² of the vectors.

1. If $V = \begin{pmatrix} v_1 & v_2 & \cdots & v_n \end{pmatrix}$ is the $m\times n$ matrix whose columns are the basis vectors $v_i$, what is $V^T V$?

2. In terms of $V$ and/or the vectors $v_i$, write out the projection matrix $P$ onto $S$ (a) as product of matrices and (b) as a sum of rank-1 matrices.

3. Why is computing the projection $Pb$ of a vector $b$ much cheaper than computing $A(A^T A)^{-1}A^T b$ for an arbitray $m \times n$ full-column-rank matrix $A$? (Be quantitative. The latter requires $\sim mn + n^3$ operations; what does $Pb$ require using your expressions from the previous part?)

## Problem 2 (20 points)

Often, in least-square fitting, one wants to *weight* each data point inversely with the (given) error estimates (variances) $\sigma_i^2 > 0$. Data points with more error should *count for less* in the fit.

That is, if we are trying to fit $y = Ax$ to "observations" $b \in \mathbb{R}^m$ with fit parameters $x \in \mathbb{R}^n$, then we try to minimize:
$$
E = \sum_{i=1}^m \frac{(y_i - b_i)^2}{\sigma_i^2}
$$

It turns out that this process corresponds to using a *modified dot product* "$\cdot_\sigma$"
$$
y \cdot_\sigma z = \sum_{i=1}^m \frac{y_i z_i}{\sigma_i^2} = y^T W z
$$
where $W$ is the $m \times m$ diagonal matrix
$$
W = \begin{pmatrix} \sigma_1^{-2} & & & & \\
 & \sigma_2^{-2} & & & \\
 & & \sigma_3^{-2} & & \\
 & & & \ddots & \\
 & & & & \sigma_m ^{-2} \end{pmatrix}
$$
We can also define the *modified length* $\Vert y \Vert_\sigma = \sqrt{y \cdot_\sigma y} = \sqrt{y^T W y}$.

1. Write the error $E$ from above as an expression in terms of $Ax$, $b$, and $\cdot_\sigma$ or $\Vert \cdots \Vert_\sigma$.

2. What is the projection matrix $P_\sigma$ that projects $b$ into a vector $p \in C(A)$ so that the "error" $b - p$ is orthogonal to $C(A)$ under *our modified dot product*? i.e. what is the new form of orthogonal projection?

3. Derive an equation, analogous (but not identical!) to the normal equations $A^TA\hat{x}=A^Tb$, for the $\hat{x}$ that minimizes $E$. (See e.g. the derivation of least-squares in class or in the book; the "by algebra" derivation in section 4.3 is particularly short.)

4. Outline the steps of a Gram-Schmidt process that converts a sequence of vectors $a_1,a_2,\ldots$ into vectors $q_1,q_2,\ldots$ that are orthonormal under our modified dot product. i.e. it converts a matrix $A$ into a matrix $Q$ with $Q^T W Q = I$ and $C(A)=C(Q)$.
 - If $A = QR$, then give a formula for $R = \cdots$ in terms of $A$, $Q$, and $W$. (This should give an upper-triangular $R$!)
 
5. Apply your Gram-Schmidt process to the following 4 vectors and $W$ matrix in Julia (let Julia be your calculator for matrix/vector operations), and report the value of the *last* vector $q_4$. (At the end, check using the given code that `Q'*W*Q` is nearly I).

In [None]:
# our four vectors a1,a2,a3,a4 and our W matrix
A = [ -1 1 -5 -4
 4 1 -6 -2
 2 -3 -7 1
 0 1 -6 -8
 -7 1 0 5
 -5 8 -5 -5 ]
a1 = A[:,1]
a2 = A[:,2]
a3 = A[:,3]
a4 = A[:,4]
W = diagm([1,2,3,4,5,6])

In [None]:
# functions for our modified dot product and norm
mydot(x,y) = x'*W*y
mynorm(x) = sqrt(mydot(x,x))

In [None]:
q1 = a1 / mynorm(a1) # here's the first q vector for you
q2 = ????
q3 = ????
q4 = ????

Check that $Q^T W Q \approx I$:

In [None]:
Q = [q1 q2 q3 q4] # put your four vectors into a matrix
Q' * W * Q

# Problem 3 (10 points)

Suppose that we *reverse the order* of the columns of $A$, do Gram-Schmidt, and then *reverse the order* of the resulting basis to get back a matrix $Q$. Then $A = QS$ for some matrix $S$ — what is the expected pattern of nonzero entries in $S$, and why? (i.e. is $S$ upper-triangular or ...?)

The following Julia code tries this process for a random matrix $A$ to get $Q$. Give a formula for $S = \cdots$ and compute it in Julia to check your answer (or to give you a hint about what to look for):

In [None]:
# a randomly chosen 10×4 matrix with small integer entries
A = [ 6 -5 0 6
 9 0 -8 6
 4 -8 -7 0
 -9 6 1 -1
 7 3 -3 -4
 1 4 4 6
 8 4 0 9
 7 0 -5 -5
 -2 8 -8 -3
 -1 7 -9 -1]

In [None]:
Arev = flipdim(A,2) # A with the columns in reverse order

In [None]:
Qrev = qr(Arev)[1] # the Q from QR factorizing Arev (equivalent to Gram–Schmidt)
Q = flipdim(Qrev,2) # reverse the columns to go back in the same order as A

## Problem 4 (10 points)

(From Strang, section 4.4, problem 18.)

Find orthogonal vectors $q_1, q_2, q_3$ by Gram-Schmidt from $a_1, a_2, a_3$ given by:
$$
a_1 = \begin{pmatrix} 1 \\ -1 \\ 0 \\ 0 \end{pmatrix}, \;
a_2 = \begin{pmatrix} 0 \\ 1 \\ -1 \\ 0 \end{pmatrix}, \;
a_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \\ -1 \end{pmatrix},
$$
which are a basis for the vectors perpendicular to $d = \begin{pmatrix} 1 \\ 1 \\ 1 \\ 1 \end{pmatrix}$. If you form the $4\times 3$ matrix $Q = \begin{pmatrix} q_1 & q_2 & q_3 \end{pmatrix}$, then what is $Q^T d$?

## Problem 5 (10 points)

(From Strang, section 4.4, problem 37.)

We know that $P = QQ^T $ is the projection onto $C(Q)$, where $Q$ is an $m \times n$ matrix with orthonormal columns. Now add another column $a$ to produce $A = \begin{pmatrix} Q & a \end{pmatrix}$. Gram-Schmidt on $A$ replaces $a$ by what vector $q$? (Give a formula in terms of $a$ and $Q$ and/or $P$.)