Pset 3 (spring 2020 revised 8pm)
(Optional: if you wish to download this notebook as an .ipynb file, use the download button in the upper right and OPTION-click (MAC) or ALT-Click (Linux and Windows ) then "Save Link as.." or "Download Linked File As.." or something similar on your browser.)
Problem 1. Let x̄$\ = \frac{1}{n}\sum_{i=1}^n x_i$ denote the mean of x. What is sum( x.-x̄). (Remember that x.-x̄ is the vector x with an elementwise subtraction of x̄). Explain your answer.
If you wish to try some examples in Julia (optional):
using Statistics
n = 8
x = rand(n)
x̄ = mean(x)
sum( x .- x̄)
-2.220446049250313e-16
Solution 1. The sum is zero, as it is equal to $\sum_{i=1}^n(x_i-x̄)=\sum_{i=1}^n(x_i)-nx̄=n\sum_{i=1}^n(\frac{x_i}{n})-nx̄=nx̄-nx̄=0$.
Problem 2. (We may supply some hints as we go, so check back in )
This problem shows how the QR idea can derive the formulas for best fit lines.
a)Suppose (xᵢ,yᵢ) for i=1,...m are data which we will fit with a best least squares line. Let
a = x .- x̄ and b = y .- ȳ.
The slope of the best line through the (aᵢ,bᵢ) is 1) the same 2) bigger 3) smaller
than the line through the (xᵢ,yᵢ) for i=1,...m ? Explain.
b) In terms of a and b, write an m x 2 matrix A and a right hand side expressing $A [slope; intercept] = $ right hand side.
c) What is the dot product of the first columns of A with the second?
d) Use the result in c to derive the QR factorization of $A$ without too much hard work.
e) What is the slope of the best least squares line?
f) The intercept is very simple? Why is this result obvious?
g) What are the slope and intercept for the original data (xᵢ,yᵢ) without any hard work?
Solution 2. a) 1) The same. After the translation of $(-x̄,-ȳ)$, the best fit line for $(x_i,y_i)$ coincide with that for $(a_i,b_i)$. This is because parallel translation preserves the distance between points and lines, which implies the error function of points with respect to a line is the same as that after translation.
b) $\begin{pmatrix} a_1 & 1 \\ a_2 & 1 \\ \vdots & \vdots \\ a_m & 1 \end{pmatrix}\begin{pmatrix} slope \\ intercept \end{pmatrix} \approx \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{pmatrix}$
c) $0$, by Problem 1.
d) $\begin{pmatrix} a_1 & 1 \\ a_2 & 1 \\ \vdots & \vdots \\ a_m & 1 \end{pmatrix}=\begin{pmatrix} a_1 / \|a\| & 1/ \sqrt{m} \\ a_2/\|a\| & 1/ \sqrt{m} \\ \vdots & \vdots \\ a_m/\|a\| & 1/ \sqrt{m} \end{pmatrix}\begin{pmatrix} \|a\| & 0 \\ 0 & \sqrt{m} \end{pmatrix}$
e) $slope= [1 \ 0 ]R^{-1}Q^T b$ = $(1 / \|a\|) [1 \ 0] Q^Tb = (a^Tb)/\|a\|^2$ which can be recognized as a formula for β̂ in the simple linear regression formula in wikipedia.
f) We can expect that if we remove the mean, the data should go through (0,0) that is the intercept is 0. One can also directly look at the second element of $Q^Tb$ which is readily seen to be 0 because of the solution to problem 1.
g) We expect the line to shift from $(0,0)$ to $(x̄,ȳ)$.
Part (a) and (e) tell us that the slope is $\frac{\sum_{i=1}^ma_ib_i}{\sum_{i=1}^ma_i^2}$. We then have y = (slope)x + intercept, and we expect $(x̄,ȳ)$ on the line, so the intercept has to be $ȳ-\frac{\sum_{i=1}^ma_ib_i}{\sum_{i=1}^ma_i^2}x̄$.
Problem 3. Suppose A=QR is square, where Q is orthogonal and R is upper triangular and invertible. Write the solution to Ax=b in terms of b and possibly Q,Qᵀ,and R.
Solution 3. $x=R^{-1}Q^Tb$ because $x=A^{-1}b=(QR)^{-1}b=R^{-1}Q^{-1}b=R^{-1}Q^Tb$.
Problem 4. Set up a matrix least squares problem if we are interested in taking n data points $(x_i,y_i)$ and we wish to find the best function f(x)=$c_1e^x+c_2e^{−x}$ through the data points.
Solution 4.
$$ \begin{pmatrix} e^{x_1} & e^{-x_1} \\ e^{x_2} & e^{-x_2} \\ \vdots & \vdots \\ e^{x_n} & e^{-x_n} \end{pmatrix} \begin{pmatrix} c_1 \\ c_2 \end{pmatrix} \approx \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix} $$Problem 5. Find $A^T$ and $A^{-1}$ and $(A^{-1})^T$ and $(A^T)^{-1}$ for $A=\left( \begin{matrix} 1 & 0 \\ 9 & 3 \end{matrix}\right)$.
Solution 5. $A^T=\begin{pmatrix} 1 & 9 \\ 0 & 3\end{pmatrix}$ , $A^{-1}= \begin{pmatrix} 1 & 0 \\ -3 & 1/3 \end{pmatrix}$, $(A^{-1})^T= (A^T)^{-1} = \begin{pmatrix} 1 & -3 \\ 0 & 1/3 \end{pmatrix}$.
6a. The block matrix $\begin{pmatrix} 0 & A\\ A & 0 \end{pmatrix}$ is automatically symmetric.
6b. If A and B are symmetric then their product AB is symmetric.
6c. If A is not symmetric then $A^{−1}$ is not symmetric.
6d. When A,B,C are symmetric, the transpose of ABC is CBA
If $A=A^T$ and $B=B^T$, which of these matrices are certainly symmetric?
6e. $A^2−B^2$
6f. $(A+B)(A−B)$
6g. $ABA$
6h. $ABAB$
Solution 6.
6a. True if you assume $A$ is symmetric. False otherwise. The wording of the problem left this
unclear, so either answer could be "right." The block matrix is symmetric because the transpose of the block is
$\begin{pmatrix} 0 & A^T\\ A^T & 0 \end{pmatrix}$ =
$\begin{pmatrix} 0 & A\\ A & 0 \end{pmatrix}$ if you assume $A$ is symmetric.
Otherwise the block would not be symmetric.
6b. False. No, it is easy to construct counterexamples.
6c.
True. "If A is not symmetric then $A^{−1}$ is not symmetric".
Logically this is equivalent to if $A^{−1}$ is symmetric then so is $A$ or even
if $A$ is symmetric and invertible, so is $A^{-1}$. If $A$ is symmetric, one can tranpose $AA^{-1}=I$
to obtain $(A^{-1})^TA^T=I = (A^{-1})^TA$ showing A $A^{-1}=(A^{-1})^T$ which means $A^{-1}$ is symmetric.
6d "When A,B,C is symmetric, the transpose of ABC is CBA is true by taking transposes.
6e) $A^2−B^2$ and 6g)$ABA$ must be symmetric by taking transposes. Taking $A=\begin{pmatrix} 1 & 1\\ 1 & 0 \end{pmatrix}$ and $B= \begin{pmatrix} 1 & 1\\ 1 & 1 \end{pmatrix}$ provides non-symmetric examples in 6f and 6h.
Problem 7 (GS p.106 13)
Compute U and L for the symmetric matrix A:
$\left( \begin{matrix}
a & a & a & a \\
a & b & b & b \\
a & b & c & c \\
a & b & c & d
\end{matrix} \right) $.
Find four conditions on a,b,c,d to get A=LU with four pivots.
Solution 7. $\begin{pmatrix} a & a & a & a \\ a & b & b & b \\ a & b & c & c \\ a & b & c & d \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 1 & 1 & 0 \\ 0 & 1 & 0 & 1 \end{pmatrix}\begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 1 \end{pmatrix}\begin{pmatrix} a & a & a & a \\ 0 & b-a & b-a & b-a \\ 0 & 0 & c-b & c-b \\ 0 & 0 & 0 & d-c \end{pmatrix}$. Hence $\begin{pmatrix} a & a & a & a \\ a & b & b & b \\ a & b & c & c \\ a & b & c & d \end{pmatrix}=\begin{pmatrix} 1 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 \end{pmatrix}\begin{pmatrix} a & a & a & a \\ 0 & b-a & b-a & b-a \\ 0 & 0 & c-b & c-b \\ 0 & 0 & 0 & d-c \end{pmatrix}$ is the LU factorization. $A$ has four pivot if and only if $a,b-a,c-b,d-c$ are non-zero.
Problem 8. First computation of a singular value decomposition.
Here we wish for you to become familiar with an SVD by changing the m and n from what appears here and testing that the results have the required properties. You merely need to execute and turn in a screen shot. Describe in a few words what kind of matrix is U, V, and diagm(Σ) for your chosen m and n. Say sizes, and use words like orthogonal, tall skinny orthogonal, diagonal, triangular etc.
One version of the SVD produces $$ A = U * Diagonal(\Sigma) * V',$$ where $U'U=I$, $V'V=I$, $\Sigma=(\sigma_1,\sigma_2,...)$ with $\sigma_1 \ge \sigma_2 \ge ... \gt 0$.
$U$ is $m \times n$ and $V$ is $n \times n$ if $ m \ge n$, and
$U$ is $m \times m$ and $V$ is $n \times m$ if $ m \lt n$.
using LinearAlgebra
A = rand(5,3)
U,Σ,V = svd(A)
display(Σ)
3-element Array{Float64,1}: 2.0908607204660132 0.5438073613767322 0.1985541583756679
U'U ≈ I
true
V'V ≈ I
true
# A ≈ U * diagm(Σ) * V'
A ≈ U * Diagonal(Σ) * V'
true
Solution 8. The following answer depends on the choice of $m$ and $n$. First case is when $m > n$, (e.g. $A$ is a $5 \times 3$ matrix as mentioned above). $U$ is a tall skinny orthogonal matrix of size $m \times n$; $Diagonal(\Sigma)$ is a diagonal matrix of size $n \times n$ and $V$ is an orthogonal matrix of size $n \times n$.
Second case is when $m < n$. $U$ is an orthogonal matrix of size $m \times m$; $Diagonal(\Sigma)$ is a diagonal matrix of size $m \times m$ and $V$ is a tall skinny orthogonal matrix of size $n \times m$.
Third case is when $m=n$. $U$ is an orthogonal matrix of size $m \times m$; $Diagonal(\Sigma)$ is a diagonal matrix of size $m \times m$ and $V$ is an orthogonal matrix of size $m \times m$.
Problem 9. Suppose a square $A$ has an LU factorization $A=LU$ where $L$ and $U$ are invertible. If $A=QR$, what is $r_{11}$ in terms of possibly elements of $L$ and $U$?
Solution 9. $r_{11}=U_{11} \sqrt{\sum_i L_{i1}^2}$. We know that $r_{11}$ is the norm of first column vector of $A$, which is equal to $U_{11}$ times the first column vector of $L$. (technically it could be $\pm$ the above answer)
Problem 10 . Suppose an n x 2 matrix $A$ is written as $QR$, where $Q$ is a tall-skinny orthogonal matrix and is also n x 2 and R = $ \left( \begin{matrix} 1 & 3 \\ 0 & 4 \end{matrix} \right) $
What is the norm of the second column of $A$? Briefly explain?
Solution 10. Its norm is $5$. Since $A^TA=R^TQ^TQR=R^TR$, it is true that the norm of every column of $A$ is the norm of the corresponding column of $R$ and also dot products of columns of $A$ are that of $R$ as well.