Consider the following $5 \times 5$ matrix, which is almost upper triangular: $$ A = \begin{pmatrix} 2 & -2 & 0 & 1 & 1\\ 2 & -1 & 1 & -2 & 1\\ & 3 & 4 & -3 & 1\\ & & -2 & -5 & -3\\ & & & 7 & 3\\ \end{pmatrix} $$
Corresponding in Julia to:
A = [2 -2 0 1 1; 2 -1 1 -2 1; 0 3 4 -3 1; 0 0 -2 -5 -3; 0 0 0 7 3];
(a)
Compute (by hand) its LU factorization $A = LU$ via Gaussian elimination (i.e. give both $L$ and $U$).
Notice any special pattern to the nonzero entries in L and/or U?
(b)
Why doesn't your hand calculation match the output of Julia's lu(A)
function? Try it, it's not even close! (Do using LinearAlgebra
in order to access the lu
function.)
(c)
Suppose you carried out arithmetic at the same rate, but $A$ was 10 times larger: a $50\times 50$ nearly upper triangular matrix (upper triangular + nonzeros just below the diagonal).
About how much longer should part (a) take? 10 times, 100 times, 1000 times?
Is this different from how it would scale for a general matrix with no special pattern of zero entries?
a) We use Gaussian elimination to perform the following row operations on $A$:
$R_2-R_1, R_3 - 3R_2, R_4 - (-2R_3), R_5 - R_4$.
We get $L$ and $U$ as follows:
$$
L = \begin{pmatrix}
1 & 0 & 0 & 0 & 0\\
1 & 1 & 0 & 0 & 0\\
0 & 3 & 1 & 0 & 0\\
0 & 0 &-2 & 1 & 0\\
0 & 0 & 0 & 1 & 1\\
\end{pmatrix}, \quad
U = \begin{pmatrix}
2 & -2 & 0 & 1 & 1\\
0 & 1 & 1 & -3 & 0\\
0 & 0 & 1 & 6 & 1\\
0 & 0 & 0 & 7 & -1\\
0 & 0 & 0 & 0 & 4\\
\end{pmatrix}
$$
The only nonzero entries in $L$ are the ones along the diagonal and the ones just below the diagonal: this special pattern is called a bidiagonal matrix. The reason it arises here is because our $A$ matrix has only one nonzero entry below the diagonal, so it ends up requiring only a single elimination step per column.
$U$, on the other hand, has no special nonzero pattern; it is just a generic upper-triangular matrix, such as we always get from Gaussian elimination.
b) The output of lu(A)
does not match our hand calculation because lu
implemented in Julia chooses to do some row swaps, as we can tell if we look at the permutation:
using LinearAlgebra
lu(A)
LU{Float64, Matrix{Float64}, Vector{Int64}} L factor: 5×5 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 -0.0 1.0 0.0 1.0 0.333333 0.166667 -0.166667 1.0 U factor: 5×5 Matrix{Float64}: 2.0 -2.0 0.0 1.0 1.0 0.0 3.0 4.0 -3.0 1.0 0.0 0.0 -2.0 -5.0 -3.0 0.0 0.0 0.0 7.0 3.0 0.0 0.0 0.0 0.0 0.666667
lu(A).P # the permutation matrix
5×5 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0
lu(A).p # the permutation vector (the permuted order of the rows)
5-element Vector{Int64}: 1 3 4 5 2
Of course, Julia is free to re-order the rows if it wants to, which results in different $L$ and $U$ matrices, but why? It's not necessary in order to avoid zeros, as you saw by hand.
As we alluded to in Lecture 5, in practice computers almost always do row swaps during Gaussian elimination. It turns out that they are not just worried about pivots being zero, but in fact they want to avoid pivots that are merely small, in order to avoid exacerbating roundoff errors. The algorithm used by Julia (and, in fact, in virtually all computer numerical linear-algebra systems), is called "partial pivoting" — it swaps rows to make the pivot as big as possible.
We can get the same result as our hand calculation by using lu(A, NoPivot())
, which forces Julia to avoid row swaps:
using LinearAlgebra;
lu(A, NoPivot())
LU{Float64, Matrix{Float64}, Vector{Int64}} L factor: 5×5 Matrix{Float64}: 1.0 0.0 0.0 0.0 0.0 1.0 1.0 0.0 0.0 0.0 0.0 3.0 1.0 0.0 0.0 0.0 0.0 -2.0 1.0 0.0 0.0 0.0 0.0 1.0 1.0 U factor: 5×5 Matrix{Float64}: 2.0 -2.0 0.0 1.0 1.0 0.0 1.0 1.0 -3.0 0.0 0.0 0.0 1.0 6.0 1.0 0.0 0.0 0.0 7.0 -1.0 0.0 0.0 0.0 0.0 4.0
c) Each row operation requires $\Theta(n)$ work (i.e. proportional to the number $n$ of columns). However, for this matrix structure (which is called upper Hessenberg), we only need to eliminate one nonzero per column, which means that we only need to do $n$ row operations (or actually $n-1$ since the last column has nothing to eliminate). Hence, the total number of arithmetic operations grows as $\boxed{\Theta(n^2)}$, i.e. roughly proportional to $n^2$.
Thus, if $A$ were $10$ times larger, we would expect to do about $10^2 = 100$ times as much arithmetic.
This is very different from how it would scale for a general matrix with no special pattern of zero entries. For a general matrix, the number of operations for elimination $\Theta(n^3)$, where $n$ is the size of the matrix, because you have to eliminate $n(n-1)/2$ nonzero entries below the diagonal in general. In this case, a $10\times$ larger matrix would require about $1000\times$ as much arithmetic work.
Consider the three $4 \times 4$ matrices $$ A = \begin{pmatrix} 1 & -3 & 0 & 3\\ & 1 & 1 & 2\\ & & 1 & -1\\ & & & 1\\ \end{pmatrix}, \; B = \begin{pmatrix} -3 & -1 & 1 & -2\\ -3 & 0 & 0 & -3\\ 1 & 3 & 3 & -3\\ 2 & 3 & 2 & 0\\ \end{pmatrix}, \; C = A (AB)^{-1} $$
(a) Compute the third column of $C^{-1}$ without computing the whole inverse of any matrix. Think before plugging-and-chugging!
(b) Check your answer by explicitly computing $C^{-1}$ in Julia.
# fill these in:
A = ??????
B = ??????
C = A / (A*B) # computes A(AB)⁻¹
C^-1 # computes C⁻¹
(a) We can compute the third column of $C^{-1}$ without computing the whole inverse of any matrix. Let $x$ be the third column of $C^{-1}$. Then $x = C^{-1} e_3$, where $e_3 = (0,0,1,0)$ is the third column of the $4\times 4$ identity matrix $I$. (In fact, this is true for any matrix: if you multiply it by a column of $I$, you get the corresponding column of the matrix.)
$C = A (AB)^{-1}$ implies that $C^{-1} = AB A^{-1}$, and we want to compute $$ x = C^{-1} e_3 = A\underbrace{B \underbrace{A^{-1} e_3}_y}_z $$ We can then do this in three steps:
Compute $y = A^{-1} e_3$, i.e. solve $Ay = e_3$ for $y$. As $A$ is upper triangular, we can solve for $y$ by backsubstitution to obtain $y = \begin{pmatrix} -3 \\ -1 \\ 1 \\ 0 \end{pmatrix}$.
Compute $z = By$. This is just a multiplication. We obtain $z = \begin{pmatrix} 11 \\ 9 \\ -3 \\ -7 \end{pmatrix}$
Compue $x = Az$. This is just a multiplication. We obtain $x = \begin{pmatrix} -37 \\ -8 \\ 4 \\ -7 \end{pmatrix}$.
(Note that it would be a lot more work to compute any matrix inverses here. It would also be a lot more work to multiply $AB$ and then multiply by $y$ as $(AB)y$, rather than computing $A(By)$ as above. Parentheses can make a big practical difference in the amount of arithmetic, even though they theoretically do not change the result!)
(b) We can compute $C^{-1}$ in Julia as follows:
A = [1 -3 0 3; 0 1 1 2; 0 0 1 -1; 0 0 0 1];
B = [-3 -1 1 -2; -3 0 0 -3; 1 3 3 -3; 2 3 2 0];
C = A / (A*B) # computes A(AB)⁻¹
C^-1 # computes C⁻¹
4×4 Matrix{Float64}: 12.0 44.0 -37.0 -154.0 2.0 15.0 -8.0 -50.0 -1.0 -3.0 4.0 10.0 2.0 9.0 -7.0 -31.0
As in our hand calculation, the third column is indeed $(-37,-8,4,-7)$.
(Based on Strang problems.)
Which of the following subsets of $\mathbb{R^3}$ are actually subspaces? (Recall that $[x_1,x_2,x_3]$, with commas, denotes a column vector.) If it is not a subspace, give a rule that it violates.
Nowadays, numerical linear algebra programs deal with small pivots by swapping rows, leading to the $PA = LU$ factorization. Prof. Edelman mentioned that, in the elder days, people would occasionally swap columns instead, leading to a factorization $AP = LU$ for some (different) permutation $P$ (and different $L$ and $U$ matrices).
Given a factorization $AP = LU$ for a nonsingular matrix $A$, outline how you would use it to solve $Ax=b$ for $x$ (analogous to how we outlined the use of $PA=LU$ in class).
(You can exploit the fact, explained in section 2.7 of the textbook, that you can invert permutations "for free": $P^{-1} = P^T$ for any permutation matrix $P$, where the transpose $P^T$ of a matrix is simply formed by swapping the rows of $P$ with the columns of $P$. We will spend more time on this later in 18.06.)
We are given $AP = LU$, which can be rewritten as $A = LUP^{-1}$ ($= LUP^T$) or equivalently $A^{-1} = P U^{-1} L^{-1}$. We can therefore rewrite $Ax = b$ as $LUP^{-1}x = b$, or equivalently
$$ x = A^{-1} b = P \underbrace{U^{-1} \underbrace{L^{-1} b}_c}_y $$This can be solved in three steps:
Note that we didn't actually need to invert the permutation $P$, although it is easy to do if we wanted to.
Suppose we have a $3 \times 3$ matrix $A$. We will transform this into a new matrix $B$ by doing operations on the rows or columns of $A$ as follows. For each part, explain how to compute $B$ as $B=EA$ or $B=AE$ (say which!) and give the matrix $E$.