Suppose we have a $5\times 5$ diagonalizable matrix $A$ with diagonalization: $$ A = X \underbrace{\begin{pmatrix} 1 & & & & \\ & 2 & & & \\ & & 2 & & \\ & & & 3 & \\ & & & & 4 \end{pmatrix}}_\Lambda X^{-1} \, $$ and another non-diagonalizable matrix $B$ with the Jordan form: $$ B = Y \underbrace{\begin{pmatrix} 1 & & & & \\ & 2 & 1 & & \\ & & 2 & & \\ & & & 3 & \\ & & & & 4 \end{pmatrix}}_J Y^{-1} $$ You are given a function $$ f(\mu, y) = \Vert (M - \mu I)^{-1} y \Vert $$ where $M$ is either $A$ or $B$, but you aren't told which one.
Now, you pick a $y$ at random and evaluate $$\alpha = \frac{f(2.00001, y)}{f(2.0001, y)} \, $$
(a) What would you expect $\alpha$ to be, approximately, if $M = A$? (Imagine expanding $y$ in the basis of the …)
(b) What would you expect $\alpha$ to be, approximately, if $M = B$? (Imagine expanding $y$ in the basis of the …)
(c) Check your predictions from (a) and (b) in Julia for a random X = randn(5,5)
, a random Y = randn(5,5)
, and a random y = randn(5)
. For convenience, the matrices Λ
and J
are entered in Julia below.
(a) We are given a diagonalization of the matrix $A$, so we can think in terms of eigenvectors and eigenvalues. In matrix form, $(M-\mu I)^{-1}=(X\Lambda X^{-1}-\mu I)^{-1}=\bigl(X(\Lambda-\mu I)X^{-1}\bigr)^{-1}=X(\Lambda - \mu I)^{-1} X^{-1}$. Equivalently, if we write $y$ in the basis of eigenvectors of $A$ (i.e., the columns of $X$): $y=c_1x_1+\cdots+c_5x_5$, we have $p(A)y=c_1p(\lambda_1)x_1+\cdots+c_5p(\lambda_5)x_5$. Here $p(\lambda)=(\lambda-\mu)^{-1}$, so when $\mu$ approches $2$, $p(A)y$ is dominated by the terms invoving $p(2)$, i.e., $c_2x_2$ and $c_3x_3$. Thus $f(\mu,y)\approx \lVert (2-\mu)^{-1}(c_2x_2+c_3x_3)\rVert$. In conclusion, $\alpha\approx \dfrac{\lvert(2-2.00001)^{-1}\rvert}{\lvert(2-2.0001)^{-1}\rvert}=\boxed{10}$.
(b) We can still write $y$ in the columns of $Y$: $y=c_1x_1+\cdots+c_5x_5$. But in this case, the Jordan form is not diagonal — the $1$ above the diagonal tells us that $B$ is defective with a repeated eigenvalue $\lambda=2$ that has only one independent eigenvector. $x_2$ is the eigenvector and $x_3$ is the generalized eigenvector corresponding to eigenvalue $2$. Thus $p(B)x_3=p(2)x_3+p'(2)x_2$. Since $p'(2)=-(2-\mu)^{-2}$, $p(B)y$ is dominated by the terms invoving $p'(2)$, i.e., $c_3x_2$. Thus $f(\mu,y)\approx \lVert -(2-\mu)^{-2}c_3x_2\rVert$. In conclusion, $\alpha\approx \dfrac{\lvert-(2-2.00001)^{-2}\rvert}{\lvert-(2-2.0001)^{-2}\rvert}=\boxed{100}$.
(c)
using LinearAlgebra
Λ = Diagonal([1,2,2,3,4])
5×5 Diagonal{Int64, Vector{Int64}}: 1 ⋅ ⋅ ⋅ ⋅ ⋅ 2 ⋅ ⋅ ⋅ ⋅ ⋅ 2 ⋅ ⋅ ⋅ ⋅ ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ 4
J = Bidiagonal([1,2,2,3,4],[0,1,0,0],:U)
5×5 Bidiagonal{Int64, Vector{Int64}}: 1 0 ⋅ ⋅ ⋅ ⋅ 2 1 ⋅ ⋅ ⋅ ⋅ 2 0 ⋅ ⋅ ⋅ ⋅ 3 0 ⋅ ⋅ ⋅ ⋅ 4
Now, let's make a random $A$ from a random $X$ as suggested, and compute the ratio $f(2.00001,y)/f(2.0001,y)$:
X = randn(5,5)
A = X*Λ/X
f(μ,y) = norm((A-μ*I)\y)
y = randn(5)
f(2.00001,y)/f(2.0001,y)
10.000277195061413
Indeed, it is very close to 10 as predicted.
If we do the same thing for $B$, let's call the function $g(\mu,y)$ so that Julia doesn't get confused:
Y = randn(5,5)
B = Y * J / Y
g(μ,y) = norm((B-μ*I)\y)
g(2.00001,y)/g(2.0001,y)
100.01268608114778
Now it is very close to 100 as predicted above.
Suppose $A$ is a positive Markov matrix, so that there is a steady-state eigenvalue $\lambda=1$ with algebraic and geometric multiplicity 1 and all other eigenvalues have $|\lambda|<1$, as stated in class.
Suppose that $A$ is defective, however. Does $A^n x$ still converge as $n \to \infty$ to a steady-state eigenvector for an arbitrary initial $x$? Why or why not? (If we expand $x$ in a basis of eigenvectors + generalized eigenvectors, how does each term behave when you multiply by $A^n$?)
If the matrix is defective, then it may have repeated eigenvalues with too few eigenvectors — we have to use "Jordan vectors" ("generalized eigenvectors") to complete our basis and understand what $A^n$ does to an arbitrary $x$. The key point is that, for a positive Markov matrix, these repeated eigenvalues must have |λ|<1.
Το understand what $A^n$ does to an arbitrary vector, it suffices to ask what it does to the eigenvectors and Jordan vectors in our basis. We already know from class what it does to the eigenvectors (the steady state eigenvector stays constant, and the other eigenvectors decay away). But what does it do to the Jordan vectors? We need the Jordan vector terms to decay away too in order to be left with our steady state eigenvector. Do they?
Yes. Consider, as in class, an eigenvalue $\lambda_k$ with multiplicity 2 that has one eigenvector $x_k$ and one Jordan vector $j_k$. From class, we have: $$ A^n j_k = \lambda_k^n j_k + n \lambda_k^{n-1} x_k \, . $$ Both of these terms decay for $|\lambda_k| < 1$: the first term is obvious, and the second term because the exponential decay of $\lambda_k^{n-1}$ beats the linear growth of $n$.
For completeness, we should also consider eigenvalues with multiplicity 3 or higher ("Jordan blocks" of $3\times 3$ or more). I only briefly mentioned those in class, but the key point is that they give terms that look like (higher) derivatives of $\lambda^n$, which are always polynomials in n times powers of $\lambda$. So, if $|\lambda| < 1$, they still decay: exponential decay always beats polynomial growth.
$A$ is a Hermitian positive-definite matrix with eigenvalues (sorted in descending order) $\lambda_1 > \lambda_2 > \lambda_3 > \cdots > 0$.
We carry out 1000 steps of the power method (repeatedly computing $q \leftarrow \mathrm{normalize}(Aq)$ from a random starting vector, where we define $\mathrm{normalize}(x) = x / \Vert x \Vert$) and we get a pretty good estimate of an eigenvector $q_1$ of the biggest eigenvalue $\lambda_1$.
(a) If we pick another random starting vector $x$, compute $y = x - q_1 q_1^H x$, and apply 1000 steps of the power method, what would you expect to get if $q_1$ were exactly an eigenvector of $\lambda_1$ and the arithmetic is carried out exactly (no tiny roundoff errors in the 15th significant digit)? An eigenvector for which eigenvalue?
(b) If you carry out the steps of (a) but $q_1$ is only a (very good) approximation of an eigenvector (such as you might get from the power method) and/or you make tiny roundoff errors in each arithmetic step, what would you expect to get after 1000 iterations of the power method? An eigenvector for which eigenvalue?
(c) In practice, the power method after the first eigenvector $q_1$ is found (approximately) is better carried out by repeatedly computing $q \leftarrow \mathrm{normalize}((I-q_1q_1^T)Aq)$, a process called "deflation". This should converge to an eigenvector for which eigenvalue?
(d) Check your predictions in parts (b) and (c) numerically below for a random $A$ by filling in the ????
lines.
As usual, it is helpful to imagine expanding $x$ in the basis of eigenvectors, and ask what each step of the algorithm does to each term. To distinguish them from the eigenvector estimates produced by the power method (which may be approximate), let's denote the eigenvectors corresponding to $\lambda_i$ by $v_i$ (chosen orthonormal since $A$ is Hermitian) and suppose that we write $x$ in this basis: $$x=c_1v_1+c_2v_2+c_3v_3+\cdots \, . $$
(a) If $q_1=v_1$ (an exact eigenvector for $\lambda_1$) and we write $y=x-q_1q_1^Hx=c_2v_2+c_3v_3+\cdots$: you have projected y to be orthogonal to v₁, so you cancelled the v₁ component. If we then multiply (exactly) by $A$ over and over as in the power method, there is no $v_1$ term to amplify, so what you would get is (approximately) proportional to the eigenvector $v_2$ of the second-biggest |λ|.
More precisely, applying 1000 steps of the power method is mathematically equivalent (in exact arithmetic) to $\text{normalize}(A^{1000} y)=\text{normalize}(c_2\lambda_2^{1000}v_2+c_3\lambda_3^{1000}v_3+\cdots)$. Then we expect the result is approximately $\text{normalize}(c_2\lambda_2^{1000}v_2)=\pm v_2$, an eigenvector corresponding to the second largest-magnitude eigenvalue.
(The reason I keep saying "in exact arithmetic" is that this calculation is impossible on a real computer. For one thing, computing eigenvalues and eigenvectors exactly is impossible even in principle for matrices bigger than $4\times 4$ because solving a quintic equation exactly is impossible. Moreover, real computers can only store a finite number of digits when carrying out calculations, so there will typically be at least some tiny roundoff error in every arithmetic operation.)
(b) If $q_1$ is only an approximation of $v_1$, then the projection $y=x-q_1q_1^Hx$ may not exactly cancel the $c_1 v_1$ term. There will be some tiny leftover term proportional to $v_1$, but then when you multiply by $A^{1000}$ (in 1000 power iterations) this tiny leftover will be exponentially amplified until it eventually again dominates the result. So, we would expect to get a result (approximately) proportional v₁ (i.e. proportional to q₁≈v₁).
(In fact, even if our starting vector $y$ were exactly orthogonal to $v_1$, arithmetic roundoff errors in the steps of the power method will generally give a tiny $v_1$ component again, which will then be exponentially amplified.)
(c) Here, we are approximately projecting orthogonal to $v_1$ on every iteration of the power method. This forces the $v_1$ component to stay small (if it starts to grow, we immediately project it back to a small quantity), so the result is dominated by the $v_2$ term. So, just as in part (a), what you would get is (approximately) proportional to the eigenvector $v_2$ of the second-biggest |λ|.
Unlike the method in part (a), the method in part (c) actually works in real calculations, as you will verify in the next part.
(A detailed analysis of how errors propagate through these algorithms is a topic for numerical analysis and is outside the scope of 18.06.)
(d) We should check our predictions of parts (b) and (c) by verifying that those algorithms yield approximate eigenvectors of $\lambda_1$ and $\lambda_1$, respectively. So, we should check that A*q⁽ᵇ⁾ - λ[1]*q⁽ᵇ⁾
is small and check that A*q⁽ᶜ⁾ - λ[2]*q⁽ᶜ⁾
is small (i.e. the ????
should be replaced with 1
and 2
in the two parts) below:
using LinearAlgebra
B = randn(5,5)
A = B'B # a random positive-definite Hermitian matrix
λ = reverse(eigvals(A)) # eigenvalues in descending order
# the power iteration to find q₁
q₁ = normalize(rand(5))
for i = 1:1000
q₁ = normalize(A*q₁)
end
# check that this is approximately an eigenvector of λ₁
# — the following difference should be almost zero:
A*q₁ - λ[1]*q₁
5-element Vector{Float64}: 0.0 8.881784197001252e-16 -4.440892098500626e-16 0.0 0.0
# the power iteration in part (b)
q⁽ᵇ⁾ = normalize((I - q₁*q₁')*rand(5))
for i = 1:1000
q⁽ᵇ⁾ = normalize(A*q⁽ᵇ⁾)
end
# check that this is approximately an eigenvector of which λ? ANSWER: λ[1]
A*q⁽ᵇ⁾ - λ[1]*q⁽ᵇ⁾
5-element Vector{Float64}: 0.0 8.881784197001252e-16 0.0 1.1102230246251565e-16 0.0
Hooray, it matches our answer in part (b). Now to check part (c):
# the power iteration in part (c)
q⁽ᶜ⁾ = normalize(rand(5))
for i = 1:1000
q⁽ᶜ⁾ = normalize((I - q₁*q₁')*A*q⁽ᶜ⁾)
end
# check that this is approximately an eigenvector of which λ? ANSWER: λ[2]
A*q⁽ᶜ⁾ - λ[2]*q⁽ᶜ⁾
5-element Vector{Float64}: -3.552713678800501e-15 -4.440892098500626e-16 -1.7763568394002505e-15 -1.1102230246251565e-15 3.552713678800501e-15
Again, it matches our answer in part (c).