Suppose that you have the following Markov matrix, an "absorbing" Markov matrix because once you reach state 5 you "never escape":
M = [ 0.899 0.5 0.1 0.5 0
0.1 0.3 0.4 0.0 0
0.0 0.1 0.4 0.0 0
0.001 0.0 0.1 0.49 0
0.0 0.1 0.0 0.01 1 ]
5×5 Matrix{Float64}: 0.899 0.5 0.1 0.5 0.0 0.1 0.3 0.4 0.0 0.0 0.0 0.1 0.4 0.0 0.0 0.001 0.0 0.1 0.49 0.0 0.0 0.1 0.0 0.01 1.0
(a) Check (numerically with Julia) that Mnx will eventually (for n→∞) reach a steady state for any x. What must be true of the eigenvalues of M for this to happen?
using LinearAlgebra
λ, X = eigen(M);
# check something about λ?
(b) Suppose that you start with x=e1=[1,0,0,0,0] (i.e. in "state 1"). As we saw in class for Chutes & Ladders, the probability p(n) of reaching state e5=[0,0,0,0,1] after exactly n steps is
p(n)=eT5(Mn−Mn−1)e1This is plotted below. As you can see, the probability p(n) goes down very slowly after an initial rapid rise. In fact, from the fact that it is almost a straight line on a semilog plot (right), the decay of p(n) is approximately exponential: p(n)≈(some coefficient)e−kn for some constant k. What is k, as determined from the eigenvalues of M?
To solve this, imagine expanding e1 in the basis of eigenvectors. What does (Mn−Mn−1)e1 look like in that basis? What happens when n is large?
using PyPlot
e₁ = [1,0,0,0,0]
e₅ = [0,0,0,0,1]
n = 1:400
figure(figsize=(10,3))
subplot(1,2,1)
plot(n, [e₅'*(M^n - M^(n-1))*e₁ for n in n], "bo-")
title(L"probability of reaching state $e_5$ in $n$ steps")
xlabel(L"number $n$ of steps")
ylabel("probability")
subplot(1,2,2)
semilogy(n, [e₅'*(M^n - M^(n-1))*e₁ for n in n], "bo-")
title("same plot, semilog scale")
xlabel(L"number $n$ of steps")
PyObject Text(0.5, 28.0, 'number $n$ of steps')
(c) As discussed in class, the average number of steps to reach state e5 is therefore: ∞∑n=1neT5(Mn−Mn−1)e1=eT5[∞∑n=1n(Mn−Mn−1)]e1.
Suppose you are given the diagonalization M=X(λ1λ2λ3λ41)X−1 of M (noting that one of the eigenvalues must =1). In terms of X and these λk, give an exact formula for the matrix: ∞∑n=1n(Mn−Mn−1)=? You may use the formulas for an arithmetico-geometric series ∑∞n=1nrn=r/(1−r)2 or for and/or for a geometric series ∑∞n=0rn=1/(1−r) (note the difference in starting n), but you may only apply these formulas to scalars (and be careful when r=1!).
(d) We can compute the average number of steps numerically, to a good approximation, just by summing a lot of terms:
# average number of steps to reach e₅, summed numerically:
sum([n * e₅'*(M^n-M^(n-1))*e₁ for n in 1:10^5])
75.11223892250649
Check that this answer matches eT5(your part d)e1 using your answer from part (d) and the X and λ computed by Julia.
(a)
x = rand(5) # choose random x
x = x / sum(x) # normalize x to have sum = 1
n = 10000; # choose high power of M
(M^n)*x
5-element Vector{Float64}: 3.1689731134239033e-60 5.126029853310557e-61 8.741192965850824e-62 2.399200803373035e-62 1.000000000000001
We see that Mnx, for a random x, converges to e5=[0,0,0,0,1] as n→∞, which makes sense since state 5 is the absorbing state.
To check this more rigorously, we need to look at the eigenvalues (computed numerically above), and verify that |λ|<1 except for the one eigenvalue λ=1. We can compute the absolute values (magnitudes) in Julia with:
λ # the eigenvalues
5-element Vector{ComplexF64}: 0.10181769041350378 + 0.0im 0.5003800408562611 - 0.04926978942324787im 0.5003800408562611 + 0.04926978942324787im 0.9864222278739748 + 0.0im 1.0 + 0.0im
abs.(λ) # absolute values (magnitudes |λ|) of the eigenvalues
5-element Vector{Float64}: 0.10181769041350378 0.5027998582310109 0.5027998582310109 0.9864222278739748 1.0
The criterion for convergence to a steady state from any initial condition is that every eigenvalue λ satisfy λ=1 or |λ|<1. In our case, we see that exactly one λ satisfies λ=1 while the rest of the eigenvalues satisfy |λ|<1.
(b)
We have p(n)≈(some coefficient)e−kn so to find k we need to compute −logp(n)n≈k, for n large enough.
Let x0 be the steady state of M, that is, the eigenvector of M with eigenvalue 1, and let y1,y2,y3,y4 be the remaining eigenvectors. We use the convention (as in the values computed by Julia above) that λ4 is the second-biggest eigenvalue (by magnitude), λ3 is the third-biggest eigenvalue, etc.
Writing e1=c0x0+∑4i=1ciyi for some coefficients ci, we find that Mn−1e1=c0x0+∑4i=1λn−1iciyi and Mne1=c0x0+∑4i=1λniciyi. Hence (Mn−Mn−1)e1=4∑i=1λn−1i(λi−1)ciyi, where the λ=1 term has disappeared, and eT5(Mn−Mn−1)e1=4∑i=1λn−1i(λi−1)cieT5yi. For n large, the eigenvalues λn−11,λn−12,λn−13 are much smaller than λn−14 so p(n)=eT5(Mn−Mn−1)e1≈λn−14(λ4−1)c4eT5y4. Assuming that c4≠0 (which happens usually in practice, and also in our case as can be checked numerically), and since p(n)≈(some coefficient)e−kn, we see that k=−logλ4. Let's do it a bit more carefully: For some coefficient A, Ae−nk≈λn−14(λ1−1)c4eT5y4, so, taking log on both sides, −nk≈(n−1)logλ4+log((λ4−1)c4eT5y4A) Dividing by −n on both sides, and taking n→∞, we get k≈−logλ4. The numerical value of k is:
k=-log(λ[4])
0.013670793044884364 - 0.0im
We see that k is very small (because λ4 is very close to 1) which explains why the decay rate in the plot is slow.
It is also instructive (though you were not required to do this to re-plot the data along with λn4 to see that λn4=e−kn is indeed the observed decay:
n = 1:400
semilogy(n, [e₅'*(M^n - M^(n-1))*e₁ for n in n], "bo-")
semilogy(n, abs(λ[4]).^n, "k--")
title("probability of finishing in n steps")
xlabel(L"number $n$ of steps")
legend(["probability", L"|\lambda_4|^n"])
PyObject <matplotlib.legend.Legend object at 0x7fdc430f2730>
(c) Using the diagonalization M=XΛX−1, where Λ=diag(λ1,…,λ4,1), we have Mm=XΛmX−1 for any integer m so (Mn−Mn−1)=X(Λn−Λn−1)X−1=Xdiag(λn1−λn−11,…,λn4−λn−14,0)X−1. Hence, ∞∑n=1n(Mn−Mn−1)=Xdiag(∞∑n=1n(λn1−λn−11),…,∞∑n=1n(λn4−λn−14),0)X−1. For each i=1,2,3,4 we have ∞∑n=1n(λni−λn−1i)=(λi−1)∞∑n=1nλn−1i=(λi−1)∞∑n=1(n−1)λn−1i+(λi−1)∞∑n=1λn−1i. Since |λi|<1, for i=1,2,3,4, these series converge, and we get ∞∑n=1(n−1)λn−1i=∞∑n=1nλni=λi(1−λi)2, and ∞∑n=1λn−1i=∞∑n=0λni=11−λi. We conclude, ∞∑n=1n(λni−λn−1i)=(λi−1)λi(1−λi)2+(λi−1)11−λi=1λi−1, and hence, ∞∑n=1n(Mn−Mn−1)=Xdiag(1λ1−1,…,1λ4−1,0)X−1.
(d)
λ, X = eigen(M);
D=Diagonal([1/(λ[1]-1), 1/(λ[2]-1), 1/(λ[3]-1), 1/(λ[4]-1) , 0]); #answer from part (d)
e1 = [1,0,0,0,0]
e5 = [0,0,0,0,1]
exact = e5'*X*D/X*e1
75.11223892250644 - 4.826779000179276e-15im
The answer matches the estimated value (from the explicit sum, above) to ≈ 15 decimal places
estimated = sum([n * e₅'*(M^n-M^(n-1))*e₁ for n in 1:10^5])
abs(exact - estimated) / abs(exact)
7.595031273151641e-16
In class, we saw that o=[1,1,…,1,1] is an eigenvector of MT with eigenvalue λ=1 for any Markov matrix M.
(a) If xk is an eigenvector of M (Mxk=λkxk) for any other eigenvalue λk≠1 of M, show that we must have oTxk=0: it must be orthogonal to o. (Hint: use oT=oTM.)
(b) Check your result from (a) numerically for the Markov matrix from problem 1.
(c) If we expand an arbitrary x in an eigenvector basis x=c1x1+⋯+cmxm, letting xm be a steady-state eigenvector (λm=1) and supposing all of the other eigenvalues are ≠1, show that oTx gives us a simple formula for cm. (We derived the same formula in a different way in class.)
(a)
Using the hint we have oTxk=oTMxk=λkoTxk. If oTxk≠0 then we can divide the last equation by oTxk to get that λk=1. But we assumed that λk≠1. We conclude that oTxk=0.
(b)
M = [ 0.899 0.5 0.1 0.5 0
0.1 0.3 0.4 0.0 0
0.0 0.1 0.4 0.0 0
0.001 0.0 0.1 0.49 0
0.0 0.1 0.0 0.01 1 ];
o=[1 1 1 1 1];
λ, X = eigen(M);
abs.(o*X) #compute absolute values of o^Tx_k for all k.
1×5 Matrix{Float64}: 4.44089e-16 6.53583e-16 6.53583e-16 1.11022e-15 1.0
We see that oTxk=0 for k=1,2,3,4 (which correspond to λk≠1) and oTxk=1 for k=5 (which corresponds to λk=1).
(c)
By part (a) oTx=m∑i=1cioTxi=cmoTxm. We know that oTxm≠0 since otherwise o=0, which is false. Hence, cm=oTxoTxm.
(Based on Strang, section 6.2, problem 9.)
Suppose we form a sequence of numbers g0,g1,g2,g3 by the rule
gk+2=(1−w)gk+1+wgkfor some scalar w. If 0<w<1, then gk+2 could be thought of as a weighted average of the previous two values in the sequence. For example, for w=0.5 (equal weights) this produces the sequence g0,g1,g2,g3,…=0,1,12,34,58,1116,2132,4364,85128,171256,341512,6831024,13652048,27314096,54618192,1092316384,2184532768,…
(a) If we define xk=(gk+1gk), then write the rule for the sequence in matrix form: xk+1=Axk. What is A?
(b) Find the eigenvalues and eigenvectors of A (your answers should be a function of w). Check your answers with the λ, X = eig(A)
function in Julia for w=0.1.
(c) What happens to the eigenvalues and eigenvectors as w gets closer and closer to −1? Is there a still a basis of eigenvectors and a diagonalization of A for w=−1?
(d) For 0<w<1, the eigenvalues immediately tell which of these three possibilities occurs: the sequence diverges, decays, or goes to a nonzero constant as n→∞? Does this behavior depend on the starting vector x0?
(e) Find the limit as n→∞ of An (for 0<w<1) from the diagonalization of A. (Your answer should be a function of w. Google the formula for the inverse of a 2×2 matrix if you need it.)
(f) For w=0.5, if g0=0 and g1=1, i.e. x0=(10), then show that the sequence approaches 2/3.
(a) If xk=(gk+1gk), then we can use the recurrence relation gk+2=(1−w)gk+1+wgk to write xk+1=(gk+2gk+1)=((1−w)w10)(gk+1gk)=Axk.
(b) We can find the eignvalues of our matrix A by solving the characteristic equation det: \begin{align} -\lambda(1-w-\lambda) - w = 0 \implies \lambda^2 + (w-1)\lambda - w =0. \end{align} Solving this quadratic yields: \begin{align} \lambda &= \frac{1-w \pm \sqrt {(w-1)^2 + 4w}}{2}\\ &= \frac{1-w \pm \sqrt {(w+1)^2 }}{2} \\ &= 1, -w. \end{align}
To find the eigenvector corresponding to \lambda_1 = 1, we solve (A-I)u_1 = 0 \begin{align} \begin{pmatrix} -w & w \\ 1 & -1 \end{pmatrix} u_1 = 0 \;\; \implies \;\; u_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \end{align}
To find the eigenvector corresponding to \lambda_2 = -w, we solve (A+wI)u_2 = 0 \begin{align} \begin{pmatrix} 1 & w \\ 1 & w \end{pmatrix} u_2 = 0 \;\; \implies \;\; u_2 = \begin{pmatrix} w \\ -1 \end{pmatrix} \end{align}
For w=0.1 we should have eigenvalues 1,-0.1 with corresponding eigenvectors parallel to u_1 = \begin{pmatrix} 1 \\ 1 \end{pmatrix} and u_2 = \begin{pmatrix} 0.1 \\ -1 \end{pmatrix}. We can check this in Julia:
w = 0.1;
A = [1-w w
1 0];
λ, X = eigen(A)
λ
2-element Vector{Float64}: -0.1 1.0
X
2×2 Matrix{Float64}: -0.0995037 0.707107 0.995037 0.707107
The eigenvalues are the same, but the eigenvectors (the columns of X) look different. This is because Julia outputs normalized eigenvectors. If we divide our u_1 and u_2 by \sqrt{2} and \sqrt{1^2+0.1^2} respectively, then we will get the same results:
normalize([1, 1]) # the normalize(x) function computes x / ‖x‖
2-element Vector{Float64}: 0.7071067811865475 0.7071067811865475
normalize([-0.1, 1])
2-element Vector{Float64}: -0.09950371902099893 0.9950371902099893
(c) For w = -1, then the eigenvalues will coincide, and u_2 will become \begin{pmatrix} -1 \\ -1 \end{pmatrix}, which is parallel to u_1. For this particular value of w, the matrix A only has one eigenvalue with one linearly independent eigenvector. This means that there is no basis of eigenvectors and that A will not be diagonalizable (A is "defective").
(d) If x_n = Ax_{n-1}, then x_n = A^n x_0. We can write x_0 as a linear combination of the eigenvectors: x_0 = \alpha_1 u_1 + \alpha_2 u_2.
Then x_n = A^n x_0 = \alpha_1 u_1 + \alpha_2 (-w)^n u_2. Since 0<w<1, w^n \to 0 as n \to \infty, and so x_n \to \alpha_1 u_1, i.e. g_n tends to a nonzero constant as n\to \infty. However, if \alpha_1 = 0 (if x_0 is parallel to u_2), then g_n \to 0.
(e) From the diagonalization formula, we have A=X\Lambda X^{-1}. This means that A^n = (X\Lambda X^{-1})^n = (X\Lambda X^{-1})...(X\Lambda X^{-1}) = X\Lambda^n X^{-1}. We can use the formula for the inverse of a 2\times 2 matrix to obtain X^{-1}: \begin{align} X = \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \implies X^{-1} = \frac{1}{w+1}\begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \end{align} So: \begin{align} A^n &= \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & (-w)^n \end{pmatrix} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix}\\ &= \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix}\begin{pmatrix} 1 & w \\ (-w)^n & -(-w)^n \end{pmatrix}\\ &= \frac{1}{w+1} \begin{pmatrix} 1+w(-w)^n & w-w(-w)^n \\ 1-(-w)^n & w+(-w)^n \end{pmatrix} \end{align} But w^n \to 0 as n\to \infty, and so \begin{align} \boxed{A^n \to \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & w \end{pmatrix}} \end{align}
Alternatively, we could have taken the n \to \infty limit before multiplying the matrices together to get the same result more easily: \begin{align} A^\infty &= \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \underbrace{\begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}}_{\Lambda^\infty} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \\ &= \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & -1 \end{pmatrix} \begin{pmatrix} 1 & w \\ 0 & 0 \end{pmatrix} = \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & w \end{pmatrix} \end{align}
Notice that A^\infty is a rank-1 matrix: every column is a multiple of u_1. That's because, no matter what vector x you start with, A^\infty x is a multiple of u_1 (the u_2 component decays to zero).
Let's double-check our A^n formula in Julia for w = 0.1:
A^4
2×2 Matrix{Float64}: 0.9091 0.0909 0.909 0.091
A⁴ = [ 1+w*(-w)^4 w-w*(-w)^4
1-(-w)^4 w+(-w)^4 ] / (w+1)
2×2 Matrix{Float64}: 0.9091 0.0909 0.909 0.091
Yup, they match!
A^4 - A⁴ # zero to roundoff error
2×2 Matrix{Float64}: 0.0 1.38778e-17 1.11022e-16 2.77556e-17
We can also check our limiting formula by plugging in a big exponent:
A^1000 * (w+1)
2×2 Matrix{Float64}: 1.0 0.1 1.0 0.1
(f) To find the limit of g_n as n\to\infty with g_0 = 0 and g_1 = 1, we find the limit of x_n = A^n\begin{pmatrix} 1 \\ 0 \end{pmatrix} as n\to \infty: \begin{align} x_n = A^n \begin{pmatrix} 1 \\ 0 \end{pmatrix} &\to \frac{1}{w+1} \begin{pmatrix} 1 & w \\ 1 & w \end{pmatrix} \begin{pmatrix} 1 \\ 0 \end{pmatrix}\\ &= \frac{1}{w+1} \begin{pmatrix} 1 \\ 1 \end{pmatrix} \end{align} Substituting w=0.5, we find that x_n \to \frac{2}{3} \begin{pmatrix} 1 \\ 1 \end{pmatrix}, and so g_n\to 2/3 as n\to\infty.