Suppose we are solving the ODE $\frac{dx}{dt} = Ax$ with a given initial condition $x(0)$, where $A$ is an $m \times m$ matrix.
(a) Show that we can obtain the solutions $x(\Delta t), x(2\Delta t), x(3\Delta t), \ldots$ at a sequence of discrete times $n\Delta t$ from $Bx(0), B^2x(0), B^3 x(0), \ldots$, i.e. by multiplying each preceding solution by some matrix $B$. (For example, we might use this to plot the solution at a sequence of points.) What is $B$?
(b) We know from class that $x(t)$ must be exponentially decaying if all of the eigenvalues of $A$ have negative real parts. $B^n x(0)$ is decaying if all of the eigenvalues of $B$ have ____________ — why must this follow if the eigenvalues of $A$ have negative real parts?
(a) We can write the solution of the ODE $\frac{dx}{dt}=Ax$ in the form $x(t)=e^{At}x(0)$. Hence the solutions $x(\Delta t), x(2\Delta t), x(3\Delta t), \dots $ are equal to $e^{\Delta tA}x(0), e^{2\Delta tA}x(0), e^{3\Delta tA}x(0),\dots$.
Recall that for square matrices $C, D$ such that $CD=DC$ we have $e^{C+D}=e^{C}e^{D}$. In particular, $e^{\Delta tA}e^{n\Delta t A}= e^{(n+1)\Delta A}$, so by setting $\boxed{B=e^{\Delta t A}}$ we get $$ x(n\Delta t)=e^{n\Delta t A}x(0)=B^nx(0). $$
(b) $B^nx(0)$ is decaying if all eigenvalues of $B$ are of magnitude $<1$. In more detail, recall that for any (diagonalizable) square matrix $B$ we can write $x(0)=c_1 x_1+c_2x_2+\dots+c_mx_m$ where $x_i$ are eignevectors of $B$ and $c_i$ are some scalars determined by $x(0)$. Then $$ B^nx(0)=c_1\mu_1^nx_1+c_2\mu_2^nx_2+\dots+c_m\mu_m^nx_m, $$ where $\mu_i$ are the corresponding eigenvalues of $B$. Since we have no assumptions about $x(0)$ and the constants $c_i$, to have dacaying $B^nx(0)$ we want all $\mu_i^n$ to decay as $n\to\infty$. This is exactly the condition $|\mu_i|<1$.
This is consistent with the condition on the decay of the solution $x(t)$: if $\lambda_1, \lambda_2, \dots$ are eigenvalues of $A$ then $e^{\Delta t\lambda_1}, e^{\Delta t\lambda_2}, \dots, e^{\Delta t\lambda_m}$ are eigenvalues of $e^{\Delta tA}=B$. Since $|e^z|=e^{\mathrm{Re}{(z)}}$, the condition $\mathrm{Re}(\lambda_i)<0$ is equivalent to $|e^{\Delta t\lambda_i}|=e^{\Delta t\ \mathrm{Re}(\lambda_i)}<1$ for any $\Delta t > 0$.
Suppose that $A$ is a $3 \times 3$ real-symmetric matrix. (Recall from class that such a matrix has real eigenvalues and orthogonal eigenvectors.) Suppose its eigenvalues are $\lambda_1 = 1, \lambda_2 = -1, \lambda_3 = -2$, and corresponding eigenvectors are $x_1,x_2,x_3$. You are given that $x_1 = [1,0,1]$ (denoting a column vector ala Julia).
(a) Give an approximate solution at $t=100$ to $\frac{dx}{dt}=Ax$ for $x(0) = [1,1,0]$. (Give a specific quantitative vector, even if the vector is very big or very small; an answer like "$\approx 0$" or "$\approx \infty$" is not acceptable.)
(b) If $x_2 = [0,1,0]$, give a possible $x_3$. (You should not use these $x_2, x_3$ to solve part (a).)
(a) To find the solution of $\frac{dx}{dt}=Ax$ with $x(0)=[1,1, 0]$, we should expand $x(0)$ in terms of the eignevectors $$x(0)=c_1 x_1+c_2x_2+c_3x_3$$ and then use the fact that the solution is given by $$ x(t)=e^{At}x(0)=c_1 e^{\lambda_1 t}x_1+c_2e^{\lambda_2 t}x_2+c_3e^{\lambda_3t}x_3. $$
Note that we are only interested in an approximate solution when $t=100$. We know the eigenvalues: $\lambda_1=1, \lambda_2=-1, \lambda_3=-2$. Plugging them into the solution, we get $$ x(100)=c_1 e^{100}x_1+c_2e^{-100}x_2+c_3e^{-200}x_3. $$ The numbers $e^{-100}, e^{-200}$ are extremely small when compared to $e^{100}$ (namely, $e^{-200}\approx 10^{-87}$), so we can approximate $x(100)$ by $c_1e^{100}x_1$.
The only thing left to compute is $c_1$. Since $x_1, x_2, x_3$ are orthogonal, $c_1 x_1$ is just the orthogonal projection of $x(0)$ on $x_1$, and so $c_1= \frac{x_1^{T}x(0)}{x_1^Tx_1}=\frac{1}{2}$. Thus
$$ \boxed{x(100)\approx \frac{e^{100}}{2}\begin{pmatrix}1\\0\\1\end{pmatrix}}. $$(b) Since $x_1, x_2, x_3$ are orthogonal, $x_3$ must be a vector orthogonal to both $x_1$ and $x_2$. We can find such vector by solving a linear system of equations with rows given by $x_1^T, x_2^T$ $$ \begin{pmatrix} 1 & 0 & 1\\ 0 & 1 & 0 \end{pmatrix} x_3=0. $$ The matrix is already upper-triangular and we can find a standard solution by back-substitution, getting that $x_3$ must be proportional to $\boxed{\begin{pmatrix}-1\\0\\1\end{pmatrix}}$. (You could also just take the cross product of $x_1$ and $x_2$ since we are in 3d.)
Let $X = \begin{pmatrix} x_1 & x_2 & \cdots & x_m \end{pmatrix}$ denote eigenvectors of the diagonalizable $m \times m$ complex matrix $A$, with corresponding eigenvalues $\lambda_1, \ldots, \lambda_m$.
(a) The eigenvalues of $A^H$ must be ________? Check your answer for a random complex matrix in Julia, computed with A = rand(ComplexF64, 5,5)
; note that $A^H$ in Julia is A'
.
(b) If $A$ is real (so that $A^H = A^T$), why is your answer in (a) consistent with the statement in class that $A$ and $A^T$ have identical eigenvalues?
(c) From $A = X \Lambda X^{-1}$, derive a relationship between eigenvectors $y_1, \ldots, y_m$ of $A^H$ (the "left eigenvectors" of $A$) and the rows or columns of $X^{-1}$.
(d) Using the eigenvectors $y_k$ from (c), what must be true of $y_1^H x_2$ (and similarly for other dot products)?
(a) The eigenvalues of $A^H$ must be $\overline{\lambda_1}, \overline{\lambda_2}, \dots, \overline{\lambda_m}$. To show it, recall that $A^H=(\overline{A})^T$. Taking the transpose of a matrix does not change eigenvalues, so eigenvalues of $A^H$ are equal to eigenvalues of $\overline{A}$. For the eigenvalues of $\overline{A}$, consider the definition of an eigenvector $Ax_1=\lambda_1 x_1$ and take its complex conjugate. We end up with $\overline{Ax_1}=\overline{\lambda_1x_1}$, which is equivalent to saying that $\overline{x_1}$ is an eigenvector of $\overline{A}$ with eigenvalue $\overline{\lambda_1}$. Repeating for other eigenvalues $\lambda_2, \dots, \lambda_m$ shows that the eigenvalues of $\overline{A}$ (and $A^H$) are $\overline{\lambda_1}, \overline{\lambda_2}, \dots, \overline{\lambda_m}$.
The Julia example is below:
using LinearAlgebra;
A = rand(ComplexF64, 5,5);
eigvals(A)
5-element Vector{ComplexF64}: -0.39073144452371034 - 0.41196407706139726im -0.3799516195876417 + 0.4840499421387676im 0.4109387055886737 - 0.6270118203940078im 0.6691537427475441 - 0.04470363613736018im 2.1783118287380088 + 2.4309666735163464im
eigvals(A')
5-element Vector{ComplexF64}: -0.3907314445237102 + 0.41196407706139726im -0.3799516195876413 - 0.48404994213876684im 0.41093870558867407 + 0.6270118203940089im 0.6691537427475442 + 0.04470363613736026im 2.178311828738014 - 2.4309666735163535im
(b) If $A$ is real then $A^H=A^T$ and the eigenvalues of $A^H=A^T$ should be $\lambda_1, \dots, \lambda_m$. This is consistent with our answer in part (a) because for any eigenvalue $\lambda_i$ of a real matrix $A$ its complex conjugate $\overline{\lambda_i}$ is also an eigenvalue of $A$ (the set of eigenvalues of $A$ consists of individual real eigenvalues and pairs of complex conjugate eigenvalues). So the collections of numbers $\lambda_1, \dots, \lambda_m$ and $\overline{\lambda_{1}}, \dots, \overline{\lambda_{m}}$ are actually the same.
(Note that you cannot simply say that the eigenvalues are real in this case. Real matrices can have complex eigenvalues!)
(c) Recall that in the diagonalization $A=X\Lambda X^{-1}$ the columns of $X$ are the eigenvectors of $A$, while the diagonal entries of $\Lambda$ are the corresponding eigenvalues.
Consider $A^H$ and plug in the diagonalization $A=X\Lambda X^{-1}$. Since $(AB)^H=B^HA^H$, we get $A^H=(X^{-1})^H\Lambda^HX^H$, which is equivalent to finding the diagonalizaton $A^H=Y\Lambda^HY^{-1}$ with $Y=(X^{H})^{-1}$ (we use the fact that $(X^H)^{-1}=(X^{-1})^H$, which is true for any $X$ and was originally derived in class for the transpose). Hence, the columns of $Y=(X^{-1})^H$ are the eigenvectors $y_i$ of $A^H$ or, equivalently, eigenvectors $y_i$ are complex conjugates of rows of $X^{-1}$.
(d) Note that if $Y$ is the matrix with columns $(y_1, \dots, y_m)$ and $X$ is the matrix with columns $(x_1, \dots, x_m)$ then the product $Y^HX$ gives the dot products between $x_i$ and $y_j$: $$ Y^HX=\begin{pmatrix} y_1^Hx_1 & y_1^Hx_2 & \dots\\ y_2^Hx_1 & y_2^Hx_2&\dots\\ \dots & \dots & \dots \end{pmatrix} $$ (we have used the same trick when saying that $Q^TQ=I$ for a matrix $Q$ with orthonormal columns). The computation in part (c) shows that we can choose $Y=(X^H)^{-1}$, so $$ Y^HX=X^{-1}X=I. $$ This implies that $\boxed{y_1^Hx_2=0}$, since it is an off-diagonal entry in the matrix $Y^HX=I$. More generally, $y_i^Hx_i=1$ for the eigenvectors with the same index, while $y_j^Hx_i=0$ for eigenvectors with different indices, at least with this choice $Y=(X^H)^{-1}$ of normalization of $Y$.
If you try computing $Y^H X$ in Julia via eigvecs(A')' * eigvecs(A)
, you will see that the off-diagonal entries are indeed zero (up to roundoff errors), but the diagonal entries are not 1 because Julia normalizes its eigenvectors differently.
Suppose that $A$ is an $m \times m$ real-symmetric matrix ($A = A^T$). Consider the function:
$$ f(x) = \frac{x^T A x}{x^T x} $$that take as input a real nonzero vector $x \in \mathbb{R}^m$ and returns a real number.
(a) Compute the gradient $\nabla f$ (with respect to $x$).
(b) Show that $\nabla f = 0$ if and only if $x$ is some eigenvector of $A$, in which case $f(x)$ is equal to the eigenvalue!
(c) If the (real) eigenvalues of $A$ are $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_m$, then what are the minimum and maximum possible values of $f(x)$?
(a) Recall that $\nabla f$ is defined by $df=(\nabla f)^T dx$. First we compute $df$, using product (or, rather, quotient) rule and noting that the numerator and denumerator are just scalar functions of $x$: $$ df=d\left(\frac{x^TAx}{x^Tx}\right)=\frac{(x^Tx)d(x^TAx)-(x^TAx)d(x^Tx)}{(x^Tx)^2}=\frac{(x^Tx)((dx)^TAx+x^TAdx)-(x^TAx)((dx)^Tx+x^Tdx)}{(x^Tx)^2}. $$ Now we want to get rid of $(dx)^T$ and move all $dx$ to the right. To do it we use the trick of transposing $1\times 1$ matrices $(dx)^Tx=x^Tdx$ and $(dx)^TAx=x^TAdx$ (recall that $A=A^T$): $$ df=\frac{2(x^Tx)x^TAdx-2(x^TAx)x^Tdx}{(x^Tx)^2}=2\frac{(x^Tx)x^TA-(x^TAx)x^T}{(x^Tx)^2}dx. $$ Hence $(\nabla f)^T=2\frac{(x^Tx)x^TA-(x^TAx)x^T}{(x^Tx)^2}$ and equivalently $$ \boxed{\nabla f= 2\frac{Ax(x^Tx)-x(x^TAx)}{(x^Tx)^2} = \frac{2}{x^T x} \left[ Ax - f(x) x \right]}. $$ where we see that one of the terms simplifies to $f(x)$.
(b) By the computation in part (a) we have $\nabla f(x)=0$ for nonzero $x$ only if $$ Ax(x^Tx)-x(x^TAx)=0. $$ This is equivalent to $$ Ax=\frac{x^TAx}{x^Tx} x = f(x) x, $$ which is the same as saying that $x$ is an eigenvector of $A$ with eigenvalue $f(x)$.
(c) Recall from 18.02 that for a function $f(x)$ of several variables the minimum and maximum can be achieved only at critical points $x$, that is, at points such that $\nabla f(x)=0$. By part (b) the minimum and maximum can be achived only if $x$ is an eigenvector, and for an eigenvector $x_i$ with eigenvalue $\lambda_i$ we have $$ f(x_i)=\frac{\lambda_ix_i^Tx_i}{x_i^Tx_i}=\lambda_i. $$ So, the maximal and minimal values of $f(x)$ must be eigenvalues of $A$, more precisely, the maximal possible value is $\lambda_1$ and the minimal possible value is $\lambda_m$.
A technical calculus note: to use the argument based on the critical points we also need to show that our function actually has points of maximum and minimum (for example, we need to exclude the possibility that $f(x)$ could diverge to $\pm \infty$). To show it, note that for any scalar $\alpha$ we have $f(x)=f(\alpha x)$, so in fact to find minimal and maximal value it is enough to consider vectors $x$ of magnitude $1$. In other words, it is enough to look for the maximum and minimum of $f(x)$ on the unit sphere, and since the sphere is compact we must have points of global maximum and minimum (this is a fact from calculus called the extreme value theorem).