Chapter 07

Diagonalization

00 — Symbol Glossary


01 — What is Diagonalization?

Definition

A matrix ARn×nA\in\mathbb{R}^{n\times n} is diagonalisable if there exists an invertible matrix PP such that

A=PDP1A=PDP^{-1}

where DD is a diagonal matrix. Equivalently, P1AP=DP^{-1}AP=D.

The columns of PP are nn linearly independent eigenvectors of AA, and the diagonal entries of DD are the corresponding eigenvalues.

Note

Diagonalisation rewrites AA in its "natural" coordinate system — the eigenvector basis — where the transformation is simply scaling along each axis.

Why is this useful?

  • Matrix powers: Ak=PDkP1A^k=PD^kP^{-1}, and Dk=diag(λ1k,,λnk)D^k=\text{diag}(\lambda_1^k,\ldots,\lambda_n^k) is trivial to compute.
  • Matrix exponential: etA=PetDP1e^{tA}=Pe^{tD}P^{-1}, needed for ODEs and Markov chains.
  • Understanding long-run behaviour: as kk\to\infty, the dominant eigenvalue governs Akx0A^k\mathbf{x}_0.
Common mistake

Wrong: every square matrix AA can be written as PDP1PDP^{-1}.
Why it happens: eigenvalues always exist (over C\mathbb{C}), so it feels like diagonalisation should too.
Correct: diagonalisation requires nn linearly independent eigenvectors. Defective matrices (where geometric multiplicity << algebraic multiplicity for some λ\lambda) cannot be diagonalised — they require Jordan form instead.
Check: A=(2102)A=\begin{pmatrix}2&1\\0&2\end{pmatrix} has only one independent eigenvector for λ=2\lambda=2 but the eigenvalue has algebraic multiplicity 2 — not diagonalisable.


02 — Diagonalizability Criterion

Definition

ARn×nA\in\mathbb{R}^{n\times n} is diagonalisable if and only if AA has nn linearly independent eigenvectors.

Sufficient conditions (each independently guarantees diagonalisability):

  1. AA has nn distinct eigenvalues.
  2. AA is symmetric (A=AA=A^\top) — the spectral theorem guarantees an orthonormal eigenbasis.
  3. For every eigenvalue, the geometric multiplicity equals the algebraic multiplicity.
Example

If AA is 3×33\times3 with eigenvalues 1,2,5-1,\,2,\,5 (all distinct), then AA automatically has 3 linearly independent eigenvectors and is diagonalisable, regardless of the rest of the matrix.


03 — The Diagonalization Procedure

Diagonalise $A=\begin{pmatrix}4&1\\2&3\end{pmatrix}$ — eigenvalues $\lambda_1=5$, $\lambda_2=2$ (from Chapter 06)

For λ1=5\lambda_1=5: v1=(11)\mathbf{v}_1=\begin{pmatrix}1\\1\end{pmatrix}. For λ2=2\lambda_2=2: v2=(12)\mathbf{v}_2=\begin{pmatrix}-1\\2\end{pmatrix}.

P=(1112)P=\begin{pmatrix}1&-1\\1&2\end{pmatrix} — column 1 is v1\mathbf{v}_1 for λ1=5\lambda_1=5; column 2 is v2\mathbf{v}_2 for λ2=2\lambda_2=2.

D=(5002)D=\begin{pmatrix}5&0\\0&2\end{pmatrix}d11=λ1=5d_{11}=\lambda_1=5 corresponds to column 1 of PP; d22=λ2=2d_{22}=\lambda_2=2 corresponds to column 2.

det(P)=12(1)1=3\det(P)=1\cdot2-(-1)\cdot1=3 P1=13(2111)P^{-1}=\frac{1}{3}\begin{pmatrix}2&1\\-1&1\end{pmatrix} — use the 2×22\times2 inverse formula: swap diagonal, negate off-diagonal, divide by det\det.

PD=(1112)(5002)=(5254)PD=\begin{pmatrix}1&-1\\1&2\end{pmatrix}\begin{pmatrix}5&0\\0&2\end{pmatrix}=\begin{pmatrix}5&-2\\5&4\end{pmatrix} PDP1=(5254)13(2111)=13(10+2521045+4)=13(12369)=(4123)=APDP^{-1}=\begin{pmatrix}5&-2\\5&4\end{pmatrix}\cdot\frac{1}{3}\begin{pmatrix}2&1\\-1&1\end{pmatrix}=\frac{1}{3}\begin{pmatrix}10+2&5-2\\10-4&5+4\end{pmatrix}=\frac{1}{3}\begin{pmatrix}12&3\\6&9\end{pmatrix}=\begin{pmatrix}4&1\\2&3\end{pmatrix}=A\,\checkmark

Common mistake

Wrong: putting λ1\lambda_1 first in DD but v2\mathbf{v}_2 first in PP.
Why it happens: writing eigenvalues in one order and eigenvectors in another without tracking correspondences.
Correct: the jj-th column of PP must be an eigenvector for the jj-th diagonal entry of DD. Swapping both simultaneously (swapping column jj in PP and entry jj in DD) is fine.
Check: verify AP=PDAP=PD column by column: Avj=λjvjA\mathbf{v}_j=\lambda_j\mathbf{v}_j for each jj.


04 — Computing Matrix Powers via Diagonalization

Ak=PDkP1,Dk=(λ1k00λ2k)A^k = P D^k P^{-1}, \quad D^k = \begin{pmatrix}\lambda_1^k & 0 & \cdots \\ 0 & \lambda_2^k & \cdots \\ \vdots & & \ddots\end{pmatrix}

Compute $A^{10}$ for $A=\begin{pmatrix}4&1\\2&3\end{pmatrix}$

D10=(51000210)=(9765625001024)D^{10}=\begin{pmatrix}5^{10}&0\\0&2^{10}\end{pmatrix}=\begin{pmatrix}9765625&0\\0&1024\end{pmatrix}510=9,765,6255^{10}=9{,}765{,}625; 210=10242^{10}=1024.

A10=13(1112)(9765625001024)(2111)A^{10}=\frac{1}{3}\begin{pmatrix}1&-1\\1&2\end{pmatrix}\begin{pmatrix}9765625&0\\0&1024\end{pmatrix}\begin{pmatrix}2&1\\-1&1\end{pmatrix} — use PP and P1P^{-1} from the diagonalisation step.

PD10=(9765625102497656252048)PD^{10}=\begin{pmatrix}9765625&-1024\\9765625&2048\end{pmatrix} — row ii, col jj: column jj of PP scaled by diagonal entry jj of D10D^{10}.

A10=13(97656252+10249765625102497656252220489765625+22048)A^{10}=\frac{1}{3}\begin{pmatrix}9765625\cdot2+1024&9765625-1024\\9765625\cdot2-2\cdot2048&9765625+2\cdot2048\end{pmatrix} =13(195322749764601195615219769721)=\frac{1}{3}\begin{pmatrix}19532274&9764601\\19561521&9769721\end{pmatrix} These are integers because det(P)=3\det(P)=3 divides all entries in the product.

Note

Without diagonalisation, computing A10A^{10} requires 10 matrix multiplications. With diagonalisation, it requires raising scalars to powers — far cheaper for large kk or large nn.


05 — Orthogonal Diagonalization of Symmetric Matrices

Definition

If A=AA=A^\top (symmetric), then AA is orthogonally diagonalisable: there exists an orthogonal matrix QQ (QQ=IQ^\top Q=I, i.e. Q1=QQ^{-1}=Q^\top) such that

A=QDQA=QDQ^\top

The columns of QQ are orthonormal eigenvectors; the diagonal entries of DD are real eigenvalues.

Example

A covariance matrix Σ\Sigma is symmetric (Σ=Σ\Sigma=\Sigma^\top) and positive semi-definite. By the spectral theorem: Σ=QDQ,D=diag(λ1,,λp),  λi0\Sigma=QDQ^\top, \quad D=\text{diag}(\lambda_1,\ldots,\lambda_p),\;\lambda_i\geq0 The columns of QQ are orthonormal principal directions (PCs); DD contains their variances. This makes PCA a stable, unique decomposition.


06 — Quant Application — Matrix Powers in Markov Chains

A Markov chain transition matrix MM has columns summing to 1 (stochastic matrix). The state after kk steps is sk=Mks0\mathbf{s}_k=M^k\mathbf{s}_0.

If MM is diagonalisable with M=PDP1M=PDP^{-1}:

sk=PDkP1s0\mathbf{s}_k=PD^kP^{-1}\mathbf{s}_0

As kk\to\infty, eigenvalues with λ<1|\lambda|<1 decay to zero. The dominant eigenvalue is always λ1=1\lambda_1=1 (for a stochastic matrix), and MkM^k\to steady-state matrix as kk\to\infty.

In credit risk, rating transition matrices are Markov chains. Computing the 5-year migration probability matrix requires M5=PD5P1M^5=PD^5P^{-1} — diagonalisation makes this tractable.

Similarly, in the Vasicek interest rate model, the mean-reversion dynamics involve a matrix exponential etAe^{tA} which reduces to PetDP1Pe^{tD}P^{-1} when AA is diagonalisable.


Exercises

EXERCISE 7.1

Check whether the matrix has nn distinct eigenvalues (automatic diagonalisability), or has repeated eigenvalues (check geometric vs algebraic multiplicity). A zero eigenvalue does not prevent diagonalisability by itself.

(a) A=(1003)A=\begin{pmatrix}1&0\\0&3\end{pmatrix}: already diagonal; eigenvalues 1,31,3 distinct. Diagonalisable.

(b) B=(2102)B=\begin{pmatrix}2&1\\0&2\end{pmatrix}: λ=2\lambda=2 (multiplicity 2). B2I=(0100)B-2I=\begin{pmatrix}0&1\\0&0\end{pmatrix}; dimker=1<2\dim\ker=1 < 2. Not diagonalisable (defective).

(c) C=(0110)C=\begin{pmatrix}0&-1\\1&0\end{pmatrix}: p(λ)=λ2+1p(\lambda)=\lambda^2+1; roots ±iC\pm i\in\mathbb{C}. Not diagonalisable over R\mathbb{R} (diagonalisable over C\mathbb{C}).

Determine whether each matrix is diagonalisable (over R\mathbb{R}) and explain why:
(a) (1003)\begin{pmatrix}1&0\\0&3\end{pmatrix}, (b) (2102)\begin{pmatrix}2&1\\0&2\end{pmatrix}, (c) (0110)\begin{pmatrix}0&-1\\1&0\end{pmatrix}.

EXERCISE 7.2

Find eigenvalues from det(AλI)=0\det(A-\lambda I)=0. Find eigenvectors for each. Form PP from eigenvectors as columns; DD with eigenvalues matching. Verify AP=PDAP=PD.

A=(3210)A=\begin{pmatrix}3&-2\\1&0\end{pmatrix}.

p(λ)=(3λ)(0λ)(2)(1)=λ23λ+2=(λ1)(λ2)λ1=2,λ2=1p(\lambda)=(3-\lambda)(0-\lambda)-(-2)(1)=\lambda^2-3\lambda+2=(\lambda-1)(\lambda-2) \Rightarrow \lambda_1=2,\,\lambda_2=1.

For λ=2\lambda=2: (A2I)=(1212)(A-2I)=\begin{pmatrix}1&-2\\1&-2\end{pmatrix}; v1=2v2v_1=2v_2; v1=(21)\mathbf{v}_1=\begin{pmatrix}2\\1\end{pmatrix}.

For λ=1\lambda=1: (AI)=(2211)(A-I)=\begin{pmatrix}2&-2\\1&-1\end{pmatrix}; v1=v2v_1=v_2; v2=(11)\mathbf{v}_2=\begin{pmatrix}1\\1\end{pmatrix}.

P=(2111)P=\begin{pmatrix}2&1\\1&1\end{pmatrix}, D=(2001)D=\begin{pmatrix}2&0\\0&1\end{pmatrix}.

det(P)=21=1\det(P)=2-1=1; P1=(1112)P^{-1}=\begin{pmatrix}1&-1\\-1&2\end{pmatrix}.

Check: PDP1=(2111)(2001)(1112)=(4121)(1112)=(3210)=APDP^{-1}=\begin{pmatrix}2&1\\1&1\end{pmatrix}\begin{pmatrix}2&0\\0&1\end{pmatrix}\begin{pmatrix}1&-1\\-1&2\end{pmatrix}=\begin{pmatrix}4&1\\2&1\end{pmatrix}\begin{pmatrix}1&-1\\-1&2\end{pmatrix}=\begin{pmatrix}3&-2\\1&0\end{pmatrix}=A\,\checkmark.

Diagonalise A=(3210)A=\begin{pmatrix}3&-2\\1&0\end{pmatrix}: find PP and DD such that A=PDP1A=PDP^{-1}.

EXERCISE 7.3

Once A=PDP1A=PDP^{-1}, use Ak=PDkP1A^k=PD^kP^{-1}. Compute D6D^6 by raising each diagonal entry to the 6th power, then carry out the two matrix multiplications.

Using A=PDP1A=PDP^{-1} from Exercise 7.2 with λ1=2\lambda_1=2, λ2=1\lambda_2=1.

D6=(260016)=(64001)D^6=\begin{pmatrix}2^6&0\\0&1^6\end{pmatrix}=\begin{pmatrix}64&0\\0&1\end{pmatrix}.

A6=PD6P1=(2111)(64001)(1112)A^6=PD^6P^{-1}=\begin{pmatrix}2&1\\1&1\end{pmatrix}\begin{pmatrix}64&0\\0&1\end{pmatrix}\begin{pmatrix}1&-1\\-1&2\end{pmatrix}.

PD6=(1281641)PD^6=\begin{pmatrix}128&1\\64&1\end{pmatrix}.

A6=(1281641)(1112)=(1271266362)A^6=\begin{pmatrix}128&1\\64&1\end{pmatrix}\begin{pmatrix}1&-1\\-1&2\end{pmatrix}=\begin{pmatrix}127&-126\\63&-62\end{pmatrix}.

Using the diagonalisation from Exercise 7.2, compute A6A^6.

EXERCISE 7.4

For a symmetric matrix, find eigenvalues, then for each eigenvalue find an eigenvector and normalise it to unit length. The resulting unit vectors form the columns of QQ. Verify QQ=IQ^\top Q=I.

S=(2112)S=\begin{pmatrix}2&1\\1&2\end{pmatrix}. p(λ)=(2λ)21=λ24λ+3=(λ3)(λ1)λ1=3,λ2=1p(\lambda)=(2-\lambda)^2-1=\lambda^2-4\lambda+3=(\lambda-3)(\lambda-1) \Rightarrow \lambda_1=3,\,\lambda_2=1.

For λ=3\lambda=3: S3I=(1111)S-3I=\begin{pmatrix}-1&1\\1&-1\end{pmatrix}; v=(11)\mathbf{v}=\begin{pmatrix}1\\1\end{pmatrix}; normalised: q1=12(11)\mathbf{q}_1=\frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\end{pmatrix}.

For λ=1\lambda=1: SI=(1111)S-I=\begin{pmatrix}1&1\\1&1\end{pmatrix}; v=(11)\mathbf{v}=\begin{pmatrix}-1\\1\end{pmatrix}; normalised: q2=12(11)\mathbf{q}_2=\frac{1}{\sqrt{2}}\begin{pmatrix}-1\\1\end{pmatrix}.

Q=12(1111)Q=\frac{1}{\sqrt{2}}\begin{pmatrix}1&-1\\1&1\end{pmatrix}, D=(3001)D=\begin{pmatrix}3&0\\0&1\end{pmatrix}, so S=QDQS=QDQ^\top.

Verify: QQ=12(1111)(1111)=12(2002)=IQ^\top Q=\frac{1}{2}\begin{pmatrix}1&1\\-1&1\end{pmatrix}\begin{pmatrix}1&-1\\1&1\end{pmatrix}=\frac{1}{2}\begin{pmatrix}2&0\\0&2\end{pmatrix}=I\,\checkmark.

Orthogonally diagonalise the symmetric matrix S=(2112)S=\begin{pmatrix}2&1\\1&2\end{pmatrix}: find orthogonal QQ and diagonal DD such that S=QDQS=QDQ^\top.

EXERCISE 7.5

Long-run behaviour is governed by the dominant eigenvalue. If λ1>λ2|\lambda_1|>|\lambda_2|, then Akλ1kv1(u1x0)/λ1kA^k\approx\lambda_1^k\mathbf{v}_1(\mathbf{u}_1^\top\mathbf{x}_0)/\lambda_1^k scaled — the component along v2\mathbf{v}_2 decays relative to v1\mathbf{v}_1. What happens if λ>1|\lambda|>1 vs λ<1|\lambda|<1?

Akx0=PDkP1x0=c1λ1kv1+c2λ2kv2A^k\mathbf{x}_0=PD^kP^{-1}\mathbf{x}_0=c_1\lambda_1^k\mathbf{v}_1+c_2\lambda_2^k\mathbf{v}_2 where P1x0=(c1,c2)P^{-1}\mathbf{x}_0=(c_1,c_2)^\top.

Case λ1>1|\lambda_1|>1, λ2<1|\lambda_2|<1: the v2\mathbf{v}_2 component decays. As kk\to\infty, Akx0c1λ1kv1A^k\mathbf{x}_0\approx c_1\lambda_1^k\mathbf{v}_1 — diverges along v1\mathbf{v}_1. The dominant eigenvalue λ1\lambda_1 governs long-run growth rate.

Case λ1=λ2<1|\lambda_1|=|\lambda_2|<1: both components decay. Akx00A^k\mathbf{x}_0\to\mathbf{0} — the system is stable.

Case λ1=1|\lambda_1|=1, λ2<1|\lambda_2|<1 (stochastic matrix): the state converges to the eigenvector v1\mathbf{v}_1 for λ1=1\lambda_1=1 — the steady state distribution.

Suppose A=PDP1A=PDP^{-1} with D=diag(λ1,λ2)D=\text{diag}(\lambda_1,\lambda_2). Describe the long-run behaviour of Akx0A^k\mathbf{x}_0 for three cases: λ1>1>λ2|\lambda_1|>1>|\lambda_2|; λ1=λ2<1|\lambda_1|=|\lambda_2|<1; λ1=1>λ2\lambda_1=1>|\lambda_2|.

EXERCISE 7.6

A credit rating transition matrix MM is stochastic. M4M^4 gives 4-year transitions. Diagonalise MM (or note the special structure), raise DD to the 4th power, and read off the probability of migrating from AA to default in 4 years. Eigenvalue 11 corresponds to the absorbing state (default is absorbing here).

M=(0.90.101)M=\begin{pmatrix}0.9&0.1\\0&1\end{pmatrix} (AA stays AA with prob 0.90.9; default is absorbing).

Upper triangular: eigenvalues λ1=0.9\lambda_1=0.9, λ2=1\lambda_2=1.

M4=(0.9401)=(0.65610.343901)M^4=\begin{pmatrix}0.9^4&\star\\0&1\end{pmatrix}=\begin{pmatrix}0.6561&0.3439\\0&1\end{pmatrix}.

(Top-right entry: 10.94=0.34391-0.9^4=0.3439, since rows must sum to 1 for a stochastic matrix.)

Interpretation: a bond rated AA today has a 34.39%34.39\% probability of having defaulted within 4 years according to this simple model. The dominant eigenvalue λ2=1\lambda_2=1 is the default (absorbing) state; all probability mass eventually flows there as kk\to\infty, because 0.9k00.9^k\to0.

A simplified two-state credit rating model has transition matrix M=(0.90.101)M=\begin{pmatrix}0.9&0.1\\0&1\end{pmatrix} where state 1 = AA-rated, state 2 = default (absorbing). Compute the 4-year transition matrix M4M^4 and interpret the result for a bond currently rated AA.


Chapter Summary

ConceptFormula / Rule
DiagonalisationA=PDP1A=PDP^{-1}; columns of PP = eigenvectors; diagonal of DD = eigenvalues
Diagonalisable conditionnn linearly independent eigenvectors (sufficient: nn distinct eigenvalues)
Matrix powerAk=PDkP1A^k=PD^kP^{-1}; Dk=diag(λ1k,,λnk)D^k=\text{diag}(\lambda_1^k,\ldots,\lambda_n^k)
Matrix exponentialetA=PetDP1e^{tA}=Pe^{tD}P^{-1} for diagonalisable AA
Spectral theoremSymmetric A=QDQA=QDQ^\top; QQ orthogonal
Defective matrixmg<mam_g<m_a for some λ\lambda; requires Jordan form
Dominant eigenvalueGoverns Akx0A^k\mathbf{x}_0 as kk\to\infty

Next chapter: Chapter 08 — Linear Transformations, where we study maps T:VWT:V\to W satisfying T(u+v)=T(u)+T(v)T(\mathbf{u}+\mathbf{v})=T(\mathbf{u})+T(\mathbf{v}) and T(cv)=cT(v)T(c\mathbf{v})=cT(\mathbf{v}), and connect them to matrix representations.