Chapter 20

Numerical Stability & Conditioning

00 · Symbol Glossary

$\kappa(A)$kappa A — condition number

The condition number of an invertible matrix AA measures how much relative errors in b\mathbf{b} or in AA can amplify in the solution of Ax=bA\mathbf{x}=\mathbf{b}. For the 2-norm: κ2(A)=AA1=σ1/σn\kappa_2(A) = \|A\|\,\|A^{-1}\| = \sigma_1/\sigma_n. Large κ(A)\kappa(A) means the problem is ill-conditioned — small input perturbations cause large output changes.

$\varepsilon_{\text{mach}}$epsilon mach — machine epsilon

The smallest floating-point number such that 1+εmach>11 + \varepsilon_{\text{mach}} > 1 in computer arithmetic. For IEEE double precision, εmach2.2×1016\varepsilon_{\text{mach}} \approx 2.2 \times 10^{-16}. All numerical algorithms lose precision at a rate governed by εmachκ(A)\varepsilon_{\text{mach}} \cdot \kappa(A).

$\|A\|$norm of A — matrix norm

A measure of the "size" of a matrix. The operator 2-norm A2=σ1\|A\|_2 = \sigma_1 is the largest stretching factor. The Frobenius norm AF=ijaij2\|A\|_F = \sqrt{\sum_{ij} a_{ij}^2} treats the matrix as a long vector. Condition numbers depend on the norm chosen; κ2\kappa_2 is standard in numerical analysis.

$\tilde{A}$A tilde — computed matrix

The matrix actually stored in floating-point — a perturbed version of the exact AA. Backward stability means the computed answer is the exact answer for a nearby problem A~x=b~\tilde{A}\mathbf{x}=\tilde{\mathbf{b}}.


01 · Why Numerical Stability Matters

Chapters 16–18 derived exact LU, QR, and SVD factorisations. A computer stores numbers to only 16\sim 16 decimal digits. When κ(A)\kappa(A) is large, those digits are not enough — the computed solution can be meaningless even when the algorithm is algebraically correct.

Definition — Well-Conditioned vs Ill-Conditioned

For Ax=bA\mathbf{x}=\mathbf{b}, the relative error in the solution satisfies approximately:

Δxxκ(A)Δbb\frac{\|\Delta\mathbf{x}\|}{\|\mathbf{x}\|} \lesssim \kappa(A) \cdot \frac{\|\Delta\mathbf{b}\|}{\|\mathbf{b}\|}

κ(A)1\kappa(A) \approx 1well-conditioned: input noise barely affects the output.

κ(A)1\kappa(A) \gg 1ill-conditioned: small relative errors in b\mathbf{b} (or AA) can cause large relative errors in x\mathbf{x}.

κ(A)1\kappa(A) \geq 1 always (for any norm). κ(I)=1\kappa(I) = 1.

✓ Example — Portfolio Optimisation Sensitivity

A covariance matrix ΣR100×100\Sigma \in \mathbb{R}^{100 \times 100} has κ(Σ)=106\kappa(\Sigma) = 10^6 — nearly collinear factor exposures make some eigenvalues very small.

A 10810^{-8} relative perturbation in Σ\Sigma (within machine noise) can produce a 108×106=10210^{-8} \times 10^6 = 10^{-2} relative error in optimal weights — a 1%1\% allocation error on a billion-dollar book is 1010 million dollars.

Knowing κ(Σ)\kappa(\Sigma) before inverting tells you whether the optimiser's output is trustworthy.

❌ Failure — Trusting a Solution Without Checking $\kappa(A)$

Solve Ax=bA\mathbf{x}=\mathbf{b} where A=(1111.0001)A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix}, b=(22.0001)\mathbf{b}=\begin{pmatrix}2\\2.0001\end{pmatrix}. Exact solution: x=(11)\mathbf{x}=\begin{pmatrix}1\\1\end{pmatrix}.

Perturb b\mathbf{b} to b~=(22.0002)\tilde{\mathbf{b}}=\begin{pmatrix}2\\2.0002\end{pmatrix} — relative change 104\sim 10^{-4}.

New solution: x=(02)\mathbf{x}=\begin{pmatrix}0\\2\end{pmatrix} — completely different allocation.

Why it breaks: κ(A)2×104\kappa(A) \approx 2 \times 10^4. The near-parallel rows of AA make the system sensitive.

Consequence: a correct LU or QR implementation still returns a useless answer for ill-conditioned AA. The problem is the data, not the code.


02 · The Condition Number via SVD

Definition — Condition Number in the 2-Norm

For invertible ARn×nA \in \mathbb{R}^{n \times n} with singular values σ1σn>0\sigma_1 \geq \cdots \geq \sigma_n > 0:

κ2(A)=σ1σn=A2A12\kappa_2(A) = \frac{\sigma_1}{\sigma_n} = \|A\|_2 \cdot \|A^{-1}\|_2

σ1\sigma_1 — largest singular value; A2=σ1\|A\|_2 = \sigma_1.

σn\sigma_n — smallest singular value; A12=1/σn\|A^{-1}\|_2 = 1/\sigma_n.

κ2(A)1\kappa_2(A) \geq 1. Equality holds for orthogonal matrices (σ1=σn=1\sigma_1 = \sigma_n = 1).

Step-by-step — $\kappa_2(A)$ for $A = \begin{pmatrix}1&2\\2&1\end{pmatrix}$
1

Singular values from Chapter 18: σ1=3\sigma_1 = 3, σ2=1\sigma_2 = 1.

2

Compute condition number: κ2(A)=σ1/σ2=3/1=3\kappa_2(A) = \sigma_1/\sigma_2 = 3/1 = 3.

The 33 comes from the ratio of largest to smallest stretching.

3

Interpret: a relative error in b\mathbf{b} of size 101010^{-10} can cause a relative error in x\mathbf{x} of at most 3×1010\approx 3 \times 10^{-10}. This system is well-conditioned — three digits of safety beyond machine epsilon.

Hilbert Matrix — Canonical Ill-Conditioned Example

The n×nn \times n Hilbert matrix Hij=1/(i+j1)H_{ij} = 1/(i+j-1) has κ2(Hn)\kappa_2(H_n) growing as e3.5n\sim e^{3.5n}. For n=10n=10, κ2(H10)1013\kappa_2(H_{10}) \approx 10^{13} — inversion in double precision loses all reliable digits. The Hilbert matrix is a standard test for numerical linear algebra libraries.


03 · Forward Error vs Backward Error

Definition — Forward and Backward Stability

Forward error: xcomputedxexact\|\mathbf{x}_{\text{computed}} - \mathbf{x}_{\text{exact}}\| — how far the computed solution is from the true answer.

Backward error: the smallest perturbation ΔA\Delta A such that xcomputed\mathbf{x}_{\text{computed}} is the exact solution of (A+ΔA)x=b(A + \Delta A)\mathbf{x} = \mathbf{b}.

An algorithm is backward stable if ΔA/A\|\Delta A\| / \|A\| is small — the computed answer solves a nearby problem. Backward stability does not guarantee small forward error when κ(A)\kappa(A) is large.

❌ Failure — Backward Stable but Forward Inaccurate

LU with partial pivoting is backward stable: ΔA/Aεmach\|\Delta A\|/\|A\| \sim \varepsilon_{\text{mach}}.

For Hilbert H12H_{12}, LU produces ΔA\Delta A tiny but xcomputed\mathbf{x}_{\text{computed}} has no correct digits because κ(H12)1016\kappa(H_{12}) \sim 10^{16}.

Consequence: backward stability is a property of the algorithm; forward accuracy requires both a stable algorithm and a well-conditioned problem. QR and SVD are preferred for least squares precisely because they are backward stable and avoid squaring κ(A)\kappa(A).


04 · Stability of LU, QR, and SVD

Definition — Conditioning of Decomposition Methods
MethodOperationConditioning impact
LU solveSolve Ax=bA\mathbf{x}=\mathbf{b}κ(A)\kappa(A)
Normal equationsForm ATAA^TA, solveκ(A)2\kappa(A)^2
QR least squaresSolve Rx=QTbR\mathbf{x}=Q^T\mathbf{b}κ(A)\kappa(A)
SVD least squaresx^=A+b\hat{\mathbf{x}} = A^+\mathbf{b}κ(A)\kappa(A); can truncate small σi\sigma_i

Normal equations square the condition number. QR and SVD avoid this — Chapter 17's warning made precise.

✓ Example — Regression with $\kappa(A) = 10^4$

Design matrix AR1000×10A \in \mathbb{R}^{1000 \times 10} for factor regression. κ(A)=104\kappa(A) = 10^4.

Normal equations: κ(ATA)=108\kappa(A^TA) = 10^8. Expected relative error 1016108=108\sim 10^{-16} \cdot 10^8 = 10^{-8} — betas unreliable below 10410^{-4} relative precision.

QR: κ(R)=κ(A)=104\kappa(R) = \kappa(A) = 10^4. Expected error 1012\sim 10^{-12} — four extra digits of reliability.

For risk attribution requiring 6 significant figures in factor exposures, QR is mandatory; normal equations fail silently.


05 · Truncated SVD and Regularisation

When κ(A)\kappa(A) is huge because σn0\sigma_n \approx 0 (rank deficiency or near-dependency), truncating tiny singular values stabilises the solution.

Definition — Truncated SVD (Regularised Pseudoinverse)

Replace Σ+\Sigma^+ with a truncated version that sets σi1=0\sigma_i^{-1} = 0 when σi<τ\sigma_i < \tau (threshold):

x^=i:σiτ1σib,uivi\hat{\mathbf{x}} = \sum_{i:\,\sigma_i \geq \tau} \frac{1}{\sigma_i}\langle\mathbf{b},\mathbf{u}_i\rangle\,\mathbf{v}_i

This is Tikhonov regularisation in disguise — trading bias (approximate fit) for lower variance (stability). In portfolio optimisation, it corresponds to discarding near-zero-variance factor directions that amplify noise.

✓ Example — Ridge Regression as Stability Tool

Nearly collinear factors give σ10=108\sigma_{10} = 10^{-8} in the SVD of AA. Including 1/σ101/\sigma_{10} in A+A^+ amplifies noise by 10810^8.

Truncating σ10\sigma_{10} (ridge penalty λ\lambda corresponds to τ=λ\tau = \sqrt{\lambda}) yields stable betas at the cost of slight bias. Cross-validation selects τ\tau — standard in quantitative equity factor models.


06 · Practice Exercises

EXERCISE 20.1

κ2(A)=σmax/σmin\kappa_2(A) = \sigma_{\max}/\sigma_{\min}. For diagonal AA, singular values are aii|a_{ii}|.

A=(2000.01)A = \begin{pmatrix}2&0\\0&0.01\end{pmatrix}. σ1=2\sigma_1 = 2, σ2=0.01\sigma_2 = 0.01.

κ2(A)=2/0.01=200\kappa_2(A) = 2/0.01 = 200.

A 10410^{-4} relative error in b\mathbf{b} can cause up to 200×104=0.02=2%200 \times 10^{-4} = 0.02 = 2\% relative error in x\mathbf{x}.

For A=(2000.01)A = \begin{pmatrix}2&0\\0&0.01\end{pmatrix}, compute κ2(A)\kappa_2(A) from singular values. If Δb/b=104\|\Delta\mathbf{b}\|/\|\mathbf{b}\| = 10^{-4}, bound the relative error in x\mathbf{x}.

EXERCISE 20.2

κ(A)AA1\kappa(A) \approx \|A\|\|A^{-1}\|. For 2×22\times2, compute det(A)\det(A) and A1A^{-1} explicitly.

A=(1111.0001)A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix}. det=1.00011=0.0001\det = 1.0001-1 = 0.0001.

A1=10.0001(1.0001111)=(10001100001000010000)A^{-1} = \frac{1}{0.0001}\begin{pmatrix}1.0001&-1\\-1&1\end{pmatrix} = \begin{pmatrix}10001&-10000\\-10000&10000\end{pmatrix}.

A22\|A\|_2 \approx 2 (order of magnitude). A1220000\|A^{-1}\|_2 \approx 20000.

κ(A)4×104\kappa(A) \approx 4 \times 10^4 — ill-conditioned. Near-duplicate rows cause extreme sensitivity.

Estimate κ(A)\kappa(A) for A=(1111.0001)A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix} using A=2.0001\|A\|_\infty = 2.0001 and A1\|A^{-1}\|_\infty. Explain why this matrix is ill-conditioned.

EXERCISE 20.3

κ(ATA)=κ(A)2\kappa(A^TA) = \kappa(A)^2. If κ(A)=103\kappa(A)=10^3, normal equations lose 6\approx 6 digits.

κ(A)=103\kappa(A) = 10^3. κ(ATA)=106\kappa(A^TA) = 10^6.

Double precision: εmach2×1016\varepsilon_{\text{mach}} \approx 2\times10^{-16}.

Normal equations relative error bound: 1016106=1010\sim 10^{-16} \cdot 10^6 = 10^{-10} — about 10 reliable digits.

QR relative error bound: 1016103=1013\sim 10^{-16} \cdot 10^3 = 10^{-13} — about 13 reliable digits.

QR preserves 3 extra digits — critical for κ(A)104\kappa(A) \geq 10^4.

If κ(A)=103\kappa(A) = 10^3, compare κ(ATA)\kappa(A^TA) to κ(A)\kappa(A). How many digits of accuracy might normal equations lose relative to QR in double precision?

EXERCISE 20.4

Orthogonal matrices have σ1=σn=1\sigma_1 = \sigma_n = 1, so κ(Q)=1\kappa(Q)=1. They are perfectly conditioned.

Q=(0110)Q = \begin{pmatrix}0&-1\\1&0\end{pmatrix} (90° rotation). QTQ=IQ^TQ = I.

Singular values of QQ: both equal 1. κ2(Q)=1/1=1\kappa_2(Q) = 1/1 = 1.

Qx=x\|Q\mathbf{x}\| = \|\mathbf{x}\| for all x\mathbf{x} — rotations preserve length. No amplification of errors.

Therefore orthogonal transformations (QR's QQ factor, SVD's UU and VV) are numerically safe operations.

Show that κ2(Q)=1\kappa_2(Q) = 1 for any orthogonal matrix QQ. Use Q=(0110)Q = \begin{pmatrix}0&-1\\1&0\end{pmatrix} as a concrete example and explain why orthogonal transformations preserve lengths.

EXERCISE 20.5

κ(Hn)\kappa(H_n) grows rapidly with nn. For n=5n=5, κ(H5)5×105\kappa(H_5) \approx 5 \times 10^5. Inversion loses log10(κ)\log_{10}(\kappa) digits.

H5H_5 has entries Hij=1/(i+j1)H_{ij} = 1/(i+j-1).

κ2(H5)4.8×105\kappa_2(H_5) \approx 4.8 \times 10^5.

Digits lost log10(4.8×105)5.7\approx \log_{10}(4.8\times10^5) \approx 5.7.

In double precision (16\sim 16 digits), solving H5x=bH_5\mathbf{x}=\mathbf{b} via LU yields at most 166=1016-6 = 10 reliable digits — and fewer if b\mathbf{b} also has error.

For n=12n=12, κ(H12)1016\kappa(H_{12}) \sim 10^{16} — complete loss of precision. The Hilbert matrix demonstrates that problem conditioning, not algorithm choice, can make the answer unknowable.

The 5×55 \times 5 Hilbert matrix has κ2(H5)4.8×105\kappa_2(H_5) \approx 4.8 \times 10^5. How many digits of accuracy are lost in solving H5x=bH_5\mathbf{x}=\mathbf{b} in double precision? Why does increasing nn make the problem worse?

EXERCISE 20.6

κ(Σ)\kappa(\Sigma) large means eigenvalues span a wide range — near-collinearity. Invert via SVD, truncating small σi\sigma_i. Optimal weights w=Σ1μ\mathbf{w}^* = \Sigma^{-1}\boldsymbol{\mu} become unstable when κ(Σ)1\kappa(\Sigma) \gg 1.

Σ=(10.9990.9991)\Sigma = \begin{pmatrix}1&0.999\\0.999&1\end{pmatrix}. Eigenvalues: 1+0.999=1.9991+0.999=1.999 and 10.999=0.0011-0.999=0.001.

κ(Σ)=1.999/0.0011999\kappa(\Sigma) = 1.999/0.001 \approx 1999.

Σ1=10.001(2.001)(10.9990.9991)\Sigma^{-1} = \frac{1}{0.001(2.001)}\begin{pmatrix}1&-0.999\\-0.999&1\end{pmatrix} — entries 500\sim 500.

A 10610^{-6} perturbation in Σ12\Sigma_{12} changes Σ1\Sigma^{-1} by order 500×106=5×104500 \times 10^{-6} = 5\times10^{-4} — large relative swings in optimal portfolio weights.

Remedy: SVD of Σ\Sigma, truncate σ2=0.001\sigma_2 = 0.001 (or add ridge λI\lambda I). Stable approximate weights sacrifice exact mean-variance optimality for robustness — standard in production portfolio systems.

A covariance matrix Σ=(10.9990.9991)\Sigma = \begin{pmatrix}1&0.999\\0.999&1\end{pmatrix} arises from two nearly identical factors. Compute κ(Σ)\kappa(\Sigma), explain why Σ1\Sigma^{-1} is numerically dangerous, and describe one stabilisation strategy used in portfolio optimisation.


07 · Summary

TermDefinition
Condition number κ(A)\kappa(A)Amplification factor for relative input errors; κ2=σ1/σn\kappa_2 = \sigma_1/\sigma_n
Well-conditionedκ(A)1\kappa(A) \approx 1; small input noise, small output change
Ill-conditionedκ(A)1\kappa(A) \gg 1; solution sensitive to perturbations
Machine epsilonεmach2.2×1016\varepsilon_{\text{mach}} \approx 2.2\times10^{-16} in double precision
Forward errorxcomputedxexact\|\mathbf{x}_{\text{computed}} - \mathbf{x}_{\text{exact}}\|
Backward errorSmallest ΔA\Delta A such that computed x\mathbf{x} is exact for A+ΔAA+\Delta A
Normal equationsκ(ATA)=κ(A)2\kappa(A^TA) = \kappa(A)^2 — avoid in practice
QR / SVDBackward stable; κ\kappa not squared
Truncated SVDRegularisation by discarding small σi\sigma_i

Next: Calculus — Limits & Continuity — the analytical foundations for derivatives, optimisation, and stochastic models that build on the linear algebra toolkit completed in this subject.