Chapter 20

Numerical Stability & Conditioning

00 · Symbol Glossary

$\kappa(A)$kappa A — condition number

The condition number of an invertible matrix $A$ measures how much relative errors in $\mathbf{b}$ or in $A$ can amplify in the solution of $A\mathbf{x}=\mathbf{b}$ . For the 2-norm: $\kappa_2(A) = \|A\|\,\|A^{-1}\| = \sigma_1/\sigma_n$ . Large $\kappa(A)$ means the problem is ill-conditioned — small input perturbations cause large output changes.

$\varepsilon_{\text{mach}}$epsilon mach — machine epsilon

The smallest floating-point number such that $1 + \varepsilon_{\text{mach}} > 1$ in computer arithmetic. For IEEE double precision, $\varepsilon_{\text{mach}} \approx 2.2 \times 10^{-16}$ . All numerical algorithms lose precision at a rate governed by $\varepsilon_{\text{mach}} \cdot \kappa(A)$ .

$\|A\|$norm of A — matrix norm

A measure of the "size" of a matrix. The operator 2-norm $\|A\|_2 = \sigma_1$ is the largest stretching factor. The Frobenius norm $\|A\|_F = \sqrt{\sum_{ij} a_{ij}^2}$ treats the matrix as a long vector. Condition numbers depend on the norm chosen; $\kappa_2$ is standard in numerical analysis.

$\tilde{A}$A tilde — computed matrix

The matrix actually stored in floating-point — a perturbed version of the exact $A$ . Backward stability means the computed answer is the exact answer for a nearby problem $\tilde{A}\mathbf{x}=\tilde{\mathbf{b}}$ .

01 · Why Numerical Stability Matters

Chapters 16–18 derived exact LU, QR, and SVD factorisations. A computer stores numbers to only $\sim 16$ decimal digits. When $\kappa(A)$ is large, those digits are not enough — the computed solution can be meaningless even when the algorithm is algebraically correct.

Definition — Well-Conditioned vs Ill-Conditioned

For $A\mathbf{x}=\mathbf{b}$ , the relative error in the solution satisfies approximately:

\frac{\|\Delta\mathbf{x}\|}{\|\mathbf{x}\|} \lesssim \kappa(A) \cdot \frac{\|\Delta\mathbf{b}\|}{\|\mathbf{b}\|}

$\kappa(A) \approx 1$ — well-conditioned: input noise barely affects the output.

$\kappa(A) \gg 1$ — ill-conditioned: small relative errors in $\mathbf{b}$ (or $A$ ) can cause large relative errors in $\mathbf{x}$ .

$\kappa(A) \geq 1$ always (for any norm). $\kappa(I) = 1$ .

✓ Example — Portfolio Optimisation Sensitivity

A covariance matrix $\Sigma \in \mathbb{R}^{100 \times 100}$ has $\kappa(\Sigma) = 10^6$ — nearly collinear factor exposures make some eigenvalues very small.

A $10^{-8}$ relative perturbation in $\Sigma$ (within machine noise) can produce a $10^{-8} \times 10^6 = 10^{-2}$ relative error in optimal weights — a $1\%$ allocation error on a billion-dollar book is $10$ million dollars.

Knowing $\kappa(\Sigma)$ before inverting tells you whether the optimiser's output is trustworthy.

❌ Failure — Trusting a Solution Without Checking $\kappa(A)$

Solve $A\mathbf{x}=\mathbf{b}$ where $A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix}$ , $\mathbf{b}=\begin{pmatrix}2\\2.0001\end{pmatrix}$ . Exact solution: $\mathbf{x}=\begin{pmatrix}1\\1\end{pmatrix}$ .

Perturb $\mathbf{b}$ to $\tilde{\mathbf{b}}=\begin{pmatrix}2\\2.0002\end{pmatrix}$ — relative change $\sim 10^{-4}$ .

New solution: $\mathbf{x}=\begin{pmatrix}0\\2\end{pmatrix}$ — completely different allocation.

Why it breaks: $\kappa(A) \approx 2 \times 10^4$ . The near-parallel rows of $A$ make the system sensitive.

Consequence: a correct LU or QR implementation still returns a useless answer for ill-conditioned $A$ . The problem is the data, not the code.

02 · The Condition Number via SVD

Definition — Condition Number in the 2-Norm

For invertible $A \in \mathbb{R}^{n \times n}$ with singular values $\sigma_1 \geq \cdots \geq \sigma_n > 0$ :

\kappa_2(A) = \frac{\sigma_1}{\sigma_n} = \|A\|_2 \cdot \|A^{-1}\|_2

$\sigma_1$ — largest singular value; $\|A\|_2 = \sigma_1$ .

$\sigma_n$ — smallest singular value; $\|A^{-1}\|_2 = 1/\sigma_n$ .

$\kappa_2(A) \geq 1$ . Equality holds for orthogonal matrices ( $\sigma_1 = \sigma_n = 1$ ).

Step-by-step — $\kappa_2(A)$ for $A = \begin{pmatrix}1&2\\2&1\end{pmatrix}$

Singular values from Chapter 18: $\sigma_1 = 3$ , $\sigma_2 = 1$ .

Compute condition number: $\kappa_2(A) = \sigma_1/\sigma_2 = 3/1 = 3$ .

The $3$ comes from the ratio of largest to smallest stretching.

Interpret: a relative error in $\mathbf{b}$ of size $10^{-10}$ can cause a relative error in $\mathbf{x}$ of at most $\approx 3 \times 10^{-10}$ . This system is well-conditioned — three digits of safety beyond machine epsilon.

Hilbert Matrix — Canonical Ill-Conditioned Example

The $n \times n$ Hilbert matrix $H_{ij} = 1/(i+j-1)$ has $\kappa_2(H_n)$ growing as $\sim e^{3.5n}$ . For $n=10$ , $\kappa_2(H_{10}) \approx 10^{13}$ — inversion in double precision loses all reliable digits. The Hilbert matrix is a standard test for numerical linear algebra libraries.

03 · Forward Error vs Backward Error

Definition — Forward and Backward Stability

Forward error: $\|\mathbf{x}_{\text{computed}} - \mathbf{x}_{\text{exact}}\|$ — how far the computed solution is from the true answer.

Backward error: the smallest perturbation $\Delta A$ such that $\mathbf{x}_{\text{computed}}$ is the exact solution of $(A + \Delta A)\mathbf{x} = \mathbf{b}$ .

An algorithm is backward stable if $\|\Delta A\| / \|A\|$ is small — the computed answer solves a nearby problem. Backward stability does not guarantee small forward error when $\kappa(A)$ is large.

❌ Failure — Backward Stable but Forward Inaccurate

LU with partial pivoting is backward stable: $\|\Delta A\|/\|A\| \sim \varepsilon_{\text{mach}}$ .

For Hilbert $H_{12}$ , LU produces $\Delta A$ tiny but $\mathbf{x}_{\text{computed}}$ has no correct digits because $\kappa(H_{12}) \sim 10^{16}$ .

Consequence: backward stability is a property of the algorithm; forward accuracy requires both a stable algorithm and a well-conditioned problem. QR and SVD are preferred for least squares precisely because they are backward stable and avoid squaring $\kappa(A)$ .

04 · Stability of LU, QR, and SVD

Definition — Conditioning of Decomposition Methods

Method	Operation	Conditioning impact
LU solve	Solve $A\mathbf{x}=\mathbf{b}$	$\kappa(A)$
Normal equations	Form $A^TA$ , solve	$\kappa(A)^2$
QR least squares	Solve $R\mathbf{x}=Q^T\mathbf{b}$	$\kappa(A)$
SVD least squares	$\hat{\mathbf{x}} = A^+\mathbf{b}$	$\kappa(A)$ ; can truncate small $\sigma_i$

Normal equations square the condition number. QR and SVD avoid this — Chapter 17's warning made precise.

✓ Example — Regression with $\kappa(A) = 10^4$

Design matrix $A \in \mathbb{R}^{1000 \times 10}$ for factor regression. $\kappa(A) = 10^4$ .

Normal equations: $\kappa(A^TA) = 10^8$ . Expected relative error $\sim 10^{-16} \cdot 10^8 = 10^{-8}$ — betas unreliable below $10^{-4}$ relative precision.

QR: $\kappa(R) = \kappa(A) = 10^4$ . Expected error $\sim 10^{-12}$ — four extra digits of reliability.

For risk attribution requiring 6 significant figures in factor exposures, QR is mandatory; normal equations fail silently.

05 · Truncated SVD and Regularisation

When $\kappa(A)$ is huge because $\sigma_n \approx 0$ (rank deficiency or near-dependency), truncating tiny singular values stabilises the solution.

Definition — Truncated SVD (Regularised Pseudoinverse)

Replace $\Sigma^+$ with a truncated version that sets $\sigma_i^{-1} = 0$ when $\sigma_i < \tau$ (threshold):

\hat{\mathbf{x}} = \sum_{i:\,\sigma_i \geq \tau} \frac{1}{\sigma_i}\langle\mathbf{b},\mathbf{u}_i\rangle\,\mathbf{v}_i

This is Tikhonov regularisation in disguise — trading bias (approximate fit) for lower variance (stability). In portfolio optimisation, it corresponds to discarding near-zero-variance factor directions that amplify noise.

✓ Example — Ridge Regression as Stability Tool

Nearly collinear factors give $\sigma_{10} = 10^{-8}$ in the SVD of $A$ . Including $1/\sigma_{10}$ in $A^+$ amplifies noise by $10^8$ .

Truncating $\sigma_{10}$ (ridge penalty $\lambda$ corresponds to $\tau = \sqrt{\lambda}$ ) yields stable betas at the cost of slight bias. Cross-validation selects $\tau$ — standard in quantitative equity factor models.

06 · Practice Exercises

EXERCISE 20.1

$\kappa_2(A) = \sigma_{\max}/\sigma_{\min}$ . For diagonal $A$ , singular values are $|a_{ii}|$ .

$A = \begin{pmatrix}2&0\\0&0.01\end{pmatrix}$ . $\sigma_1 = 2$ , $\sigma_2 = 0.01$ .

$\kappa_2(A) = 2/0.01 = 200$ .

A $10^{-4}$ relative error in $\mathbf{b}$ can cause up to $200 \times 10^{-4} = 0.02 = 2\%$ relative error in $\mathbf{x}$ .

For $A = \begin{pmatrix}2&0\\0&0.01\end{pmatrix}$ , compute $\kappa_2(A)$ from singular values. If $\|\Delta\mathbf{b}\|/\|\mathbf{b}\| = 10^{-4}$ , bound the relative error in $\mathbf{x}$ .

EXERCISE 20.2

$\kappa(A) \approx \|A\|\|A^{-1}\|$ . For $2\times2$ , compute $\det(A)$ and $A^{-1}$ explicitly.

$A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix}$ . $\det = 1.0001-1 = 0.0001$ .

$A^{-1} = \frac{1}{0.0001}\begin{pmatrix}1.0001&-1\\-1&1\end{pmatrix} = \begin{pmatrix}10001&-10000\\-10000&10000\end{pmatrix}$ .

$\|A\|_2 \approx 2$ (order of magnitude). $\|A^{-1}\|_2 \approx 20000$ .

$\kappa(A) \approx 4 \times 10^4$ — ill-conditioned. Near-duplicate rows cause extreme sensitivity.

Estimate $\kappa(A)$ for $A = \begin{pmatrix}1&1\\1&1.0001\end{pmatrix}$ using $\|A\|_\infty = 2.0001$ and $\|A^{-1}\|_\infty$ . Explain why this matrix is ill-conditioned.

EXERCISE 20.3

$\kappa(A^TA) = \kappa(A)^2$ . If $\kappa(A)=10^3$ , normal equations lose $\approx 6$ digits.

$\kappa(A) = 10^3$ . $\kappa(A^TA) = 10^6$ .

Double precision: $\varepsilon_{\text{mach}} \approx 2\times10^{-16}$ .

Normal equations relative error bound: $\sim 10^{-16} \cdot 10^6 = 10^{-10}$ — about 10 reliable digits.

QR relative error bound: $\sim 10^{-16} \cdot 10^3 = 10^{-13}$ — about 13 reliable digits.

QR preserves 3 extra digits — critical for $\kappa(A) \geq 10^4$ .

If $\kappa(A) = 10^3$ , compare $\kappa(A^TA)$ to $\kappa(A)$ . How many digits of accuracy might normal equations lose relative to QR in double precision?

EXERCISE 20.4

Orthogonal matrices have $\sigma_1 = \sigma_n = 1$ , so $\kappa(Q)=1$ . They are perfectly conditioned.

$Q = \begin{pmatrix}0&-1\\1&0\end{pmatrix}$ (90° rotation). $Q^TQ = I$ .

Singular values of $Q$ : both equal 1. $\kappa_2(Q) = 1/1 = 1$ .

$\|Q\mathbf{x}\| = \|\mathbf{x}\|$ for all $\mathbf{x}$ — rotations preserve length. No amplification of errors.

Therefore orthogonal transformations (QR's $Q$ factor, SVD's $U$ and $V$ ) are numerically safe operations.

Show that $\kappa_2(Q) = 1$ for any orthogonal matrix $Q$ . Use $Q = \begin{pmatrix}0&-1\\1&0\end{pmatrix}$ as a concrete example and explain why orthogonal transformations preserve lengths.

EXERCISE 20.5

$\kappa(H_n)$ grows rapidly with $n$ . For $n=5$ , $\kappa(H_5) \approx 5 \times 10^5$ . Inversion loses $\log_{10}(\kappa)$ digits.

$H_5$ has entries $H_{ij} = 1/(i+j-1)$ .

$\kappa_2(H_5) \approx 4.8 \times 10^5$ .

Digits lost $\approx \log_{10}(4.8\times10^5) \approx 5.7$ .

In double precision ( $\sim 16$ digits), solving $H_5\mathbf{x}=\mathbf{b}$ via LU yields at most $16-6 = 10$ reliable digits — and fewer if $\mathbf{b}$ also has error.

For $n=12$ , $\kappa(H_{12}) \sim 10^{16}$ — complete loss of precision. The Hilbert matrix demonstrates that problem conditioning, not algorithm choice, can make the answer unknowable.

The $5 \times 5$ Hilbert matrix has $\kappa_2(H_5) \approx 4.8 \times 10^5$ . How many digits of accuracy are lost in solving $H_5\mathbf{x}=\mathbf{b}$ in double precision? Why does increasing $n$ make the problem worse?

EXERCISE 20.6

$\kappa(\Sigma)$ large means eigenvalues span a wide range — near-collinearity. Invert via SVD, truncating small $\sigma_i$ . Optimal weights $\mathbf{w}^* = \Sigma^{-1}\boldsymbol{\mu}$ become unstable when $\kappa(\Sigma) \gg 1$ .

$\Sigma = \begin{pmatrix}1&0.999\\0.999&1\end{pmatrix}$ . Eigenvalues: $1+0.999=1.999$ and $1-0.999=0.001$ .

$\kappa(\Sigma) = 1.999/0.001 \approx 1999$ .

$\Sigma^{-1} = \frac{1}{0.001(2.001)}\begin{pmatrix}1&-0.999\\-0.999&1\end{pmatrix}$ — entries $\sim 500$ .

A $10^{-6}$ perturbation in $\Sigma_{12}$ changes $\Sigma^{-1}$ by order $500 \times 10^{-6} = 5\times10^{-4}$ — large relative swings in optimal portfolio weights.

Remedy: SVD of $\Sigma$ , truncate $\sigma_2 = 0.001$ (or add ridge $\lambda I$ ). Stable approximate weights sacrifice exact mean-variance optimality for robustness — standard in production portfolio systems.

A covariance matrix $\Sigma = \begin{pmatrix}1&0.999\\0.999&1\end{pmatrix}$ arises from two nearly identical factors. Compute $\kappa(\Sigma)$ , explain why $\Sigma^{-1}$ is numerically dangerous, and describe one stabilisation strategy used in portfolio optimisation.

07 · Summary

Term	Definition
Condition number $\kappa(A)$	Amplification factor for relative input errors; $\kappa_2 = \sigma_1/\sigma_n$
Well-conditioned	$\kappa(A) \approx 1$ ; small input noise, small output change
Ill-conditioned	$\kappa(A) \gg 1$ ; solution sensitive to perturbations
Machine epsilon	$\varepsilon_{\text{mach}} \approx 2.2\times10^{-16}$ in double precision
Forward error	$\\|\mathbf{x}_{\text{computed}} - \mathbf{x}_{\text{exact}}\\|$
Backward error	Smallest $\Delta A$ such that computed $\mathbf{x}$ is exact for $A+\Delta A$
Normal equations	$\kappa(A^TA) = \kappa(A)^2$ — avoid in practice
QR / SVD	Backward stable; $\kappa$ not squared
Truncated SVD	Regularisation by discarding small $\sigma_i$

Next: Calculus — Limits & Continuity — the analytical foundations for derivatives, optimisation, and stochastic models that build on the linear algebra toolkit completed in this subject.