Chapter 17

Matrix Decompositions — QR

00 · Symbol Glossary

$A = QR$A equals Q R — QR factorisation

The decomposition of a matrix $A$ with linearly independent columns into an orthogonal factor $Q$ (orthonormal columns) and an upper triangular factor $R$ . Extends the Gram-Schmidt construction from Chapter 11 into a matrix factorisation usable for least squares and eigenvalue algorithms.

$Q^TQ = I$Q transpose Q equals I — orthonormal columns

The defining property of the $Q$ factor when $A$ has $n$ linearly independent columns: the columns of $Q$ are orthonormal. Read aloud as "Q transpose Q equals the identity." Inversion of $Q$ is free: $Q^{-1} = Q^T$ .

$\mathbf{r}$bold r — residual vector

The residual $\mathbf{r} = A\mathbf{x} - \mathbf{b}$ in a least squares problem — the part of $\mathbf{b}$ that the column space of $A$ cannot explain. Minimising $\|\mathbf{r}\|_2$ is the objective of ordinary least squares regression.

$A^TA$A transpose A — normal equations matrix

The Gram matrix formed from the columns of $A$ . The classical least squares normal equations are $A^TA\mathbf{x} = A^T\mathbf{b}$ . This system is theoretically correct but numerically dangerous when $\kappa(A^TA) = \kappa(A)^2$ — QR avoids forming $A^TA$ entirely.

01 · From Gram-Schmidt to QR

Chapter 11 constructed orthonormal vectors from a linearly independent set via the Gram-Schmidt process. QR packages that construction into a single matrix factorisation: the columns of $Q$ are the orthonormalised vectors, and $R$ records the projection coefficients that reconstruct the original columns of $A$ .

Definition — QR Decomposition

If $A \in \mathbb{R}^{m \times n}$ has linearly independent columns ( $m \geq n$ ), then there exists a factorisation:

A = QR

$Q \in \mathbb{R}^{m \times n}$ has orthonormal columns: $Q^TQ = I_n$ .

$R \in \mathbb{R}^{n \times n}$ is upper triangular with positive diagonal entries $r_{ii} > 0$ .

The columns of $Q$ are $\mathbf{q}_1, \ldots, \mathbf{q}_n$ produced by Gram-Schmidt on the columns of $A$ . The entry $r_{ij} = \langle \mathbf{a}_j, \mathbf{q}_i \rangle$ for $i \leq j$ , and $r_{ij} = 0$ for $i > j$ .

✓ Example — Factor Model Regression Setup

A fund manager runs a 3-factor model on 252 trading days. The design matrix $A \in \mathbb{R}^{252 \times 3}$ has columns: market, size, and value factor returns. The response $\mathbf{b} \in \mathbb{R}^{252}$ holds the fund's daily excess returns.

Factor exposures (betas) solve the overdetermined system $A\mathbf{x} \approx \mathbf{b}$ in the least squares sense. QR factorisation of $A$ gives numerically stable betas without forming $A^TA$ , which squares the condition number of a potentially ill-conditioned factor correlation structure.

❌ Failure — QR on Linearly Dependent Columns

Let $A = \begin{pmatrix}1&2\\2&4\\3&6\end{pmatrix}$ — column 2 equals $2 \times$ column 1.

Gram-Schmidt on $\mathbf{a}_2$ gives $\mathbf{u}_2 = \mathbf{a}_2 - \langle\mathbf{a}_2,\mathbf{q}_1\rangle\mathbf{q}_1 = \mathbf{0}$ . Normalising $\mathbf{u}_2$ requires division by $\|\mathbf{u}_2\| = 0$ .

Why it breaks: $R$ would have $r_{22} = 0$ , violating the positive-diagonal requirement. The columns of $A$ do not span a full $n$ -dimensional subspace.

Consequence: QR requires linearly independent columns. Rank-deficient $A$ needs a pivoted variant ( $AP = QR$ ) or SVD-based least squares.

02 · Constructing $Q$ and $R$ via Gram-Schmidt

The algorithm from Chapter 11, now read as a factorisation.

Definition — QR via Gram-Schmidt

Given linearly independent columns $\mathbf{a}_1, \ldots, \mathbf{a}_n$ of $A$ :

Step 1: $\mathbf{u}_1 = \mathbf{a}_1$ , $\mathbf{q}_1 = \mathbf{u}_1/\|\mathbf{u}_1\|$ , $r_{11} = \|\mathbf{u}_1\|$ .

Step $j$ (for $j = 2, \ldots, n$ ):

\mathbf{u}_j = \mathbf{a}_j - \sum_{i=1}^{j-1} r_{ij}\mathbf{q}_i, \quad r_{ij} = \langle \mathbf{a}_j, \mathbf{q}_i \rangle, \quad r_{jj} = \|\mathbf{u}_j\|, \quad \mathbf{q}_j = \mathbf{u}_j/r_{jj}

Assemble $Q = [\mathbf{q}_1 \mid \cdots \mid \mathbf{q}_n]$ and the upper triangular $R$ .

Step-by-step — QR of $A = \begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}$ (columns $\mathbf{a}_1, \mathbf{a}_2$)

Column 1: $\mathbf{a}_1 = \begin{pmatrix}1\\1\\0\end{pmatrix}$ . $\|\mathbf{a}_1\| = \sqrt{1+1+0} = \sqrt{2}$ .

$r_{11} = \sqrt{2}$ . $\mathbf{q}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\\0\end{pmatrix}$ .

Project $\mathbf{a}_2$ onto $\mathbf{q}_1$ : $r_{12} = \langle\mathbf{a}_2, \mathbf{q}_1\rangle = \frac{1}{\sqrt{2}}(1\cdot1 + 0\cdot1 + 1\cdot0) = \frac{1}{\sqrt{2}}$ .

The $\frac{1}{\sqrt{2}}$ comes from the dot product formula with $\mathbf{q}_1$ already normalised.

Orthogonalise $\mathbf{a}_2$ : $\mathbf{u}_2 = \mathbf{a}_2 - r_{12}\mathbf{q}_1 = \begin{pmatrix}1\\0\\1\end{pmatrix} - \frac{1}{\sqrt{2}}\cdot\frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1\\0\\1\end{pmatrix} - \begin{pmatrix}1/2\\1/2\\0\end{pmatrix} = \begin{pmatrix}1/2\\-1/2\\1\end{pmatrix}$ .

$r_{22} = \|\mathbf{u}_2\| = \sqrt{\frac{1}{4}+\frac{1}{4}+1} = \sqrt{\frac{3}{2}} = \frac{\sqrt{6}}{2}$ .

Normalise: $\mathbf{q}_2 = \frac{1}{r_{22}}\begin{pmatrix}1/2\\-1/2\\1\end{pmatrix} = \frac{2}{\sqrt{6}}\begin{pmatrix}1/2\\-1/2\\1\end{pmatrix} = \frac{1}{\sqrt{6}}\begin{pmatrix}1\\-1\\2\end{pmatrix}$ .

Assemble factors:

Q = \begin{pmatrix}\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & \frac{-1}{\sqrt{6}} \\ 0 & \frac{2}{\sqrt{6}}\end{pmatrix}, \quad R = \begin{pmatrix}\sqrt{2} & \frac{1}{\sqrt{2}} \\ 0 & \frac{\sqrt{6}}{2}\end{pmatrix}

Verify $A = QR$ : $(QR)_{11} = \frac{1}{\sqrt{2}}\cdot\sqrt{2} + \frac{1}{\sqrt{6}}\cdot0 = 1$ ✓.

$(QR)_{21} = \frac{1}{\sqrt{2}}\cdot\sqrt{2} + \frac{-1}{\sqrt{6}}\cdot0 = 1$ ✓.

$(QR)_{32} = \frac{2}{\sqrt{6}}\cdot\frac{\sqrt{6}}{2} = 1$ ✓. All six entries of the $3 \times 2$ product match $A$ .

Reading $R$

The upper triangular $R$ is the coefficient matrix in the expansion $\mathbf{a}_j = r_{1j}\mathbf{q}_1 + r_{2j}\mathbf{q}_2 + \cdots + r_{jj}\mathbf{q}_j$ . Entry $r_{ij}$ is the component of $\mathbf{a}_j$ in direction $\mathbf{q}_i$ . Zeros below the diagonal reflect the fact that $\mathbf{a}_j$ has no component in directions $\mathbf{q}_i$ for $i > j$ .

03 · Properties of the QR Factorisation

Definition — Orthonormal Columns

The columns $\mathbf{q}_1, \ldots, \mathbf{q}_n$ of $Q$ satisfy:

Q^TQ = I_n \iff \langle \mathbf{q}_i, \mathbf{q}_j \rangle = \delta_{ij}

$\delta_{ij}$ — Kronecker delta: $1$ when $i=j$ , $0$ when $i \neq j$ .

When $Q$ is square ( $m = n$ ), $Q$ is an orthogonal matrix and $QQ^T = I_m$ also holds.

Definition — $R$ Is Upper Triangular

$R = (r_{ij})$ satisfies $r_{ij} = 0$ for all $i > j$ . The diagonal entries $r_{ii} = \|\mathbf{u}_i\|$ are the norms of the orthogonalised vectors before normalisation — strictly positive when columns of $A$ are independent.

❌ Failure — Assuming $QQ^T = I$ for Tall $Q$

For $A \in \mathbb{R}^{m \times n}$ with $m > n$ , $Q \in \mathbb{R}^{m \times n}$ has $Q^TQ = I_n$ but $QQ^T \neq I_m$ .

Setup: $Q$ from the $3 \times 2$ example above.

Computation: $QQ^T$ is $3 \times 3$ , not $I_3$ — the three rows of $Q$ span only a 2-dimensional subspace of $\mathbb{R}^3$ .

Why it breaks: $QQ^T = I_m$ requires $m$ orthonormal columns in $\mathbb{R}^m$ . A tall thin $Q$ has only $n < m$ columns.

Consequence: for tall $Q$ , use $Q^TQ = I_n$ for inversion in the $n$ -dimensional column space. Do not treat $Q$ as a square orthogonal matrix.

04 · Least Squares via QR

The normal equations $A^TA\mathbf{x} = A^T\mathbf{b}$ solve least squares algebraically. QR solves the same problem without squaring the condition number.

Definition — Least Squares Problem

Given $A \in \mathbb{R}^{m \times n}$ with $m > n$ and full column rank, find $\hat{\mathbf{x}}$ that minimises:

\|\mathbf{r}\|_2 = \|A\hat{\mathbf{x}} - \mathbf{b}\|_2

$\mathbf{r} = A\hat{\mathbf{x}} - \mathbf{b}$ — the residual vector. The minimiser $\hat{\mathbf{x}}$ satisfies the normal equations $A^TA\hat{\mathbf{x}} = A^T\mathbf{b}$ .

Definition — QR Least Squares Solution

With $A = QR$ :

R\hat{\mathbf{x}} = Q^T\mathbf{b}

$Q^T\mathbf{b}$ projects $\mathbf{b}$ onto the coordinate system of the orthonormal columns of $A$ . Back-substitution on the small $n \times n$ upper triangular system $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ gives $\hat{\mathbf{x}}$ directly.

Step-by-step — Least squares for $A = \begin{pmatrix}1\\1\\0\end{pmatrix}$, $\mathbf{b} = \begin{pmatrix}2\\0\\1\end{pmatrix}$ (column vector)

Note: $A$ is $3 \times 1$ here — one regressor. $Q = \mathbf{q}_1 = \frac{1}{\sqrt{2}}\begin{pmatrix}1\\1\\0\end{pmatrix}$ , $R = \begin{pmatrix}\sqrt{2}\end{pmatrix}$ .

Compute $Q^T\mathbf{b}$ : $\frac{1}{\sqrt{2}}(1\cdot2 + 1\cdot0 + 0\cdot1) = \frac{2}{\sqrt{2}} = \sqrt{2}$ .

Solve $R\hat{x} = Q^T\mathbf{b}$ : $\sqrt{2}\,\hat{x} = \sqrt{2}$ , so $\hat{x} = 1$ .

Residual: $A\hat{\mathbf{x}} = \begin{pmatrix}1\\1\\0\end{pmatrix}$ . $\mathbf{r} = \begin{pmatrix}2\\0\\1\end{pmatrix} - \begin{pmatrix}1\\1\\0\end{pmatrix} = \begin{pmatrix}1\\-1\\1\end{pmatrix}$ .

$\|\mathbf{r}\| = \sqrt{1+1+1} = \sqrt{3}$ . The part of $\mathbf{b}$ orthogonal to the column of $A$ has length $\sqrt{3}$ .

05 · QR vs Normal Equations — Numerical Stability

Definition — Why QR Beats Normal Equations

The normal equations form $A^TA$ , whose condition number satisfies:

\kappa(A^TA) = \kappa(A)^2

Squaring the condition number amplifies relative errors in floating-point arithmetic. QR works directly with $A$ : solving $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ has conditioning $\kappa(R) = \kappa(A)$ — no squaring.

For a regression with $\kappa(A) = 10^4$ , normal equations have $\kappa(A^TA) = 10^8$ — potentially losing 8 digits of precision in double arithmetic. QR retains 4 digits of reliability relative to $A$ .

✓ Example — Ill-Conditioned Factor Regression

Design matrix $A \in \mathbb{R}^{500 \times 5}$ with nearly collinear factor columns (value and growth proxies with correlation $\rho = 0.99$ ). $\kappa(A) \approx 10^3$ .

Normal equations: $\kappa(A^TA) \approx 10^6$ . Beta estimates on the 4th and 5th factors may have relative errors $\sim 10^{-10}$ in double precision — unreliable for risk attribution.

QR: solves $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ with $\kappa(R) \approx 10^3$ . Same theoretical answer, but stable betas. Production quant libraries (LAPACK dgels) use QR or SVD, not $A^TA$ .

❌ Failure — Forming $A^TA$ Explicitly

For $A = \begin{pmatrix}1&1\\10^{-5}&0\\0&10^{-5}\end{pmatrix}$ , columns are nearly orthogonal but $\kappa(A) \approx 10^5$ .

$A^TA = \begin{pmatrix}1+10^{-10}&1\\1&1+10^{-10}\end{pmatrix} \approx \begin{pmatrix}1&1\\1&1\end{pmatrix}$ in limited precision — the small $10^{-10}$ terms vanish, making $A^TA$ singular or nearly so.

Consequence: normal equations destroy information present in $A$ . QR on $A$ directly preserves the nearly-orthogonal structure. This is why Chapter 20 treats condition numbers as first-class objects.

06 · Practice Exercises

EXERCISE 17.1

Apply Gram-Schmidt to the two columns. $r_{11} = \|\mathbf{a}_1\|$ , $r_{12} = \langle\mathbf{a}_2,\mathbf{q}_1\rangle$ , $r_{22} = \|\mathbf{u}_2\|$ .

$\mathbf{a}_1 = \begin{pmatrix}3\\0\\4\end{pmatrix}$ . $r_{11} = \sqrt{9+16} = 5$ . $\mathbf{q}_1 = \frac{1}{5}\begin{pmatrix}3\\0\\4\end{pmatrix}$ .

$r_{12} = \langle\mathbf{a}_2,\mathbf{q}_1\rangle = \frac{1}{5}(3\cdot0 + 0\cdot4 + 4\cdot0) = 0$ .

$\mathbf{u}_2 = \mathbf{a}_2 = \begin{pmatrix}0\\4\\0\end{pmatrix}$ . $r_{22} = 4$ . $\mathbf{q}_2 = \begin{pmatrix}0\\1\\0\end{pmatrix}$ .

Q = \begin{pmatrix}3/5&0\\0&1\\4/5&0\end{pmatrix}, \quad R = \begin{pmatrix}5&0\\0&4\end{pmatrix}

Verify: $(QR)_{31} = \frac{4}{5}\cdot5 = 4$ ✓.

Find the QR decomposition of $A = \begin{pmatrix}3&0\\0&4\\4&0\end{pmatrix}$ . Show $r_{12}=0$ and verify $A=QR$ .

EXERCISE 17.2

After building $Q$ and $R$ from Gram-Schmidt, verify $Q^TQ = I_2$ by computing the four entries of the $2 \times 2$ product.

From the chapter example: $Q = \begin{pmatrix}1/\sqrt{2}&1/\sqrt{6}\\1/\sqrt{2}&-1/\sqrt{6}\\0&2/\sqrt{6}\end{pmatrix}$ .

$(Q^TQ)_{11} = \frac{1}{2}+\frac{1}{2}+0 = 1$ ✓.

$(Q^TQ)_{12} = \frac{1}{\sqrt{12}} - \frac{1}{\sqrt{12}} + 0 = 0$ ✓.

$(Q^TQ)_{22} = \frac{1}{6}+\frac{1}{6}+\frac{4}{6} = 1$ ✓.

$Q^TQ = I_2$ . Columns are orthonormal.

For $A = \begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}$ , verify $Q^TQ = I_2$ using the $Q$ from the chapter's step-by-step. Show all three diagonal/off-diagonal entries.

EXERCISE 17.3

Compute $Q^T\mathbf{b}$ first — a vector in $\mathbb{R}^2$ . Then back-substitute on $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ .

Use QR from the chapter: $R = \begin{pmatrix}\sqrt{2}&1/\sqrt{2}\\0&\sqrt{6}/2\end{pmatrix}$ , $\mathbf{b}=\begin{pmatrix}3\\1\\2\end{pmatrix}$ .

$Q^T\mathbf{b}$ : row 1 = $\frac{1}{\sqrt{2}}(3+1+0) = 4/\sqrt{2} = 2\sqrt{2}$ . Row 2 = $\frac{1}{\sqrt{6}}(3-1+4) = 6/\sqrt{6} = \sqrt{6}$ .

Solve: $\frac{\sqrt{6}}{2}\hat{x}_2 = \sqrt{6}$ , so $\hat{x}_2 = 2$ .

$\sqrt{2}\,\hat{x}_1 + \frac{1}{\sqrt{2}}(2) = 2\sqrt{2}$ , so $\sqrt{2}\,\hat{x}_1 = 2\sqrt{2} - \sqrt{2} = \sqrt{2}$ , $\hat{x}_1 = 1$ .

$\hat{\mathbf{x}} = \begin{pmatrix}1\\2\end{pmatrix}$ .

Using the QR factors for $A = \begin{pmatrix}1&1\\1&0\\0&1\end{pmatrix}$ , solve the least squares problem for $\mathbf{b}=\begin{pmatrix}3\\1\\2\end{pmatrix}$ via $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ .

EXERCISE 17.4

Compare the two solution paths: (1) solve $A^TA\mathbf{x}=A^T\mathbf{b}$ ; (2) solve $R\mathbf{x}=Q^T\mathbf{b}$ . Both give the same $\hat{\mathbf{x}}$ in exact arithmetic.

$A = \begin{pmatrix}1\\2\end{pmatrix}$ , $\mathbf{b}=\begin{pmatrix}1\\2\end{pmatrix}$ — exact fit.

$A^TA = 5$ , $A^T\mathbf{b} = 5$ . Normal equations: $\hat{x} = 1$ .

QR: $Q = \frac{1}{\sqrt{5}}\begin{pmatrix}1\\2\end{pmatrix}$ , $R = \begin{pmatrix}\sqrt{5}\end{pmatrix}$ . $Q^T\mathbf{b} = \frac{1}{\sqrt{5}}(1+4) = \sqrt{5}$ . $R\hat{x}=\sqrt{5}$ , $\hat{x}=1$ ✓.

Both methods agree. For this well-conditioned $1 \times 1$ case they are identical. The advantage of QR appears when $\kappa(A) \gg 1$ and $A^TA$ loses precision.

For $A = \begin{pmatrix}1\\2\end{pmatrix}$ and $\mathbf{b}=\begin{pmatrix}1\\2\end{pmatrix}$ , solve via normal equations and via QR. Confirm both give the same $\hat{x}$ and explain when they would diverge in floating point.

EXERCISE 17.5

$\kappa(A^TA) = \kappa(A)^2$ . If $\kappa(A) = 10^3$ , how many digits of relative accuracy might normal equations lose in double precision ( $\approx 16$ digits)?

$\kappa(A) = 10^3$ , so $\kappa(A^TA) = 10^6$ .

In double precision, relative error in solving a system is roughly $\varepsilon_{\text{mach}} \cdot \kappa \approx 2\times10^{-16} \cdot 10^6 = 2\times10^{-10}$ .

Normal equations may lose $\log_{10}(10^6) = 6$ digits of accuracy relative to the ideal answer — leaving $\approx 10$ reliable digits.

QR on $A$ directly: error $\approx 2\times10^{-16}\cdot10^3 = 2\times10^{-13}$ — about 1000 times more accurate.

Therefore production least squares uses QR or SVD, not $A^TA$ .

A regression design matrix has $\kappa(A) = 10^3$ . Estimate $\kappa(A^TA)$ . Explain in one paragraph why a quant library would prefer QR over normal equations for computing factor betas.

EXERCISE 17.6

Build $A$ with columns = factor returns. QR gives betas via $R\hat{\mathbf{x}}=Q^T\mathbf{r}$ where $\mathbf{r}$ is the fund's excess returns. Interpret $\hat{x}_1$ as market beta.

$A = \begin{pmatrix}1&0&0\\1&1&0\\1&1&1\end{pmatrix}$ (3 days, 3 factors), $\mathbf{r}=\begin{pmatrix}0.01\\0.02\\0.03\end{pmatrix}$ .

Gram-Schmidt: $\mathbf{q}_1 = \frac{1}{\sqrt{3}}\begin{pmatrix}1\\1\\1\end{pmatrix}$ , $r_{11}=\sqrt{3}$ .

$\mathbf{u}_2 = \begin{pmatrix}0\\1\\1\end{pmatrix} - \frac{2}{\sqrt{3}}\cdot\frac{1}{\sqrt{3}}\begin{pmatrix}1\\1\\1\end{pmatrix} = \begin{pmatrix}-2/3\\1/3\\1/3\end{pmatrix}$ , $r_{22}=\sqrt{6}/3$ , $\mathbf{q}_2=\frac{1}{\sqrt{6}}\begin{pmatrix}-2\\1\\1\end{pmatrix}$ .

$\mathbf{u}_3 = \begin{pmatrix}0\\0\\1\end{pmatrix} - \cdots$ , eventually $r_{33}=1/\sqrt{2}$ , $\mathbf{q}_3=\frac{1}{\sqrt{2}}\begin{pmatrix}0\\-1\\1\end{pmatrix}$ .

$Q^T\mathbf{r}$ : $c_1 = \frac{0.01+0.02+0.03}{\sqrt{3}} = 0.06/\sqrt{3} \approx 0.0346$ .

Back-substitute on $R\hat{\mathbf{x}}=Q^T\mathbf{r}$ for $\hat{\mathbf{x}}$ (factor exposures).

The market beta $\hat{x}_1$ captures average sensitivity to the first factor column $(1,1,1)^T$ — the parallel shift in all factors. QR isolates each factor's incremental contribution via orthogonality.

Three days of factor returns form $A = \begin{pmatrix}1&0&0\\1&1&0\\1&1&1\end{pmatrix}$ (market, size, value). Fund excess returns are $\mathbf{r}=\begin{pmatrix}0.01\\0.02\\0.03\end{pmatrix}$ . Outline the QR steps to estimate factor betas and interpret $\hat{x}_1$ as market exposure.

07 · Summary

Term	Definition
QR decomposition	$A = QR$ ; $Q$ has orthonormal columns, $R$ upper triangular
$Q^TQ = I_n$	Columns of $Q$ are orthonormal; $Q^{-1} = Q^T$ when square
$r_{ij}$	$r_{ij} = \langle\mathbf{a}_j, \mathbf{q}_i\rangle$ — projection coefficient
Least squares via QR	Solve $R\hat{\mathbf{x}} = Q^T\mathbf{b}$ by back substitution
Normal equations	$A^TA\hat{\mathbf{x}}=A^T\mathbf{b}$ — correct but $\kappa(A^TA)=\kappa(A)^2$
Numerical stability	QR avoids forming $A^TA$ ; preferred for regression
Residual	$\mathbf{r} = A\hat{\mathbf{x}}-\mathbf{b}$ ; minimised in $\ell_2$ norm

Next: Matrix Decompositions — SVD — the universal factorisation that exists for every matrix, connects to PCA and low-rank approximation, and underpins modern quant analytics.