Chapter 13

Norms & Distance Metrics

00 · Symbol Glossary

$\|\mathbf{v}\|_p$v norm p — p-norm

The LpL^p norm of v\mathbf{v}: vp=(i=1nvip)1/p\|\mathbf{v}\|_p = \left(\sum_{i=1}^n |v_i|^p\right)^{1/p} for p1p \geq 1. The subscript pp selects which norm. The cases p=1p=1, p=2p=2, p=p=\infty are the three most important in applications. Without a subscript, v\|\mathbf{v}\| denotes the L2 norm.

$\|\mathbf{v}\|_1$v norm one — L1 norm

The sum of absolute values: v1=v1+v2++vn\|\mathbf{v}\|_1 = |v_1|+|v_2|+\cdots+|v_n|. Promotes sparse solutions in optimisation — penalising the L1 norm drives many components exactly to zero. Used in LASSO regression and compressed sensing.

$\|\mathbf{v}\|_\infty$v norm infinity — max norm

The maximum absolute component: v=maxivi\|\mathbf{v}\|_\infty = \max_i |v_i|. The unit ball in this norm is a hypercube. Relevant for worst-case analysis and numerical stability bounds.

$d(\mathbf{u},\mathbf{v})$d u v — distance

The distance between u\mathbf{u} and v\mathbf{v} induced by a norm: d(u,v)=uvd(\mathbf{u},\mathbf{v}) = \|\mathbf{u}-\mathbf{v}\|. Different norms give different notions of distance. The Euclidean distance uv2\|\mathbf{u}-\mathbf{v}\|_2 is the straight-line distance; the Manhattan distance uv1\|\mathbf{u}-\mathbf{v}\|_1 is the grid-path distance.

$\|A\|$A norm — matrix norm

The induced matrix norm: A=maxx0Axx\|A\| = \max_{\mathbf{x}\neq\mathbf{0}}\frac{\|A\mathbf{x}\|}{\|\mathbf{x}\|}. Measures the maximum factor by which AA stretches a vector. The L2-induced matrix norm equals the largest singular value of AA. Used in bounding numerical errors.


01 · What Is a Norm?

The Euclidean length v2=vi2\|\mathbf{v}\|_2=\sqrt{\sum v_i^2} from Chapter 1 measures distance in a specific way — it treats all coordinates equally and uses the square-root scale. Other definitions of "length" arise naturally in applications: a trader minimising total position size uses the L1 norm (sum of absolute values); a risk manager bounding the worst single exposure uses the L-infinity norm (maximum). All three satisfy the same axioms.

Definition — Norm

A norm on Rn\mathbb{R}^n is a function :RnR\|\cdot\|: \mathbb{R}^n \to \mathbb{R} satisfying:

N1. Non-negativity: v0\|\mathbf{v}\| \geq 0, with v=0    v=0\|\mathbf{v}\|=0 \iff \mathbf{v}=\mathbf{0}.

N2. Homogeneity: cv=cv\|c\mathbf{v}\| = |c|\|\mathbf{v}\| for all cRc \in \mathbb{R}.

N3. Triangle inequality: u+vu+v\|\mathbf{u}+\mathbf{v}\| \leq \|\mathbf{u}\| + \|\mathbf{v}\|.

The triangle inequality is the key structural constraint — it says the direct distance is no greater than the sum of two leg distances.


02 · The Three Principal Norms

Definition — L1, L2, and L-Infinity Norms

For v=(v1,,vn)TRn\mathbf{v} = (v_1, \ldots, v_n)^T \in \mathbb{R}^n:

L1 (Manhattan/taxicab):

v1=i=1nvi\|\mathbf{v}\|_1 = \sum_{i=1}^n |v_i|

L2 (Euclidean):

v2=i=1nvi2\|\mathbf{v}\|_2 = \sqrt{\sum_{i=1}^n v_i^2}

L-infinity (max/Chebyshev):

v=max1invi\|\mathbf{v}\|_\infty = \max_{1 \leq i \leq n} |v_i|

The LpL^p family: vp=(ivip)1/p\|\mathbf{v}\|_p = \left(\sum_i |v_i|^p\right)^{1/p}. As pp\to\infty, the maximum dominates — that is why the limit is called the infinity norm.

Step-by-step — Computing all three norms for $\mathbf{v}=\begin{pmatrix}3\\-4\\0\\2\end{pmatrix}$
1

Take absolute values: v1=3|v_1|=3, v2=4|v_2|=4 (negative signs removed), v3=0|v_3|=0, v4=2|v_4|=2.

2

Compute L1: sum the absolute values. v1=3+4+0+2=9\|\mathbf{v}\|_1 = 3+4+0+2 = 9.

3

Compute L2: square each, sum, take root. 32=93^2=9, 42=164^2=16, 02=00^2=0, 22=42^2=4. Sum: 9+16+0+4=299+16+0+4=29. v2=295.39\|\mathbf{v}\|_2=\sqrt{29}\approx5.39.

4

Compute L-infinity: find the largest absolute value. max(3,4,0,2)=4\max(3,4,0,2)=4. v=4\|\mathbf{v}\|_\infty=4.

5

Compare: v=4v25.39v1=9\|\mathbf{v}\|_\infty = 4 \leq \|\mathbf{v}\|_2 \approx 5.39 \leq \|\mathbf{v}\|_1 = 9. This ordering always holds in Rn\mathbb{R}^n: vv2v1nv2nv\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1 \leq \sqrt{n}\,\|\mathbf{v}\|_2 \leq n\,\|\mathbf{v}\|_\infty.


03 · Unit Balls and Geometry

The unit ball Bp={vRnvp1}B_p = \{\mathbf{v} \in \mathbb{R}^n \mid \|\mathbf{v}\|_p \leq 1\} visualises how a norm defines "nearness to the origin." Different norms have dramatically different shapes.

In R2\mathbb{R}^2:

  • L1 ball: v1+v21|v_1|+|v_2|\leq1 — a diamond (square rotated 45°) with vertices at (±1,0)(\pm1,0) and (0,±1)(0,\pm1). The corners of the diamond lie on the axes — this is why L1-penalised optimisation drives solutions to corners where many components are exactly zero.

  • L2 ball: v12+v221v_1^2+v_2^2\leq1 — a circle. The round boundary means no direction is special; the optimum can lie anywhere on the boundary.

  • L-infinity ball: max(v1,v2)1\max(|v_1|,|v_2|)\leq1 — a square aligned with the axes, [1,1]×[1,1][-1,1]\times[-1,1]. Corners lie at (±1,±1)(\pm1,\pm1).

✓ Example — Why L1 Promotes Sparsity

In LASSO regression, we minimise Xβy22\|X\boldsymbol{\beta}-\mathbf{y}\|_2^2 subject to β1t\|\boldsymbol{\beta}\|_1 \leq t. The constraint is the L1 ball — a diamond in R2\mathbb{R}^2. The unconstrained OLS solution β^\hat{\boldsymbol{\beta}} typically lies off the axes. The feasible region is the diamond, and the minimum of the loss function on the diamond surface tends to occur at a corner — a point where one component is exactly zero. No such tendency exists for the L2 ball (circle), whose boundary is smooth.

❌ What Breaks — The L0 Pseudo-Norm Is Not a Norm

v0=number of nonzero components\|\mathbf{v}\|_0 = \text{number of nonzero components} is sometimes called the L0 norm in compressed sensing and sparse regression. It is not a true norm: cv0=v0\|c\mathbf{v}\|_0 = \|\mathbf{v}\|_0 for any c0c\neq0, violating homogeneity cv=cv\|c\mathbf{v}\|=|c|\|\mathbf{v}\| (e.g. 2v0=v02v0\|2\mathbf{v}\|_0 = \|\mathbf{v}\|_0 \neq 2\|\mathbf{v}\|_0). Optimising over 0\|\cdot\|_0 is NP-hard; the L1 norm is the tightest convex relaxation that still promotes sparsity.


04 · Norm Equivalence

All norms on Rn\mathbb{R}^n are equivalent — they order vectors the same way up to constant factors.

Definition — Norm Equivalence

Norms α\|\cdot\|_\alpha and β\|\cdot\|_\beta on Rn\mathbb{R}^n are equivalent if there exist constants c1,c2>0c_1, c_2 > 0 such that:

c1vαvβc2vαfor all vRnc_1 \|\mathbf{v}\|_\alpha \leq \|\mathbf{v}\|_\beta \leq c_2 \|\mathbf{v}\|_\alpha \quad \text{for all } \mathbf{v} \in \mathbb{R}^n

In Rn\mathbb{R}^n, all norms are equivalent. Explicit bounds between L1, L2, and L-infinity:

vv2nv,v2v1nv2\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2 \leq \sqrt{n}\,\|\mathbf{v}\|_\infty, \qquad \|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1 \leq \sqrt{n}\,\|\mathbf{v}\|_2

Consequence: convergence in one norm implies convergence in all norms — the choice of norm does not affect which sequences converge, only how fast.


05 · Matrix Norms

Norms extend to matrices. The most useful is the induced (operator) norm: Ap=maxxp=1Axp\|A\|_p = \max_{\|\mathbf{x}\|_p=1}\|A\mathbf{x}\|_p — the maximum stretch factor.

Definition — Induced Matrix Norms

For ARm×nA \in \mathbb{R}^{m\times n}:

Induced L1: A1=maxjiaij\|A\|_1 = \max_j \sum_i |a_{ij}| — maximum absolute column sum.

Induced L2 (spectral norm): A2=σmax(A)\|A\|_2 = \sigma_{\max}(A) — largest singular value.

Induced L-infinity: A=maxijaij\|A\|_\infty = \max_i \sum_j |a_{ij}| — maximum absolute row sum.

Frobenius norm (not induced): AF=i,jaij2\|A\|_F = \sqrt{\sum_{i,j}a_{ij}^2} — Euclidean norm on the entries.

✓ Example — Matrix Norm Computation

A=(2134)A = \begin{pmatrix}2&-1\\3&4\end{pmatrix}.

A1\|A\|_1: column sums =2+3=5= |2|+|3|=5 and 1+4=5|-1|+|4|=5. Max =5= 5.

A\|A\|_\infty: row sums =2+1=3= |2|+|-1|=3 and 3+4=7|3|+|4|=7. Max =7= 7.

AF=4+1+9+16=305.48\|A\|_F = \sqrt{4+1+9+16} = \sqrt{30} \approx 5.48.


06 · Quant Application — Regularisation: LASSO and Ridge

OLS minimises Xβy22\|X\boldsymbol{\beta}-\mathbf{y}\|_2^2 with no constraint on β\boldsymbol{\beta}. When predictors are correlated or n<pn < p, OLS is unstable. Adding a norm penalty stabilises the solution:

Ridge regression (L2 penalty):

β^ridge=argminβXβy22+λβ22\hat{\boldsymbol{\beta}}_\text{ridge} = \arg\min_{\boldsymbol{\beta}} \|X\boldsymbol{\beta}-\mathbf{y}\|_2^2 + \lambda\|\boldsymbol{\beta}\|_2^2

Closed form: β^ridge=(XTX+λI)1XTy\hat{\boldsymbol{\beta}}_\text{ridge} = (X^TX + \lambda I)^{-1}X^T\mathbf{y}. Adding λI\lambda I ensures invertibility — this is the fix for singular XTXX^TX from Chapter 12. Ridge shrinks all coefficients toward zero but keeps all of them nonzero.

LASSO (L1 penalty):

β^LASSO=argminβXβy22+λβ1\hat{\boldsymbol{\beta}}_\text{LASSO} = \arg\min_{\boldsymbol{\beta}} \|X\boldsymbol{\beta}-\mathbf{y}\|_2^2 + \lambda\|\boldsymbol{\beta}\|_1

No closed form — must use convex optimisation. LASSO sets many β^j\hat{\beta}_j exactly to zero, producing a sparse model. In factor investing: LASSO automatically selects a small number of relevant factors from a large candidate pool.

The difference: ridge penalises large coefficients; LASSO penalises non-sparsity. The L1 unit ball's corners — aligned with coordinate axes — make exact zero solutions geometrically likely at the optimum.


07 · Practice Exercises

EXERCISE 13.1

Absolute-value each component, then: L1 = sum; L2 = square-root of sum of squares; L-infinity = maximum.

v=(6230)\mathbf{v}=\begin{pmatrix}-6\\2\\-3\\0\end{pmatrix}.

Absolute values: 6,2,3,06, 2, 3, 0.

v1=6+2+3+0=11\|\mathbf{v}\|_1 = 6+2+3+0=11.

v2=36+4+9+0=49=7\|\mathbf{v}\|_2 = \sqrt{36+4+9+0}=\sqrt{49}=7.

v=max(6,2,3,0)=6\|\mathbf{v}\|_\infty = \max(6,2,3,0)=6.

Ordering: 6=vv2=7v1=116 = \|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2 = 7 \leq \|\mathbf{v}\|_1 = 11 ✓.

Bound check: v2v1=11\|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1 = 11 ✓ and v24v=2(6)=12\|\mathbf{v}\|_2 \leq \sqrt{4}\|\mathbf{v}\|_\infty = 2(6)=12 ✓.

Compute v1\|\mathbf{v}\|_1, v2\|\mathbf{v}\|_2, and v\|\mathbf{v}\|_\infty for v=(6230)\mathbf{v}=\begin{pmatrix}-6\\2\\-3\\0\end{pmatrix}. Verify the ordering vv2v1\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1.

EXERCISE 13.2

Verify axioms N1, N2, N3 for the L1 norm directly. For N3, use ui+viui+vi|u_i+v_i|\leq|u_i|+|v_i| component-wise.

N1 (non-negativity): v1=vi0\|\mathbf{v}\|_1=\sum|v_i|\geq0 since each vi0|v_i|\geq0. Equals zero     \iff all vi=0|v_i|=0     \iff all vi=0v_i=0     \iff v=0\mathbf{v}=\mathbf{0} ✓.

N2 (homogeneity): cv1=cvi=cvi=cvi=cv1\|c\mathbf{v}\|_1=\sum|cv_i|=\sum|c||v_i|=|c|\sum|v_i|=|c|\|\mathbf{v}\|_1 ✓ (since c|c| is a common factor in every term).

N3 (triangle inequality): u+v1=iui+vi\|\mathbf{u}+\mathbf{v}\|_1=\sum_i|u_i+v_i|. For each ii: ui+viui+vi|u_i+v_i|\leq|u_i|+|v_i| (standard absolute value inequality). Summing over ii: iui+viiui+ivi=u1+v1\sum_i|u_i+v_i|\leq\sum_i|u_i|+\sum_i|v_i|=\|\mathbf{u}\|_1+\|\mathbf{v}\|_1 ✓.

All three axioms hold. 1\|\cdot\|_1 is a norm.

Verify that the L1 norm v1=ivi\|\mathbf{v}\|_1=\sum_i|v_i| satisfies all three norm axioms (N1 non-negativity, N2 homogeneity, N3 triangle inequality) for vectors in Rn\mathbb{R}^n.

EXERCISE 13.3

For the induced L1 matrix norm: compute the absolute column sums and take the maximum. For L-infinity: compute the absolute row sums and take the maximum.

A=(123401)A=\begin{pmatrix}1&-2&3\\4&0&-1\end{pmatrix}.

A1\|A\|_1 (max absolute column sum):

Column 1: 1+4=5|1|+|4|=5. Column 2: 2+0=2|-2|+|0|=2. Column 3: 3+1=4|3|+|-1|=4.

A1=max(5,2,4)=5\|A\|_1=\max(5,2,4)=5.

A\|A\|_\infty (max absolute row sum):

Row 1: 1+2+3=6|1|+|-2|+|3|=6. Row 2: 4+0+1=5|4|+|0|+|-1|=5.

A=max(6,5)=6\|A\|_\infty=\max(6,5)=6.

AF\|A\|_F (Frobenius):

AF=1+4+9+16+0+1=315.57\|A\|_F=\sqrt{1+4+9+16+0+1}=\sqrt{31}\approx5.57.

Compute A1\|A\|_1 (max absolute column sum), A\|A\|_\infty (max absolute row sum), and AF\|A\|_F (Frobenius norm) for A=(123401)A=\begin{pmatrix}1&-2&3\\4&0&-1\end{pmatrix}.

EXERCISE 13.4

Show v2v1\|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1 by squaring both sides: you need vi2(vi)2\sum v_i^2 \leq (\sum|v_i|)^2. The right side expands to include all cross terms vivj0|v_i||v_j|\geq0.

Proof that v2v1\|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1:

v12=(ivi)2=ivi2+2i<jvivjivi2=v22\|\mathbf{v}\|_1^2 = \left(\sum_i|v_i|\right)^2 = \sum_i v_i^2 + 2\sum_{i < j}|v_i||v_j| \geq \sum_i v_i^2 = \|\mathbf{v}\|_2^2

since vivj0|v_i||v_j|\geq0. Taking square roots (both sides non-negative): v1v2\|\mathbf{v}\|_1 \geq \|\mathbf{v}\|_2.

Proof that vv2\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2:

v=maxivi\|\mathbf{v}\|_\infty = \max_i|v_i|. Since maxivi2ivi2\max_i|v_i|^2 \leq \sum_i v_i^2, taking square roots gives vv2\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2.

Tight bounds: v2nv\|\mathbf{v}\|_2 \leq \sqrt{n}\|\mathbf{v}\|_\infty (Cauchy-Schwarz: vi2=vivi(12)1/2(vi4)1/2\sum v_i^2 = \sum v_i\cdot v_i \leq (\sum 1^2)^{1/2}(\sum v_i^4)^{1/2}... simpler: vi2nmaxivi2\sum v_i^2 \leq n\max_i v_i^2). Similarly v1nv2\|\mathbf{v}\|_1 \leq \sqrt{n}\|\mathbf{v}\|_2 by Cauchy-Schwarz applied to vi1\sum|v_i|\cdot1.

Prove the norm ordering vv2v1\|\mathbf{v}\|_\infty \leq \|\mathbf{v}\|_2 \leq \|\mathbf{v}\|_1 for vRn\mathbf{v}\in\mathbb{R}^n. Also state (with brief justification) the tighter bounds v2nv\|\mathbf{v}\|_2\leq\sqrt{n}\|\mathbf{v}\|_\infty and v1nv2\|\mathbf{v}\|_1\leq\sqrt{n}\|\mathbf{v}\|_2.

EXERCISE 13.5

Ridge adds λI\lambda I to ATAA^TA. Show that ATA+λIA^TA+\lambda I is always positive definite for λ>0\lambda>0 using the definition xT(ATA+λI)x>0\mathbf{x}^T(A^TA+\lambda I)\mathbf{x}>0.

For any x0\mathbf{x}\neq\mathbf{0} and λ>0\lambda>0:

xT(ATA+λI)x=xTATAx+λxTx=Ax22+λx22\mathbf{x}^T(A^TA+\lambda I)\mathbf{x} = \mathbf{x}^TA^TA\mathbf{x}+\lambda\mathbf{x}^T\mathbf{x} = \|A\mathbf{x}\|_2^2 + \lambda\|\mathbf{x}\|_2^2.

Ax220\|A\mathbf{x}\|_2^2\geq0 always. λx22>0\lambda\|\mathbf{x}\|_2^2>0 since λ>0\lambda>0 and x0\mathbf{x}\neq\mathbf{0}.

Therefore xT(ATA+λI)x>0\mathbf{x}^T(A^TA+\lambda I)\mathbf{x}>0 for all x0\mathbf{x}\neq\mathbf{0} — positive definite \Rightarrow invertible.

This holds regardless of whether AA has dependent columns: even if ATAA^TA is singular, ATA+λIA^TA+\lambda I is always invertible for any λ>0\lambda>0. Ridge regression always has a unique solution.

Eigenvalue interpretation: eigenvalues of ATA+λIA^TA+\lambda I are σi2+λ\sigma_i^2+\lambda (where σi\sigma_i are singular values of AA). Since σi20\sigma_i^2\geq0 and λ>0\lambda>0, all eigenvalues are positive — confirming positive definiteness.

Prove that the ridge-regression matrix ATA+λIA^TA+\lambda I is always invertible for λ>0\lambda>0, even when ATAA^TA is singular. Use the quadratic form definition of positive definiteness.

EXERCISE 13.6

Compute β1\|\boldsymbol{\beta}\|_1, β22\|\boldsymbol{\beta}\|_2^2 for each candidate. The LASSO objective is RSS+λβ1\text{RSS}+\lambda\|\boldsymbol{\beta}\|_1 and the ridge objective is RSS+λβ22\text{RSS}+\lambda\|\boldsymbol{\beta}\|_2^2. Compare total objectives for each candidate.

RSS (sum of squared residuals): given two candidates β(1)=(0.8,0,0.3,0)T\boldsymbol{\beta}^{(1)}=(0.8, 0, 0.3, 0)^T and β(2)=(0.5,0.3,0.2,0.1)T\boldsymbol{\beta}^{(2)}=(0.5, 0.3, 0.2, 0.1)^T, assume both have the same RSS =2.0= 2.0 (illustrating the penalty difference).

λ=0.5\lambda = 0.5.

Candidate 1: β(1)1=0.8+0+0.3+0=1.1\|\boldsymbol{\beta}^{(1)}\|_1=0.8+0+0.3+0=1.1; β(1)22=0.64+0+0.09+0=0.73\|\boldsymbol{\beta}^{(1)}\|_2^2=0.64+0+0.09+0=0.73.

LASSO objective: 2.0+0.5(1.1)=2.552.0+0.5(1.1)=2.55. Ridge objective: 2.0+0.5(0.73)=2.3652.0+0.5(0.73)=2.365.

Candidate 2: β(2)1=0.5+0.3+0.2+0.1=1.1\|\boldsymbol{\beta}^{(2)}\|_1=0.5+0.3+0.2+0.1=1.1; β(2)22=0.25+0.09+0.04+0.01=0.39\|\boldsymbol{\beta}^{(2)}\|_2^2=0.25+0.09+0.04+0.01=0.39.

LASSO objective: 2.0+0.5(1.1)=2.552.0+0.5(1.1)=2.55. Ridge objective: 2.0+0.5(0.39)=2.1952.0+0.5(0.39)=2.195.

For ridge: candidate 2 is better (smaller L2 norm despite same number of nonzeros) — ridge prefers spreading coefficients evenly. For LASSO: both have equal objectives (same L1 norm) — but in practice LASSO's geometry drives toward sparse solutions like candidate 1.

A four-factor return model has two candidate coefficient vectors: β(1)=(0.8,0,0.3,0)T\boldsymbol{\beta}^{(1)}=(0.8, 0, 0.3, 0)^T (sparse) and β(2)=(0.5,0.3,0.2,0.1)T\boldsymbol{\beta}^{(2)}=(0.5, 0.3, 0.2, 0.1)^T (dense), with equal residual sum of squares (RSS = 2.0). With λ=0.5\lambda=0.5, compute both the LASSO objective (RSS+λβ1\text{RSS}+\lambda\|\boldsymbol{\beta}\|_1) and ridge objective (RSS+λβ22\text{RSS}+\lambda\|\boldsymbol{\beta}\|_2^2) for each candidate. Which does each penalty favour, and why?


08 · Chapter Summary

ConceptKey Formula
Norm axiomsNon-negativity, homogeneity, triangle inequality
L1 normv1=vi\|\mathbf{v}\|_1=\sum\lvert v_i\rvert — sum of absolute values
L2 normv2=vi2\|\mathbf{v}\|_2=\sqrt{\sum v_i^2} — Euclidean length
L-infinity normv=maxivi\|\mathbf{v}\|_\infty=\max_i\lvert v_i\rvert — largest absolute component
Norm orderingvv2v1nv2nv\|\mathbf{v}\|_\infty\leq\|\mathbf{v}\|_2\leq\|\mathbf{v}\|_1\leq\sqrt{n}\|\mathbf{v}\|_2\leq n\|\mathbf{v}\|_\infty
Unit ball shapesL1: diamond; L2: circle; L-infinity: square
Norm equivalenceAll norms on Rn\mathbb{R}^n equivalent up to constants
Induced matrix normAp=maxxp=1Axp\|A\|_p=\max_{\|\mathbf{x}\|_p=1}\|A\mathbf{x}\|_p
Ridge regressionAdd λβ22\lambda\|\boldsymbol{\beta}\|_2^2; always invertible; keeps all predictors
LASSO regressionAdd λβ1\lambda\|\boldsymbol{\beta}\|_1; sparse solutions; L1 ball corners
L0 pseudo-normCounts nonzeros; not a true norm (fails homogeneity)

Next: Chapter 14 — Positive Definite Matrices establishes four equivalent characterisations of positive definiteness and connects them to eigenvalues, Cholesky decomposition, and the requirement that all valid covariance matrices must be positive semi-definite.