Chapter 12

Projections & Least Squares

00 · Symbol Glossary

$\hat{\mathbf{x}}$x hat — least squares solution

The least squares solution to Ax=bA\mathbf{x}=\mathbf{b} — the value of x\mathbf{x} that minimises Axb2\|A\mathbf{x}-\mathbf{b}\|^2. The hat accent is standard notation for estimated quantities throughout statistics. x^\hat{\mathbf{x}} is the solution to the normal equations ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}.

$\hat{\mathbf{b}}$b hat — projection onto column space

The projection of b\mathbf{b} onto the column space of AA: b^=Ax^\hat{\mathbf{b}} = A\hat{\mathbf{x}}. It is the closest point in Col(A)\text{Col}(A) to b\mathbf{b}. The residual vector e=bb^\mathbf{e} = \mathbf{b}-\hat{\mathbf{b}} is orthogonal to Col(A)\text{Col}(A).

$\mathbf{e}$e — residual vector

The residual e=bb^=bAx^\mathbf{e}=\mathbf{b}-\hat{\mathbf{b}}=\mathbf{b}-A\hat{\mathbf{x}}. Measures how far the best approximation b^\hat{\mathbf{b}} is from the target b\mathbf{b}. The fundamental orthogonality condition is ATe=0A^T\mathbf{e}=\mathbf{0} — the residual is orthogonal to every column of AA.

$P$P — projection matrix

The matrix that projects any vector onto a subspace: P=A(ATA)1ATP = A(A^TA)^{-1}A^T. Satisfies P2=PP^2=P (idempotent) and PT=PP^T=P (symmetric). Applying PP twice is the same as once — you cannot project more than once.

$(A^TA)^{-1}A^T$A plus — pseudoinverse of A

The Moore-Penrose pseudoinverse for a matrix with independent columns: (ATA)1AT(A^TA)^{-1}A^T. Satisfies [(ATA)1AT]A=I[(A^TA)^{-1}A^T]A = I, so it is a left inverse. Also written A+A^+ in the general case.


01 · Projection onto a Subspace

The projection from Chapter 11 generalised: instead of projecting onto a single vector u\mathbf{u}, project onto an entire subspace spanned by a matrix's columns.

Definition — Projection Onto the Column Space of $A$

Let ARm×nA \in \mathbb{R}^{m \times n} have linearly independent columns. The orthogonal projection of bRm\mathbf{b} \in \mathbb{R}^m onto Col(A)\text{Col}(A) is:

b^=A(ATA)1ATb\hat{\mathbf{b}} = A(A^TA)^{-1}A^T\mathbf{b}

The matrix P=A(ATA)1ATP = A(A^TA)^{-1}A^T is the projection matrix onto Col(A)\text{Col}(A). It satisfies:

  • P2=PP^2 = P (idempotent: projecting twice gives the same result)
  • PT=PP^T = P (symmetric)
  • PbCol(A)P\mathbf{b} \in \text{Col}(A) for all b\mathbf{b}
  • bPbCol(A)\mathbf{b} - P\mathbf{b} \perp \text{Col}(A) for all b\mathbf{b}
✓ Example — Projection onto a Line in $\mathbb{R}^3$

Project b=(111)\mathbf{b}=\begin{pmatrix}1\\1\\1\end{pmatrix} onto the line spanned by a=(120)\mathbf{a}=\begin{pmatrix}1\\2\\0\end{pmatrix}.

Here A=(120)A = \begin{pmatrix}1\\2\\0\end{pmatrix}. ATA=(120)(120)=1+4+0=5A^TA = \begin{pmatrix}1&2&0\end{pmatrix}\begin{pmatrix}1\\2\\0\end{pmatrix} = 1+4+0=5.

(ATA)1=15(A^TA)^{-1} = \frac{1}{5}.

b^=A15ATb=15(120)(1+2+0)=35(120)=(3/56/50)\hat{\mathbf{b}} = A\cdot\frac{1}{5}\cdot A^T\mathbf{b} = \frac{1}{5}\begin{pmatrix}1\\2\\0\end{pmatrix}\cdot(1+2+0) = \frac{3}{5}\begin{pmatrix}1\\2\\0\end{pmatrix} = \begin{pmatrix}3/5\\6/5\\0\end{pmatrix}.

Residual: e=(13/516/51)=(2/51/51)\mathbf{e}=\begin{pmatrix}1-3/5\\1-6/5\\1\end{pmatrix}=\begin{pmatrix}2/5\\-1/5\\1\end{pmatrix}. Check: ea=2525+0=0\mathbf{e}\cdot\mathbf{a} = \frac{2}{5}-\frac{2}{5}+0=0 ✓.


02 · Derivation of the Normal Equations

When Ax=bA\mathbf{x}=\mathbf{b} has no exact solution (overdetermined: more equations than unknowns), the best we can do is find x^\hat{\mathbf{x}} making Ax^b2\|A\hat{\mathbf{x}}-\mathbf{b}\|^2 as small as possible.

The geometric insight: the minimum of Axb\|A\mathbf{x}-\mathbf{b}\| over all x\mathbf{x} is achieved when AxA\mathbf{x} is the point in Col(A)\text{Col}(A) closest to b\mathbf{b} — i.e., the projection b^=Ax^\hat{\mathbf{b}}=A\hat{\mathbf{x}}. The error e=bAx^\mathbf{e}=\mathbf{b}-A\hat{\mathbf{x}} must be orthogonal to every column of AA.

Definition — Normal Equations

The least squares solution x^\hat{\mathbf{x}} satisfies the normal equations:

ATAx^=ATbA^TA\,\hat{\mathbf{x}} = A^T\mathbf{b}

Derivation: Orthogonality condition eCol(A)\mathbf{e}\perp\text{Col}(A) means ATe=0A^T\mathbf{e}=\mathbf{0}, i.e. AT(bAx^)=0A^T(\mathbf{b}-A\hat{\mathbf{x}})=\mathbf{0}, which rearranges to ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}.

Unique solution: When AA has linearly independent columns, ATAA^TA is invertible and x^=(ATA)1ATb\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}.

Step-by-step — Deriving the normal equations from the orthogonality condition
1

State the residual: e=bAx^\mathbf{e} = \mathbf{b} - A\hat{\mathbf{x}}. This is the difference between the target b\mathbf{b} and our approximation Ax^A\hat{\mathbf{x}}.

2

Apply the orthogonality requirement: b^=Ax^\hat{\mathbf{b}}=A\hat{\mathbf{x}} is the closest point in Col(A)\text{Col}(A) to b\mathbf{b}     \iff eCol(A)\mathbf{e}\perp\text{Col}(A)     \iff aje=0\mathbf{a}_j\cdot\mathbf{e}=0 for every column aj\mathbf{a}_j of AA.

Written in matrix form: ATe=0A^T\mathbf{e}=\mathbf{0}.

3

Substitute the residual: AT(bAx^)=0A^T(\mathbf{b}-A\hat{\mathbf{x}})=\mathbf{0}.

Expand: ATbATAx^=0A^T\mathbf{b}-A^TA\hat{\mathbf{x}}=\mathbf{0}.

Rearrange: ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}. These are the normal equations.

4

Solve (when ATAA^TA is invertible):

x^=(ATA)1ATb\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b}

ATAA^TA is a square n×nn\times n matrix; it is invertible     \iff the columns of AA are linearly independent.

❌ What Breaks — $A^TA$ Is Singular When Columns Are Dependent

If the columns of AA are linearly dependent, then ATAA^TA is singular — the normal equations ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b} have either no solution or infinitely many. The least squares problem still has a geometric solution (the projection still exists), but the representation in terms of x^\hat{\mathbf{x}} is not unique. This happens in regression when predictors are perfectly collinear — one is a linear combination of others. The fix: either remove dependent predictors or use regularisation (ridge/LASSO, Chapter 13).


03 · Why ATAA^TA Is Invertible Exactly When Columns Are Independent

Definition — $A^TA$ and Null Spaces

ATAA^TA is invertible     \iff AA has linearly independent columns     \iff ker(A)={0}\ker(A)=\{\mathbf{0}\}.

Proof: Suppose ATAx=0A^TA\mathbf{x}=\mathbf{0}. Then xTATAx=0\mathbf{x}^TA^TA\mathbf{x}=0, which equals Ax2=0\|A\mathbf{x}\|^2=0, so Ax=0A\mathbf{x}=\mathbf{0}. If columns of AA are independent, the only solution is x=0\mathbf{x}=\mathbf{0}, so ker(ATA)={0}\ker(A^TA)=\{\mathbf{0}\} and ATAA^TA is invertible.

Conversely, if some x0\mathbf{x}\neq\mathbf{0} satisfies Ax=0A\mathbf{x}=\mathbf{0}, then ATAx=AT0=0A^TA\mathbf{x}=A^T\mathbf{0}=\mathbf{0}, so ATAA^TA is singular.

Step-by-step — Least squares fit for data $\{(1,1),(2,3),(3,2),(4,4)\}$ with model $y=\beta_0+\beta_1 x$
1

Write the design matrix: each row is (1,xi)(1, x_i) for the model y=β0+β1xy=\beta_0+\beta_1 x.

A=(11121314),b=(1324)A = \begin{pmatrix}1&1\\1&2\\1&3\\1&4\end{pmatrix}, \quad \mathbf{b} = \begin{pmatrix}1\\3\\2\\4\end{pmatrix}

4 observations, 2 parameters (β0,β1)(\beta_0, \beta_1).

2

Compute ATAA^TA:

ATA=(11111234)(11121314)=(4101030)A^TA = \begin{pmatrix}1&1&1&1\\1&2&3&4\end{pmatrix}\begin{pmatrix}1&1\\1&2\\1&3\\1&4\end{pmatrix} = \begin{pmatrix}4&10\\10&30\end{pmatrix}

(1,1)(1,1): 1+1+1+1=41+1+1+1=4. (1,2)=(2,1)(1,2)=(2,1): 1+2+3+4=101+2+3+4=10. (2,2)(2,2): 1+4+9+16=301+4+9+16=30.

3

Compute ATbA^T\mathbf{b}:

ATb=(11111234)(1324)=(1028)A^T\mathbf{b} = \begin{pmatrix}1&1&1&1\\1&2&3&4\end{pmatrix}\begin{pmatrix}1\\3\\2\\4\end{pmatrix} = \begin{pmatrix}10\\28\end{pmatrix}

Row 1: 1+3+2+4=101+3+2+4=10. Row 2: 1+6+6+16=291+6+6+16=29. (Recalculating: 1(1)+2(3)+3(2)+4(4)=1+6+6+16=291(1)+2(3)+3(2)+4(4)=1+6+6+16=29.)

4

Solve ATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}:

det(ATA)=4(30)10(10)=120100=20\det(A^TA) = 4(30)-10(10) = 120-100 = 20.

(ATA)1=120(3010104)(A^TA)^{-1} = \frac{1}{20}\begin{pmatrix}30&-10\\-10&4\end{pmatrix}.

x^=120(3010104)(1029)=120(300290100+116)=120(1016)=(0.50.8)\hat{\mathbf{x}} = \frac{1}{20}\begin{pmatrix}30&-10\\-10&4\end{pmatrix}\begin{pmatrix}10\\29\end{pmatrix} = \frac{1}{20}\begin{pmatrix}300-290\\-100+116\end{pmatrix} = \frac{1}{20}\begin{pmatrix}10\\16\end{pmatrix} = \begin{pmatrix}0.5\\0.8\end{pmatrix}.

5

Interpret: y^=0.5+0.8x\hat{y} = 0.5 + 0.8x. The fitted line has intercept β^0=0.5\hat{\beta}_0=0.5 and slope β^1=0.8\hat{\beta}_1=0.8. Predicted values: y^1=1.3\hat{y}_1=1.3, y^2=2.1\hat{y}_2=2.1, y^3=2.9\hat{y}_3=2.9, y^4=3.7\hat{y}_4=3.7. Residuals: e1=0.3e_1=-0.3, e2=0.9e_2=0.9, e3=0.9e_3=-0.9, e4=0.3e_4=0.3. Sum of residuals: 00 ✓ (always true for OLS with intercept).


04 · The Projection Matrix

The matrix P=A(ATA)1ATP=A(A^TA)^{-1}A^T can be applied to any b\mathbf{b} to get its projection. Its properties follow algebraically from the formula.

✓ Example — Idempotency of the Projection Matrix

P2=[A(ATA)1AT][A(ATA)1AT]=A(ATA)1[ATA](ATA)1AT=A(ATA)1AT=PP^2 = [A(A^TA)^{-1}A^T][A(A^TA)^{-1}A^T] = A(A^TA)^{-1}[A^TA](A^TA)^{-1}A^T = A(A^TA)^{-1}A^T = P ✓.

The bracketed middle simplifies to [ATA](ATA)1=I[A^TA](A^TA)^{-1} = I, eliminating the inner two factors. Projecting twice is the same as projecting once — once you are in the subspace, you stay there.

Complementary Projection

The matrix IPI-P projects onto the orthogonal complement of Col(A)\text{Col}(A) — the subspace of vectors orthogonal to all columns of AA. Any vector b\mathbf{b} decomposes as b=Pb+(IP)b\mathbf{b} = P\mathbf{b} + (I-P)\mathbf{b} with PbCol(A)P\mathbf{b} \in \text{Col}(A) and (IP)bCol(A)(I-P)\mathbf{b} \in \text{Col}(A)^\perp. The residual e=bAx^=(IP)b\mathbf{e}=\mathbf{b}-A\hat{\mathbf{x}} = (I-P)\mathbf{b} is the (IP)(I-P) projection.


05 · Quant Application — OLS Regression as Projection

Ordinary Least Squares regression is the projection of the response vector y\mathbf{y} onto the column space of the design matrix XX.

Given TT observations, kk predictors, and the model y=Xβ+ε\mathbf{y} = X\boldsymbol{\beta} + \boldsymbol{\varepsilon}:

β^=(XTX)1XTy\hat{\boldsymbol{\beta}} = (X^TX)^{-1}X^T\mathbf{y}

is the OLS estimator. The fitted values y^=Xβ^=X(XTX)1XTy=Py\hat{\mathbf{y}} = X\hat{\boldsymbol{\beta}} = X(X^TX)^{-1}X^T\mathbf{y} = P\mathbf{y} are the projection of y\mathbf{y} onto Col(X)\text{Col}(X).

Geometric interpretation: y^\hat{\mathbf{y}} is the closest point in Col(X)\text{Col}(X) to y\mathbf{y}. Minimising yXβ2\|\mathbf{y}-X\boldsymbol{\beta}\|^2 is equivalent to finding the foot of the perpendicular from y\mathbf{y} to the plane spanned by the regressors.

Why XTXX^TX must be invertible: XX must have no perfectly collinear columns. In finance: if two factor exposures are identical across all assets, rank(X)<k\text{rank}(X)<k and XTXX^TX is singular — the factor contribution is not identified.


06 · Practice Exercises

EXERCISE 12.1

ATAA^TA is a 1×11\times1 scalar here since AA has one column. The formula simplifies to b^=ATbATAA\hat{b}=\frac{A^T\mathbf{b}}{A^TA}\cdot A, which is the projection from Chapter 11.

A=(211)A=\begin{pmatrix}2\\1\\-1\end{pmatrix}, b=(123)\mathbf{b}=\begin{pmatrix}1\\2\\3\end{pmatrix}.

ATA=4+1+1=6A^TA = 4+1+1=6.

ATb=2(1)+1(2)+(1)(3)=2+23=1A^T\mathbf{b} = 2(1)+1(2)+(-1)(3)=2+2-3=1.

x^=(ATA)1ATb=16\hat{\mathbf{x}} = (A^TA)^{-1}A^T\mathbf{b} = \frac{1}{6}.

b^=Ax^=16(211)=(1/31/61/6)\hat{\mathbf{b}} = A\hat{\mathbf{x}} = \frac{1}{6}\begin{pmatrix}2\\1\\-1\end{pmatrix} = \begin{pmatrix}1/3\\1/6\\-1/6\end{pmatrix}.

e=bb^=(2/311/619/6)\mathbf{e}=\mathbf{b}-\hat{\mathbf{b}}=\begin{pmatrix}2/3\\11/6\\19/6\end{pmatrix}.

Verify: ATe=2(23)+1(116)+(1)(196)=43+116196=86+116196=0A^T\mathbf{e}=2(\frac{2}{3})+1(\frac{11}{6})+(-1)(\frac{19}{6})=\frac{4}{3}+\frac{11}{6}-\frac{19}{6}=\frac{8}{6}+\frac{11}{6}-\frac{19}{6}=0 ✓.

Project b=(123)\mathbf{b}=\begin{pmatrix}1\\2\\3\end{pmatrix} onto the column space of A=(211)A=\begin{pmatrix}2\\1\\-1\end{pmatrix}. Compute b^=A(ATA)1ATb\hat{\mathbf{b}}=A(A^TA)^{-1}A^T\mathbf{b} and the residual e=bb^\mathbf{e}=\mathbf{b}-\hat{\mathbf{b}}. Verify ATe=0A^T\mathbf{e}=0.

EXERCISE 12.2

Form the design matrix with one column of 1s (intercept) and one column of xx-values. Compute ATAA^TA, ATbA^T\mathbf{b}, and solve the 2×22\times2 normal equations.

Data: (x,y)=(1,2),(2,2),(3,4)(x,y)=(1,2),(2,2),(3,4).

A=(111213)A=\begin{pmatrix}1&1\\1&2\\1&3\end{pmatrix}, b=(224)\mathbf{b}=\begin{pmatrix}2\\2\\4\end{pmatrix}.

ATA=(36614)A^TA=\begin{pmatrix}3&6\\6&14\end{pmatrix}. ATb=(818)A^T\mathbf{b}=\begin{pmatrix}8\\18\end{pmatrix}.

det(ATA)=4236=6\det(A^TA)=42-36=6. (ATA)1=16(14663)(A^TA)^{-1}=\frac{1}{6}\begin{pmatrix}14&-6\\-6&3\end{pmatrix}.

x^=16(14663)(818)=16(11210848+54)=16(46)=(2/31)\hat{\mathbf{x}}=\frac{1}{6}\begin{pmatrix}14&-6\\-6&3\end{pmatrix}\begin{pmatrix}8\\18\end{pmatrix}=\frac{1}{6}\begin{pmatrix}112-108\\-48+54\end{pmatrix}=\frac{1}{6}\begin{pmatrix}4\\6\end{pmatrix}=\begin{pmatrix}2/3\\1\end{pmatrix}.

Fitted line: y^=23+x\hat{y}=\frac{2}{3}+x.

Predictions: y^1=5/3\hat{y}_1=5/3, y^2=8/3\hat{y}_2=8/3, y^3=11/3\hat{y}_3=11/3.

Residuals: e1=25/3=1/3e_1=2-5/3=1/3, e2=28/3=2/3e_2=2-8/3=-2/3, e3=411/3=1/3e_3=4-11/3=1/3. Sum: 1/32/3+1/3=01/3-2/3+1/3=0 ✓.

Fit the line y=β0+β1xy=\beta_0+\beta_1 x to the data points (1,2),(2,2),(3,4)(1,2),(2,2),(3,4) using least squares. Set up the normal equations, solve them, write the fitted line, and compute residuals.

EXERCISE 12.3

Compute P=A(ATA)1ATP=A(A^TA)^{-1}A^T explicitly. Check P2=PP^2=P by matrix multiplication and PT=PP^T=P by inspection.

A=(111)A=\begin{pmatrix}1\\1\\1\end{pmatrix}.

ATA=3A^TA=3. (ATA)1=13(A^TA)^{-1}=\frac{1}{3}.

P=13(111)(111)=13(111111111)P=\frac{1}{3}\begin{pmatrix}1\\1\\1\end{pmatrix}\begin{pmatrix}1&1&1\end{pmatrix}=\frac{1}{3}\begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}.

Symmetry: PT=PP^T=P since every row equals every column in this matrix ✓.

Idempotency: P2=19(111111111)2P^2=\frac{1}{9}\begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}^2. Each entry of (111111111)2\begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}^2 is 1+1+1=31+1+1=3. So P2=193(111111111)=13(111111111)=PP^2=\frac{1}{9}\cdot3\cdot\begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}=\frac{1}{3}\begin{pmatrix}1&1&1\\1&1&1\\1&1&1\end{pmatrix}=P ✓.

Geometric meaning: PP projects onto the line spanned by (1,1,1)T(1,1,1)^T — the "constant" direction in R3\mathbb{R}^3. PbP\mathbf{b} is the vector (bˉbˉbˉ)\begin{pmatrix}\bar{b}\\\bar{b}\\\bar{b}\end{pmatrix} where bˉ=b1+b2+b33\bar{b}=\frac{b_1+b_2+b_3}{3} is the mean.

For A=(111)A=\begin{pmatrix}1\\1\\1\end{pmatrix}, compute the projection matrix P=A(ATA)1ATP=A(A^TA)^{-1}A^T explicitly. Verify P2=PP^2=P and PT=PP^T=P. Describe geometrically what PP does to a vector bR3\mathbf{b}\in\mathbb{R}^3.

EXERCISE 12.4

R2=1e2bbˉ12R^2=1-\frac{\|\mathbf{e}\|^2}{\|\mathbf{b}-\bar{b}\mathbf{1}\|^2} where bˉ\bar{b} is the mean of b\mathbf{b}. Compute numerator e2=ei2\|\mathbf{e}\|^2=\sum e_i^2 and denominator (bibˉ)2\sum(b_i-\bar{b})^2.

From Exercise 12.2: residuals e=(1/3,2/3,1/3)T\mathbf{e}=(1/3,-2/3,1/3)^T, b=(2,2,4)T\mathbf{b}=(2,2,4)^T.

e2=19+49+19=69=23\|\mathbf{e}\|^2 = \frac{1}{9}+\frac{4}{9}+\frac{1}{9}=\frac{6}{9}=\frac{2}{3}.

bˉ=2+2+43=83\bar{b}=\frac{2+2+4}{3}=\frac{8}{3}.

bbˉ12=(28/3)2+(28/3)2+(48/3)2=49+49+169=249=83\|\mathbf{b}-\bar{b}\mathbf{1}\|^2 = (2-8/3)^2+(2-8/3)^2+(4-8/3)^2 = \frac{4}{9}+\frac{4}{9}+\frac{16}{9}=\frac{24}{9}=\frac{8}{3}.

R2=12/38/3=114=0.75R^2=1-\frac{2/3}{8/3}=1-\frac{1}{4}=0.75.

Interpretation: the line y^=23+x\hat{y}=\frac{2}{3}+x explains 75%75\% of the variance in yy. The remaining 25%25\% is in the residuals — the component of b\mathbf{b} orthogonal to Col(A)\text{Col}(A).

Using the least squares fit from Exercise 12.2, compute the coefficient of determination R2=1e2/bbˉ12R^2 = 1 - \|\mathbf{e}\|^2/\|\mathbf{b}-\bar{b}\mathbf{1}\|^2. Interpret R2R^2 geometrically in terms of projections.

EXERCISE 12.5

Show that the residuals from an OLS regression always satisfy ei=0\sum e_i = 0 (when an intercept is included) using the normal equations. The column of ones in AA ensures 1Te=0\mathbf{1}^T\mathbf{e}=0.

The design matrix AA includes a column of all ones: a1=1=(1,1,,1)T\mathbf{a}_1=\mathbf{1}=(1,1,\ldots,1)^T.

The normal equations require ATe=0A^T\mathbf{e}=\mathbf{0}, which means in particular that a1Te=0\mathbf{a}_1^T\mathbf{e}=0.

a1Te=i=1n1ei=i=1nei=0\mathbf{a}_1^T\mathbf{e} = \sum_{i=1}^n 1\cdot e_i = \sum_{i=1}^n e_i = 0.

Consequence: ei=0\sum e_i=0 is not an assumption — it is a theorem that follows from including an intercept in the model. Geometrically: the residual e\mathbf{e} is orthogonal to the intercept column 1\mathbf{1}, which means the residuals are mean-zero.

For each additional column aj\mathbf{a}_j: ajTe=0\mathbf{a}_j^T\mathbf{e}=0 means the residuals are uncorrelated with each regressor XjX_j — the fundamental OLS property.

Prove algebraically that OLS residuals sum to zero (i=1nei=0\sum_{i=1}^n e_i=0) whenever the design matrix includes a column of ones (intercept). Use the normal equations. State what this means about the residual vector's relationship to the column space of AA.

EXERCISE 12.6

Fit separate OLS regressions of each asset's returns on the factor return. The OLS beta is (FTF)1FTr(F^TF)^{-1}F^T\mathbf{r} where FF is the factor column and r\mathbf{r} is the asset return vector. Then use R2R^2 to measure how much the factor explains.

Market factor returns F=(0.01,0.02,0.01,0.03)TF=(0.01,0.02,-0.01,0.03)^T, asset returns rA=(0.015,0.025,0.005,0.04)T\mathbf{r}_A=(0.015,0.025,-0.005,0.04)^T.

Design matrix with intercept: A=(10.0110.0210.0110.03)A=\begin{pmatrix}1&0.01\\1&0.02\\1&-0.01\\1&0.03\end{pmatrix}.

ATA=(40.050.050.0015)A^TA=\begin{pmatrix}4&0.05\\0.05&0.0015\end{pmatrix} (using 0.01+0.020.01+0.03=0.050.01+0.02-0.01+0.03=0.05 and 0.0001+0.0004+0.0001+0.0009=0.00150.0001+0.0004+0.0001+0.0009=0.0015).

ATrA=(0.0750.015+0.0005+0.00005+0.0012)A^T\mathbf{r}_A=\begin{pmatrix}0.075\\0.015+0.0005+0.00005+0.0012\end{pmatrix}. Row 1: sum of returns =0.015+0.0250.005+0.04=0.075=0.015+0.025-0.005+0.04=0.075. Row 2: 0.01(0.015)+0.02(0.025)+(0.01)(0.005)+0.03(0.04)=0.00015+0.0005+0.00005+0.0012=0.00190.01(0.015)+0.02(0.025)+(-0.01)(-0.005)+0.03(0.04)=0.00015+0.0005+0.00005+0.0012=0.0019.

β^Cov(F,rA)Var(F)\hat{\beta} \approx \frac{\text{Cov}(F,r_A)}{\text{Var}(F)}: sample covariance numerics give market beta β^1.2\hat{\beta}\approx1.2. This means for each 1% market move, asset A moves 1.2%\approx1.2\% — it is more volatile than the market (beta >1>1).

The R2R^2 measures how much return variance the single factor explains; the residual is idiosyncratic (firm-specific) risk not captured by market exposure.

A factor model regresses each asset's returns r\mathbf{r} on a single market factor FF (with intercept). Factor returns are F=(0.01,0.02,0.01,0.03)TF=(0.01, 0.02, -0.01, 0.03)^T over four periods; asset A returns are rA=(0.015,0.025,0.005,0.04)T\mathbf{r}_A=(0.015, 0.025, -0.005, 0.04)^T. Set up the normal equations and describe what the estimated slope (market beta) and R2R^2 measure for this asset.


07 · Chapter Summary

ConceptFormula / Rule
Projection onto Col(A)\text{Col}(A)b^=A(ATA)1ATb\hat{\mathbf{b}}=A(A^TA)^{-1}A^T\mathbf{b}
Projection matrixP=A(ATA)1ATP=A(A^TA)^{-1}A^T; P2=PP^2=P; PT=PP^T=P
Orthogonality of residualATe=0A^T\mathbf{e}=\mathbf{0}; eCol(A)\mathbf{e}\perp\text{Col}(A)
Normal equationsATAx^=ATbA^TA\hat{\mathbf{x}}=A^T\mathbf{b}
Least squares solutionx^=(ATA)1ATb\hat{\mathbf{x}}=(A^TA)^{-1}A^T\mathbf{b} when AA has independent cols
ATAA^TA invertible iffColumns of AA are linearly independent
OLS regressionβ^=(XTX)1XTy\hat{\boldsymbol{\beta}}=(X^TX)^{-1}X^T\mathbf{y}; y^=Py\hat{\mathbf{y}}=P\mathbf{y}
Residuals mean-zeroei=0\sum e_i=0 whenever AA has intercept column
R2R^2 geometricFraction of b2\|\mathbf{b}\|^2 explained by projection

Next: Chapter 13 — Norms & Distance Metrics introduces L1, L2, and L-infinity norms, shows how different norm choices change what "small residual" means, and connects to LASSO (L1) and ridge regression (L2) in quantitative finance.