Projections and least-squares bridge
Projecting $\mathbf{b}$ onto a subspace $W$ picks $\mathbf{p}\in W$ that minimizes $\|\mathbf{b}-\mathbf{p}\|$. The error $\mathbf{e}=\mathbf{b}-\mathbf{p}$ is orthogonal to every vector in $W$, which is the geometric heart of least squares .
When $W=\mathrm{Col}(A)$, the normal equations $A^T A\mathbf{x}=A^T\mathbf{b}$ encode that orthogonality in coordinates. An orthogonal projection matrix $P$ onto $W$ satisfies $P^2=P$ and sends $\mathbf{b}$ to its closest point in $W$.
Decompose $\mathbf{b}=\mathbf{p}+\mathbf{e}$ with $\mathbf{p}\in W$ and $\mathbf{e}\perp W$. The Pythagorean theorem shows $\|\mathbf{b}\|^2=\|\mathbf{p}\|^2+\|\mathbf{e}\|^2$, so shrinking the residual is the same as finding the best approximation inside $W$. In regression language, you keep only the part of $\mathbf{b}$ explainable as a column combination of $A$.

Example sketch: if $W$ is a line through $\mathbf{u}\neq\mathbf{0}$, then $\mathbf{p}=(\frac{\mathbf{u}\cdot\mathbf{b}}{\mathbf{u}\cdot\mathbf{u}})\mathbf{u}$. The residual $\mathbf{e}$ is perpendicular to $\mathbf{u}$, so $\mathbf{u}\cdot\mathbf{e}=0$.

Numerical note: forming $A^T A$ explicitly can square condition numbers. QR factorization stabilizes projections that naive normal equations handle poorly . The dot-product language from this chapter is what makes the normal equations feel inevitable rather than memorized.
Picture data points nearly lying on a line but not exactly: least squares finds the line minimizing total squared vertical error. Every step of that story is a projection statement written with dots.
Related cards
Video Content
Tasks
Card Info
- Topic: Mathematics
- Difficulty: Intermediate
- Completed: 0 users