01.01.09 · foundations / linear-algebra

Gram-Schmidt orthonormalisation and finite-dim inner-product space

shipped3 tiersLean: none

Anchor (Master): Shilov — *Linear Algebra* Ch. 8; Halmos — *Finite-Dimensional Vector Spaces* §I.6 and §II.13; Axler — *Linear Algebra Done Right* Ch. 6; Golub-Van Loan — *Matrix Computations* (4th ed.) Ch. 5; Trefethen-Bau — *Numerical Linear Algebra* Lectures 7–10; Björck — *Numerical Methods for Least Squares Problems* (SIAM, 1996) Ch. 2

Intuition Beginner

An inner product on a vector space is a way of measuring two things at once: the length of a vector, and the angle between two vectors. The basic example is the dot product on the plane, where $(a, b) \cdot (c, d) = a c + b d$ . The length of $(a, b)$ comes from dotting it with itself; the angle between two vectors comes from the same dot product divided by the lengths. Any space with this kind of two-vector measurement is called an inner-product space.

A set of vectors is orthonormal when every vector has length one and every two distinct vectors are perpendicular. Orthonormal bases are the most pleasant bases. Coordinates are computed by a single dot product, not by solving a system. Distance is computed by the sum of squared coordinates. Every familiar geometry fact carries through without modification.

The Gram-Schmidt procedure is the recipe for turning any linearly independent set of vectors into an orthonormal one with the same span. Start with the first vector and scale it to length one. For each next vector, remove the part that already points along the previous orthonormal vectors, then scale what is left to length one. The output is an orthonormal set, and the partial spans match the partial spans of the original list.

Visual Beginner

The figure shows the three stages of Gram-Schmidt for a pair of vectors in the plane. On the left, the input pair $v_{1}$ and $v_{2}$ . In the middle, $v_{2}$ has been decomposed into its component along $v_{1}$ (the projection) and the perpendicular complement $u_{2}$ . On the right, both vectors have been rescaled to length one, producing an orthonormal pair $e_{1}$ and $e_{2}$ with a right angle between them.

The geometry is the whole story. Subtracting the projection along an already-chosen direction kills the overlap and leaves only the perpendicular component. Repeating direction by direction produces an orthonormal frame, and the order in which vectors were chosen determines which subspaces the frame spans at each stage.

Worked example Beginner

Apply Gram-Schmidt to the pair $v_{1} = (4, 0)$ and $v_{2} = (3, 2)$ in the plane with the ordinary dot product.

Step 1. Scale $v_{1}$ to length one. The length of $v_{1}$ is $4^{2} + 0^{2} = 4$ . Divide by $4$ to get $e_{1} = (1, 0)$ .

Step 2. Compute the dot product of $v_{2}$ with $e_{1}$ . We get $(3, 2) \cdot (1, 0) = 3$ . The component of $v_{2}$ along $e_{1}$ is $3 \cdot e_{1} = (3, 0)$ .

Step 3. Subtract the component along $e_{1}$ from $v_{2}$ . The perpendicular part is $u_{2} = (3, 2) - (3, 0) = (0, 2)$ .

Step 4. Scale $u_{2}$ to length one. The length of $(0, 2)$ is $2$ . Divide by $2$ to get $e_{2} = (0, 1)$ .

The output is the orthonormal pair ${e_{1}, e_{2}} = {(1, 0), (0, 1)}$ . Check: each vector has length one, and their dot product is $0$ . The two of them span the same plane as ${v_{1}, v_{2}}$ , since each $e_{k}$ is in the span of the first $k$ inputs and vice versa.

What this tells us: Gram-Schmidt converts a slanted pair of vectors into the standard horizontal-and-vertical frame whenever the first input already points along the horizontal axis. The procedure does not pick the simplest frame in general; it picks the frame that respects the order of the inputs.

Check your understanding Beginner

Formal definition Intermediate+

Let $F$ denote either $R$ or $C$ and let $V$ be a finite-dimensional vector space over $F$ in the sense of 01.01.03. An inner product on $V$ is a map $$ \langle \cdot, \cdot \rangle : V \times V \to F $$ satisfying three axioms.

(i) Linearity in the first slot. For all $u, v, w \in V$ and $α, β \in F$ , $$ \langle \alpha u + \beta v, w \rangle = \alpha \langle u, w \rangle + \beta \langle v, w \rangle. $$

(ii) Conjugate symmetry. For all $v, w \in V$ , $$ \langle v, w \rangle = \overline{\langle w, v \rangle}, $$ where the bar denotes complex conjugation (which is the identity in the real case). Combined with (i), this forces conjugate-linearity in the second slot: $⟨ u, α v + β w ⟩ = \overline{α} ⟨ u, v ⟩ + \overline{β} ⟨ u, w ⟩$ .

(iii) Positive definiteness. For all $v \in V$ , $⟨ v, v ⟩$ is a non-negative real number, with $⟨ v, v ⟩ = 0$ iff $v = 0$ .

The pair $(V, ⟨ \cdot, \cdot ⟩)$ is a finite-dimensional inner-product space. The norm induced by the inner product is $$ |v| = \sqrt{\langle v, v \rangle}, $$ a non-negative real number by axiom (iii). Two vectors $v, w \in V$ are orthogonal, written $v ⊥ w$ , when $⟨ v, w ⟩ = 0$ . A subset $S \subseteq V$ is orthogonal if every pair of distinct vectors in $S$ is orthogonal; $S$ is orthonormal if it is orthogonal and every vector in $S$ has norm one. An orthonormal set is automatically linearly independent: if $\sum_{i} α_{i} e_{i} = 0$ , then inner-producting with $e_{j}$ gives $α_{j} = 0$ for every $j$ .

The standard examples. (a) $V = F^{n}$ with the standard inner product $⟨(x_{1}, \dots, x_{n}), (y_{1}, \dots, y_{n})⟩ = \sum_{i = 1}^{n} x_{i} \overline{y_{i}}$ . (b) $V = R [x]_{\leq n}$ (real polynomials of degree at most $n$ ) with $⟨ f, g ⟩ = \int_{- 1}^{1} f (x) g (x) d x$ . (c) $V = M_{n} (C)$ with the Frobenius inner product $⟨ A, B ⟩ = tr (A B^{*})$ , where $B^{*} = \overline{B}^{T}$ is the conjugate transpose.

Counterexamples to common slips

The bilinear form $⟨(x_{1}, x_{2}), (y_{1}, y_{2})⟩ = x_{1} y_{1} - x_{2} y_{2}$ on $R^{2}$ is symmetric but not positive definite: the vector $(0, 1)$ has $⟨(0, 1), (0, 1)⟩ = - 1$ . This is a non-degenerate indefinite form (the Lorentz form), not an inner product. The positive-definiteness axiom is what distinguishes an inner product from a general symmetric bilinear form, in the sense of 01.01.15.
In the complex case, the form $⟨ v, w ⟩ = \sum_{i} v_{i} w_{i}$ (without conjugation) is symmetric and bilinear but not positive definite: the vector $(1, i) \in C^{2}$ has $\sum v_{i} v_{i} = 1 + i^{2} = 0$ even though $v \neq = 0$ . Conjugation in the second slot is what makes $⟨ v, v ⟩ = \sum ∣ v_{i} ∣^{2}$ a non-negative real.
The bilinear pairing $⟨ f, g ⟩ = \int_{- 1}^{1} f g d x$ on the larger space $C [- 1, 1]$ of continuous functions is positive definite, but the same pairing on the larger space of (equivalence classes of) Lebesgue-integrable functions is positive definite only modulo functions vanishing almost everywhere. The completion of $(C [- 1, 1], ⟨ \cdot, \cdot ⟩)$ is the Hilbert space $L^{2} [- 1, 1]$ , which is the natural infinite-dimensional setting in which the Legendre polynomials become a complete orthonormal basis.

Key theorem with proof Intermediate+

Theorem (Gram-Schmidt orthonormalisation; Shilov §8.3). Let $(V, ⟨ \cdot, \cdot ⟩)$ be a finite-dimensional inner-product space over $F = R$ or $C$ , and let ${v_{1}, \dots, v_{n}}$ be a linearly independent sequence in $V$ . Define inductively, for $k = 1, \dots, n$ , $$ u_k = v_k - \sum_{j < k} \langle v_k, e_j \rangle , e_j, \qquad e_k = \frac{u_k}{|u_k|}. $$ Then ${e_{1}, \dots, e_{n}}$ is an orthonormal sequence, and $$ \mathrm{span}{e_1, \ldots, e_k} = \mathrm{span}{v_1, \ldots, v_k} $$ for every $k \in {1, \dots, n}$ . In particular, every finite-dimensional inner-product space admits an orthonormal basis.

Proof. We argue by induction on $k$ .

Base case $k = 1$ . Linear independence of ${v_{1}}$ forces $v_{1} \neq = 0$ , so $∥ v_{1} ∥ > 0$ and $e_{1} = v_{1} /∥ v_{1} ∥$ is well-defined. Then $∥ e_{1} ∥ = 1$ by the homogeneity $∥ α v ∥ = ∣ α ∣ ∥ v ∥$ , and $span {e_{1}} = span {v_{1}}$ since $e_{1}$ and $v_{1}$ are non-zero scalar multiples of one another.

Inductive step. Suppose ${e_{1}, \dots, e_{k - 1}}$ is orthonormal and $span {e_{1}, \dots, e_{k - 1}} = span {v_{1}, \dots, v_{k - 1}}$ . Define $u_{k}$ by the formula. Two facts to check.

First, $u_{k} \neq = 0$ . If $u_{k} = 0$ , then $v_{k} = \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j} \in span {e_{1}, \dots, e_{k - 1}} = span {v_{1}, \dots, v_{k - 1}}$ , contradicting linear independence of ${v_{1}, \dots, v_{k}}$ . So $∥ u_{k} ∥ > 0$ and $e_{k} = u_{k} /∥ u_{k} ∥$ is well-defined and has norm one.

Second, $e_{k}$ is orthogonal to each $e_{i}$ with $i < k$ . Compute $$ \langle u_k, e_i \rangle = \langle v_k, e_i \rangle - \sum_{j < k} \langle v_k, e_j \rangle , \langle e_j, e_i \rangle = \langle v_k, e_i \rangle - \langle v_k, e_i \rangle = 0, $$ where the middle equality uses $⟨ e_{j}, e_{i} ⟩ = δ_{j i}$ from the inductive orthonormality of ${e_{1}, \dots, e_{k - 1}}$ . Dividing by $∥ u_{k} ∥$ preserves the vanishing, so $⟨ e_{k}, e_{i} ⟩ = 0$ .

Third, span preservation. By construction, $e_{k}$ is a linear combination of $v_{k}$ and ${e_{1}, \dots, e_{k - 1}}$ , which is a linear combination of $v_{k}$ and ${v_{1}, \dots, v_{k - 1}}$ by the inductive span equality. So $e_{k} \in span {v_{1}, \dots, v_{k}}$ , giving $span {e_{1}, \dots, e_{k}} \subseteq span {v_{1}, \dots, v_{k}}$ . Conversely, $v_{k} = u_{k} + \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j} = ∥ u_{k} ∥ e_{k} + \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j} \in span {e_{1}, \dots, e_{k}}$ , and the earlier $v_{i}$ for $i < k$ are in $span {e_{1}, \dots, e_{k - 1}} \subseteq span {e_{1}, \dots, e_{k}}$ by the inductive span equality. So $span {v_{1}, \dots, v_{k}} \subseteq span {e_{1}, \dots, e_{k}}$ , and the two spans coincide.

The induction terminates at $k = n$ with an orthonormal sequence whose span matches that of the input. When the input is a basis of $V$ , the output is an orthonormal basis. Since every finite-dimensional vector space has a basis (Steinitz exchange, 01.01.04), every finite-dimensional inner-product space has an orthonormal basis. $□$

Bridge. Gram-Schmidt builds toward every theorem in finite-dimensional inner-product geometry. The foundational reason it works is exactly the iterated projection picture: subtracting the projection onto an already-chosen orthonormal partial frame kills the overlap and leaves the perpendicular component, which is then normalised to length one. This is exactly the construction that appears again in 01.01.12 (singular value decomposition), where the spectral theorem on $A^{*} A$ produces an orthonormal eigenbasis whose role is identical to the orthonormal basis produced here. The central insight is that finite-dimensional inner-product spaces are flexible enough to always admit an orthonormal basis, and Gram-Schmidt is the constructive proof. Putting these together, the existence of an orthonormal basis identifies $V$ with $F^{n}$ as an inner-product space, generalises to the Iwasawa $GL_{n} = O_{n} \cdot Triag_{n}^{+}$ decomposition of the general linear group, and is dual to the polar decomposition of operators. The bridge is the QR factorisation $A = QR$ , which writes Gram-Schmidt at the matrix level and supplies the algorithmic content for least-squares regression and numerical conditioning of linear systems.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Apply Gram-Schmidt to the sequence $v_{1} = (1, 1, 0)$ , $v_{2} = (1, 0, 1)$ , $v_{3} = (0, 1, 1)$ in $R^{3}$ with the standard inner product and write out $e_{1}, e_{2}, e_{3}$ explicitly.

Hint

After Step 1, $e_{1} = (1, 1, 0) / 2$ . Compute $⟨ v_{2}, e_{1} ⟩$ , subtract the projection, then normalise; repeat for $v_{3}$ .

Answer

Step 1: $∥ v_{1} ∥ = 2$ , so $e_{1} = (1, 1, 0) / 2$ . Step 2: $⟨ v_{2}, e_{1} ⟩ = (1 + 0 + 0) / 2 = 1/ 2$ . The component along $e_{1}$ is $(1/ 2) \cdot e_{1} = (1/2, 1/2, 0)$ . The perpendicular part is $u_{2} = (1, 0, 1) - (1/2, 1/2, 0) = (1/2, - 1/2, 1)$ . Then $∥ u_{2} ∥^{2} = 1/4 + 1/4 + 1 = 3/2$ , so $∥ u_{2} ∥ = 3/2 = 6 /2$ and $e_{2} = (1/2, - 1/2, 1) / (6 /2) = (1, - 1, 2) / 6$ . Step 3: $⟨ v_{3}, e_{1} ⟩ = (0 + 1 + 0) / 2 = 1/ 2$ , and $⟨ v_{3}, e_{2} ⟩ = (0 - 1 + 2) / 6 = 1/ 6$ . The combined projection is $(1/ 2) e_{1} + (1/ 6) e_{2} = (1/2, 1/2, 0) + (1/6, - 1/6, 2/6) = (2/3, 1/3, 1/3)$ . The perpendicular part is $u_{3} = (0, 1, 1) - (2/3, 1/3, 1/3) = (- 2/3, 2/3, 2/3)$ . Then $∥ u_{3} ∥^{2} = 4/9 + 4/9 + 4/9 = 4/3$ and $∥ u_{3} ∥ = 2/ 3$ , so $e_{3} = (- 2/3, 2/3, 2/3) / (2/ 3) = (- 1, 1, 1) / 3$ . Final orthonormal basis: $e_{1} = (1, 1, 0) / 2$ , $e_{2} = (1, - 1, 2) / 6$ , $e_{3} = (- 1, 1, 1) / 3$ . Verification: $⟨ e_{i}, e_{j} ⟩ = δ_{ij}$ by direct dot products. Rubric: full credit for all three $e_{k}$ with denominators $2$ , $6$ , $3$ and verified orthonormality.

Exercise 4 (medium, symbolic).

In $V = R [x]_{\leq 2}$ with $⟨ f, g ⟩ = \int_{- 1}^{1} f (x) g (x) d x$ , apply Gram-Schmidt to the monomials $v_{1} = 1$ , $v_{2} = x$ , $v_{3} = x^{2}$ and report the orthonormalised polynomials.

Hint

Compute the relevant inner products: $\int_{- 1}^{1} 1 d x = 2$ , $\int_{- 1}^{1} x d x = 0$ , $\int_{- 1}^{1} x^{2} d x = 2/3$ , $\int_{- 1}^{1} x^{3} d x = 0$ , $\int_{- 1}^{1} x^{4} d x = 2/5$ .

Answer

Step 1: $∥ v_{1} ∥^{2} = \int_{- 1}^{1} 1 d x = 2$ , so $e_{1} = 1/ 2$ . Step 2: $⟨ v_{2}, e_{1} ⟩ = (1/ 2) \int_{- 1}^{1} x d x = 0$ , so $u_{2} = x$ and $∥ u_{2} ∥^{2} = \int_{- 1}^{1} x^{2} d x = 2/3$ , giving $∥ u_{2} ∥ = 2/3$ and $e_{2} = x / 2/3 = 3/2 x$ . Step 3: $⟨ v_{3}, e_{1} ⟩ = (1/ 2) \int_{- 1}^{1} x^{2} d x = (1/ 2) (2/3) = 2 /3$ , and $⟨ v_{3}, e_{2} ⟩ = 3/2 \int_{- 1}^{1} x^{3} d x = 0$ . So $u_{3} = x^{2} - (2 /3) e_{1} = x^{2} - (2 /3) (1/ 2) = x^{2} - 1/3$ . Then $∥ u_{3} ∥^{2} = \int_{- 1}^{1} (x^{2} - 1/3)^{2} d x = \int_{- 1}^{1} (x^{4} - (2/3) x^{2} + 1/9) d x = 2/5 - (2/3) (2/3) + 2/9 = 2/5 - 4/9 + 2/9 = 2/5 - 2/9 = (18 - 10) /45 = 8/45$ . So $∥ u_{3} ∥ = 8/45 = 22 / (35) = 2 2/45$ . The third orthonormal polynomial is $e_{3} = (x^{2} - 1/3) / 8/45 = 45/8 (x^{2} - 1/3) = (35 / (22)) (x^{2} - 1/3) = 5/8 (3 x^{2} - 1)$ after multiplying numerator and denominator by $3$ . Final normalised Legendre polynomials: $e_{1} = 1/ 2$ , $e_{2} = 3/2 x$ , $e_{3} = 5/8 (3 x^{2} - 1)$ . These match the standard Legendre polynomials $P_{0} (x) = 1, P_{1} (x) = x, P_{2} (x) = (3 x^{2} - 1) /2$ up to the normalisation factors $(2 k + 1) /2$ that make them unit-norm in the $L^{2} [- 1, 1]$ inner product. Rubric: full credit for all three $e_{k}$ with correct numerical coefficients.

Exercise 5 (medium, short-answer).

Prove the Cauchy-Schwarz inequality $∣ ⟨ v, w ⟩ ∣ \leq ∥ v ∥ ∥ w ∥$ for any two vectors $v, w$ in a real inner-product space, with equality iff ${v, w}$ is linearly dependent.

Hint

Expand $∥ v - tw ∥^{2} \geq 0$ as a quadratic in the real variable $t$ and use that its discriminant is non-positive.

Answer

If $w = 0$ , both sides vanish and equality holds with ${v, w}$ dependent. Assume $w \neq = 0$ . For every real $t$ , $$ 0 \le |v - tw|^2 = \langle v, v \rangle - 2 t \langle v, w \rangle + t^2 \langle w, w \rangle. $$ This is a quadratic in $t$ with leading coefficient $⟨ w, w ⟩ > 0$ , so it is non-negative for all $t$ iff its discriminant is non-positive: $$ (2 \langle v, w \rangle)^2 - 4 \langle v, v \rangle \langle w, w \rangle \le 0. $$ Dividing by $4$ and taking square roots gives $∣ ⟨ v, w ⟩ ∣ \leq ∥ v ∥ ∥ w ∥$ . Equality holds iff the discriminant vanishes, iff the quadratic has a real double root $t_{0}$ , iff $v - t_{0} w = 0$ , iff ${v, w}$ is linearly dependent. The complex case is the same with $t$ allowed complex and the inequality applied to $v - (⟨ v, w ⟩ /∥ w ∥^{2}) w$ . Rubric: full credit for the discriminant argument plus the equality clause.

Exercise 6 (medium, short-answer).

Show that the QR decomposition exists for every full-column-rank matrix $A \in R^{m \times n}$ with $m \geq n$ : there exist $Q \in R^{m \times n}$ with $Q^{T} Q = I_{n}$ and an upper triangular $R \in R^{n \times n}$ with positive diagonal entries such that $A = QR$ . Identify $r_{k k}$ geometrically.

Hint

Apply Gram-Schmidt to the columns of $A$ . Read off $r_{k k}$ from the normalisation step.

Answer

Let $A = [v_{1} ∣ \dots ∣ v_{n}]$ . Apply Gram-Schmidt: $u_{k} = v_{k} - \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j}$ , $e_{k} = u_{k} /∥ u_{k} ∥$ . By the span-preservation lemma, each $v_{k}$ is a linear combination of ${e_{1}, \dots, e_{k}}$ , explicitly $$ v_k = |u_k| , e_k + \sum_{j < k} \langle v_k, e_j \rangle , e_j. $$ Set $Q = [e_{1} ∣ \dots ∣ e_{n}]$ and define $R \in R^{n \times n}$ by $r_{k k} = ∥ u_{k} ∥$ on the diagonal, $r_{j k} = ⟨ v_{k}, e_{j} ⟩$ for $j < k$ (strictly upper triangle), and $r_{j k} = 0$ for $j > k$ (strictly lower triangle). Then $A = QR$ by the column-by-column identity above. Orthonormality of the columns of $Q$ is the conclusion of Gram-Schmidt. Each $r_{k k} = ∥ u_{k} ∥$ is the distance from $v_{k}$ to $span {v_{1}, \dots, v_{k - 1}}$ (the length of the perpendicular component), and positivity follows from linear independence. Rubric: full credit for the column-by-column formula plus the geometric identification of $r_{k k}$ .

Exercise 7 (hard, short-answer).

State and prove the uniqueness of the QR decomposition under the constraint that $R$ has positive diagonal entries.

Hint

Suppose $A = Q_{1} R_{1} = Q_{2} R_{2}$ , multiply by $Q_{2}^{T}$ on the left, and observe that the product $Q_{2}^{T} Q_{1}$ has orthonormal columns and is also $R_{2} R_{1}^{- 1}$ — upper triangular with positive diagonal.

Answer

Suppose $A = Q_{1} R_{1} = Q_{2} R_{2}$ with $Q_{i}$ having orthonormal columns and $R_{i}$ upper triangular with positive diagonal. Multiply the equality $Q_{1} R_{1} = Q_{2} R_{2}$ on the left by $Q_{2}^{T}$ (using $Q_{2}^{T} Q_{2} = I_{n}$ and $Q_{2}^{T} A = R_{2}$ ): the result is $Q_{2}^{T} Q_{1} R_{1} = R_{2}$ , so $Q_{2}^{T} Q_{1} = R_{2} R_{1}^{- 1}$ . The left side has orthonormal columns: since $Q_{2}^{T} Q_{1} \in R^{n \times n}$ and $(Q_{2}^{T} Q_{1})^{T} (Q_{2}^{T} Q_{1}) = Q_{1}^{T} Q_{2} Q_{2}^{T} Q_{1}$ . The matrix $Q_{2} Q_{2}^{T}$ is the orthogonal projector onto $col (Q_{2}) = col (A)$ , and $Q_{1}$ also has columns in $col (A)$ , so $Q_{2} Q_{2}^{T} Q_{1} = Q_{1}$ , giving $(Q_{2}^{T} Q_{1})^{T} (Q_{2}^{T} Q_{1}) = Q_{1}^{T} Q_{1} = I_{n}$ . So $Q_{2}^{T} Q_{1}$ is orthogonal. The right side $R_{2} R_{1}^{- 1}$ is upper triangular with positive diagonal (product of two such). An orthogonal upper-triangular matrix with positive diagonal is the identity: orthogonality forces each diagonal entry to have $∣ r_{k k} ∣ = 1$ and positivity forces $r_{k k} = 1$ ; column-orthogonality then forces strictly-upper entries to vanish by an induction in column index. So $Q_{2}^{T} Q_{1} = I_{n}$ , giving $Q_{1} = Q_{2}$ and $R_{1} = Q_{1}^{T} A = Q_{2}^{T} A = R_{2}$ . Rubric: full credit for the orthogonal-upper-triangular-positive-diagonal-is-identity step plus the explicit derivation.

Exercise 8 (hard, short-answer).

Define the Gram determinant $G (v_{1}, \dots, v_{n}) = det (⟨ v_{i}, v_{j} ⟩)_{i, j = 1}^{n}$ for vectors in an inner-product space. Show that $G (v_{1}, \dots, v_{n}) = \prod_{k = 1}^{n} ∥ u_{k} ∥^{2}$ , where $u_{k}$ is the perpendicular component produced by Gram-Schmidt, and conclude that $G \geq 0$ with equality iff ${v_{1}, \dots, v_{n}}$ is linearly dependent.

Hint

Write the matrix of inner products in the Gram-Schmidt-derived QR form. The Gram matrix factors as $R^{T} R$ over the reals.

Answer

Stack the vectors as columns: $V = [v_{1} ∣ \dots ∣ v_{n}] \in R^{m \times n}$ (or with conjugate transpose in the complex case). The Gram matrix is $G = V^{T} V$ with entries $G_{ij} = ⟨ v_{i}, v_{j} ⟩$ . If ${v_{1}, \dots, v_{n}}$ is linearly independent, the QR decomposition gives $V = QR$ with $Q^{T} Q = I_{n}$ and $R$ upper triangular with positive diagonal, so $G = V^{T} V = R^{T} Q^{T} QR = R^{T} R$ . Then $det G = (det R)^{2} = \prod_{k} r_{k k}^{2} = \prod_{k} ∥ u_{k} ∥^{2}$ , since $r_{k k} = ∥ u_{k} ∥$ by the QR derivation in Exercise 6. Each $∥ u_{k} ∥^{2} > 0$ under linear independence, so $G > 0$ . If ${v_{1}, \dots, v_{n}}$ is linearly dependent, some $v_{k}$ lies in $span {v_{1}, \dots, v_{k - 1}}$ , the corresponding $u_{k} = 0$ , $R$ becomes rank-deficient, $det R = 0$ , and $det G = 0$ . Geometric interpretation: $G$ is the volume of the $n$ -dimensional parallelepiped spanned by ${v_{1}, \dots, v_{n}}$ , which vanishes iff the parallelepiped collapses to lower dimension. Rubric: full credit for the $G = R^{T} R$ factorisation plus the dependence characterisation.

Advanced results Master

Theorem (orthogonal projection; Shilov §8.5). Let $W \subseteq V$ be a subspace of a finite-dimensional inner-product space and ${e_{1}, \dots, e_{k}}$ an orthonormal basis of $W$ . The map $$ P_W : V \to W, \qquad P_W(v) = \sum_{i=1}^k \langle v, e_i \rangle , e_i $$ is the unique linear map satisfying $P_{W}^{2} = P_{W}$ , $im (P_{W}) = W$ , and $⟨ P_{W} v, w ⟩ = ⟨ v, P_{W} w ⟩$ for all $v, w \in V$ . The orthogonal complement $W^{⊥} = {v \in V : ⟨ v, w ⟩ = 0 for all w \in W}$ is a subspace, and $V = W \oplus W^{⊥}$ as inner-product spaces, with the splitting realised by $v = P_{W} (v) + (v - P_{W} (v))$ . The map $P_{W}$ minimises $∥ v - w ∥$ over $w \in W$ — the closest point in $W$ to $v$ is $P_{W} (v)$ , and the minimum distance squared is $∥ v ∥^{2} - ∥ P_{W} (v) ∥^{2}$ .

The projection-and-complement decomposition is the geometric content of Gram-Schmidt: each $u_{k}$ in the recursion is precisely the projection of $v_{k}$ onto the orthogonal complement of $span {v_{1}, \dots, v_{k - 1}}$ , and the recursion successively builds the projector onto the running partial span.

Theorem (Bessel's inequality and Parseval's identity). Let $V$ be a finite-dimensional inner-product space with orthonormal basis ${e_{1}, \dots, e_{n}}$ . For every $v \in V$ , $$ \sum_{i=1}^n |\langle v, e_i \rangle|^2 = |v|^2, $$ the Parseval identity. For every orthonormal subset ${e_{1}, \dots, e_{k}}$ with $k \leq n$ , $$ \sum_{i=1}^k |\langle v, e_i \rangle|^2 \le |v|^2, $$ the Bessel inequality, with equality iff $v \in span {e_{1}, \dots, e_{k}}$ . Each inner product $⟨ v, e_{i} ⟩$ is the $i$ -th Fourier coefficient of $v$ in the orthonormal basis.

Parseval gives the unitary identification $V ≅ F^{n}$ as inner-product spaces via the orthonormal-basis isomorphism $v \mapsto (⟨ v, e_{i} ⟩)$ , the finite-dimensional analogue of the $L^{2}$ unitary identification underlying classical Fourier analysis.

Theorem (QR decomposition; Golub-Van Loan Ch. 5). Let $A \in F^{m \times n}$ with $m \geq n$ and full column rank. There exist a matrix $Q \in F^{m \times n}$ with orthonormal columns ($Q^ Q = I_n $) an d an u pp er t r ian g u l a r ma t r i x$ R \in F^{n \times n} $w i t h p os i t i v er e a l d ia g o na l s u c h t ha t$ A = Q R $. T h e d eco m p os i t i o ni s u ni q u e u n d er t h e p os i t i v e - d ia g o na l co n s t r ain t . I nma t r i x f or m, a ppl y in g t h e G r am - S c hmi d t r ec u r s i o n co l u mn - b y - co l u mn t o$ A $, t h eor t h o n or ma l v ec t or s f or m t h eco l u mn so f$ Q $an d t h e inn er p r o d u c t s$ \langle v_k, e_j \rangle $f or$ j < k $t o g e t h er w i t h$ |u_k| $o n t h e d ia g o na l f or m$ R$.*

The QR decomposition is the algorithmic embodiment of Gram-Schmidt and is the foundation of every linear least-squares method: $min_{x} ∥ A x - b ∥^{2}$ has unique solution $x = R^{- 1} Q^{*} b$ , computed by one back-substitution. Numerical-analysis variants (modified Gram-Schmidt, Householder reflectors, Givens rotations) produce the same $Q, R$ in exact arithmetic but with very different numerical behaviour in finite precision.

Theorem (modified Gram-Schmidt and numerical stability; Björck 1967). Classical Gram-Schmidt loses orthogonality in finite-precision arithmetic in proportion to the condition number $κ (A) = σ_{m a x} / σ_{m i n}$ : after one pass through the algorithm in double precision, the computed $\hat{Q}$ may have $|\hat Q^ \hat Q - I| \sim \kappa(A) \cdot \epsilon_{\mathrm{machine}} $, w hi c h c anb eor d er o n e f or m o d er a t e l y i l l - co n d i t i o n e d$ A $. T h e m o d i f i e d G r am - S c hmi d t o f B j \overset{o}{¨} r c k (1967) r eor g ani ses t h er ec u r s i o n : a t s t e p$ k $, in s t e a d o f co m p u t in g a l l coe f f i c i e n t s$ \langle v_k, e_j \rangle $a t o n ce, t h er u nnin g r es i d u a l$ u_k^{(0)} = v_k $i s u p d a t e d co l u mn - b y - co l u mn v ia$ u_k^{(j+1)} = u_k^{(j)} - \langle u_k^{(j)}, e_{j+1} \rangle e_{j+1} $, so t ha t e a c h s u b t r a c t i o n u ses t h e m os t r ece n tl y co m p u t e d r es i d u a l . I n f ini t e - p r ec i s i o na r i t hm e t i c, t h e m o d i f i e d p r oce d u r e ha s l oss - o f - or t h o g o na l i t y b o u n d$ |\hat Q^* \hat Q - I| \sim \sqrt{\kappa(A)} \cdot \epsilon_{\mathrm{machine}} $, d r ama t i c a l l y b e tt er t han c l a ss i c a l G r am - S c hmi d t f or m o d er a t e l y co n d i t i o n e d$ A$. Reorthogonalisation (one additional pass) restores full orthogonality at the cost of a factor of two in work.*

The modified procedure is mathematically identical to the classical one in exact arithmetic and produces the same $Q$ and $R$ , but the order of operations matters in floating point. Modern numerical-linear-algebra codes default to Householder QR (which is backwards-stable independent of conditioning) for general problems and to modified Gram-Schmidt for problems where the orthonormal vectors must be produced incrementally (Krylov-subspace methods, Arnoldi iteration).

Theorem (Householder reflectors as QR; Householder 1958). For every non-zero vector $x \in F^{m}$ there exists a unitary matrix $H_{x}$ (the Householder reflector associated to $x$ ) of the form $H = I - 2 (w w^)/(w^* w) $f or so m e n o n - z er o$ w $d e p e n d in g o n$ x $, s u c h t ha t$ H_x x = \pm |x| e_1 $f or t h es t an d a r d u ni t v ec t or$ e_1 $. S u ccess i v e l y a ppl y in g H o u se h o l d er r e f l ec t or s t o t h eco l u mn so f$ A $— z er o in g t h es u b - d ia g o na l o f o n eco l u mna t a t im e — p r o d u ces a u ni t a r y$ H_n \cdots H_1 $s u c h t ha t$ H_n \cdots H_1 A $i s u pp er t r ian g u l a r . T h e p r o d u c t$ Q = H_1^* \cdots H_n^ $i s t h eor t h o g o na l f a c t or in t h e QR d eco m p os i t i o n . E a c h H o u se h o l d er s t e p ha s ba c k w a r d s - er r or b o u n d p r o p or t i o na l t o$ \epsilon_{\mathrm{machine}} $a l o n e, in d e p e n d e n t o f$ \kappa(A)$.

Householder QR is the algorithm of choice for dense least-squares problems. Givens rotations (Givens 1958) are the analogous tool for sparse problems, where the zeroing is done one entry at a time by a $2 \times 2$ rotation acting on a chosen pair of rows; the workload is $O (n)$ per zero rather than $O (m)$ , advantageous when most entries are already zero and only a few sub-diagonals need annihilating.

Theorem (Cholesky factorisation as right-looking Gram-Schmidt). Let $A \in F^{n \times n}$ be positive definite (Hermitian with $⟨ A v, v ⟩ > 0$ for all $v \neq = 0$ ). There exists a unique lower triangular matrix $L \in F^{n \times n}$ with positive diagonal entries such that $A = L L^ $(t h e * * C h o l es k y f a c t or i s a t i o n * *) . E q u i v a l e n tl y, i f$ A $i s t h e G r amma t r i x$ (\langle v_i, v_j \rangle) $o f a l in e a r l y in d e p e n d e n t se q u e n ce$ {v_1, \ldots, v_n} $, t h e n$ L $i s t h e G r am - ma t r i x - s i d e ima g eo f t h e l o w er - t r ian g u l a r c han g e - o f - ba s i s ma t r i x f r o m$ {v_i} $t o t h e G r am - S c hmi d t o u tp u t$ {e_i}$.*

Cholesky is "Gram-Schmidt without the vectors": it operates on the Gram matrix $A = V^{*} V$ directly without ever materialising the underlying vectors $V$ . In numerical analysis, the Cholesky factorisation is the preferred route to solving positive-definite linear systems (twice as fast as $LU$ on a general matrix) and is the workhorse of Gaussian-elimination-based estimation in statistics, optimisation, and finite-element methods.

Theorem (separable Hilbert space and the orthonormal basis). Let $H$ be a separable Hilbert space (an inner-product space whose induced metric is complete and that admits a countable dense subset). Then $H$ admits a countable orthonormal basis ${e_{n}}_{n \in N}$ in the sense that every $v \in H$ has a Fourier expansion $v = \sum_{n} ⟨ v, e_{n} ⟩ e_{n}$ , convergent in norm, with $∥ v ∥^{2} = \sum_{n} ∣ ⟨ v, e_{n} ⟩ ∣^{2}$ (Parseval). The Gram-Schmidt procedure applied to any countable linearly independent dense set produces such an orthonormal basis.

The infinite-dimensional case is the natural setting for classical Fourier analysis and quantum mechanics. The classical-orthogonal-polynomial families — Legendre on $[- 1, 1]$ with constant weight, Hermite on $R$ with $e^{- x^{2}}$ weight, Laguerre on $[0, \infty)$ with $e^{- x}$ weight, Chebyshev on $[- 1, 1]$ with $(1 - x^{2})^{- 1/2}$ weight — are all Gram-Schmidt outputs on the monomial basis ${1, x, x^{2}, \dots}$ under the corresponding weighted $L^{2}$ inner products, and they form complete orthonormal bases of the relevant $L^{2}$ spaces.

Synthesis. Gram-Schmidt is the foundational reason every finite-dimensional inner-product space admits an orthonormal basis, and the QR decomposition is exactly the matrix-level packaging of that procedure. The central insight is that orthonormality is constructively reachable from any linearly independent input by iterating projection-and-rescale, and the order in which inputs are processed determines the span filtration of the output. Putting these together, the QR decomposition, the Cholesky factorisation, the Householder and Givens algorithms, and the modified Gram-Schmidt of Björck are different presentations of one structural fact: the matrix group $GL_{n} (F)$ admits the Iwasawa decomposition $GL_{n} (F) = K \cdot A \cdot N$ with $K$ the unitary group, $A$ the diagonal-positive subgroup, and $N$ the strictly-upper-triangular unipotent subgroup, and Gram-Schmidt is the constructive proof of this decomposition. This is exactly the bridge that appears again in 01.01.12 (singular value decomposition), where the spectral theorem on $A^{*} A$ produces the orthonormal singular-vector frame that plays the role of the Gram-Schmidt output, and the resulting $A = U Σ V^{*}$ generalises $A = QR$ to arbitrary rectangular matrices. The bridge is that QR identifies $GL_{n} (F)$ with the homogeneous space $GL_{n} (F) / N$ on which the unitary group $K$ acts simply transitively from the left; SVD does the same for $GL_{n} (F)$ as a double coset $K \ GL_{n} (F) / K$ .

The duality between the geometric statement (orthonormal basis) and the algebraic statement (QR factorisation) is a recurring pattern. The geometric statement says every finite-dimensional inner-product space is unitarily isomorphic to $F^{n}$ with the standard inner product. The algebraic statement says every full-rank rectangular matrix factors as orthonormal-times-triangular. Both are content-free in the language of abstract algebra, both have constructive proofs via Gram-Schmidt, and both have numerically realised algorithms in computational linear algebra. The numerical and abstract presentations diverge in finite precision: classical Gram-Schmidt is unstable, the modified version is better, and Householder is best, even though all three compute the same factorisation in exact arithmetic. The numerical-analysis side of this story — Björck's backward-error analysis, the loss-of-orthogonality bound $κ (A) \cdot ϵ$ , the rounding-error theory of Wilkinson — is the foundational reason QR-based least squares replaced normal-equation least squares in scientific computing in the 1960s.

Full proof set Master

Proposition (Cauchy-Schwarz with full equality clause). In an inner-product space $(V, ⟨ \cdot, \cdot ⟩)$ over $F = R$ or $C$ , $$ |\langle v, w \rangle| \le |v| , |w| $$ for all $v, w \in V$ , with equality iff ${v, w}$ is linearly dependent.

Proof. If $w = 0$ , both sides vanish and ${v, w}$ is dependent. Assume $w \neq = 0$ . Set $λ = ⟨ v, w ⟩ / ⟨ w, w ⟩ \in F$ and compute $$ 0 \le |v - \lambda w|^2 = \langle v - \lambda w, v - \lambda w \rangle = \langle v, v \rangle - \lambda \langle w, v \rangle - \overline{\lambda} \langle v, w \rangle + |\lambda|^2 \langle w, w \rangle. $$ Substituting $λ ⟨ w, v ⟩ = (⟨ v, w ⟩ / ⟨ w, w ⟩) \overline{⟨ v, w ⟩} = ∣ ⟨ v, w ⟩ ∣^{2} / ⟨ w, w ⟩$ and analogously for the third term gives $$ 0 \le \langle v, v \rangle - 2 \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle} + \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle} = \langle v, v \rangle - \frac{|\langle v, w \rangle|^2}{\langle w, w \rangle}. $$ Multiply by $⟨ w, w ⟩ > 0$ to obtain $∣ ⟨ v, w ⟩ ∣^{2} \leq ⟨ v, v ⟩ ⟨ w, w ⟩ = ∥ v ∥^{2} ∥ w ∥^{2}$ . Taking square roots gives the inequality. Equality holds iff $v - λ w = 0$ , iff $v$ is a scalar multiple of $w$ , iff ${v, w}$ is linearly dependent. $□$

Proposition (triangle inequality from Cauchy-Schwarz). For all $v, w$ in an inner-product space, $∥ v + w ∥ \leq ∥ v ∥ + ∥ w ∥$ .

Proof. Compute $$ |v + w|^2 = \langle v + w, v + w \rangle = |v|^2 + \langle v, w \rangle + \langle w, v \rangle + |w|^2 = |v|^2 + 2 , \mathrm{Re}\langle v, w \rangle + |w|^2. $$ The real part bound $Re ⟨ v, w ⟩ \leq ∣ ⟨ v, w ⟩ ∣ \leq ∥ v ∥∥ w ∥$ (Cauchy-Schwarz) gives $$ |v + w|^2 \le |v|^2 + 2 |v| |w| + |w|^2 = (|v| + |w|)^2. $$ Taking non-negative square roots gives the triangle inequality. $□$

Proposition (Gram-Schmidt theorem; full statement). Let $(V, ⟨ \cdot, \cdot ⟩)$ be a finite-dimensional inner-product space and ${v_{1}, \dots, v_{n}} \subseteq V$ linearly independent. The recursion $u_{k} = v_{k} - \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j}$ , $e_{k} = u_{k} /∥ u_{k} ∥$ produces an orthonormal ${e_{1}, \dots, e_{n}}$ with $span {e_{1}, \dots, e_{k}} = span {v_{1}, \dots, v_{k}}$ for each $k$ .

Proof. Given in the Intermediate-tier key theorem. The induction on $k$ uses linear independence at each step to ensure $u_{k} \neq = 0$ , computes $⟨ u_{k}, e_{i} ⟩ = 0$ for $i < k$ by direct expansion using orthonormality of the previous $e_{j}$ , and verifies the span equality by writing each side as a linear combination of the other's generators. $□$

Proposition (QR existence and uniqueness). For $A \in F^{m \times n}$ with $m \geq n$ and full column rank, there exist unique $Q \in F^{m \times n}$ with $Q^ Q = I_n $an d$ R \in F^{n \times n} $u pp er t r ian g u l a r w i t h p os i t i v er e a l d ia g o na l s u c h t ha t$ A = Q R$.*

Proof. Existence. Apply Gram-Schmidt to the columns of $A$ . By the previous proposition, the orthonormal output ${e_{1}, \dots, e_{n}}$ satisfies $v_{k} = ∥ u_{k} ∥ e_{k} + \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j}$ . Stack the $e_{k}$ as columns of $Q$ and define $R$ by $r_{k k} = ∥ u_{k} ∥$ on the diagonal, $r_{j k} = ⟨ v_{k}, e_{j} ⟩$ for $j < k$ , and $r_{j k} = 0$ for $j > k$ . Then $A = QR$ column-by-column, $Q^{*} Q = I_{n}$ by orthonormality, and $r_{k k} > 0$ by linear independence.

Uniqueness. Suppose $A = Q_{1} R_{1} = Q_{2} R_{2}$ are two such factorisations. Then $Q_{2}^{*} Q_{1} = R_{2} R_{1}^{- 1}$ . The right side is upper triangular with positive diagonal (product of two upper triangulars with positive diagonal). The left side has orthonormal columns: $(Q_{2}^{*} Q_{1})^{*} (Q_{2}^{*} Q_{1}) = Q_{1}^{*} Q_{2} Q_{2}^{*} Q_{1} = Q_{1}^{*} Q_{1} = I_{n}$ (using $Q_{2} Q_{2}^{*} Q_{1} = Q_{1}$ , which holds because $Q_{2} Q_{2}^{*}$ is the orthogonal projector onto $col (A)$ and $Q_{1}$ has columns in that subspace). So $Q_{2}^{*} Q_{1}$ is an $n \times n$ unitary matrix that is upper triangular with positive real diagonal. Lemma: such a matrix is the identity. Proof of lemma: write $M = Q_{2}^{*} Q_{1}$ . Unitarity gives $\sum_{k} ∣ m_{ik} ∣^{2} = 1$ for each row $i$ . Upper-triangularity gives $m_{ik} = 0$ for $k < i$ , so $∣ m_{ii} ∣^{2} + \sum_{k > i} ∣ m_{ik} ∣^{2} = 1$ and $∣ m_{ii} ∣ \leq 1$ . Column-unitarity gives $\sum_{i} ∣ m_{ik} ∣^{2} = 1$ , and the same upper-triangular constraint forces $\sum_{i \leq k} ∣ m_{ik} ∣^{2} = 1$ , so $∣ m_{k k} ∣^{2} \leq 1$ . Combined: each $∣ m_{k k} ∣ = 1$ , hence $m_{k k} = 1$ (positive real diagonal). Then $\sum_{k > i} ∣ m_{ik} ∣^{2} = 0$ , so $m_{ik} = 0$ for $k > i$ , and the matrix is the identity. From $Q_{2}^{*} Q_{1} = I_{n}$ we get $Q_{1} = Q_{2}$ and then $R_{1} = Q_{1}^{*} A = Q_{2}^{*} A = R_{2}$ . $□$

Proposition (existence of orthonormal basis; corollary). Every finite-dimensional inner-product space $V$ admits an orthonormal basis.

Proof. Pick any basis ${v_{1}, \dots, v_{n}}$ of $V$ (existence by 01.01.04). Apply Gram-Schmidt. The output ${e_{1}, \dots, e_{n}}$ is orthonormal and spans $V$ since its span equals $span {v_{1}, \dots, v_{n}} = V$ . A spanning orthonormal set in an $n$ -dimensional space is a basis (orthonormal implies linearly independent, $n$ vectors in an $n$ -space). $□$

Connections Master

Vector space 01.01.03. A finite-dimensional inner-product space is a finite-dimensional vector space carrying the extra structure of a positive-definite symmetric (or conjugate-symmetric) bilinear pairing. Every inner-product-space construction in this unit reduces to vector-space operations together with one extra map $V \times V \to F$ , and the orthonormal basis produced by Gram-Schmidt is in particular a basis in the sense of 01.01.04.
Subspace, basis, dimension 01.01.04. Gram-Schmidt is the constructive proof that every finite-dimensional inner-product space admits an orthonormal basis. The existence of some basis is the Steinitz exchange theorem from 01.01.04; Gram-Schmidt upgrades the existence statement to a constructive recipe in the presence of an inner product, and the span-preservation property at each step refines the bare basis statement to a flag-of-subspaces statement.
Bilinear and quadratic forms 01.01.15. An inner product is a positive-definite symmetric (or conjugate-symmetric) bilinear form, and the norm $∥ v ∥ = ⟨ v, v ⟩$ is the square root of the associated quadratic form. The classification of bilinear forms over $R$ by signature (Sylvester's law of inertia) singles out the positive-definite case as the unique signature $(n, 0)$ class, and Gram-Schmidt is the diagonalisation procedure for that class.
Singular value decomposition 01.01.12. The SVD $A = U Σ V^{*}$ generalises the QR decomposition $A = QR$ from full-column-rank matrices to arbitrary rectangular matrices and replaces the upper-triangular factor $R$ by the diagonal $Σ$ . The orthonormal singular-vector frames $U$ and $V$ play the same structural role as the Gram-Schmidt output, and the existence proof of SVD uses the spectral theorem on $A^{*} A$ , which itself depends on the existence of an orthonormal eigenbasis — produced by Gram-Schmidt applied to the eigenspace decomposition.
Eigenvalue, eigenvector, characteristic polynomial 01.01.08. The finite-dimensional spectral theorem says every self-adjoint operator on a finite-dimensional inner-product space admits an orthonormal eigenbasis. The standard proof picks a normalised eigenvector, splits the space as $span {e_{1}} \oplus span {e_{1}}^{⊥}$ (orthogonal-complement decomposition supplied by Gram-Schmidt applied to a basis containing $e_{1}$ ), and iterates. Gram-Schmidt is the structural input.

Historical & philosophical context Master

The orthogonalisation procedure now associated with Gram and Schmidt has roots in the least-squares method of Legendre 1805 (Nouvelles méthodes pour la détermination des orbites des comètes) and Gauss 1809 (Theoria motus corporum coelestium). Pierre-Simon Laplace's 1820 Théorie analytique des probabilités used what is recognisably a Gram-Schmidt-like recursion to orthogonalise observation equations in the analysis of geodetic data, though the procedure was not isolated as an independent object. Jørgen Pedersen Gram's 1883 Journal für die reine und angewandte Mathematik paper "Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate" (94, 41–73) ^{[Gram 1883]} gave the procedure in the context of orthogonal polynomial expansions and the moment problem on the real line, with the explicit formula $u_{k} = v_{k} - \sum_{j < k} ⟨ v_{k}, e_{j} ⟩ e_{j}$ as the iterated-projection recursion. Gram's motivation was the least-squares fitting of polynomial approximations to empirical data; the orthogonalisation step was the device that decoupled the normal equations.

Erhard Schmidt's 1907 Mathematische Annalen paper "Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil" (63, 433–476) ^{[Schmidt 1907]} reframed the procedure in the abstract Hilbert-space setting that David Hilbert was then developing for the spectral theory of integral equations. Schmidt's paper introduced the orthonormal basis of $L^{2}$ as a foundational object, applied Gram-Schmidt to produce orthonormal bases out of arbitrary linearly independent sequences in Hilbert space, and used the resulting Fourier-coefficient expansion to diagonalise compact self-adjoint integral operators (the Hilbert-Schmidt theorem). The textbook-standard formulation of the procedure dates from this paper. The attribution "Gram-Schmidt" is conventional; both Gram and Schmidt acknowledged Laplace as a precursor, and the German-language tradition often called the procedure simply "Orthogonalisierungsverfahren."

The numerical-analysis story is twentieth-century. James H. Wilkinson's 1965 The Algebraic Eigenvalue Problem identified the loss-of-orthogonality phenomenon in floating-point Gram-Schmidt as a serious obstacle to scientific computation. Alston Householder's 1958 Journal of the ACM paper "Unitary triangularization of a nonsymmetric matrix" (5, 339–342) ^{[Householder 1958]} introduced the reflector-based QR algorithm whose backward-stability bounds are independent of the conditioning of $A$ . Wallace Givens's 1958 Journal of the SIAM paper "Computation of plane unitary rotations transforming a general matrix to triangular form" (6, 26–50) ^{[Givens 1958]} introduced the rotation-based QR for sparse matrices. Åke Björck's 1967 BIT Numerical Mathematics paper "Solving linear least squares problems by Gram-Schmidt orthogonalization" (7, 1–21) ^{[Björck 1967]} presented the modified Gram-Schmidt procedure together with its backward-error analysis, establishing the $κ (A) \cdot ϵ_{machine}$ loss-of-orthogonality bound that justifies its use in Krylov-subspace methods. Golub-Van Loan's Matrix Computations (first edition 1983; fourth edition 2013) ^{[GolubVanLoan2013]} became the canonical synthesis. The QR algorithm for eigenvalue computation (Francis 1961, Kublanovskaya 1961) — distinct from the QR decomposition but sharing the orthogonal-triangular factorisation as its inner step — completed the place of orthonormalisation as the foundational primitive of numerical linear algebra.

Bibliography Master

@article{Gram1883,
  author  = {Gram, J. P.},
  title   = {Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate},
  journal = {Journal f\"ur die reine und angewandte Mathematik},
  volume  = {94},
  year    = {1883},
  pages   = {41--73}
}

@article{Schmidt1907,
  author  = {Schmidt, Erhard},
  title   = {Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil},
  journal = {Mathematische Annalen},
  volume  = {63},
  year    = {1907},
  pages   = {433--476}
}

@article{Householder1958,
  author  = {Householder, A. S.},
  title   = {Unitary triangularization of a nonsymmetric matrix},
  journal = {Journal of the ACM},
  volume  = {5},
  year    = {1958},
  pages   = {339--342}
}

@article{Givens1958,
  author  = {Givens, W.},
  title   = {Computation of plane unitary rotations transforming a general matrix to triangular form},
  journal = {Journal of the Society for Industrial and Applied Mathematics},
  volume  = {6},
  year    = {1958},
  pages   = {26--50}
}

@article{Bjorck1967,
  author  = {Bj\"orck, {\AA}ke},
  title   = {Solving linear least squares problems by Gram-Schmidt orthogonalization},
  journal = {BIT Numerical Mathematics},
  volume  = {7},
  year    = {1967},
  pages   = {1--21}
}

@book{GolubVanLoan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{HalmosFDVS,
  author    = {Halmos, Paul R.},
  title     = {Finite-Dimensional Vector Spaces},
  edition   = {2},
  publisher = {Van Nostrand},
  year      = {1958}
}

@book{ShilovLinearAlgebra,
  author    = {Shilov, Georgi E.},
  title     = {Linear Algebra},
  publisher = {Dover Publications},
  year      = {1977},
  note      = {Reprint of the 1971 Prentice-Hall edition; original Russian edition 1969}
}

@book{Axler2015,
  author    = {Axler, Sheldon},
  title     = {Linear Algebra Done Right},
  edition   = {3},
  publisher = {Springer},
  year      = {2015}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  year      = {1997}
}

@book{Bjorck1996,
  author    = {Bj\"orck, {\AA}ke},
  title     = {Numerical Methods for Least Squares Problems},
  publisher = {SIAM},
  year      = {1996}
}

@book{Wilkinson1965,
  author    = {Wilkinson, James H.},
  title     = {The Algebraic Eigenvalue Problem},
  publisher = {Clarendon Press},
  year      = {1965}
}

@article{Laplace1820,
  author  = {Laplace, Pierre-Simon},
  title   = {Th\'eorie analytique des probabilit\'es},
  journal = {3rd edition},
  publisher = {Courcier},
  year    = {1820}
}

Prerequisites

01.01.03
01.01.04
01.01.15

Tier anchors

beginner: 3Blue1Brown — *Essence of Linear Algebra* Ch. 9 (dot product and duality); Strang — *Introduction to Linear Algebra* Ch. 4 (orthogonality)
intermediate: Shilov — *Linear Algebra* §8.3–§8.5 (orthogonalisation procedure); Axler — *Linear Algebra Done Right* §6.B (Gram-Schmidt, orthonormal bases)
master: Shilov — *Linear Algebra* Ch. 8; Halmos — *Finite-Dimensional Vector Spaces* §I.6 and §II.13; Axler — *Linear Algebra Done Right* Ch. 6; Golub-Van Loan — *Matrix Computations* (4th ed.) Ch. 5; Trefethen-Bau — *Numerical Linear Algebra* Lectures 7–10; Björck — *Numerical Methods for Least Squares Problems* (SIAM, 1996) Ch. 2

References

images/Shilov-Linear-Algebra__4cbdee00cc.jpg · Shilov *Linear Algebra* — Fast Track archive cover; Ch. 8 inner-product spaces and Gram-Schmidt orthogonalisation procedure
linear-algebra_compress.pdf · linear-algebra (compressed reference) — orthogonal projection, Gram-Schmidt, and QR decomposition chapters
Gram, J. P. — Ueber die Entwickelung reeller Functionen in Reihen mittelst der Methode der kleinsten Quadrate · Journal für die reine und angewandte Mathematik 94 (1883), 41–73 — the original orthogonalisation recursion in the context of least-squares interpolation and moment problems
Schmidt, E. — Zur Theorie der linearen und nichtlinearen Integralgleichungen, I. Teil · Mathematische Annalen 63 (1907), 433–476 — orthonormalisation in the Hilbert-space theory of integral equations; the form now textbook-standard
Householder, A. S. — Unitary triangularization of a nonsymmetric matrix · Journal of the ACM 5 (1958), 339–342 — reflector-based QR decomposition as a backwards-stable alternative to classical Gram-Schmidt
Givens, W. — Computation of plane unitary rotations transforming a general matrix to triangular form · Journal of the SIAM 6 (1958), 26–50 — rotation-based QR for the same triangularisation task
Björck, Å. — Solving linear least squares problems by Gram-Schmidt orthogonalization · BIT Numerical Mathematics 7 (1967), 1–21 — modified Gram-Schmidt and its backward error analysis
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Ch. 5 (orthogonalization and least squares): classical and modified Gram-Schmidt, Householder, Givens, QR with column pivoting
Halmos, P. R. — Finite-Dimensional Vector Spaces (2nd ed.) · §I.6 (inner products), §II.13 (orthonormal bases and Gram-Schmidt) — the canonical functional-analytic packaging
Axler, S. — Linear Algebra Done Right (3rd ed.) · Ch. 6, §6.A inner products and norms, §6.B orthonormal bases and the Gram-Schmidt procedure

Estimated time

beginner: 18m
intermediate: 40m
master: 75m