43.03.02 · numerical-analysis / 03-direct-linear-solvers

Cholesky factorization and the symmetric positive-definite solve

shipped3 tiersLean: none

Anchor (Master): Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 10 (the backward-error analysis of Cholesky, the unconditional stability with growth factor 1, the SPD perturbation theory); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §4.2-4.4; Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 23

Intuition Beginner

Some matrices are special in a way that makes solving with them much easier. A symmetric positive-definite matrix is one that is its own mirror image across the diagonal, and that always returns a positive number when you sandwich any nonzero vector around it. These matrices show up everywhere: in the energy of a vibrating structure, in the covariance of a dataset, in the normal equations of a least-squares fit. For them, the general factorization of 43.03.01 can be sharpened into something cleaner and cheaper.

The clean version is the Cholesky factorization. Instead of two different triangular pieces $L$ and $U$ , a symmetric positive-definite matrix splits into a single triangular piece times its own transpose: $A = R^{⊤} R$ . One triangle does all the work, because the symmetry of $A$ is carried inside the single factor. You can think of $R$ as the square root of $A$ in a triangular shape — and like an ordinary square root of a positive number, it exists only because $A$ is positive in the right sense.

Because one factor does the job of two, Cholesky costs about half what general elimination costs. And there is a deeper bonus. With ordinary elimination you had to watch for tiny pivots and reorder rows to stay safe, the partial pivoting of 43.03.01. Cholesky never needs that. The positive-definite structure guarantees that the numbers you divide by are never dangerously small relative to the rest, so the method is safe for every such matrix, with no reordering and no luck involved.

There is even a free diagnostic hidden here. If you try to run Cholesky on a matrix and at some point you are asked to take the square root of a negative number, the matrix was not positive-definite after all. The factorization succeeds exactly when the matrix is the special kind, so running it is itself the test.

Visual Beginner

The picture is a square symmetric matrix folding into a single triangle: the upper-right triangle $R$ , which when multiplied by its own flipped copy rebuilds the whole square.

Read the table top to bottom. You compute the triangle one row at a time. The first diagonal entry of $R$ is the square root of the first diagonal entry of $A$ . Each later entry is found by undoing the contributions already accounted for and then, on the diagonal, taking a square root. Because $A$ is positive-definite, every square root is of a positive number, so the process never stalls.

stage	what you compute	why it is safe
start	the symmetric square $A$	$A$ returns positive values
first diagonal	square root of top-left entry	that entry is positive
first row of $R$	divide the top row by the first diagonal	the divisor is not tiny
next diagonal	subtract known parts, take square root	the leftover stays positive
finish	the full upper triangle $R$ with $A = R^{⊤} R$	every step stayed positive

The takeaway: where general elimination produced two triangles and needed pivoting to stay safe, the positive-definite structure collapses the answer into one triangle and removes the need to pivot at all.

Worked example Beginner

Let us factor a small symmetric positive-definite matrix by hand and watch the single triangle $R$ appear.

Take $$ A = \begin{pmatrix} 4 & 2 & 2 \ 2 & 5 & 7 \ 2 & 7 & 19 \end{pmatrix}. $$ This matrix is symmetric (it equals its own transpose) and, as the factorization will confirm, positive-definite. We build the upper triangle $R$ so that $A = R^{⊤} R$ .

Step 1. First diagonal entry. The top-left entry of $A$ is $4$ , so $R_{11} = 4 = 2$ .

Step 2. First row of $R$ . Divide the rest of the top row of $A$ by $R_{11}$ . So $R_{12} = 2/2 = 1$ and $R_{13} = 2/2 = 1$ .

Step 3. Second diagonal entry. Take the $(2, 2)$ entry of $A$ , which is $5$ , and subtract the square of what is already above it in that column of $R$ , namely $R_{12}^{2} = 1$ . Then $R_{22} = 5 - 1 = 4 = 2$ .

Step 4. Rest of the second row. The $(2, 3)$ entry of $A$ is $7$ . Subtract $R_{12} \times R_{13} = 1 \times 1 = 1$ , then divide by $R_{22} = 2$ : so $R_{23} = (7 - 1) /2 = 3$ .

Step 5. Third diagonal entry. The $(3, 3)$ entry of $A$ is $19$ . Subtract $R_{13}^{2} + R_{23}^{2} = 1 + 9 = 10$ . Then $R_{33} = 19 - 10 = 9 = 3$ .

Reading off the triangle, $$ R = \begin{pmatrix} 2 & 1 & 1 \ 0 & 2 & 3 \ 0 & 0 & 3 \end{pmatrix}. $$

Check the bottom-right corner: $R^{⊤} R$ in position $(3, 3)$ is $1^{2} + 3^{2} + 3^{2} = 1 + 9 + 9 = 19$ , the original entry. Every square root we took — of $4$ , of $4$ , of $9$ — was of a positive number.

What this tells us: the symmetric square $A$ has been rebuilt from one upper triangle. No row swaps were needed, and the fact that every square root was of a positive value is the running certificate that $A$ really was positive-definite.

Check your understanding Beginner

Formal definition Intermediate+

Let $A \in F^{n \times n}$ with $F = R$ or $C$ be Hermitian positive-definite (SPD when $F = R$ ): $A = A^{*}$ and $x^{*} A x > 0$ for every nonzero $x \in F^{n}$ , the quadratic-form positivity of 01.01.15. A Cholesky factorization of $A$ is a factorization $$ A = R^{} R, $$ where $R \in F^{n \times n}$ is upper triangular with real positive diagonal entries $R_{k k} > 0$ . (The equivalent lower-triangular convention writes $A = LL^{} $w i t h$ L = R^{*} $; t hi s u ni t u ses t h e u pp er - t r ian g u l a r$ R$ throughout.)

Existence and uniqueness. For every Hermitian positive-definite $A$ the factorization $A = R^{*} R$ exists and is unique among upper-triangular $R$ with positive diagonal. The factorization is the LU factorization of 43.03.01 specialized and symmetrized: writing the symmetric pivoting-free elimination $A = L D L^{*}$ with $L$ unit lower triangular and $D = diag (d_{1}, \dots, d_{n})$ , positive-definiteness forces each pivot $d_{k} > 0$ , so $D^{1/2}$ is real and $R = D^{1/2} L^{*}$ is upper triangular with positive diagonal $R_{k k} = d_{k}$ . The pivot $d_{k}$ equals the ratio of consecutive leading principal minors $det A_{k} / det A_{k - 1}$ , all of which are positive for an SPD matrix by Sylvester's criterion 01.01.15.

The algorithm. Equating entries of $A = R^{*} R$ column by column gives the recurrences, for $k = 1, \dots, n$ and $j > k$ , $$ R_{kk} = \Big(a_{kk} - \sum_{i=1}^{k-1} |R_{ik}|^2\Big)^{1/2}, \qquad R_{kj} = \frac{1}{R_{kk}}\Big(a_{kj} - \sum_{i=1}^{k-1} \overline{R_{ik}},R_{ij}\Big). $$ Each diagonal entry subtracts the already-computed column contributions and takes a square root; each off-diagonal entry subtracts the analogous cross terms and divides by the fresh diagonal. The argument of every square root is positive precisely because $A$ is positive-definite, so the algorithm never breaks down. A failure — a nonpositive radicand — is a certificate that $A$ is not positive-definite, which is the standard numerical test for definiteness.

Operation count. Cholesky forms $R$ in $\frac{1}{3} n^{3} + O (n^{2})$ floating-point operations, half the $\frac{2}{3} n^{3}$ of general LU, because symmetry lets the algorithm compute and store only one triangle. Solving the SPD system $A x = b$ then proceeds in two triangular sweeps — forward substitution $R^{*} y = b$ and back substitution $R x = y$ — each costing $n^{2} + O (n)$ operations, exactly as in 43.03.01.

Counterexamples to common slips

Symmetry alone does not suffice. The matrix $(1221)$ is symmetric but indefinite (its determinant is $- 3 < 0$ ), so $R_{22} = 1 - 4 = - 3$ is not real and no Cholesky factorization exists. Positive-definiteness, not symmetry, is the hypothesis that makes the square roots real.
Cholesky is not LU with $U = L^{⊤}$ for any symmetric matrix. For a symmetric indefinite matrix the symmetric reduction produces $A = L D L^{*}$ with some negative $d_{k}$ , so $D^{1/2}$ is complex and the real Cholesky factor does not exist; the indefinite case needs the block $L D L^{*}$ of Bunch-Kaufman, treated forward in 43.03.08, not Cholesky.
Positive diagonal is part of the definition, not a consequence to be hoped for. Dropping the positive-diagonal normalization destroys uniqueness: $R$ and $D R$ with $D$ a diagonal sign/phase matrix give the same $A = R^{*} R$ unless the diagonal is pinned positive. The convention $R_{k k} > 0$ is what makes the factor unique.
A nonsingular symmetric matrix need not be positive-definite. $(- 1 0 0 - 1)$ is symmetric, nonsingular, and has nonzero leading minors, yet is negative-definite; Cholesky fails at the first step. Invertibility is unrelated to the SPD hypothesis.

Key theorem with proof Intermediate+

The signature result is that Cholesky exists exactly for positive-definite matrices, runs without pivoting, and — the payoff that distinguishes it from general elimination — is backward stable for every SPD matrix, with no growth factor to fear.

Theorem (existence, uniqueness, and unconditional stability of Cholesky). Let $A \in F^{n \times n}$ be Hermitian. Then:

(Existence and uniqueness.) $A$ is positive-definite if and only if it admits a factorization $A = R^{*} R$ with $R$ upper triangular and positive diagonal, in which case $R$ is unique.
(No growth.) During the Cholesky reduction every entry of every active submatrix is bounded in magnitude by the largest diagonal entry of $A$ ; consequently the growth factor in the sense of 43.03.01 satisfies $ρ_{n} \leq 1$ , uniformly over all SPD matrices and independent of $n$ .
(Backward stability.) In the standard floating-point model of 43.01.01 with unit roundoff $ϵ_{mach}$ , the computed factor $\tilde{R}$ satisfies $\tilde{R}^{*} \tilde{R} = A + E$ with $∥ E ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ for a low-degree polynomial $c_{n}$ — and the computed solution $\tilde{x}$ of $A x = b$ satisfies $(A + Δ A) \tilde{x} = b$ with the same order of perturbation, so Cholesky is backward stable for every SPD matrix ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Proof. (Existence.) Induct on $n$ . For $n = 1$ , positive-definiteness means $a_{11} > 0$ , so $R = (a_{11})$ works. For $n > 1$ , partition $$ A = \begin{pmatrix} \alpha & w^{} \ w & B \end{pmatrix}, \qquad \alpha = a_{11} > 0,\ w \in \mathbb{F}^{n-1},\ B \in \mathbb{F}^{(n-1)\times(n-1)}. $$ Set $r_{11} = α$ and $s = w / α$ . Then a direct block multiplication gives $$ A = \begin{pmatrix} \sqrt{\alpha} & 0 \ s & I \end{pmatrix} \begin{pmatrix} 1 & 0 \ 0 & S \end{pmatrix} \begin{pmatrix} \sqrt{\alpha} & s^{} \ 0 & I \end{pmatrix}, \qquad S = B - \frac{w w^{}}{\alpha}, $$ where $S$ is the Schur complement of $α$ in $A$ . The Schur complement of a positive-definite matrix is positive-definite: for nonzero $u \in F^{n - 1}$ , choosing $t = -w^{}u/\alpha $an d t es t in g$ A $a g ain s t$ (t, u^{\top})^{\top} $g i v es$ u^{} S u = (t, u^{}) A (t, u)^{\top} > 0 $. B y t h e in d u c t i v e h y p o t h es i s$ S = R_S^{} R_S $w i t h$ R_S$ upper triangular and positive diagonal. Assembling, $$ R = \begin{pmatrix} \sqrt{\alpha} & s^{} \ 0 & R_S \end{pmatrix} $$ is upper triangular with positive diagonal and satisfies $R^{*} R = A$ . The converse is immediate: if $A = R^{*} R$ with $R$ nonsingular then $x^{*} A x = ∥ R x ∥_{2}^{2} > 0$ for $x \neq = 0$ , so $A$ is positive-definite.

(Uniqueness.) If $A = R_{1}^{*} R_{1} = R_{2}^{*} R_{2}$ with both factors upper triangular and positive-diagonal, then $R_{2} R_{1}^{- 1} = R_{2}^{-*} R_{1}^{*} = (R_{1} R_{2}^{- 1})^{*}$ . The left side is upper triangular and the right side is lower triangular, so $Q := R_{2} R_{1}^{- 1}$ is diagonal; from $Q = Q^{*}$ and $A = R_{1}^{*} Q^{*} Q R_{1}$ giving $Q^{*} Q = I$ , the diagonal $Q$ is unitary with positive diagonal, hence $Q = I$ and $R_{1} = R_{2}$ .

(No growth.) The active submatrix after eliminating $k - 1$ columns is the Schur complement $S^{(k)}$ , which is positive-definite by the argument above. For a positive-definite matrix every entry is bounded by the diagonal: $∣ S_{ij}^{(k)} ∣^{2} \leq S_{ii}^{(k)} S_{j j}^{(k)}$ by positivity of the $2 \times 2$ principal minor, so $∣ S_{ij}^{(k)} ∣ \leq max_{i} S_{ii}^{(k)}$ . Moreover the diagonal cannot grow, since $S_{ii}^{(k + 1)} = S_{ii}^{(k)} - ∣ S_{i, k}^{(k)} ∣^{2} / S_{k, k}^{(k)} \leq S_{ii}^{(k)}$ , so $max_{i} S_{ii}^{(k)} \leq max_{i} a_{ii}$ for all $k$ . Hence no intermediate entry exceeds $max_{i} a_{ii} \leq max_{ij} ∣ a_{ij} ∣$ , giving $ρ_{n} \leq 1$ .

(Backward stability.) Apply the standard model to the recurrences. The computed entries satisfy $\tilde{R}_{k k} = fl ((a_{k k} - \sum_{i} ∣ \tilde{R}_{ik} ∣^{2})^{1/2})$ and analogously for $\tilde{R}_{k j}$ , each rounding step contributing a relative perturbation of size $ϵ_{mach}$ . Collecting these by the product-of-factors lemma of 43.01.01, the computed factor exactly factors a perturbed matrix, $\tilde{R}^{*} \tilde{R} = A + E$ , where each entry $E_{ij}$ accumulates at most $n$ rounding terms, each bounded by $γ_{n} ∣ \tilde{R}^{*} ∣ ∣ \tilde{R} ∣$ entrywise with $γ_{n} = n ϵ_{mach} / (1 - n ϵ_{mach})$ . Because $ρ_{n} \leq 1$ , the entries of $\tilde{R}$ are bounded by $(max_{i} a_{ii})^{1/2}$ , so $∥ ∣ \tilde{R}^{*} ∣ ∣ \tilde{R} ∣ ∥_{2} \leq c_{n} ∥ A ∥_{2}$ with no growth-factor inflation, giving $∥ E ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ . The triangular backward-error result of 43.01.03 adds a perturbation of the same order to the two solves, combining into a single $Δ A$ of the stated size. $□$

Bridge. This theorem is the foundational reason the symmetric positive-definite solve is the safest dense factorization in numerical linear algebra: the growth factor that made 43.03.01 only conditionally stable is here bounded a priori by $1$ , so the master inequality of 43.01.03 specializes with the algorithm's contribution removed entirely. This is exactly the structure of the GEPP backward-error theorem with $ρ_{n}$ — the single quantity that could go wrong there — pinned to its best possible value by positive-definiteness, and the central insight is that the Schur complement of an SPD matrix is again SPD, so the diagonal-dominance-of-self propagates through every step. The result builds toward the conjugate gradient method 43.07.04, whose energy-norm minimization rests on the same SPD structure and whose preconditioners are built from incomplete Cholesky factors, and it appears again in the least-squares comparison 43.04.01, where the normal-equations route solves $A^{*} A x = A^{*} b$ by Cholesky and pays for its speed with a squared condition number. Putting these together, where general elimination gambles on the growth factor and wins in expectation, Cholesky generalises that gamble into a certainty for the positive-definite class, and the bridge is that the same positivity that defines the matrix is what bounds its rounding.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Prove that if $A$ is Hermitian positive-definite then so is its Schur complement $S = B - w w^{*} / α$ , where $A = (α w w^{*} B)$ with $α = a_{11}$ .

Hint

For a candidate vector $u$ , choose the first coordinate $t$ to make the cross terms with $w$ cancel, then read off the positivity of $A$ as positivity of $S$ .

Answer

Let $u \in F^{n - 1}$ be nonzero and set $t = - w^{*} u / α$ . Form the vector $z = (t, u^{⊤})^{⊤}$ . Then $$ z^{}Az = \alpha|t|^2 + t,\overline{w^{}u}/1 \cdot \text{(conjugate terms)} + u^{}Bu = \alpha |t|^2 + t,u^{}w + \bar t, w^{}u + u^{}Bu. $$ Substituting $t = - w^{*} u / α$ , the cross terms give $- 2∣ w^{*} u ∣^{2} / α$ and $α ∣ t ∣^{2} = ∣ w^{*} u ∣^{2} / α$ , so $z^{*} A z = u^{*} B u - ∣ w^{*} u ∣^{2} / α = u^{*} (B - w w^{*} / α) u = u^{*} S u$ . Since $A$ is positive-definite and $z \neq = 0$ (its tail $u$ is nonzero), $z^{*} A z > 0$ , hence $u^{*} S u > 0$ . As $u$ was arbitrary nonzero, $S$ is positive-definite; $S = S^{*}$ is inherited from $A = A^{*}$ .

Exercise 5 (medium, symbolic).

Show that for a Hermitian positive-definite matrix every entry is bounded by the largest diagonal entry: $∣ a_{ij} ∣ \leq max_{k} a_{k k}$ . Explain why this is the source of the growth-factor bound $ρ_{n} \leq 1$ .

Hint

Positive-definiteness forces every $2 \times 2$ principal minor to be positive.

Answer

For indices $i \neq = j$ , the $2 \times 2$ principal submatrix $(a_{ii} \overline{a_{ij}} a_{ij} a_{j j})$ is positive-definite (a principal submatrix of an SPD matrix), so its determinant is positive: $a_{ii} a_{j j} - ∣ a_{ij} ∣^{2} > 0$ , giving $∣ a_{ij} ∣^{2} < a_{ii} a_{j j} \leq (max_{k} a_{k k})^{2}$ , hence $∣ a_{ij} ∣ \leq max_{k} a_{k k}$ ; the diagonal case is immediate. Because each Schur complement in Cholesky is again SPD and its diagonal does not increase, the same bound holds at every step, so no entry ever exceeds the original maximum diagonal: the growth factor satisfies $ρ_{n} \leq 1$ , which is why Cholesky needs no pivoting and is unconditionally stable.

Exercise 7 (hard, symbolic).

Prove the uniqueness half of the Cholesky theorem: if $A = R_{1}^{*} R_{1} = R_{2}^{*} R_{2}$ with both $R_{j}$ upper triangular and positive-diagonal, then $R_{1} = R_{2}$ .

Hint

Form $Q = R_{2} R_{1}^{- 1}$ and show it is simultaneously upper and lower triangular, hence diagonal, then unitary with positive diagonal.

Answer

From $R_{1}^{*} R_{1} = R_{2}^{*} R_{2}$ and the nonsingularity of both factors (positive diagonal entries), $R_{2} R_{1}^{- 1} = R_{2}^{-*} R_{1}^{*}$ . Set $Q = R_{2} R_{1}^{- 1}$ : the left expression is a product of upper-triangular matrices, hence upper triangular, while the right expression $R_{2}^{-*} R_{1}^{*}$ is a product of lower-triangular matrices, hence lower triangular. A matrix that is both upper and lower triangular is diagonal, so $Q$ is diagonal. From $Q = R_{2} R_{1}^{- 1}$ we get $R_{2} = Q R_{1}$ , so $A = R_{2}^{*} R_{2} = R_{1}^{*} Q^{*} Q R_{1}$ , and comparing with $A = R_{1}^{*} R_{1}$ and cancelling the nonsingular $R_{1}^{*}, R_{1}$ gives $Q^{*} Q = I$ . A diagonal $Q$ with $Q^{*} Q = I$ has unit-modulus diagonal entries; but the diagonal of $Q = R_{2} R_{1}^{- 1}$ is the ratio of the positive diagonals of $R_{2}$ and $R_{1}$ , hence positive real. A positive real number of modulus $1$ is $1$ , so $Q = I$ and $R_{2} = R_{1}$ .

Exercise 8 (hard, symbolic).

Let $A$ be SPD with Cholesky factor $R$ . Show that solving the SPD system $A x = b$ via Cholesky achieves a forward error governed by the same condition number as the system itself, with no growth-factor penalty: $∥ \tilde{x} - x ∥/∥ x ∥ = O (κ_{2} (A) ϵ_{mach})$ , where $κ_{2} (A) = λ_{m a x} (A) / λ_{m i n} (A)$ .

Hint

Combine the backward-error theorem ( $ρ_{n} \leq 1$ ) with the standard perturbation bound, and use the spectral characterization $κ_{2} (A) = λ_{m a x} / λ_{m i n}$ from the spectral theorem 01.01.13.

Answer

By the theorem, the computed solution satisfies $(A + Δ A) \tilde{x} = b$ with $∥Δ A ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ , the bound carrying no growth factor because $ρ_{n} \leq 1$ . The standard perturbation result for $A x = b$ gives $∥ \tilde{x} - x ∥/∥ x ∥ \leq κ_{2} (A) ∥Δ A ∥_{2} /∥ A ∥_{2} / (1 - κ_{2} (A) ∥Δ A ∥_{2} /∥ A ∥_{2})$ . Substituting the backward error yields $∥ \tilde{x} - x ∥/∥ x ∥ = O (κ_{2} (A) c_{n} ϵ_{mach})$ . Because $A$ is Hermitian, the spectral theorem 01.01.13 gives $∥ A ∥_{2} = λ_{m a x} (A)$ and $∥ A^{- 1} ∥_{2} = 1/ λ_{m i n} (A)$ , so $κ_{2} (A) = λ_{m a x} (A) / λ_{m i n} (A)$ , a ratio of positive eigenvalues. The forward error is thus the problem's conditioning times machine epsilon, with the algorithm contributing nothing beyond the polynomial $c_{n}$ — the best a direct solver can do.

Advanced results Master

Theorem 1 (Cholesky, $L D L^{*}$ , and the symmetric reduction). For Hermitian positive-definite $A$ , the Cholesky factorization $A = R^{*} R$ , the root-free symmetric factorization $A = L D L^{*}$ with $L$ unit lower triangular and $D$ diagonal positive, and the symmetric Gaussian elimination of 43.03.01 all coincide as descriptions of the same data: $R = D^{1/2} L^{*}$ , $d_{k} = R_{k k}^{2} = det A_{k} / det A_{k - 1}$ , and $L_{ik}$ is the elimination multiplier. The $L D L^{*}$ form avoids the $n$ square roots of Cholesky and is preferred when square roots are costly or when one wants to defer the positive-definiteness test to the signs of the $d_{k}$ ; for a general symmetric indefinite matrix the diagonal $D$ is replaced by a block-diagonal $D$ with $1 \times 1$ and $2 \times 2$ pivots, the Bunch-Kaufman factorization forward-pointed in 43.03.08, whose symmetric pivoting controls growth without destroying symmetry ^{[Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)]}.

Theorem 2 (unconditional backward stability, sharp form). Let $\tilde{R}$ be the Cholesky factor of SPD $A$ computed in floating-point arithmetic with unit roundoff $ϵ_{mach}$ , and let $\tilde{x}$ be the computed solution of $A x = b$ from the two triangular sweeps. Then there is a perturbation $Δ A$ with $$ (A + \Delta A)\tilde{x} = b, \qquad |\Delta A|2 \le \frac{c,n,\epsilon{\mathrm{mach}}}{1 - c,n,\epsilon_{\mathrm{mach}}},|A|_2, $$ for a modest constant $c$ , with no dependence on any growth factor or condition number. The result holds for every SPD matrix without exception, which is the precise sense in which Cholesky is unconditionally backward stable, in contrast to the only conditionally stable GEPP of 43.03.01 whose bound carries the $ρ_{n}$ that can in principle reach $2^{n - 1}$ . A practical corollary is that for SPD systems one need never inspect a growth factor: the factorization is guaranteed safe by the input class alone ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Theorem 3 (Cholesky as a definiteness test and the semidefinite boundary). Running the Cholesky algorithm on a Hermitian matrix is a backward-stable test of positive-definiteness: the algorithm completes with all radicands positive if and only if $A$ is positive-definite, and the first nonpositive radicand localizes the failure of Sylvester's criterion to a specific leading minor. On the boundary, a positive semidefinite $A$ of rank $r$ admits a Cholesky factorization with a pivoted permutation $P^{⊤} A P = R^{*} R$ where $R$ has exactly $r$ nonzero rows; symmetric pivoting on the largest remaining diagonal entry makes this rank-revealing and numerically stable, and the trailing $n - r$ zero pivots expose the null space. The semidefinite case is the bridge from the strict SPD solve to rank-deficient least squares and to the low-rank approximation of covariance matrices ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]}.

Theorem 4 (the normal-equations route and its conditioning cost). For a full-column-rank $A \in F^{m \times n}$ ( $m \geq n$ ), the least-squares problem $min_{x} ∥ A x - b ∥_{2}$ has the normal equations $A^{*} A x = A^{*} b$ with SPD coefficient matrix $A^{*} A$ , solvable by Cholesky in $\frac{1}{3} n^{3} + O (m n^{2})$ operations — the fastest of the three classical least-squares solvers. The price is conditioning: $κ_{2} (A^{*} A) = κ_{2} (A)^{2}$ , so forming and factoring $A^{*} A$ can lose twice as many digits as the orthogonal QR or SVD routes. Cholesky on the normal equations is the method of choice when $A$ is well-conditioned and speed matters, and the squared-conditioning warning is exactly the dividing line at which QR replaces it, the comparison assembled in 43.04.01 ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra]}.

Synthesis. The Cholesky factorization is the foundational reason the symmetric positive-definite solve is treated as the gold standard among direct methods: the single scalar that made 43.03.01 only conditionally trustworthy, the growth factor $ρ_{n}$ , is here pinned to $ρ_{n} \leq 1$ a priori by the positive-definite structure, so the master inequality of 43.01.03 specializes with the algorithm's risk term removed and only the problem's conditioning $κ (A)$ of 43.01.02 surviving. This is exactly the structure of the GEPP backward-error theorem with its worst case dissolved, and the central insight — that the Schur complement of an SPD matrix is again SPD, with its diagonal nonincreasing — is what propagates the entry bound $∣ a_{ij} ∣ \leq max_{k} a_{k k}$ through every step. The factorization generalises in two directions that organize the rest of the chapter: dropping square roots gives the $L D L^{*}$ form and, for the indefinite case, the block Bunch-Kaufman pivoting of 43.03.08; relaxing strict definiteness gives the rank-revealing pivoted Cholesky of the semidefinite boundary. It is dual to the iterative world, where the same SPD hypothesis powers the conjugate gradient method 43.07.04 and where incomplete Cholesky supplies its standard preconditioner; and it feeds the least-squares comparison 43.04.01, where the normal-equations route trades a squared condition number for Cholesky's speed. Putting these together, where general elimination gambles on the growth factor and wins in expectation, the positive-definite class converts that gamble into a theorem, and the bridge is that the very positivity defining the matrix is what bounds the rounding committed in factoring it.

Full proof set Master

Proposition 1 (existence via the Schur complement recursion). Let $A \in F^{n \times n}$ be Hermitian positive-definite. Then $A = R^{}R $f or an u pp er - t r ian g u l a r$ R$ with positive diagonal, constructed recursively through Schur complements.*

Proof. Induct on $n$ . The base case $n = 1$ gives $a_{11} > 0$ and $R = (a_{11})$ . For $n > 1$ partition $A = (α w w^{*} B)$ with $α = a_{11} > 0$ . Define $S = B - w w^{*} / α$ , the Schur complement. For nonzero $u \in F^{n - 1}$ , set $t = - w^{*} u / α$ and $z = (t, u^{⊤})^{⊤} \neq = 0$ ; expanding $z^{*} A z = α ∣ t ∣^{2} + 2 Re (\overset{ˉ}{t} w^{*} u) + u^{*} B u$ and substituting $t$ gives $z^{*} A z = u^{*} B u - ∣ w^{*} u ∣^{2} / α = u^{*} S u$ , which is positive because $A$ is positive-definite; thus $S$ is positive-definite (and Hermitian, inherited from $A$ ). By the inductive hypothesis $S = R_{S}^{*} R_{S}$ with $R_{S}$ upper triangular, positive diagonal. Put $R = (α 0 w^{*} / α R_{S})$ . Then $$ R^{}R = \begin{pmatrix} \alpha & w^{} \ w & ww^{}/\alpha + R_S^{}R_S \end{pmatrix} = \begin{pmatrix} \alpha & w^{} \ w & ww^{}/\alpha + S \end{pmatrix} = \begin{pmatrix} \alpha & w^{*} \ w & B \end{pmatrix} = A, $$ and $R$ is upper triangular with positive diagonal ${α} \cup diag (R_{S})$ . $□$

Proposition 2 (uniqueness from positive-diagonal normalization). The factor $R$ of Proposition 1 is unique among upper-triangular matrices with positive diagonal.

Proof. Suppose $A = R_{1}^{*} R_{1} = R_{2}^{*} R_{2}$ with both factors upper triangular and positive-diagonal, hence nonsingular. Then $R_{2} R_{1}^{- 1} = R_{2}^{-*} R_{1}^{*}$ . The left side is upper triangular (product of upper-triangular matrices and an upper-triangular inverse); the right side is lower triangular. A matrix simultaneously upper and lower triangular is diagonal, so $Q := R_{2} R_{1}^{- 1}$ is diagonal. From $R_{2} = Q R_{1}$ and $R_{2}^{*} R_{2} = R_{1}^{*} R_{1}$ we get $R_{1}^{*} Q^{*} Q R_{1} = R_{1}^{*} R_{1}$ , and cancelling $R_{1}^{*}$ and $R_{1}$ gives $Q^{*} Q = I$ , so $Q$ is unitary. A diagonal unitary matrix has unit-modulus entries; but the $k$ -th diagonal entry of $Q$ is $(R_{2})_{k k} / (R_{1})_{k k}$ , a quotient of positive reals, hence a positive real of modulus $1$ , namely $1$ . Thus $Q = I$ and $R_{1} = R_{2}$ . $□$

Proposition 3 (no growth: $ρ_{n} \leq 1$ ). In the Cholesky reduction of an SPD matrix, every entry of every active submatrix is bounded by $max_{i} a_{ii}$ , so the growth factor satisfies $ρ_{n} \leq 1$ .

Proof. The active submatrix after $k - 1$ steps is the Schur complement $S^{(k)}$ , positive-definite by Proposition 1. For any positive-definite $M$ , the $2 \times 2$ principal minor on indices $i, j$ is positive: $M_{ii} M_{j j} - ∣ M_{ij} ∣^{2} > 0$ , so $∣ M_{ij} ∣ < M_{ii} M_{j j} \leq max_{ℓ} M_{ℓℓ}$ , and the diagonal terms obey the same bound directly; hence every entry of $S^{(k)}$ is at most $max_{i} S_{ii}^{(k)}$ . The diagonal does not grow: the Cholesky update gives $S_{ii}^{(k + 1)} = S_{ii}^{(k)} - ∣ S_{ik}^{(k)} ∣^{2} / S_{k k}^{(k)} \leq S_{ii}^{(k)}$ , so inductively $max_{i} S_{ii}^{(k)} \leq max_{i} a_{ii}$ for all $k$ . Therefore no intermediate entry exceeds $max_{i} a_{ii} \leq max_{ij} ∣ a_{ij} ∣$ , giving $ρ_{n} = max_{ij k} ∣ S_{ij}^{(k)} ∣/ max_{ij} ∣ a_{ij} ∣ \leq 1$ . $□$

Proposition 4 (backward stability of the SPD solve). Let $\tilde{R}$ be the computed Cholesky factor and $\tilde{x}$ the computed solution of $A x = b$ . Then $(A + Δ A) \tilde{x} = b$ with $∥Δ A ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ , no growth factor present.

Proof. The standard model applied to the Cholesky recurrences gives, for each computed entry, $\tilde{R}_{k j}$ equal to its exact counterpart times $(1 + θ)$ with $∣ θ ∣ \leq γ_{n} = n ϵ_{mach} / (1 - n ϵ_{mach})$ , by collecting the inner-product, division, and square-root roundings through the product-of-factors lemma of 43.01.01. Summing the perturbations over the at-most- $n$ steps touching entry $(i, j)$ yields $\tilde{R}^{*} \tilde{R} = A + E$ with $∣ E ∣ \leq γ_{n + 1} ∣ \tilde{R}^{*} ∣ ∣ \tilde{R} ∣$ entrywise. By Proposition 3 the entries of $\tilde{R}$ are bounded by $(max_{i} a_{ii})^{1/2}$ , so $∣ \tilde{R}^{*} ∣ ∣ \tilde{R} ∣_{2} \leq n max_{i} a_{ii} \leq n ∥ A ∥_{2}$ , giving $∥ E ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ with no growth-factor inflation — the contrast with the GEPP bound, whose analogous step carries $ρ_{n}$ . Solving $\tilde{R}^{*} y = b$ and $\tilde{R} x = y$ by substitution, the triangular backward-error result of 43.01.03 gives $(\tilde{R}^{*} + Δ_{1}) \tilde{y} = b$ and $(\tilde{R} + Δ_{2}) \tilde{x} = \tilde{y}$ with $∣ Δ_{j} ∣ \leq γ_{n} ∣ \tilde{R} ∣$ ; eliminating $\tilde{y}$ and substituting $\tilde{R}^{*} \tilde{R} = A + E$ gives $(A + Δ A) \tilde{x} = b$ with $Δ A = E + \tilde{R}^{*} Δ_{2} + Δ_{1} \tilde{R} + Δ_{1} Δ_{2}$ . Each added term is $O (γ_{n} ∥ A ∥_{2})$ by the same entry bound, and dropping the second-order $Δ_{1} Δ_{2}$ , $∥Δ A ∥_{2} \leq c_{n} ϵ_{mach} ∥ A ∥_{2}$ . $□$

Connections Master

Gaussian elimination, LU factorization, and its stability 43.03.01 is the general factorization that Cholesky specializes and improves: the symmetric reduction $A = L D L^{*}$ is the LU factorization of that unit with $U = D L^{*}$ , and the growth factor $ρ_{n}$ that made GEPP only conditionally stable there is bounded a priori by $1$ here. That unit owns the general direct-solver theory and the growth-factor concept; this unit shows that positive-definiteness removes the gamble, needs no pivoting, and halves the cost — the canonical illustration of how matrix structure buys both speed and unconditional stability.
Quadratic forms and positive-definiteness 01.01.15 supplies the defining hypothesis $x^{*} A x > 0$ and, through Sylvester's criterion, the positivity of all leading principal minors $det A_{k}$ that makes every Cholesky pivot $d_{k} = det A_{k} / det A_{k - 1}$ positive and every square root real. The Cholesky factorization is the algorithmic shadow of that unit's classification of positive-definite forms: where it characterizes positivity abstractly by minors or eigenvalue signs, this unit turns the characterization into a constructive $O (n^{3})$ test that also produces the factor.
The spectral theorem 01.01.13 underlies the eigenvalue characterization $κ_{2} (A) = λ_{m a x} (A) / λ_{m i n} (A)$ that governs the conditioning of the SPD solve and identifies $∥ A ∥_{2}$ and $∥ A^{- 1} ∥_{2}$ with the extreme eigenvalues. Cholesky and the spectral decomposition are two factorizations of the same SPD matrix — one triangular and cheap, one orthogonal and diagonalizing — and the connection is that positive-definiteness, defined spectrally as all-positive eigenvalues in that unit, is exactly what guarantees the triangular factor of this unit exists with real positive diagonal.
The conjugate gradient method 43.07.04 is the iterative counterpart that shares the SPD hypothesis: where this unit solves $A x = b$ by a direct $\frac{1}{3} n^{3}$ factorization, conjugate gradient minimizes the $A$ -norm of the error over Krylov subspaces, and its standard preconditioners are built from incomplete Cholesky factors that approximate the $R$ of this unit while preserving sparsity. The two methods divide the SPD solver landscape between dense-direct and sparse-iterative, joined by the shared positive-definite structure.
Least squares: normal equations vs QR vs SVD 43.04.01 uses Cholesky as the engine of its fastest route: the normal equations $A^{*} A x = A^{*} b$ have an SPD coefficient matrix solved by the factorization of this unit, paying for its speed with the squared condition number $κ_{2} (A^{*} A) = κ_{2} (A)^{2}$ . This unit's unconditional stability is what makes the normal-equations Cholesky safe given a well-conditioned $A$ ; the squared conditioning is precisely the warning that sends ill-conditioned least-squares problems to the orthogonal methods compared there.

Historical & philosophical context Master

The factorization is named for André-Louis Cholesky (1875–1918), a French artillery officer and geodesist who devised the symmetric square-root method to solve the normal equations arising in the least-squares adjustment of triangulation networks. He did not publish it; the method became known only after his death in the First World War, through a posthumous note by Commandant Benoit in the Bulletin géodésique (1924) ^{[Cholesky, A.-L. — Sur la résolution numérique des systèmes d'équations linéaires]}, and circulated for decades primarily in the geodetic and structural-engineering communities before entering the mainstream numerical-analysis literature.

The modern stability understanding is twentieth-century. The recognition that Cholesky is exceptionally well-behaved — that the positive-definite structure bounds the growth factor a priori and so dispenses with the pivoting that general Gaussian elimination requires — was made precise in the backward-error framework of James H. Wilkinson and developed in componentwise form by Nicholas Higham, whose Accuracy and Stability of Numerical Algorithms (SIAM, 1996; 2nd ed. 2002) ^{[Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)]} gives the definitive analysis of the unconditional backward stability of the SPD solve. The connection to the symmetric reduction $A = L D L^{*}$ and the indefinite generalization were systematized by James Bunch and Beresford Parlett (1971) and refined by Bunch and Linda Kaufman (1977), placing Cholesky as the definite endpoint of a family of symmetry-preserving factorizations.

Bibliography Master

@book{trefethenbau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}

@book{golubvanloan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{higham2002accuracy,
  author    = {Higham, Nicholas J.},
  title     = {Accuracy and Stability of Numerical Algorithms},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2002}
}

@article{benoit1924cholesky,
  author  = {Benoit, Commandant},
  title   = {Note sur une méthode de résolution des équations normales provenant de l'application de la méthode des moindres carrés (procédé du Commandant Cholesky)},
  journal = {Bulletin géodésique},
  volume  = {2},
  year    = {1924},
  pages   = {67--77}
}

@article{bunchparlett1971,
  author  = {Bunch, James R. and Parlett, Beresford N.},
  title   = {Direct Methods for Solving Symmetric Indefinite Systems of Linear Equations},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {8},
  number  = {4},
  year    = {1971},
  pages   = {639--655}
}

@article{bunchkaufman1977,
  author  = {Bunch, James R. and Kaufman, Linda},
  title   = {Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems},
  journal = {Mathematics of Computation},
  volume  = {31},
  number  = {137},
  year    = {1977},
  pages   = {163--179}
}

Prerequisites

43.03.01
01.01.15
01.01.13

Tier anchors

beginner: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 23 (Cholesky factorization, the SPD-solver story, no pivoting needed); Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) §6.5 (positive-definite matrices and A = R^T R)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 23 (the Cholesky factorization A = R^*R, the algorithm, the a-priori stability statement); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §4.2 (Cholesky existence/uniqueness, the bordered algorithm, the n^3/3 count)
master: Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 10 (the backward-error analysis of Cholesky, the unconditional stability with growth factor 1, the SPD perturbation theory); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §4.2-4.4; Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 23

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra · SIAM, 1997. Lecture 23: the Cholesky factorization A = R^*R of a Hermitian positive-definite matrix, the symmetric reduction exploiting structure to halve the work to n^3/3 operations, the fact that the diagonal of the active submatrix bounds every entry so that no pivoting is required, and the unconditional backward stability of Cholesky for every SPD matrix.
Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.) · SIAM, 2002. Ch. 10: the componentwise backward-error analysis of Cholesky factorization, the bound ||E||_2 <= c_n eps_mach ||A||_2 with no growth-factor dependence (effective growth factor 1 for SPD matrices), the perturbation theory of the SPD solve, and the use of a failed Cholesky attempt as a numerical test of positive-definiteness.
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §4.2: the Cholesky factorization of an SPD matrix, the existence/uniqueness proof from the leading-principal-minor positivity, the bordered (outer-product and gaxpy) algorithms, the n^3/3 flop count, and the connection to the LDL^T factorization for symmetric indefinite matrices.
Cholesky, A.-L. — Sur la résolution numérique des systèmes d'équations linéaires · Bulletin géodésique 2 (1924, posthumous), 67-77. The original method for solving the symmetric normal equations of geodetic least-squares adjustment by the square-root factorization later named after its author.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m