43.03.02 · numerical-analysis / 03-direct-linear-solvers

Cholesky factorization and the symmetric positive-definite solve

shipped3 tiersLean: none

Anchor (Master): Higham 2002 *Accuracy and Stability of Numerical Algorithms* 2e (SIAM) Ch. 10 (the backward-error analysis of Cholesky, the unconditional stability with growth factor 1, the SPD perturbation theory); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §4.2-4.4; Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 23

Intuition Beginner

Some matrices are special in a way that makes solving with them much easier. A symmetric positive-definite matrix is one that is its own mirror image across the diagonal, and that always returns a positive number when you sandwich any nonzero vector around it. These matrices show up everywhere: in the energy of a vibrating structure, in the covariance of a dataset, in the normal equations of a least-squares fit. For them, the general factorization of 43.03.01 can be sharpened into something cleaner and cheaper.

The clean version is the Cholesky factorization. Instead of two different triangular pieces and , a symmetric positive-definite matrix splits into a single triangular piece times its own transpose: . One triangle does all the work, because the symmetry of is carried inside the single factor. You can think of as the square root of in a triangular shape — and like an ordinary square root of a positive number, it exists only because is positive in the right sense.

Because one factor does the job of two, Cholesky costs about half what general elimination costs. And there is a deeper bonus. With ordinary elimination you had to watch for tiny pivots and reorder rows to stay safe, the partial pivoting of 43.03.01. Cholesky never needs that. The positive-definite structure guarantees that the numbers you divide by are never dangerously small relative to the rest, so the method is safe for every such matrix, with no reordering and no luck involved.

There is even a free diagnostic hidden here. If you try to run Cholesky on a matrix and at some point you are asked to take the square root of a negative number, the matrix was not positive-definite after all. The factorization succeeds exactly when the matrix is the special kind, so running it is itself the test.

Visual Beginner

The picture is a square symmetric matrix folding into a single triangle: the upper-right triangle , which when multiplied by its own flipped copy rebuilds the whole square.

Read the table top to bottom. You compute the triangle one row at a time. The first diagonal entry of is the square root of the first diagonal entry of . Each later entry is found by undoing the contributions already accounted for and then, on the diagonal, taking a square root. Because is positive-definite, every square root is of a positive number, so the process never stalls.

stage what you compute why it is safe
start the symmetric square returns positive values
first diagonal square root of top-left entry that entry is positive
first row of divide the top row by the first diagonal the divisor is not tiny
next diagonal subtract known parts, take square root the leftover stays positive
finish the full upper triangle with every step stayed positive

The takeaway: where general elimination produced two triangles and needed pivoting to stay safe, the positive-definite structure collapses the answer into one triangle and removes the need to pivot at all.

Worked example Beginner

Let us factor a small symmetric positive-definite matrix by hand and watch the single triangle appear.

Take $$ A = \begin{pmatrix} 4 & 2 & 2 \ 2 & 5 & 7 \ 2 & 7 & 19 \end{pmatrix}. $$ This matrix is symmetric (it equals its own transpose) and, as the factorization will confirm, positive-definite. We build the upper triangle so that .

Step 1. First diagonal entry. The top-left entry of is , so .

Step 2. First row of . Divide the rest of the top row of by . So and .

Step 3. Second diagonal entry. Take the entry of , which is , and subtract the square of what is already above it in that column of , namely . Then .

Step 4. Rest of the second row. The entry of is . Subtract , then divide by : so .

Step 5. Third diagonal entry. The entry of is . Subtract . Then .

Reading off the triangle, $$ R = \begin{pmatrix} 2 & 1 & 1 \ 0 & 2 & 3 \ 0 & 0 & 3 \end{pmatrix}. $$

Check the bottom-right corner: in position is , the original entry. Every square root we took — of , of , of — was of a positive number.

What this tells us: the symmetric square has been rebuilt from one upper triangle. No row swaps were needed, and the fact that every square root was of a positive value is the running certificate that really was positive-definite.

Check your understanding Beginner

Formal definition Intermediate+

Let with or be Hermitian positive-definite (SPD when ): and for every nonzero , the quadratic-form positivity of 01.01.15. A Cholesky factorization of is a factorization $$ A = R^{} R, $$ where is upper triangular with real positive diagonal entries . (The equivalent lower-triangular convention writes $A = LL^{}L = R^{*}R$ throughout.)

Existence and uniqueness. For every Hermitian positive-definite the factorization exists and is unique among upper-triangular with positive diagonal. The factorization is the LU factorization of 43.03.01 specialized and symmetrized: writing the symmetric pivoting-free elimination with unit lower triangular and , positive-definiteness forces each pivot , so is real and is upper triangular with positive diagonal . The pivot equals the ratio of consecutive leading principal minors , all of which are positive for an SPD matrix by Sylvester's criterion 01.01.15.

The algorithm. Equating entries of column by column gives the recurrences, for and , $$ R_{kk} = \Big(a_{kk} - \sum_{i=1}^{k-1} |R_{ik}|^2\Big)^{1/2}, \qquad R_{kj} = \frac{1}{R_{kk}}\Big(a_{kj} - \sum_{i=1}^{k-1} \overline{R_{ik}},R_{ij}\Big). $$ Each diagonal entry subtracts the already-computed column contributions and takes a square root; each off-diagonal entry subtracts the analogous cross terms and divides by the fresh diagonal. The argument of every square root is positive precisely because is positive-definite, so the algorithm never breaks down. A failure — a nonpositive radicand — is a certificate that is not positive-definite, which is the standard numerical test for definiteness.

Operation count. Cholesky forms in floating-point operations, half the of general LU, because symmetry lets the algorithm compute and store only one triangle. Solving the SPD system then proceeds in two triangular sweeps — forward substitution and back substitution — each costing operations, exactly as in 43.03.01.

Counterexamples to common slips

  • Symmetry alone does not suffice. The matrix is symmetric but indefinite (its determinant is ), so is not real and no Cholesky factorization exists. Positive-definiteness, not symmetry, is the hypothesis that makes the square roots real.

  • Cholesky is not LU with for any symmetric matrix. For a symmetric indefinite matrix the symmetric reduction produces with some negative , so is complex and the real Cholesky factor does not exist; the indefinite case needs the block of Bunch-Kaufman, treated forward in 43.03.08, not Cholesky.

  • Positive diagonal is part of the definition, not a consequence to be hoped for. Dropping the positive-diagonal normalization destroys uniqueness: and with a diagonal sign/phase matrix give the same unless the diagonal is pinned positive. The convention is what makes the factor unique.

  • A nonsingular symmetric matrix need not be positive-definite. is symmetric, nonsingular, and has nonzero leading minors, yet is negative-definite; Cholesky fails at the first step. Invertibility is unrelated to the SPD hypothesis.

Key theorem with proof Intermediate+

The signature result is that Cholesky exists exactly for positive-definite matrices, runs without pivoting, and — the payoff that distinguishes it from general elimination — is backward stable for every SPD matrix, with no growth factor to fear.

Theorem (existence, uniqueness, and unconditional stability of Cholesky). Let be Hermitian. Then:

  1. (Existence and uniqueness.) is positive-definite if and only if it admits a factorization with upper triangular and positive diagonal, in which case is unique.

  2. (No growth.) During the Cholesky reduction every entry of every active submatrix is bounded in magnitude by the largest diagonal entry of ; consequently the growth factor in the sense of 43.03.01 satisfies , uniformly over all SPD matrices and independent of .

  3. (Backward stability.) In the standard floating-point model of 43.01.01 with unit roundoff , the computed factor satisfies with for a low-degree polynomial — and the computed solution of satisfies with the same order of perturbation, so Cholesky is backward stable for every SPD matrix [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Proof. (Existence.) Induct on . For , positive-definiteness means , so works. For , partition $$ A = \begin{pmatrix} \alpha & w^{} \ w & B \end{pmatrix}, \qquad \alpha = a_{11} > 0,\ w \in \mathbb{F}^{n-1},\ B \in \mathbb{F}^{(n-1)\times(n-1)}. $$ Set and . Then a direct block multiplication gives $$ A = \begin{pmatrix} \sqrt{\alpha} & 0 \ s & I \end{pmatrix} \begin{pmatrix} 1 & 0 \ 0 & S \end{pmatrix} \begin{pmatrix} \sqrt{\alpha} & s^{} \ 0 & I \end{pmatrix}, \qquad S = B - \frac{w w^{}}{\alpha}, $$ where is the Schur complement of in . The Schur complement of a positive-definite matrix is positive-definite: for nonzero , choosing $t = -w^{}u/\alphaA(t, u^{\top})^{\top}u^{} S u = (t, u^{}) A (t, u)^{\top} > 0S = R_S^{} R_SR_S$ upper triangular and positive diagonal. Assembling, $$ R = \begin{pmatrix} \sqrt{\alpha} & s^{} \ 0 & R_S \end{pmatrix} $$ is upper triangular with positive diagonal and satisfies . The converse is immediate: if with nonsingular then for , so is positive-definite.

(Uniqueness.) If with both factors upper triangular and positive-diagonal, then . The left side is upper triangular and the right side is lower triangular, so is diagonal; from and giving , the diagonal is unitary with positive diagonal, hence and .

(No growth.) The active submatrix after eliminating columns is the Schur complement , which is positive-definite by the argument above. For a positive-definite matrix every entry is bounded by the diagonal: by positivity of the principal minor, so . Moreover the diagonal cannot grow, since , so for all . Hence no intermediate entry exceeds , giving .

(Backward stability.) Apply the standard model to the recurrences. The computed entries satisfy and analogously for , each rounding step contributing a relative perturbation of size . Collecting these by the product-of-factors lemma of 43.01.01, the computed factor exactly factors a perturbed matrix, , where each entry accumulates at most rounding terms, each bounded by entrywise with . Because , the entries of are bounded by , so with no growth-factor inflation, giving . The triangular backward-error result of 43.01.03 adds a perturbation of the same order to the two solves, combining into a single of the stated size.

Bridge. This theorem is the foundational reason the symmetric positive-definite solve is the safest dense factorization in numerical linear algebra: the growth factor that made 43.03.01 only conditionally stable is here bounded a priori by , so the master inequality of 43.01.03 specializes with the algorithm's contribution removed entirely. This is exactly the structure of the GEPP backward-error theorem with — the single quantity that could go wrong there — pinned to its best possible value by positive-definiteness, and the central insight is that the Schur complement of an SPD matrix is again SPD, so the diagonal-dominance-of-self propagates through every step. The result builds toward the conjugate gradient method 43.07.04, whose energy-norm minimization rests on the same SPD structure and whose preconditioners are built from incomplete Cholesky factors, and it appears again in the least-squares comparison 43.04.01, where the normal-equations route solves by Cholesky and pays for its speed with a squared condition number. Putting these together, where general elimination gambles on the growth factor and wins in expectation, Cholesky generalises that gamble into a certainty for the positive-definite class, and the bridge is that the same positivity that defines the matrix is what bounds its rounding.

Exercises Intermediate+

Advanced results Master

Theorem 1 (Cholesky, , and the symmetric reduction). For Hermitian positive-definite , the Cholesky factorization , the root-free symmetric factorization with unit lower triangular and diagonal positive, and the symmetric Gaussian elimination of 43.03.01 all coincide as descriptions of the same data: , , and is the elimination multiplier. The form avoids the square roots of Cholesky and is preferred when square roots are costly or when one wants to defer the positive-definiteness test to the signs of the ; for a general symmetric indefinite matrix the diagonal is replaced by a block-diagonal with and pivots, the Bunch-Kaufman factorization forward-pointed in 43.03.08, whose symmetric pivoting controls growth without destroying symmetry [Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.)].

Theorem 2 (unconditional backward stability, sharp form). Let be the Cholesky factor of SPD computed in floating-point arithmetic with unit roundoff , and let be the computed solution of from the two triangular sweeps. Then there is a perturbation with $$ (A + \Delta A)\tilde{x} = b, \qquad |\Delta A|2 \le \frac{c,n,\epsilon{\mathrm{mach}}}{1 - c,n,\epsilon_{\mathrm{mach}}},|A|_2, $$ for a modest constant , with no dependence on any growth factor or condition number. The result holds for every SPD matrix without exception, which is the precise sense in which Cholesky is unconditionally backward stable, in contrast to the only conditionally stable GEPP of 43.03.01 whose bound carries the that can in principle reach . A practical corollary is that for SPD systems one need never inspect a growth factor: the factorization is guaranteed safe by the input class alone [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Theorem 3 (Cholesky as a definiteness test and the semidefinite boundary). Running the Cholesky algorithm on a Hermitian matrix is a backward-stable test of positive-definiteness: the algorithm completes with all radicands positive if and only if is positive-definite, and the first nonpositive radicand localizes the failure of Sylvester's criterion to a specific leading minor. On the boundary, a positive semidefinite of rank admits a Cholesky factorization with a pivoted permutation where has exactly nonzero rows; symmetric pivoting on the largest remaining diagonal entry makes this rank-revealing and numerically stable, and the trailing zero pivots expose the null space. The semidefinite case is the bridge from the strict SPD solve to rank-deficient least squares and to the low-rank approximation of covariance matrices [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)].

Theorem 4 (the normal-equations route and its conditioning cost). For a full-column-rank (), the least-squares problem has the normal equations with SPD coefficient matrix , solvable by Cholesky in operations — the fastest of the three classical least-squares solvers. The price is conditioning: , so forming and factoring can lose twice as many digits as the orthogonal QR or SVD routes. Cholesky on the normal equations is the method of choice when is well-conditioned and speed matters, and the squared-conditioning warning is exactly the dividing line at which QR replaces it, the comparison assembled in 43.04.01 [Trefethen, L. N. & Bau, D. — Numerical Linear Algebra].

Synthesis. The Cholesky factorization is the foundational reason the symmetric positive-definite solve is treated as the gold standard among direct methods: the single scalar that made 43.03.01 only conditionally trustworthy, the growth factor , is here pinned to a priori by the positive-definite structure, so the master inequality of 43.01.03 specializes with the algorithm's risk term removed and only the problem's conditioning of 43.01.02 surviving. This is exactly the structure of the GEPP backward-error theorem with its worst case dissolved, and the central insight — that the Schur complement of an SPD matrix is again SPD, with its diagonal nonincreasing — is what propagates the entry bound through every step. The factorization generalises in two directions that organize the rest of the chapter: dropping square roots gives the form and, for the indefinite case, the block Bunch-Kaufman pivoting of 43.03.08; relaxing strict definiteness gives the rank-revealing pivoted Cholesky of the semidefinite boundary. It is dual to the iterative world, where the same SPD hypothesis powers the conjugate gradient method 43.07.04 and where incomplete Cholesky supplies its standard preconditioner; and it feeds the least-squares comparison 43.04.01, where the normal-equations route trades a squared condition number for Cholesky's speed. Putting these together, where general elimination gambles on the growth factor and wins in expectation, the positive-definite class converts that gamble into a theorem, and the bridge is that the very positivity defining the matrix is what bounds the rounding committed in factoring it.

Full proof set Master

Proposition 1 (existence via the Schur complement recursion). Let be Hermitian positive-definite. Then $A = R^{}RR$ with positive diagonal, constructed recursively through Schur complements.*

Proof. Induct on . The base case gives and . For partition with . Define , the Schur complement. For nonzero , set and ; expanding and substituting gives , which is positive because is positive-definite; thus is positive-definite (and Hermitian, inherited from ). By the inductive hypothesis with upper triangular, positive diagonal. Put . Then $$ R^{}R = \begin{pmatrix} \alpha & w^{} \ w & ww^{}/\alpha + R_S^{}R_S \end{pmatrix} = \begin{pmatrix} \alpha & w^{} \ w & ww^{}/\alpha + S \end{pmatrix} = \begin{pmatrix} \alpha & w^{*} \ w & B \end{pmatrix} = A, $$ and is upper triangular with positive diagonal .

Proposition 2 (uniqueness from positive-diagonal normalization). The factor of Proposition 1 is unique among upper-triangular matrices with positive diagonal.

Proof. Suppose with both factors upper triangular and positive-diagonal, hence nonsingular. Then . The left side is upper triangular (product of upper-triangular matrices and an upper-triangular inverse); the right side is lower triangular. A matrix simultaneously upper and lower triangular is diagonal, so is diagonal. From and we get , and cancelling and gives , so is unitary. A diagonal unitary matrix has unit-modulus entries; but the -th diagonal entry of is , a quotient of positive reals, hence a positive real of modulus , namely . Thus and .

Proposition 3 (no growth: ). In the Cholesky reduction of an SPD matrix, every entry of every active submatrix is bounded by , so the growth factor satisfies .

Proof. The active submatrix after steps is the Schur complement , positive-definite by Proposition 1. For any positive-definite , the principal minor on indices is positive: , so , and the diagonal terms obey the same bound directly; hence every entry of is at most . The diagonal does not grow: the Cholesky update gives , so inductively for all . Therefore no intermediate entry exceeds , giving .

Proposition 4 (backward stability of the SPD solve). Let be the computed Cholesky factor and the computed solution of . Then with , no growth factor present.

Proof. The standard model applied to the Cholesky recurrences gives, for each computed entry, equal to its exact counterpart times with , by collecting the inner-product, division, and square-root roundings through the product-of-factors lemma of 43.01.01. Summing the perturbations over the at-most- steps touching entry yields with entrywise. By Proposition 3 the entries of are bounded by , so , giving with no growth-factor inflation — the contrast with the GEPP bound, whose analogous step carries . Solving and by substitution, the triangular backward-error result of 43.01.03 gives and with ; eliminating and substituting gives with . Each added term is by the same entry bound, and dropping the second-order , .

Connections Master

  • Gaussian elimination, LU factorization, and its stability 43.03.01 is the general factorization that Cholesky specializes and improves: the symmetric reduction is the LU factorization of that unit with , and the growth factor that made GEPP only conditionally stable there is bounded a priori by here. That unit owns the general direct-solver theory and the growth-factor concept; this unit shows that positive-definiteness removes the gamble, needs no pivoting, and halves the cost — the canonical illustration of how matrix structure buys both speed and unconditional stability.

  • Quadratic forms and positive-definiteness 01.01.15 supplies the defining hypothesis and, through Sylvester's criterion, the positivity of all leading principal minors that makes every Cholesky pivot positive and every square root real. The Cholesky factorization is the algorithmic shadow of that unit's classification of positive-definite forms: where it characterizes positivity abstractly by minors or eigenvalue signs, this unit turns the characterization into a constructive test that also produces the factor.

  • The spectral theorem 01.01.13 underlies the eigenvalue characterization that governs the conditioning of the SPD solve and identifies and with the extreme eigenvalues. Cholesky and the spectral decomposition are two factorizations of the same SPD matrix — one triangular and cheap, one orthogonal and diagonalizing — and the connection is that positive-definiteness, defined spectrally as all-positive eigenvalues in that unit, is exactly what guarantees the triangular factor of this unit exists with real positive diagonal.

  • The conjugate gradient method 43.07.04 is the iterative counterpart that shares the SPD hypothesis: where this unit solves by a direct factorization, conjugate gradient minimizes the -norm of the error over Krylov subspaces, and its standard preconditioners are built from incomplete Cholesky factors that approximate the of this unit while preserving sparsity. The two methods divide the SPD solver landscape between dense-direct and sparse-iterative, joined by the shared positive-definite structure.

  • Least squares: normal equations vs QR vs SVD 43.04.01 uses Cholesky as the engine of its fastest route: the normal equations have an SPD coefficient matrix solved by the factorization of this unit, paying for its speed with the squared condition number . This unit's unconditional stability is what makes the normal-equations Cholesky safe given a well-conditioned ; the squared conditioning is precisely the warning that sends ill-conditioned least-squares problems to the orthogonal methods compared there.

Historical & philosophical context Master

The factorization is named for André-Louis Cholesky (1875–1918), a French artillery officer and geodesist who devised the symmetric square-root method to solve the normal equations arising in the least-squares adjustment of triangulation networks. He did not publish it; the method became known only after his death in the First World War, through a posthumous note by Commandant Benoit in the Bulletin géodésique (1924) [Cholesky, A.-L. — Sur la résolution numérique des systèmes d'équations linéaires], and circulated for decades primarily in the geodetic and structural-engineering communities before entering the mainstream numerical-analysis literature.

The modern stability understanding is twentieth-century. The recognition that Cholesky is exceptionally well-behaved — that the positive-definite structure bounds the growth factor a priori and so dispenses with the pivoting that general Gaussian elimination requires — was made precise in the backward-error framework of James H. Wilkinson and developed in componentwise form by Nicholas Higham, whose Accuracy and Stability of Numerical Algorithms (SIAM, 1996; 2nd ed. 2002) [Higham, N. J. — Accuracy and Stability of Numerical Algorithms (2nd ed.)] gives the definitive analysis of the unconditional backward stability of the SPD solve. The connection to the symmetric reduction and the indefinite generalization were systematized by James Bunch and Beresford Parlett (1971) and refined by Bunch and Linda Kaufman (1977), placing Cholesky as the definite endpoint of a family of symmetry-preserving factorizations.

Bibliography Master

@book{trefethenbau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {1997}
}

@book{golubvanloan2013,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@book{higham2002accuracy,
  author    = {Higham, Nicholas J.},
  title     = {Accuracy and Stability of Numerical Algorithms},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2002}
}

@article{benoit1924cholesky,
  author  = {Benoit, Commandant},
  title   = {Note sur une méthode de résolution des équations normales provenant de l'application de la méthode des moindres carrés (procédé du Commandant Cholesky)},
  journal = {Bulletin géodésique},
  volume  = {2},
  year    = {1924},
  pages   = {67--77}
}

@article{bunchparlett1971,
  author  = {Bunch, James R. and Parlett, Beresford N.},
  title   = {Direct Methods for Solving Symmetric Indefinite Systems of Linear Equations},
  journal = {SIAM Journal on Numerical Analysis},
  volume  = {8},
  number  = {4},
  year    = {1971},
  pages   = {639--655}
}

@article{bunchkaufman1977,
  author  = {Bunch, James R. and Kaufman, Linda},
  title   = {Some Stable Methods for Calculating Inertia and Solving Symmetric Linear Systems},
  journal = {Mathematics of Computation},
  volume  = {31},
  number  = {137},
  year    = {1977},
  pages   = {163--179}
}