Preconditioning
Anchor (Master): Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 9-10 (the algebra of left/right/split preconditioning, preconditioned CG in the M-inner-product, ILU(0)/ILU(p)/ILUT incomplete factorisations and their existence for M-matrices, approximate-inverse preconditioners) and Ch. 13 (multigrid and domain decomposition as preconditioners); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 8-12 (preconditioners, incomplete Cholesky, the spectral-equivalence convergence theory, and multilevel/Schwarz methods); Elman-Silvester-Wathen 2014 *Finite Elements and Fast Iterative Solvers* 2e (Oxford) Ch. 2-4 (optimal and spectrally equivalent preconditioners for PDE operators)
Intuition Beginner
The Krylov solvers of the last few units — conjugate gradient for symmetric problems, GMRES for general ones — are fast only when the matrix is well behaved. A well-behaved matrix is one whose stretching is roughly the same in every direction, so the bowl you are minimising is nearly round. When the matrix stretches space far more in one direction than another, the bowl is a long thin canyon, and even a clever solver needs many steps to crawl along it.
Preconditioning fixes the matrix before the solver ever runs. The idea is to find a second matrix, call it , that is close to the original but much cheaper to undo. You then hand the solver the reweighted problem instead of the raw one. If really is close to , the reweighted matrix is close to doing nothing at all, so the solver sees an almost-round bowl and finishes in a handful of steps.
There is a balance to strike. A preconditioner that exactly equals would make the reweighted problem perfectly round, but undoing it would be as hard as solving the original system, so nothing is gained. A preconditioner that is too crude is cheap to undo but barely rounds the bowl, so the step count stays high. The art is to pick an that is cheap to apply and still close enough to to round the problem out.
This single trick is what turns the Krylov methods from textbook curiosities into the workhorses behind weather models, structural engineering, and circuit simulation. The solver supplies the engine; the preconditioner supplies the smooth road.
Visual Beginner
The picture is the same long thin canyon seen twice: once as the raw problem the solver dreads, and once after preconditioning has reshaped it into a round bowl the solver finishes quickly.
Read the table top to bottom. The left column is the raw system, where the matrix stretches one direction far more than another and the solver takes many steps. The right column is the preconditioned system, where the reshaped matrix stretches every direction about equally and the same solver finishes fast.
| feature | raw system | preconditioned |
|---|---|---|
| shape of the bowl | long thin canyon | nearly round |
| spread of stretching | very uneven | nearly equal |
| solver steps needed | many | few |
| cost per step | one multiply by | one multiply by plus one undo of |
The takeaway: a Krylov solver is fast on round problems and slow on stretched ones. Preconditioning multiplies in a cheap approximate undo of the matrix, reshaping the stretched problem into a round one before the solver starts. The extra work is one undo of per step; the payoff is far fewer steps, and on hard problems that trade is hugely in your favour.
Worked example Beginner
Take a diagonal stand-in for a stretched matrix, $$ A = \begin{pmatrix} 100 & 0 \ 0 & 1 \end{pmatrix}, \qquad b = \begin{pmatrix} 100 \ 1 \end{pmatrix}, $$ whose exact answer is . The matrix stretches the first direction times more than the second, so its condition number — the ratio of the largest stretch to the smallest — is . A Krylov solver is slow when this ratio is large.
Choose the simplest preconditioner: the diagonal of itself, , which here equals . Undoing is one division per entry, very cheap. The reshaped matrix is $$ M^{-1}A = \begin{pmatrix} 1/100 & 0 \ 0 & 1 \end{pmatrix}\begin{pmatrix} 100 & 0 \ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \ 0 & 1 \end{pmatrix}. $$ The reshaped matrix is the identity. Its condition number is , the best possible, down from . The reshaped right-hand side is , and the reshaped problem is solved in one step: , the exact answer.
What this tells us: dividing each row by its own diagonal entry — the Jacobi preconditioner — collapsed a condition number of down to , so the solver finished immediately instead of crawling. The undo cost just two divisions. On a real matrix the diagonal is only an approximation, so the reshaped condition number will not fall all the way to , but the same mechanism shrinks it and speeds the solver up.
Check your understanding Beginner
Formal definition Intermediate+
Let () be nonsingular and . A preconditioner is a nonsingular matrix , chosen so that (i) systems are cheap to solve relative to , and (ii) the preconditioned operator is better conditioned or has a more clustered spectrum than . One never forms explicitly; "applying " means solving one system with .
Left, right, and split preconditioning. There are three algebraically equivalent placements of [Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)]: $$ \underbrace{M^{-1}A,x = M^{-1}b}{\text{left}}, \qquad \underbrace{A M^{-1}u = b,\ \ x = M^{-1}u}{\text{right}}, \qquad \underbrace{L^{-1} A L^{-},w = L^{-1}b,\ \ x = L^{-}w}_{\text{split, } M = L L^}. $$ Left preconditioning changes the residual the Krylov method sees from to ; right preconditioning leaves the true residual unchanged and is preferred when a residual-based stopping test must track . Split preconditioning, available when $M = LL^M^{-1}L^{-1}AL^{-*}A$ — the requirement that makes conjugate gradient applicable.
The spectrum-clustering goal. The convergence theory of 43.07.04 and 43.07.03 shows that the Krylov error after steps is governed by a min-max polynomial on the spectrum of the operator the method runs on: for CG the energy-norm error obeys . A degree- polynomial pinned to is small on the spectrum exactly when the eigenvalues lie in a few tight clusters, or in a short interval bounded away from . Preconditioning is the deliberate construction of an for which is so distributed — ideally clustered near , since when . The relevant figure of merit is the condition number for symmetric problems (replacing in the Chebyshev bound) and the spectral/field-of-values distribution of for nonsymmetric ones.
Preconditioned conjugate gradient (PCG). When and , is generally not symmetric, yet CG can still be applied — by running it in the -inner product , in which is self-adjoint. The resulting algorithm applies exactly once per iteration, to the residual, producing the preconditioned residual :
$$
\alpha_k = \frac{r_k^* z_k}{p_k^* A p_k},\quad
x_{k+1} = x_k + \alpha_k p_k,\quad
r_{k+1} = r_k - \alpha_k A p_k,\quad
z_{k+1} = M^{-1}r_{k+1},\quad
\beta_k = \frac{r_{k+1}^* z_{k+1}}{r_k^* z_k},\quad
p_{k+1} = z_{k+1} + \beta_k p_k.
$$
This is the CG recurrence of 43.07.04 with every Euclidean residual inner product replaced by . Preconditioned GMRES is simpler: it runs the GMRES of 43.07.03 on (left) or (right) directly, since GMRES needs no symmetry.
Classical and modern preconditioners. The Jacobi preconditioner is , undone by one division per entry. The Gauss-Seidel and symmetric SOR (SSOR) preconditioners are built from the stationary splittings of 43.07.01: SSOR uses up to scaling, a Hermitian positive-definite factorisation suitable for PCG. Incomplete LU (ILU) and incomplete Cholesky (IC) compute approximate factors by performing Gaussian elimination but discarding fill-in outside a prescribed sparsity pattern, so keeps 's sparsity. Multigrid, domain decomposition, and sparse approximate inverse preconditioners are treated at high level in the Advanced results.
Counterexamples to common slips
- The preconditioned operator is not symmetric in the Euclidean sense. For and the matrix is self-adjoint in the -inner product, not in the standard one. PCG works because it tacitly runs in ; applying ordinary CG to the explicitly formed would be inconsistent.
- Left and right preconditioning are not interchangeable for stopping. They generate the same iterates in exact arithmetic up to the change of variable, but the residual a left-preconditioned method monitors is , not . A stopping test calibrated to the true residual must use right preconditioning or unwind the preconditioned residual.
- must be definite to precondition CG. A poor indefinite can destroy the positive-definiteness that CG requires, causing the denominator or the inner product to lose its sign. SSOR and incomplete Cholesky are chosen precisely to keep .
- A clustered spectrum, not a small norm, is the goal. Making small is sufficient but not necessary; what the polynomial bound rewards is a few tight eigenvalue clusters. A preconditioner that leaves a handful of outliers but clusters the bulk can still give fast convergence, since the residual polynomial spends a few roots on the outliers.
Key theorem with proof Intermediate+
The signature result is that preconditioned conjugate gradient is exactly ordinary conjugate gradient applied to the symmetrically split operator , so the whole convergence theory of 43.07.04 transfers verbatim with replaced by ; and that the split, left, and -inner-product forms produce the same iterates.
Theorem (PCG equivalence and the preconditioned Chebyshev bound). Let and with a Hermitian factorisation $M = LL^\hat A = L^{-1}AL^{-}\hat b = L^{-1}b\hat x = L^x\hat A \succ 0M^{-1}A\sigma(\hat A) = \sigma(M^{-1}A)\hat x_k\hat A\hat x = \hat bx_k = L^{-}\hat x_k$, exactly to the PCG iterates of the Formal definition. Consequently $$ |x_k - x_\star|A \le 2\left(\frac{\sqrt{\hat\kappa} - 1}{\sqrt{\hat\kappa} + 1}\right)^{k}|x_0 - x\star|A, \qquad \hat\kappa = \kappa(M^{-1}A) = \frac{\lambda{\max}(M^{-1}A)}{\lambda_{\min}(M^{-1}A)}. $$ [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]
Proof. First, is Hermitian since , and for , because and is nonsingular, so . The similarity shows and are similar, hence share a spectrum, which is real and positive because .
Now run ordinary CG of 43.07.04 on from . Its quantities are , , with the recurrences , , , , . Introduce the untransformed variables , , , and . Then , so . Likewise , giving . The iterate update becomes on multiplying by ; the residual update becomes, on multiplying by , . The coefficient , and becomes — precisely after multiplying by and using . These are exactly the PCG recurrences, so the iterates coincide.
Finally, ordinary CG on obeys the Chebyshev bound of 43.07.04 in the -norm: with . The -norm pulls back to the -norm: , using . Substituting gives the stated bound.
Bridge. This equivalence is the foundational reason preconditioning costs nothing in theory and everything in practice: it builds toward the convergence and design theory of the Advanced results by showing that the entire CG apparatus of 43.07.04 — the energy-norm optimality, the conjugacy, the residual-polynomial reformulation — survives intact under preconditioning, with simply overwritten by . This is exactly the spectrum-reshaping principle made quantitative: the Chebyshev bound that turned into now runs on the preconditioned condition number, so the central insight is that a preconditioner's only job is to shrink or cluster , and the count generalises the count of unpreconditioned CG. The split form is dual to the -inner-product form: one symmetrises by changing the operator, the other by changing the geometry, and both yield identical iterates. The same construction appears again in 43.07.03, where preconditioned GMRES runs the Arnoldi-based minimisation on or and the field of values of the preconditioned operator replaces that of . Putting these together, the bridge is that preconditioning is a similarity transform chosen to improve a spectrum, executed implicitly so that the iterate sequence is exactly that of the unpreconditioned method on the better-conditioned operator.
Exercises Intermediate+
Advanced results Master
Theorem 1 (spectral equivalence and mesh-independent convergence). Let be a family of SPD matrices arising from discretising an elliptic operator on a mesh of size , with as . A preconditioner family is spectrally equivalent *to if there are constants , independent of , with for all . Then uniformly in , and PCG converges to a fixed tolerance in a number of iterations bounded independently of .* The bound is the condition-number estimate of Exercise 4 made uniform; a preconditioner achieving it is called optimal, and multigrid and certain domain-decomposition preconditioners achieve it for the model elliptic problems [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)].
Theorem 2 (existence of incomplete factorisations for M-matrices). Let be a symmetric M-matrix (symmetric positive-definite with nonpositive off-diagonal entries), and fix a sparsity pattern . The incomplete Cholesky factorisation $A = \bar L\bar L^ - RP\bar LM = \bar L\bar L^$ is symmetric positive-definite. The Meijerink-van der Vorst theorem guarantees every pivot stays positive during incomplete elimination because dropping a nonpositive fill entry only increases the remaining diagonal of an M-matrix; the same argument gives ILU() existence for nonsymmetric M-matrices. This is the theoretical license for the ICCG method that made preconditioned CG the standard sparse SPD solver [Meijerink, J. A. & van der Vorst, H. A. — An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix].
Theorem 3 (incomplete-factorisation hierarchy and fill control). ILU() retains exactly the sparsity of , discarding all fill; ILU() retains fill of level at most , where the level of an entry records how many elimination steps generated it; ILUT() drops entries below a relative threshold and keeps at most per row. Increasing or lowering makes a more accurate approximation of , clustering more tightly near and reducing the iteration count, at the cost of denser factors and more expensive applies. The hierarchy makes explicit the central tradeoff: the preconditioner's quality (how clustered it renders the spectrum) is bought with its construction and per-application cost, and the optimum minimises total time, not iteration count [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].
Theorem 4 (multigrid, domain decomposition, and sparse approximate inverse as preconditioners). Three modern families supply preconditioners whose applies are matrix-free or embarrassingly parallel. Multigrid uses a hierarchy of grids with smoothing (a stationary iteration of 43.07.01) on each level and coarse-grid correction; one V-cycle is a linear operator that, for the model Poisson problem, is spectrally equivalent to , giving an optimal preconditioner. Domain decomposition (additive Schwarz) splits the domain into overlapping subdomains, solves local problems, and sums the corrections: , with a coarse space added to bound the condition number independently of the number of subdomains. Sparse approximate inverse (SPAI) computes a sparse directly by minimising column by column, a set of independent small least-squares problems of 43.04.01, yielding a preconditioner whose apply is a single sparse matrix-vector product and so is fully parallel. Each trades a different resource — grid hierarchy, subdomain solves, or up-front least-squares work — for a clustered preconditioned spectrum [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].
Synthesis. Preconditioning is one operation — replace by a spectrally improved similar operator that is implicit, never formed — viewed under one invariant, the spectrum of the operator the Krylov method actually runs on, and the foundational reason it works is that the residual-polynomial convergence theory of 43.07.04 and 43.07.03 depends on only through that spectrum, so reshaping it is the entire lever. This is exactly the Chebyshev and min-max polynomial machinery of those units read backward: where CG turned into and rewarded clustered spectra, preconditioning manufactures the small or the tight clusters that the polynomial bound rewards, and the central insight is that the iteration count of preconditioned CG generalises the count of plain CG with overwritten by the preconditioned operator. The split preconditioner is dual to the -inner-product formulation — change the operator or change the geometry, identical iterates — and this duality is the bridge from the stationary splittings of 43.07.01, which were too weak to solve but are exactly strong enough to precondition: Jacobi, Gauss-Seidel, and SSOR all return here as , their spectral radius creeping to one as solvers being precisely the defect a preconditioner need not cure, only approximate.
Putting these together, the chapter closes a circle. The stationary methods of 43.07.01 supplied the splittings; the Krylov methods of 43.07.02, 43.07.03, and 43.07.04 supplied the optimal polynomial acceleration whose rate is set by the spectrum; and preconditioning fuses the two, using a cheap splitting or incomplete factorisation or multilevel operator to reshape the spectrum so the optimal acceleration needs only a handful of steps. The condition number of 43.01.02 is the quantity every member of the family manipulates, and the design space — Jacobi to incomplete Cholesky to multigrid — is a ladder of cost-versus-clustering tradeoffs, with the optimal preconditioner of Theorem 1 the one whose is bounded independently of the problem size while its apply stays cheap, converting the divergent iteration count of the raw discretised operator into a constant.
Full proof set Master
Proposition 1 (split symmetrisation preserves definiteness and spectrum). For and $M = LL^ \succ 0\hat A = L^{-1}AL^{-}M^{-1}A\sigma(M^{-1}A) = \sigma(\hat A) \subset (0,\infty)$.
Proof. , Hermitian. For , since and nonsingular, so . The similarity gives , real and positive by .
Proposition 2 (PCG equals CG on the split operator). The PCG recurrences of the Formal definition produce iterates $x_k = L^{-}\hat x_k\hat x_k\hat A\hat x = \hat b\hat b = L^{-1}b\hat x_0 = L^x_0$.
Proof. This is the Key theorem's computation: substituting , , into the CG recurrences for reproduces the PCG coefficients , and the updates , , , using and . The map is the change of variable.
Proposition 3 (preconditioned Chebyshev bound). Under the hypotheses of Proposition 1, with .
Proof. CG on obeys the Chebyshev bound of 43.07.04 in the -norm with (Proposition 1). The norm identity transports the bound from the -norm to the -norm via Proposition 2.
Proposition 4 (spectral-equivalence condition-number bound). *If for all with , , then .*
Proof. The eigenvalues of are the stationary values of the generalised Rayleigh quotient (the pencil ). The hypothesis bounds , so by Courant-Fischer 01.01.14 for the pencil, and ; with (Proposition 1) this gives .
Proposition 5 (low-rank preconditioner gives finite GMRES termination). If with and nonsingular, preconditioned GMRES reaches the exact solution in at most steps in exact arithmetic.
Proof. has as an eigenvalue of multiplicity , so has at most distinct eigenvalues, all nonzero ( nonsingular). Let () be the distinct eigenvalues and , — well defined since . Then , , and vanishes on , so on a diagonalisable operator. The residual-polynomial bound of 43.07.03 gives , so GMRES terminates by step .
Proposition 6 (incomplete Cholesky exists for symmetric M-matrices). Let be a symmetric M-matrix and a sparsity pattern containing the diagonal. The incomplete Cholesky factorisation with fill restricted to produces with positive diagonal, and $M = \bar L\bar L^ \succ 0$.*
Proof sketch. Incomplete Cholesky performs the recursion but sets to zero any update at a position outside . For an M-matrix the off-diagonal entries are and the Schur complements of the exact factorisation stay M-matrices with positive diagonal (the exact pivots are positive by SPD-ness). Dropping a fill entry at position removes a nonpositive contribution from an off-diagonal and a corresponding nonnegative amount from the diagonal of the remaining Schur complement, so each incomplete pivot is at least as large as the corresponding exact pivot, hence remains positive. The induction (Meijerink-van der Vorst) shows every diagonal of is real and positive, so is nonsingular and is Hermitian positive-definite, with for .
Connections Master
The conjugate gradient method
43.07.04is the solver preconditioning most directly serves: preconditioned CG is, by the Key theorem, ordinary CG run on the symmetrically split operator or equivalently in the -inner product, so the energy-norm optimality, the conjugacy of search directions, and the Chebyshev convergence bound all carry over with replaced by . The clustered-spectrum acceleration that unit identifies — that the count of distinct or well-separated eigenvalues governs the true rate — is exactly what a preconditioner engineers, which is why incomplete-Cholesky-preconditioned CG (ICCG) is the standard solver for the sparse SPD systems of discretised elliptic PDEs.GMRES
43.07.03is the nonsymmetric counterpart: preconditioned GMRES runs the Arnoldi-based minimal-residual minimisation on (left) or (right), and the diagonalisable convergence bound and field-of-values analysis of that unit apply to the preconditioned operator. The any-curve theorem there warns that for non-normal the spectrum alone need not predict convergence, so preconditioner design for GMRES targets the field of values or pseudospectrum, not merely the eigenvalues; the flexible variant FGMRES accommodates a preconditioner that changes from step to step, such as an inner Krylov solve.Stationary iterative methods: Jacobi, Gauss-Seidel, SOR
43.07.01supply the classical preconditioners: the splitting that was a slow solver on its own — its iteration matrix having spectral radius creeping to one as the grid refines — becomes the preconditioner whose only job is to approximate cheaply, not to converge alone. Jacobi (), Gauss-Seidel, and the symmetric SSOR factorisation built from all reappear here, and the SSOR construction yields the symmetric positive-definite that preconditioned CG requires, fusing the stationary and Krylov families that the chapter developed separately.Conditioning and condition numbers of problems
43.01.02furnishes the figure of merit the whole subject optimises: preconditioning is the deliberate manipulation of the condition number , and the spectral-equivalence theory measures a preconditioner by whether is bounded independently of the discretisation parameter. The least-squares machinery of43.04.01underlies the sparse-approximate-inverse preconditioner, whose columns solve independent small least-squares problems .
Historical & philosophical context Master
The idea of transforming a linear system to improve the convergence of an iteration predates the Krylov methods: David Young's 1950 thesis on successive over-relaxation, and the splitting framework of Richard Varga's 1962 Matrix Iterative Analysis, established that the spectral radius of controls a stationary iteration, which is the conceptual seed of using as a preconditioner rather than as a standalone solver. The word preconditioning in its modern sense, and the recognition that it should be paired with conjugate gradients rather than with a stationary iteration, crystallised in the 1970s.
The decisive step was the 1977 paper of Johannes Meijerink and Henk van der Vorst, which introduced the incomplete Cholesky factorisation as a preconditioner, proved its existence and stability for symmetric M-matrices, and demonstrated that incomplete-Cholesky-preconditioned conjugate gradient (ICCG) solved the sparse SPD systems of discretised PDEs far faster than any stationary method [Meijerink, J. A. & van der Vorst, H. A. — An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix]. ICCG converted conjugate gradients from a finite direct method that had disappointed in finite-precision arithmetic — the revival narrative of 43.07.04 — into the dominant iterative solver for elliptic problems. The multigrid method of Achi Brandt (1977), originally a standalone solver, was soon recognised as a spectrally optimal preconditioner; the additive-Schwarz domain-decomposition framework was placed on a rigorous condition-number footing through the 1980s and 1990s by Olof Widlund, Maksymilian Dryja, and collaborators. Yousef Saad's textbook consolidated the algebra of left, right, and split preconditioning, the ILU hierarchy, and approximate-inverse methods into the unified account standard today [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)], and Anne Greenbaum supplied the spectral-equivalence convergence theory that distinguishes optimal from merely useful preconditioners [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)].
Bibliography Master
@article{MeijerinkVanDerVorst1977,
author = {Meijerink, J. A. and van der Vorst, H. A.},
title = {An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric {M}-Matrix},
journal = {Mathematics of Computation},
volume = {31},
number = {137},
year = {1977},
pages = {148--162}
}
@book{Saad2003precond,
author = {Saad, Yousef},
title = {Iterative Methods for Sparse Linear Systems},
edition = {2},
publisher = {Society for Industrial and Applied Mathematics},
year = {2003}
}
@book{Greenbaum1997precond,
author = {Greenbaum, Anne},
title = {Iterative Methods for Solving Linear Systems},
series = {Frontiers in Applied Mathematics},
publisher = {SIAM},
year = {1997}
}
@book{Varga1962,
author = {Varga, Richard S.},
title = {Matrix Iterative Analysis},
publisher = {Prentice-Hall},
year = {1962}
}
@article{Brandt1977,
author = {Brandt, Achi},
title = {Multi-Level Adaptive Solutions to Boundary-Value Problems},
journal = {Mathematics of Computation},
volume = {31},
number = {138},
year = {1977},
pages = {333--390}
}
@book{ElmanSilvesterWathen2014,
author = {Elman, Howard C. and Silvester, David J. and Wathen, Andrew J.},
title = {Finite Elements and Fast Iterative Solvers},
edition = {2},
publisher = {Oxford University Press},
year = {2014}
}
@article{Benzi2002,
author = {Benzi, Michele},
title = {Preconditioning Techniques for Large Linear Systems: A Survey},
journal = {Journal of Computational Physics},
volume = {182},
number = {2},
year = {2002},
pages = {418--477}
}
@book{TrefethenBau1997precond,
author = {Trefethen, Lloyd N. and Bau, David},
title = {Numerical Linear Algebra},
publisher = {SIAM},
address = {Philadelphia},
year = {1997}
}