43.07.05 · numerical-analysis / 07-iterative-krylov-methods

Preconditioning

shipped3 tiersLean: none

Anchor (Master): Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 9-10 (the algebra of left/right/split preconditioning, preconditioned CG in the M-inner-product, ILU(0)/ILU(p)/ILUT incomplete factorisations and their existence for M-matrices, approximate-inverse preconditioners) and Ch. 13 (multigrid and domain decomposition as preconditioners); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 8-12 (preconditioners, incomplete Cholesky, the spectral-equivalence convergence theory, and multilevel/Schwarz methods); Elman-Silvester-Wathen 2014 *Finite Elements and Fast Iterative Solvers* 2e (Oxford) Ch. 2-4 (optimal and spectrally equivalent preconditioners for PDE operators)

Intuition Beginner

The Krylov solvers of the last few units — conjugate gradient for symmetric problems, GMRES for general ones — are fast only when the matrix is well behaved. A well-behaved matrix is one whose stretching is roughly the same in every direction, so the bowl you are minimising is nearly round. When the matrix stretches space far more in one direction than another, the bowl is a long thin canyon, and even a clever solver needs many steps to crawl along it.

Preconditioning fixes the matrix before the solver ever runs. The idea is to find a second matrix, call it $M$ , that is close to the original $A$ but much cheaper to undo. You then hand the solver the reweighted problem instead of the raw one. If $M$ really is close to $A$ , the reweighted matrix is close to doing nothing at all, so the solver sees an almost-round bowl and finishes in a handful of steps.

There is a balance to strike. A preconditioner that exactly equals $A$ would make the reweighted problem perfectly round, but undoing it would be as hard as solving the original system, so nothing is gained. A preconditioner that is too crude is cheap to undo but barely rounds the bowl, so the step count stays high. The art is to pick an $M$ that is cheap to apply and still close enough to $A$ to round the problem out.

This single trick is what turns the Krylov methods from textbook curiosities into the workhorses behind weather models, structural engineering, and circuit simulation. The solver supplies the engine; the preconditioner supplies the smooth road.

Visual Beginner

The picture is the same long thin canyon seen twice: once as the raw problem the solver dreads, and once after preconditioning has reshaped it into a round bowl the solver finishes quickly.

Read the table top to bottom. The left column is the raw system, where the matrix stretches one direction far more than another and the solver takes many steps. The right column is the preconditioned system, where the reshaped matrix stretches every direction about equally and the same solver finishes fast.

feature	raw system $A x = b$	preconditioned $M^{- 1} A x = M^{- 1} b$
shape of the bowl	long thin canyon	nearly round
spread of stretching	very uneven	nearly equal
solver steps needed	many	few
cost per step	one multiply by $A$	one multiply by $A$ plus one undo of $M$

The takeaway: a Krylov solver is fast on round problems and slow on stretched ones. Preconditioning multiplies in a cheap approximate undo of the matrix, reshaping the stretched problem into a round one before the solver starts. The extra work is one undo of $M$ per step; the payoff is far fewer steps, and on hard problems that trade is hugely in your favour.

Worked example Beginner

Take a diagonal stand-in for a stretched matrix, $$ A = \begin{pmatrix} 100 & 0 \ 0 & 1 \end{pmatrix}, \qquad b = \begin{pmatrix} 100 \ 1 \end{pmatrix}, $$ whose exact answer is $x = (1, 1)$ . The matrix stretches the first direction $100$ times more than the second, so its condition number — the ratio of the largest stretch to the smallest — is $100/1 = 100$ . A Krylov solver is slow when this ratio is large.

Choose the simplest preconditioner: the diagonal of $A$ itself, $M = (100001)$ , which here equals $A$ . Undoing $M$ is one division per entry, very cheap. The reshaped matrix is $$ M^{-1}A = \begin{pmatrix} 1/100 & 0 \ 0 & 1 \end{pmatrix}\begin{pmatrix} 100 & 0 \ 0 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 0 \ 0 & 1 \end{pmatrix}. $$ The reshaped matrix is the identity. Its condition number is $1/1 = 1$ , the best possible, down from $100$ . The reshaped right-hand side is $M^{- 1} b = (1, 1)$ , and the reshaped problem $I x = (1, 1)$ is solved in one step: $x = (1, 1)$ , the exact answer.

What this tells us: dividing each row by its own diagonal entry — the Jacobi preconditioner — collapsed a condition number of $100$ down to $1$ , so the solver finished immediately instead of crawling. The undo cost just two divisions. On a real matrix the diagonal is only an approximation, so the reshaped condition number will not fall all the way to $1$ , but the same mechanism shrinks it and speeds the solver up.

Check your understanding Beginner

Exercise (easy, multiple choice).

What is the job of a preconditioner $M$ in a Krylov method?

A. To replace the solver with a direct factorisation. B. To approximate $A$ while being cheap to undo, so the reshaped matrix is close to the identity and the solver finishes in fewer steps. C. To make the matrix larger so each step does more work. D. To guarantee the exact answer in one step for every matrix.

Hint

A good preconditioner rounds out the bowl without being expensive to apply.

Answer

B. Feedback-correct: correct; $M$ approximates $A$ but is cheap to undo, so $M^{- 1} A$ is close to the identity, the bowl is nearly round, and the solver converges in far fewer steps. Feedback-wrong: it does not replace the iterative solver with elimination (A); it reshapes rather than enlarges the problem (C); and only the special case $M = A$ would give a one-step answer, which is as costly as solving the original system (D).

Formal definition Intermediate+

Let $A \in F^{n \times n}$ ( $F \in {R, C}$ ) be nonsingular and $b \in F^{n}$ . A preconditioner is a nonsingular matrix $M$ , chosen so that (i) systems $M z = r$ are cheap to solve relative to $A x = b$ , and (ii) the preconditioned operator is better conditioned or has a more clustered spectrum than $A$ . One never forms $M^{- 1}$ explicitly; "applying $M^{- 1}$ " means solving one system with $M$ .

Left, right, and split preconditioning. There are three algebraically equivalent placements of $M$ ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)]}: $$ \underbrace{M^{-1}A,x = M^{-1}b}{\text{left}}, \qquad \underbrace{A M^{-1}u = b,\ \ x = M^{-1}u}{\text{right}}, \qquad \underbrace{L^{-1} A L^{-},w = L^{-1}b,\ \ x = L^{-}w}_{\text{split, } M = L L^}. $$ Left preconditioning changes the residual the Krylov method sees from $b - A x$ to $M^{- 1} (b - A x)$ ; right preconditioning leaves the true residual unchanged and is preferred when a residual-based stopping test must track $∥ b - A x ∥$ . Split preconditioning, available when $M = LL^ $i s a H er mi t ian p os i t i v e - d e f ini t e f a c t or i s a t i o n, a ppl i es ha l f o f$ M^{-1} $o n e a c h s i d eso t ha tt h e p r eco n d i t i o n e d o p er a t or$ L^{-1}AL^{-*} $inh er i t s t h esy mm e t r y an dd e f ini t e n esso f$ A$ — the requirement that makes conjugate gradient applicable.

The spectrum-clustering goal. The convergence theory of 43.07.04 and 43.07.03 shows that the Krylov error after $k$ steps is governed by a min-max polynomial on the spectrum of the operator the method runs on: for CG the energy-norm error obeys $∥ e_{k} ∥_{A} /∥ e_{0} ∥_{A} \leq min_{p (0) = 1, d e g p \leq k} max_{λ \in σ} ∣ p (λ) ∣$ . A degree- $k$ polynomial pinned to $p (0) = 1$ is small on the spectrum exactly when the eigenvalues lie in a few tight clusters, or in a short interval bounded away from $0$ . Preconditioning is the deliberate construction of an $M$ for which $σ (M^{- 1} A)$ is so distributed — ideally clustered near $1$ , since $M^{- 1} A \approx I$ when $M \approx A$ . The relevant figure of merit is the condition number $κ (M^{- 1} A)$ for symmetric problems (replacing $κ (A)$ in the Chebyshev bound) and the spectral/field-of-values distribution of $M^{- 1} A$ for nonsymmetric ones.

Preconditioned conjugate gradient (PCG). When $A ≻ 0$ and $M ≻ 0$ , $M^{- 1} A$ is generally not symmetric, yet CG can still be applied — by running it in the $M$ -inner product $⟨ u, v ⟩_{M} = u^{*} M v$ , in which $M^{- 1} A$ is self-adjoint. The resulting algorithm applies $M^{- 1}$ exactly once per iteration, to the residual, producing the preconditioned residual $z_{k} = M^{- 1} r_{k}$ : $$ \alpha_k = \frac{r_k^* z_k}{p_k^* A p_k},\quad x_{k+1} = x_k + \alpha_k p_k,\quad r_{k+1} = r_k - \alpha_k A p_k,\quad z_{k+1} = M^{-1}r_{k+1},\quad \beta_k = \frac{r_{k+1}^* z_{k+1}}{r_k^* z_k},\quad p_{k+1} = z_{k+1} + \beta_k p_k. $$ This is the CG recurrence of 43.07.04 with every Euclidean residual inner product $r^{*} r$ replaced by $r^{*} z = r^{*} M^{- 1} r$ . Preconditioned GMRES is simpler: it runs the GMRES of 43.07.03 on $M^{- 1} A$ (left) or $A M^{- 1}$ (right) directly, since GMRES needs no symmetry.

Classical and modern preconditioners. The Jacobi preconditioner is $M = D = diag (A)$ , undone by one division per entry. The Gauss-Seidel and symmetric SOR (SSOR) preconditioners are built from the stationary splittings $A = D - L - U$ of 43.07.01: SSOR uses $M = (D - ω L) D^{- 1} (D - ω U) / (2 - ω)$ up to scaling, a Hermitian positive-definite factorisation suitable for PCG. Incomplete LU (ILU) and incomplete Cholesky (IC) compute approximate factors $A \approx \overset{ˉ}{L} \overset{ˉ}{U}$ by performing Gaussian elimination but discarding fill-in outside a prescribed sparsity pattern, so $M = \overset{ˉ}{L} \overset{ˉ}{U}$ keeps $A$ 's sparsity. Multigrid, domain decomposition, and sparse approximate inverse preconditioners are treated at high level in the Advanced results.

Counterexamples to common slips

The preconditioned operator is not symmetric in the Euclidean sense. For $A ≻ 0$ and $M ≻ 0$ the matrix $M^{- 1} A$ is self-adjoint in the $M$ -inner product, not in the standard one. PCG works because it tacitly runs in $⟨ \cdot, \cdot ⟩_{M}$ ; applying ordinary CG to the explicitly formed $M^{- 1} A$ would be inconsistent.
Left and right preconditioning are not interchangeable for stopping. They generate the same iterates in exact arithmetic up to the change of variable, but the residual a left-preconditioned method monitors is $M^{- 1} (b - A x)$ , not $b - A x$ . A stopping test calibrated to the true residual must use right preconditioning or unwind the preconditioned residual.
$M$ must be definite to precondition CG. A poor indefinite $M$ can destroy the positive-definiteness that CG requires, causing the denominator $p_{k}^{*} A p_{k}$ or the inner product $r_{k}^{*} z_{k}$ to lose its sign. SSOR and incomplete Cholesky are chosen precisely to keep $M ≻ 0$ .
A clustered spectrum, not a small norm, is the goal. Making $∥ M^{- 1} A - I ∥$ small is sufficient but not necessary; what the polynomial bound rewards is a few tight eigenvalue clusters. A preconditioner that leaves a handful of outliers but clusters the bulk can still give fast convergence, since the residual polynomial spends a few roots on the outliers.

Key theorem with proof Intermediate+

The signature result is that preconditioned conjugate gradient is exactly ordinary conjugate gradient applied to the symmetrically split operator $\hat{A} = L^{- 1} A L^{-*}$ , so the whole convergence theory of 43.07.04 transfers verbatim with $κ (A)$ replaced by $κ (M^{- 1} A)$ ; and that the split, left, and $M$ -inner-product forms produce the same iterates.

Theorem (PCG equivalence and the preconditioned Chebyshev bound). Let $A ≻ 0$ and $M ≻ 0$ with a Hermitian factorisation $M = LL^ $. S e t$ \hat A = L^{-1}AL^{-} $,$ \hat b = L^{-1}b $, an d$ \hat x = L^x $. T h e n$ \hat A \succ 0 $i ss imi l a r t o$ M^{-1}A $(so$ \sigma(\hat A) = \sigma(M^{-1}A) $), an d t h e i t er a t es$ \hat x_k $p r o d u ce d b y or d ina r y C G o n$ \hat A\hat x = \hat b $cor r es p o n d, u n d er$ x_k = L^{-}\hat x_k$, exactly to the PCG iterates of the Formal definition. Consequently $$ |x_k - x_\star|A \le 2\left(\frac{\sqrt{\hat\kappa} - 1}{\sqrt{\hat\kappa} + 1}\right)^{k}|x_0 - x\star|A, \qquad \hat\kappa = \kappa(M^{-1}A) = \frac{\lambda{\max}(M^{-1}A)}{\lambda_{\min}(M^{-1}A)}. $$ ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}

Proof. First, $\hat{A} = L^{- 1} A L^{-*}$ is Hermitian since $A = A^{*}$ , and for $w \neq = 0$ , $w^{*} \hat{A} w = (L^{-*} w)^{*} A (L^{-*} w) > 0$ because $A ≻ 0$ and $L^{-*}$ is nonsingular, so $\hat{A} ≻ 0$ . The similarity $L^{*} (M^{- 1} A) L^{-*} = L^{*} (L L^{*})^{- 1} A L^{-*} = L^{*} L^{-*} L^{- 1} A L^{-*} = L^{- 1} A L^{-*} = \hat{A}$ shows $\hat{A}$ and $M^{- 1} A$ are similar, hence share a spectrum, which is real and positive because $\hat{A} ≻ 0$ .

Now run ordinary CG of 43.07.04 on $\hat{A} \overset{x}{^} = \hat{b}$ from $\overset{x}{^}_{0} = L^{*} x_{0}$ . Its quantities are $\overset{r}{^}_{k} = \hat{b} - \hat{A} \overset{x}{^}_{k}$ , $\overset{p}{^}_{k}$ , with the recurrences $\overset{α}{^}_{k} = (\overset{r}{^}_{k}^{*} \overset{r}{^}_{k}) / (\overset{p}{^}_{k}^{*} \hat{A} \overset{p}{^}_{k})$ , $\overset{x}{^}_{k + 1} = \overset{x}{^}_{k} + \overset{α}{^}_{k} \overset{p}{^}_{k}$ , $\overset{r}{^}_{k + 1} = \overset{r}{^}_{k} - \overset{α}{^}_{k} \hat{A} \overset{p}{^}_{k}$ , $\hat{β}_{k} = (\overset{r}{^}_{k + 1}^{*} \overset{r}{^}_{k + 1}) / (\overset{r}{^}_{k}^{*} \overset{r}{^}_{k})$ , $\overset{p}{^}_{k + 1} = \overset{r}{^}_{k + 1} + \hat{β}_{k} \overset{p}{^}_{k}$ . Introduce the untransformed variables $x_{k} = L^{-*} \overset{x}{^}_{k}$ , $r_{k} = L \overset{r}{^}_{k}$ , $p_{k} = L^{-*} \overset{p}{^}_{k}$ , and $z_{k} = M^{- 1} r_{k}$ . Then $\overset{r}{^}_{k} = L^{- 1} r_{k}$ , so $\overset{r}{^}_{k}^{*} \overset{r}{^}_{k} = r_{k}^{*} L^{-*} L^{- 1} r_{k} = r_{k}^{*} (L L^{*})^{- 1} r_{k} = r_{k}^{*} M^{- 1} r_{k} = r_{k}^{*} z_{k}$ . Likewise $\overset{p}{^}_{k}^{*} \hat{A} \overset{p}{^}_{k} = \overset{p}{^}_{k}^{*} L^{- 1} A L^{-*} \overset{p}{^}_{k} = p_{k}^{*} A p_{k}$ , giving $\overset{α}{^}_{k} = (r_{k}^{*} z_{k}) / (p_{k}^{*} A p_{k}) = α_{k}$ . The iterate update $\overset{x}{^}_{k + 1} = \overset{x}{^}_{k} + \overset{α}{^}_{k} \overset{p}{^}_{k}$ becomes $x_{k + 1} = x_{k} + α_{k} p_{k}$ on multiplying by $L^{-*}$ ; the residual update $\overset{r}{^}_{k + 1} = \overset{r}{^}_{k} - \overset{α}{^}_{k} \hat{A} \overset{p}{^}_{k}$ becomes, on multiplying by $L$ , $r_{k + 1} = r_{k} - α_{k} A L^{-*} \overset{p}{^}_{k} = r_{k} - α_{k} A p_{k}$ . The coefficient $\hat{β}_{k} = (r_{k + 1}^{*} z_{k + 1}) / (r_{k}^{*} z_{k}) = β_{k}$ , and $\overset{p}{^}_{k + 1} = \overset{r}{^}_{k + 1} + \hat{β}_{k} \overset{p}{^}_{k}$ becomes $p_{k + 1} = L \overset{r}{^}_{k + 1} \cdot$ — precisely $p_{k + 1} = z_{k + 1} + β_{k} p_{k}$ after multiplying by $L^{-*}$ and using $L^{-*} \overset{r}{^}_{k + 1} = L^{-*} L^{- 1} r_{k + 1} = M^{- 1} r_{k + 1} = z_{k + 1}$ . These are exactly the PCG recurrences, so the iterates coincide.

Finally, ordinary CG on $\hat{A}$ obeys the Chebyshev bound of 43.07.04 in the $\hat{A}$ -norm: $∥ \overset{x}{^}_{k} - \overset{x}{^}_{⋆} ∥_{\hat{A}} \leq 2 ((\overset{κ}{^} - 1) / (\overset{κ}{^} + 1))^{k} ∥ \overset{x}{^}_{0} - \overset{x}{^}_{⋆} ∥_{\hat{A}}$ with $\overset{κ}{^} = κ (\hat{A}) = κ (M^{- 1} A)$ . The $\hat{A}$ -norm pulls back to the $A$ -norm: $∥ \overset{x}{^} - \overset{x}{^}_{⋆} ∥_{\hat{A}}^{2} = (\overset{x}{^} - \overset{x}{^}_{⋆})^{*} \hat{A} (\overset{x}{^} - \overset{x}{^}_{⋆}) = (x - x_{⋆})^{*} L L^{- 1} A L^{-*} L^{*} (x - x_{⋆}) = (x - x_{⋆})^{*} A (x - x_{⋆}) = ∥ x - x_{⋆} ∥_{A}^{2}$ , using $\overset{x}{^} - \overset{x}{^}_{⋆} = L^{*} (x - x_{⋆})$ . Substituting gives the stated bound. $□$

Bridge. This equivalence is the foundational reason preconditioning costs nothing in theory and everything in practice: it builds toward the convergence and design theory of the Advanced results by showing that the entire CG apparatus of 43.07.04 — the energy-norm optimality, the conjugacy, the residual-polynomial reformulation — survives intact under preconditioning, with $κ (A)$ simply overwritten by $κ (M^{- 1} A)$ . This is exactly the spectrum-reshaping principle made quantitative: the Chebyshev bound that turned $κ$ into $κ$ now runs on the preconditioned condition number, so the central insight is that a preconditioner's only job is to shrink or cluster $σ (M^{- 1} A)$ , and the $\overset{κ}{^}$ count generalises the $κ$ count of unpreconditioned CG. The split form $\hat{A} = L^{- 1} A L^{-*}$ is dual to the $M$ -inner-product form: one symmetrises by changing the operator, the other by changing the geometry, and both yield identical iterates. The same construction appears again in 43.07.03, where preconditioned GMRES runs the Arnoldi-based minimisation on $M^{- 1} A$ or $A M^{- 1}$ and the field of values of the preconditioned operator replaces that of $A$ . Putting these together, the bridge is that preconditioning is a similarity transform chosen to improve a spectrum, executed implicitly so that the iterate sequence is exactly that of the unpreconditioned method on the better-conditioned operator.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Show that for $A ≻ 0$ and $M ≻ 0$ the preconditioned operator $M^{- 1} A$ is self-adjoint with respect to the $M$ -inner product $⟨ u, v ⟩_{M} = u^{*} M v$ , and that its eigenvalues are real and positive.

Hint

Compute $⟨ M^{- 1} A u, v ⟩_{M}$ and $⟨ u, M^{- 1} A v ⟩_{M}$ and compare; for positivity use the similarity to $L^{- 1} A L^{-*}$ with $M = L L^{*}$ .

Answer

Compute $⟨ M^{- 1} A u, v ⟩_{M} = (M^{- 1} A u)^{*} M v = u^{*} A^{*} M^{-*} M v = u^{*} A v$ since $A^{*} = A$ and $M^{-*} M = I$ ( $M$ Hermitian). Likewise $⟨ u, M^{- 1} A v ⟩_{M} = u^{*} M (M^{- 1} A v) = u^{*} A v$ . The two agree, so $M^{- 1} A$ is self-adjoint in $⟨ \cdot, \cdot ⟩_{M}$ . Writing $M = L L^{*}$ , $M^{- 1} A$ is similar to $\hat{A} = L^{- 1} A L^{-*}$ , which is Hermitian positive-definite (Key theorem), so its eigenvalues are real and positive; similarity preserves the spectrum, hence $σ (M^{- 1} A) \subset (0, \infty)$ . This is why CG run in the $M$ -inner product is well defined: $M^{- 1} A$ behaves like an SPD operator in that geometry.

Exercise 4 (medium, symbolic).

Prove the spectral-equivalence condition-number bound: if $M ≻ 0$ and there are constants $0 < c_{1} \leq c_{2}$ with $c_{1} u^{*} M u \leq u^{*} A u \leq c_{2} u^{*} M u$ for all $u$ , then $κ (M^{- 1} A) \leq c_{2} / c_{1}$ .

Hint

The hypothesis bounds the Rayleigh quotient $u^{*} A u / u^{*} M u$ ; relate it to the eigenvalues of $M^{- 1} A$ via the generalised eigenproblem $A u = λ M u$ .

Answer

The eigenvalues of $M^{- 1} A$ are the $λ$ solving the generalised eigenproblem $A u = λ M u$ with $u \neq = 0$ , equivalently the stationary values of the generalised Rayleigh quotient $R (u) = (u^{*} A u) / (u^{*} M u)$ . The hypothesis says $c_{1} \leq R (u) \leq c_{2}$ for every $u \neq = 0$ . By the Courant-Fischer characterisation 01.01.14 applied to the pencil $(A, M)$ , $λ_{m i n} (M^{- 1} A) = min_{u} R (u) \geq c_{1}$ and $λ_{m a x} (M^{- 1} A) = max_{u} R (u) \leq c_{2}$ . Since $σ (M^{- 1} A) \subset (0, \infty)$ (Exercise 3), $κ (M^{- 1} A) = λ_{m a x} / λ_{m i n} \leq c_{2} / c_{1}$ . When $c_{2} / c_{1}$ is bounded independently of the mesh size — spectral equivalence of $M$ and $A$ — the preconditioned iteration count is mesh-independent, the defining property of an optimal preconditioner.

Exercise 6 (medium, symbolic).

Show that left and right preconditioning give the same spectrum: $σ (M^{- 1} A) = σ (A M^{- 1})$ . Conclude that the residual-polynomial convergence bounds of 43.07.03 are identical for the two placements.

Hint

$M^{- 1} A$ and $A M^{- 1}$ are similar via $M$ (or $M^{- 1}$ ); use $B C$ and $C B$ sharing nonzero eigenvalues.

Answer

$A M^{- 1} = M (M^{- 1} A) M^{- 1}$ , a similarity transform of $M^{- 1} A$ by $M$ , so the two matrices have identical spectra: $σ (M^{- 1} A) = σ (A M^{- 1})$ . (Equivalently, for any square $B = M^{- 1}$ , $C = A$ , the products $B C$ and $C B$ share all eigenvalues.) Since the diagonalisable GMRES bound and the residual-polynomial characterisation of 43.07.03 depend on the operator only through its spectrum (and, for non-normal operators, its eigenvector conditioning, which similarity by $M$ may change), the spectral part of the convergence estimate is the same for left and right preconditioning. They differ in the residual they minimise — $M^{- 1} (b - A x)$ versus $b - A x$ — which is what dictates the choice in practice rather than the asymptotic rate.

Exercise 7 (hard, symbolic).

Suppose $M^{- 1} A = I + R$ where $R$ has rank $r$ . Prove that preconditioned GMRES (in exact arithmetic) converges to the exact solution in at most $r + 1$ steps, regardless of $n$ .

Hint

$I + R$ has eigenvalue $1$ with multiplicity at least $n - r$ and at most $r$ other eigenvalues, so it has at most $r + 1$ distinct eigenvalues; build the annihilating residual polynomial.

Answer

Since $rank (R) = r$ , $R$ has eigenvalue $0$ with multiplicity at least $n - r$ , so $M^{- 1} A = I + R$ has eigenvalue $1$ with multiplicity at least $n - r$ and at most $r$ further eigenvalues $μ_{1}, \dots, μ_{s}$ (with $s \leq r$ ). The distinct eigenvalues number at most $s + 1 \leq r + 1$ . By the residual-polynomial characterisation of 43.07.03, $∥ r_{k} ∥ = min_{p (0) = 1, d e g p \leq k} ∥ p (M^{- 1} A) \overset{r}{^}_{0} ∥$ for the preconditioned operator. Take $p (t) = \prod_{j = 1}^{s} (1 - (t - 1) / (μ_{j} - 1)) \cdot (normalisation pinning p (0) = 1)$ ; more cleanly, let $q (t) = \prod_{distinct ν} (t - ν)$ over the at-most- $(r + 1)$ distinct eigenvalues $ν$ and set $p (t) = q (t) / q (0)$ , valid since $0 \in / σ (M^{- 1} A)$ (the operator is nonsingular). Then $de g p \leq r + 1$ , $p (0) = 1$ , and $p$ annihilates the whole spectrum, so $p (M^{- 1} A) = 0$ on a diagonalisable operator, giving $∥ r_{r + 1} ∥ = 0$ . Hence GMRES terminates in at most $r + 1$ steps. A low-rank correction to the identity is an ideal preconditioning target — clustering all but a few eigenvalues at $1$ .

Exercise 8 (hard, symbolic).

Derive the SSOR preconditioner from the stationary splitting $A = D - L - U$ of 43.07.01 (with $D$ diagonal, $- L$ strictly lower, $- U$ strictly upper, $U = L^{*}$ for symmetric $A$ ), and show that for $0 < ω < 2$ the resulting $M$ is symmetric positive-definite, hence usable in PCG.

Hint

SSOR composes a forward and a backward SOR sweep. The preconditioner is $M = \frac{1}{ω ( 2 - ω )} (D - ω L) D^{- 1} (D - ω U)$ ; factor it as $C^{*} D_{ω}^{- 1} C$ with $C = D - ω U$ and use $D ≻ 0$ .

Answer

A forward SOR sweep has iteration matrix built from $(D - ω L)$ , a backward sweep from $(D - ω U)$ ; composing them and scaling to match $A$ gives the SSOR preconditioner $$ M = \frac{1}{\omega(2-\omega)},(D - \omega L),D^{-1},(D - \omega U). $$ For symmetric $A$ , $U = L^{*}$ , so $D - ω U = (D - ω L)^{*}$ (as $D = D^{*}$ ), and writing $C = D - ω L$ , $$ M = \frac{1}{\omega(2-\omega)},C,D^{-1},C^. $$ $M$ is Hermitian because $(CD^{-1}C^)^* = CD^{-1}C^ $($ D^{-1} $H er mi t ian) . I t i s p os i t i v e - d e f ini t e w h e n t h esc a l a r$ 1/[\omega(2-\omega)] > 0 $an d$ D \succ 0 $: f or S P D$ A $t h e d ia g o na l$ D \succ 0 $, an d$ \omega(2-\omega) > 0 $e x a c tl y f or$ 0 < \omega < 2 $, w hi l e$ C = D - \omega L $i s n o n s in g u l a r (l o w er - t r ian g u l a r w i t h t h e p os i t i v e d ia g o na l o f$ D $), so f or$ u \ne 0 $,$ u^*Mu = \tfrac{1}{\omega(2-\omega)}(C^*u)^*D^{-1}(C^*u) > 0 $. T h u s$ M \succ 0 $, an d$ M = LL^ $f or$ L = C D^{-1/2}/\sqrt{\omega(2-\omega)}$ supplies the split factorisation that PCG needs. SSOR therefore turns the stationary splitting of 43.07.01 — too slow as a solver — into a valid symmetric preconditioner for CG.

Advanced results Master

Theorem 1 (spectral equivalence and mesh-independent convergence). Let ${A_{h}}$ be a family of SPD matrices arising from discretising an elliptic operator on a mesh of size $h$ , with $κ (A_{h}) = O (h^{- 2}) \to \infty$ as $h \to 0$ . A preconditioner family ${M_{h}}$ is spectrally equivalent *to ${A_{h}}$ if there are constants $0 < c_{1} \leq c_{2}$ , independent of $h$ , with $c_{1} u^{*} M_{h} u \leq u^{*} A_{h} u \leq c_{2} u^{*} M_{h} u$ for all $u$ . Then $κ (M_{h}^{- 1} A_{h}) \leq c_{2} / c_{1}$ uniformly in $h$ , and PCG converges to a fixed tolerance in a number of iterations bounded independently of $h$ .* The bound is the condition-number estimate of Exercise 4 made uniform; a preconditioner achieving it is called optimal, and multigrid and certain domain-decomposition preconditioners achieve it for the model elliptic problems ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}.

Theorem 2 (existence of incomplete factorisations for M-matrices). Let $A$ be a symmetric M-matrix (symmetric positive-definite with nonpositive off-diagonal entries), and fix a sparsity pattern $P \supseteq {(i, i)}$ . The incomplete Cholesky factorisation $A = \bar L\bar L^ - R $, co m p u t e d b y C h o l es k y e l imina t i o n w i t ha l l f i l l - in o u t s i d e$ P $se tt oz er o, e x i s t s w i t h$ \bar L $ha v in g r e a l p os i t i v e d ia g o na l, an d t h e p r eco n d i t i o n er$ M = \bar L\bar L^$ is symmetric positive-definite. The Meijerink-van der Vorst theorem guarantees every pivot stays positive during incomplete elimination because dropping a nonpositive fill entry only increases the remaining diagonal of an M-matrix; the same argument gives ILU( $0$ ) existence for nonsymmetric M-matrices. This is the theoretical license for the ICCG method that made preconditioned CG the standard sparse SPD solver ^{[Meijerink, J. A. & van der Vorst, H. A. — An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix]}.

Theorem 3 (incomplete-factorisation hierarchy and fill control). ILU( $0$ ) retains exactly the sparsity of $A$ , discarding all fill; ILU( $p$ ) retains fill of level at most $p$ , where the level of an entry records how many elimination steps generated it; ILUT( $τ, ℓ$ ) drops entries below a relative threshold $τ$ and keeps at most $ℓ$ per row. Increasing $p$ or lowering $τ$ makes $M = \overset{ˉ}{L} \overset{ˉ}{U}$ a more accurate approximation of $A$ , clustering $σ (M^{- 1} A)$ more tightly near $1$ and reducing the iteration count, at the cost of denser factors and more expensive applies. The hierarchy makes explicit the central tradeoff: the preconditioner's quality (how clustered it renders the spectrum) is bought with its construction and per-application cost, and the optimum minimises total time, not iteration count ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Theorem 4 (multigrid, domain decomposition, and sparse approximate inverse as preconditioners). Three modern families supply preconditioners whose applies are matrix-free or embarrassingly parallel. Multigrid uses a hierarchy of grids with smoothing (a stationary iteration of 43.07.01) on each level and coarse-grid correction; one V-cycle is a linear operator $M^{- 1}$ that, for the model Poisson problem, is spectrally equivalent to $A$ , giving an optimal $O (n)$ preconditioner. Domain decomposition (additive Schwarz) splits the domain into overlapping subdomains, solves local problems, and sums the corrections: $M^{- 1} = \sum_{i} R_{i}^{*} A_{i}^{- 1} R_{i}$ , with a coarse space added to bound the condition number independently of the number of subdomains. Sparse approximate inverse (SPAI) computes a sparse $M \approx A^{- 1}$ directly by minimising $∥ I - M A ∥_{F}$ column by column, a set of independent small least-squares problems of 43.04.01, yielding a preconditioner whose apply is a single sparse matrix-vector product and so is fully parallel. Each trades a different resource — grid hierarchy, subdomain solves, or up-front least-squares work — for a clustered preconditioned spectrum ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Synthesis. Preconditioning is one operation — replace $A$ by a spectrally improved similar operator $M^{- 1} A$ that is implicit, never formed — viewed under one invariant, the spectrum of the operator the Krylov method actually runs on, and the foundational reason it works is that the residual-polynomial convergence theory of 43.07.04 and 43.07.03 depends on $A$ only through that spectrum, so reshaping it is the entire lever. This is exactly the Chebyshev and min-max polynomial machinery of those units read backward: where CG turned $κ$ into $κ$ and rewarded clustered spectra, preconditioning manufactures the small $κ (M^{- 1} A)$ or the tight clusters that the polynomial bound rewards, and the central insight is that the $κ (M^{- 1} A)$ iteration count of preconditioned CG generalises the $κ (A)$ count of plain CG with $A$ overwritten by the preconditioned operator. The split preconditioner $L^{- 1} A L^{-*}$ is dual to the $M$ -inner-product formulation — change the operator or change the geometry, identical iterates — and this duality is the bridge from the stationary splittings of 43.07.01, which were too weak to solve but are exactly strong enough to precondition: Jacobi, Gauss-Seidel, and SSOR all return here as $M$ , their spectral radius creeping to one as solvers being precisely the defect a preconditioner need not cure, only approximate.

Putting these together, the chapter closes a circle. The stationary methods of 43.07.01 supplied the splittings; the Krylov methods of 43.07.02, 43.07.03, and 43.07.04 supplied the optimal polynomial acceleration whose rate is set by the spectrum; and preconditioning fuses the two, using a cheap splitting or incomplete factorisation or multilevel operator to reshape the spectrum so the optimal acceleration needs only a handful of steps. The condition number $κ$ of 43.01.02 is the quantity every member of the family manipulates, and the design space — Jacobi to incomplete Cholesky to multigrid — is a ladder of cost-versus-clustering tradeoffs, with the optimal preconditioner of Theorem 1 the one whose $κ (M^{- 1} A)$ is bounded independently of the problem size while its apply stays cheap, converting the divergent $O (h^{- 1})$ iteration count of the raw discretised operator into a constant.

Full proof set Master

Proposition 1 (split symmetrisation preserves definiteness and spectrum). For $A ≻ 0$ and $M = LL^ \succ 0 $, t h es pl i t o p er a t or$ \hat A = L^{-1}AL^{-} $i sH er mi t ian p os i t i v e - d e f ini t e an d s imi l a r t o$ M^{-1}A $, so$ \sigma(M^{-1}A) = \sigma(\hat A) \subset (0,\infty)$.

Proof. $\hat{A}^{*} = (L^{- 1} A L^{-*})^{*} = L^{- 1} A^{*} L^{-*} = L^{- 1} A L^{-*} = \hat{A}$ , Hermitian. For $w \neq = 0$ , $w^{*} \hat{A} w = (L^{-*} w)^{*} A (L^{-*} w) > 0$ since $A ≻ 0$ and $L^{-*}$ nonsingular, so $\hat{A} ≻ 0$ . The similarity $L^{*} (M^{- 1} A) L^{-*} = L^{*} (L L^{*})^{- 1} A L^{-*} = L^{- 1} A L^{-*} = \hat{A}$ gives $σ (M^{- 1} A) = σ (\hat{A})$ , real and positive by $\hat{A} ≻ 0$ . $□$

Proposition 2 (PCG equals CG on the split operator). The PCG recurrences of the Formal definition produce iterates $x_k = L^{-}\hat x_k $, w h er e$ \hat x_k $a r e t h eor d ina r y C G i t er a t eso n$ \hat A\hat x = \hat b $w i t h$ \hat b = L^{-1}b $,$ \hat x_0 = L^x_0$.

Proof. This is the Key theorem's computation: substituting $\overset{r}{^}_{k} = L^{- 1} r_{k}$ , $\overset{p}{^}_{k} = L^{*} p_{k}$ , $z_{k} = M^{- 1} r_{k}$ into the CG recurrences for $\hat{A}$ reproduces the PCG coefficients $α_{k} = (r_{k}^{*} z_{k}) / (p_{k}^{*} A p_{k})$ , $β_{k} = (r_{k + 1}^{*} z_{k + 1}) / (r_{k}^{*} z_{k})$ and the updates $x_{k + 1} = x_{k} + α_{k} p_{k}$ , $r_{k + 1} = r_{k} - α_{k} A p_{k}$ , $p_{k + 1} = z_{k + 1} + β_{k} p_{k}$ , using $r_{k}^{*} z_{k} = \overset{r}{^}_{k}^{*} \overset{r}{^}_{k}$ and $p_{k}^{*} A p_{k} = \overset{p}{^}_{k}^{*} \hat{A} \overset{p}{^}_{k}$ . The map $\overset{x}{^}_{k} \mapsto L^{-*} \overset{x}{^}_{k}$ is the change of variable. $□$

Proposition 3 (preconditioned Chebyshev bound). Under the hypotheses of Proposition 1, $∥ x_{k} - x_{⋆} ∥_{A} \leq 2 ((\overset{κ}{^} - 1) / (\overset{κ}{^} + 1))^{k} ∥ x_{0} - x_{⋆} ∥_{A}$ with $\overset{κ}{^} = κ (M^{- 1} A)$ .

Proof. CG on $\hat{A} ≻ 0$ obeys the Chebyshev bound of 43.07.04 in the $\hat{A}$ -norm with $κ (\hat{A}) = κ (M^{- 1} A) = \overset{κ}{^}$ (Proposition 1). The norm identity $∥ \overset{x}{^} - \overset{x}{^}_{⋆} ∥_{\hat{A}}^{2} = (L^{*} (x - x_{⋆}))^{*} \hat{A} (L^{*} (x - x_{⋆})) = (x - x_{⋆})^{*} L L^{- 1} A L^{-*} L^{*} (x - x_{⋆}) = ∥ x - x_{⋆} ∥_{A}^{2}$ transports the bound from the $\hat{A}$ -norm to the $A$ -norm via Proposition 2. $□$

Proposition 4 (spectral-equivalence condition-number bound). *If $c_{1} u^{*} M u \leq u^{*} A u \leq c_{2} u^{*} M u$ for all $u$ with $M ≻ 0$ , $A ≻ 0$ , then $κ (M^{- 1} A) \leq c_{2} / c_{1}$ .*

Proof. The eigenvalues of $M^{- 1} A$ are the stationary values of the generalised Rayleigh quotient $R (u) = (u^{*} A u) / (u^{*} M u)$ (the pencil $A u = λ M u$ ). The hypothesis bounds $c_{1} \leq R (u) \leq c_{2}$ , so by Courant-Fischer 01.01.14 for the pencil, $λ_{m i n} (M^{- 1} A) \geq c_{1}$ and $λ_{m a x} (M^{- 1} A) \leq c_{2}$ ; with $σ \subset (0, \infty)$ (Proposition 1) this gives $κ = λ_{m a x} / λ_{m i n} \leq c_{2} / c_{1}$ . $□$

Proposition 5 (low-rank preconditioner gives finite GMRES termination). If $M^{- 1} A = I + R$ with $rank (R) = r$ and $M^{- 1} A$ nonsingular, preconditioned GMRES reaches the exact solution in at most $r + 1$ steps in exact arithmetic.

Proof. $R$ has $0$ as an eigenvalue of multiplicity $\geq n - r$ , so $I + R$ has at most $r + 1$ distinct eigenvalues, all nonzero ( $M^{- 1} A$ nonsingular). Let $ν_{1}, \dots, ν_{t}$ ( $t \leq r + 1$ ) be the distinct eigenvalues and $q (z) = \prod_{j = 1}^{t} (z - ν_{j})$ , $p (z) = q (z) / q (0)$ — well defined since $0 \in / {ν_{j}}$ . Then $p (0) = 1$ , $de g p = t \leq r + 1$ , and $p$ vanishes on $σ (M^{- 1} A)$ , so $p (M^{- 1} A) = 0$ on a diagonalisable operator. The residual-polynomial bound of 43.07.03 gives $∥ r_{t} ∥ \leq ∥ p (M^{- 1} A) \overset{r}{^}_{0} ∥ = 0$ , so GMRES terminates by step $t \leq r + 1$ . $□$

Proposition 6 (incomplete Cholesky exists for symmetric M-matrices). Let $A$ be a symmetric M-matrix and $P$ a sparsity pattern containing the diagonal. The incomplete Cholesky factorisation with fill restricted to $P$ produces $\overset{ˉ}{L}$ with positive diagonal, and $M = \bar L\bar L^ \succ 0$.*

Proof sketch. Incomplete Cholesky performs the recursion $a_{ij} \leftarrow a_{ij} - \sum_{k < i} \overset{ˉ}{l}_{ik} \overset{ˉ}{l}_{j k}$ but sets to zero any update at a position outside $P$ . For an M-matrix the off-diagonal entries are $\leq 0$ and the Schur complements of the exact factorisation stay M-matrices with positive diagonal (the exact pivots are positive by SPD-ness). Dropping a fill entry at position $(i, j) \in / P$ removes a nonpositive contribution $- \overset{ˉ}{l}_{ik} \overset{ˉ}{l}_{j k}$ from an off-diagonal and a corresponding nonnegative amount from the diagonal of the remaining Schur complement, so each incomplete pivot is at least as large as the corresponding exact pivot, hence remains positive. The induction (Meijerink-van der Vorst) shows every diagonal of $\overset{ˉ}{L}$ is real and positive, so $\overset{ˉ}{L}$ is nonsingular and $M = \overset{ˉ}{L} \overset{ˉ}{L}^{*}$ is Hermitian positive-definite, with $u^{*} M u = ∥ \overset{ˉ}{L}^{*} u ∥_{2}^{2} > 0$ for $u \neq = 0$ . $□$

Connections Master

The conjugate gradient method 43.07.04 is the solver preconditioning most directly serves: preconditioned CG is, by the Key theorem, ordinary CG run on the symmetrically split operator $L^{- 1} A L^{-*}$ or equivalently in the $M$ -inner product, so the energy-norm optimality, the conjugacy of search directions, and the Chebyshev convergence bound all carry over with $κ (A)$ replaced by $κ (M^{- 1} A)$ . The clustered-spectrum acceleration that unit identifies — that the count of distinct or well-separated eigenvalues governs the true rate — is exactly what a preconditioner engineers, which is why incomplete-Cholesky-preconditioned CG (ICCG) is the standard solver for the sparse SPD systems of discretised elliptic PDEs.
GMRES 43.07.03 is the nonsymmetric counterpart: preconditioned GMRES runs the Arnoldi-based minimal-residual minimisation on $M^{- 1} A$ (left) or $A M^{- 1}$ (right), and the diagonalisable convergence bound and field-of-values analysis of that unit apply to the preconditioned operator. The any-curve theorem there warns that for non-normal $M^{- 1} A$ the spectrum alone need not predict convergence, so preconditioner design for GMRES targets the field of values or pseudospectrum, not merely the eigenvalues; the flexible variant FGMRES accommodates a preconditioner that changes from step to step, such as an inner Krylov solve.
Stationary iterative methods: Jacobi, Gauss-Seidel, SOR 43.07.01 supply the classical preconditioners: the splitting $A = M - N$ that was a slow solver on its own — its iteration matrix $M^{- 1} N$ having spectral radius creeping to one as the grid refines — becomes the preconditioner $M$ whose only job is to approximate $A$ cheaply, not to converge alone. Jacobi ( $M = D$ ), Gauss-Seidel, and the symmetric SSOR factorisation built from $A = D - L - U$ all reappear here, and the SSOR construction yields the symmetric positive-definite $M$ that preconditioned CG requires, fusing the stationary and Krylov families that the chapter developed separately.
Conditioning and condition numbers of problems 43.01.02 furnishes the figure of merit the whole subject optimises: preconditioning is the deliberate manipulation of the condition number $κ (M^{- 1} A)$ , and the spectral-equivalence theory measures a preconditioner by whether $κ (M^{- 1} A)$ is bounded independently of the discretisation parameter. The least-squares machinery of 43.04.01 underlies the sparse-approximate-inverse preconditioner, whose columns solve independent small least-squares problems $min ∥ I - M A ∥_{F}$ .

Historical & philosophical context Master

The idea of transforming a linear system to improve the convergence of an iteration predates the Krylov methods: David Young's 1950 thesis on successive over-relaxation, and the splitting framework of Richard Varga's 1962 Matrix Iterative Analysis, established that the spectral radius of $M^{- 1} N$ controls a stationary iteration, which is the conceptual seed of using $M$ as a preconditioner rather than as a standalone solver. The word preconditioning in its modern sense, and the recognition that it should be paired with conjugate gradients rather than with a stationary iteration, crystallised in the 1970s.

The decisive step was the 1977 paper of Johannes Meijerink and Henk van der Vorst, which introduced the incomplete Cholesky factorisation as a preconditioner, proved its existence and stability for symmetric M-matrices, and demonstrated that incomplete-Cholesky-preconditioned conjugate gradient (ICCG) solved the sparse SPD systems of discretised PDEs far faster than any stationary method ^{[Meijerink, J. A. & van der Vorst, H. A. — An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix]}. ICCG converted conjugate gradients from a finite direct method that had disappointed in finite-precision arithmetic — the revival narrative of 43.07.04 — into the dominant iterative solver for elliptic problems. The multigrid method of Achi Brandt (1977), originally a standalone solver, was soon recognised as a spectrally optimal preconditioner; the additive-Schwarz domain-decomposition framework was placed on a rigorous condition-number footing through the 1980s and 1990s by Olof Widlund, Maksymilian Dryja, and collaborators. Yousef Saad's textbook consolidated the algebra of left, right, and split preconditioning, the ILU hierarchy, and approximate-inverse methods into the unified account standard today ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}, and Anne Greenbaum supplied the spectral-equivalence convergence theory that distinguishes optimal from merely useful preconditioners ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}.

Bibliography Master

@article{MeijerinkVanDerVorst1977,
  author  = {Meijerink, J. A. and van der Vorst, H. A.},
  title   = {An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric {M}-Matrix},
  journal = {Mathematics of Computation},
  volume  = {31},
  number  = {137},
  year    = {1977},
  pages   = {148--162}
}

@book{Saad2003precond,
  author    = {Saad, Yousef},
  title     = {Iterative Methods for Sparse Linear Systems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2003}
}

@book{Greenbaum1997precond,
  author    = {Greenbaum, Anne},
  title     = {Iterative Methods for Solving Linear Systems},
  series    = {Frontiers in Applied Mathematics},
  publisher = {SIAM},
  year      = {1997}
}

@book{Varga1962,
  author    = {Varga, Richard S.},
  title     = {Matrix Iterative Analysis},
  publisher = {Prentice-Hall},
  year      = {1962}
}

@article{Brandt1977,
  author  = {Brandt, Achi},
  title   = {Multi-Level Adaptive Solutions to Boundary-Value Problems},
  journal = {Mathematics of Computation},
  volume  = {31},
  number  = {138},
  year    = {1977},
  pages   = {333--390}
}

@book{ElmanSilvesterWathen2014,
  author    = {Elman, Howard C. and Silvester, David J. and Wathen, Andrew J.},
  title     = {Finite Elements and Fast Iterative Solvers},
  edition   = {2},
  publisher = {Oxford University Press},
  year      = {2014}
}

@article{Benzi2002,
  author  = {Benzi, Michele},
  title   = {Preconditioning Techniques for Large Linear Systems: A Survey},
  journal = {Journal of Computational Physics},
  volume  = {182},
  number  = {2},
  year    = {2002},
  pages   = {418--477}
}

@book{TrefethenBau1997precond,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

Prerequisites

43.07.03
43.07.04
43.07.01

Tier anchors

beginner: Reweighting a stretched, awkward problem into a round, well-behaved one before you start solving it, so a fast method that loves round problems finishes in a handful of steps — Shewchuk 1994 *An Introduction to the Conjugate Gradient Method Without the Agonizing Pain* (CMU technical report) §10 (the preconditioner as a change of coordinates that makes the bowl rounder); Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) §11.5 (preconditioners and why they accelerate iterations)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 40 (preconditioning: solving M⁻¹Ax = M⁻¹b with M ≈ A cheap to invert, left/right/split forms, the spectrum-clustering convergence rationale, and the catalogue of classical preconditioners); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) §9.1-9.3 (preconditioned CG and GMRES, the symmetry-preserving split preconditioner, Jacobi/SSOR) and Ch. 10 (ILU factorisations)
master: Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 9-10 (the algebra of left/right/split preconditioning, preconditioned CG in the M-inner-product, ILU(0)/ILU(p)/ILUT incomplete factorisations and their existence for M-matrices, approximate-inverse preconditioners) and Ch. 13 (multigrid and domain decomposition as preconditioners); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 8-12 (preconditioners, incomplete Cholesky, the spectral-equivalence convergence theory, and multilevel/Schwarz methods); Elman-Silvester-Wathen 2014 *Finite Elements and Fast Iterative Solvers* 2e (Oxford) Ch. 2-4 (optimal and spectrally equivalent preconditioners for PDE operators)

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lecture 40: preconditioning of iterative methods. The transformed system M^{-1} A x = M^{-1} b with a preconditioner M chosen so that (i) M^{-1} A is close to the identity / has a clustered or low-condition-number spectrum and (ii) systems M z = r are cheap to solve; left preconditioning M^{-1} A x = M^{-1} b, right preconditioning A M^{-1} u = b with x = M^{-1} u, and split / symmetric preconditioning M = L L^* giving L^{-1} A L^{-*} to preserve symmetry and positive-definiteness for CG; the convergence rationale that Krylov convergence is governed by the spectrum / conditioning of the operator the method actually sees, so reshaping that spectrum is the whole point; the classical preconditioners Jacobi (diagonal), Gauss-Seidel/SSOR built on the stationary splittings, and incomplete LU / Cholesky; the cost-versus-iteration-reduction tradeoff.
Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.) · SIAM, 2003. Ch. 9: preconditioned conjugate gradient and preconditioned GMRES — the derivation that PCG is CG run in the M-inner-product so that it never forms the nonsymmetric M^{-1}A explicitly but only applies M^{-1} once per step, the equivalence of left/right/split preconditioning, and the flexible-GMRES variant FGMRES for a varying preconditioner. Ch. 10: incomplete LU factorisations A ≈ L̄ Ū + R with a prescribed sparsity pattern — ILU(0), ILU(p) by level-of-fill, and the threshold variant ILUT; existence of the incomplete factorisation for M-matrices (Meijerink-van der Vorst); incomplete Cholesky IC for the SPD case; sparse approximate inverse (SPAI) preconditioners minimising ‖I - M A‖_F. Ch. 13: multigrid and Schwarz domain-decomposition methods used as preconditioners.
Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997) · Ch. 8-12: the convergence theory of preconditioned iterations — the spectral-equivalence notion (c1 (M x, x) ≤ (A x, x) ≤ c2 (M x, x) bounding κ(M^{-1}A) ≤ c2/c1 independently of the mesh), the resulting mesh-independent iteration count of an optimal preconditioner, incomplete-Cholesky and modified-IC preconditioners, the comparison of preconditioners by the clustering of the spectrum of M^{-1}A, and multilevel / additive- and multiplicative-Schwarz domain-decomposition preconditioners with their condition-number bounds.
Meijerink, J. A. & van der Vorst, H. A. — An Iterative Solution Method for Linear Systems of Which the Coefficient Matrix is a Symmetric M-Matrix · Mathematics of Computation 31 (1977), 148-162: the incomplete Cholesky / incomplete LU factorisation with a fixed sparsity pattern, the proof that the incomplete factorisation exists and is stable for a symmetric M-matrix (every pivot stays positive), and the ICCG method — incomplete-Cholesky-preconditioned conjugate gradient — as the first widely successful preconditioned Krylov solver for the sparse SPD systems of discretised PDEs.

Estimated time

beginner: 18m
intermediate: 45m
master: 90m