43.07.03 · numerical-analysis / 07-iterative-krylov-methods

GMRES

shipped3 tiersLean: none

Anchor (Master): Saad & Schultz 1986 *SIAM J. Sci. Stat. Comput.* 7(3):856-869 (the original GMRES paper — optimality, finite termination, the minimal-residual property); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 3 (convergence via polynomial approximation, the non-normal subtlety); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) §6.5-6.11 and §6.30 (restarted GMRES, stagnation, field-of-values bounds)

Intuition Beginner

Suppose you face a giant system of equations , so large you can never write the matrix down, but you can do one cheap thing: hand the matrix a vector and get back the matrix times that vector. From this one power you grow a small space of candidate answers — the Krylov space of 43.07.02 — built from your starting guess and its successive products by the matrix. The question of this unit is: among all the candidate answers in that growing space, which one is best?

GMRES answers with a precise rule. For any candidate answer , the residual measures how badly the equation is missed: a residual of zero means is exact. GMRES picks, at each stage, the candidate whose residual is as short as possible. That is why it is called the generalised minimal residual method. As the space grows by one direction each stage, the best residual can only get shorter or stay the same — never longer. So the method marches steadily toward the answer.

There is a clever bookkeeping trick that makes this cheap. Building the Krylov space the tidy way (the Arnoldi process) turns the giant minimisation into a tiny one: at stage you only solve a small lopsided system with rows and columns. And you can read off how short the current residual is without ever assembling the full candidate answer — a single number falls out of the bookkeeping for free, so you know when to stop before paying to form the solution.

GMRES works for a general matrix, even a lopsided non-symmetric one. When the matrix is symmetric and positive-definite, a cheaper cousin called conjugate gradients does the same job with short-term memory; GMRES is the price you pay for generality.

Visual Beginner

The picture is a growing space of candidate answers, with GMRES standing at the true target and, at each stage, picking the candidate closest to it — so the leftover miss shrinks step by step.

Read the table top to bottom. Each row is one stage. The middle column is the length of the shortest residual achievable from the space built so far; it never grows. The right column notes the size of the tiny least-squares problem solved at that stage — always one row taller than it is wide.

stage shortest residual length small problem solved
1
2
3
4

The takeaway: GMRES is "find the nearest point of a growing space to a fixed target". Because each new space contains the last, the nearest point can only get nearer. The work at each stage is a small, tall least-squares fit, and the size of the leftover at the bottom of that fit is exactly the residual length you are driving to zero.

Worked example Beginner

Take the small system $$ A = \begin{pmatrix} 2 & 1 \ 0 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} 1 \ 1 \end{pmatrix}, $$ and start from the guess , so the starting residual is , of length .

Stage 1 (the first Krylov direction). The first tidied direction is rescaled to length one: . GMRES looks for the best candidate of the form — a single number to choose. We want to make as short as possible.

Compute . The residual of the candidate is .

Choosing to minimise the length of is a one-variable least-squares fit. Setting the derivative of the squared length to zero: the squared length is with , whose derivative is , zero at . The shortest residual then has squared length , so .

What this tells us: one stage cut the residual from down to about , just by choosing the single best step along the first Krylov direction. A second stage would add the next direction and drive the residual to zero, since the space then fills all of the plane. Each stage solved only a tiny minimisation, never touching except through one matrix-times-vector product.

Check your understanding Beginner

Formal definition Intermediate+

Let with be nonsingular and possibly nonsymmetric, let , and fix an initial guess with residual . The -th GMRES iterate is the unique minimiser $$ x_m = \operatorname*{arg,min}_{x \in x_0 + \mathcal{K}_m(A, r_0)} |b - A x|_2, \qquad \mathcal{K}_m(A, r_0) = \operatorname{span}{r_0, A r_0, \dots, A^{m-1} r_0}, $$ where is the Krylov subspace of 43.07.02 generated by the residual. Equivalently, writing with , GMRES minimises over [Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems].

The Arnoldi reduction. Run the Arnoldi process of 43.07.02 on with starting vector . It produces, before breakdown, an orthonormal basis of with , , together with the upper-Hessenberg matrix satisfying the Arnoldi relation . Parametrise with . Then $$ r_0 - A z = \beta q_1 - A Q_m y = \beta q_1 - Q_{m+1}\tilde H_m y = Q_{m+1}\big(\beta e_1 - \tilde H_m y\big), $$ using . Since has orthonormal columns it preserves the Euclidean norm, so $$ |b - A x|_2 = |r_0 - A z|_2 = \big|,\beta e_1 - \tilde H_m y,\big|2. $$ The large minimisation collapses to the small least-squares problem $\min{y}|\beta e_1 - \tilde H_m y|_2y_mx_m = x_0 + Q_m y_m|\beta e_1 - \tilde H_m y_m|_2$.

Givens-rotation solution and the cheap residual. Because is upper-Hessenberg, its QR factorisation is built by plane (Givens) rotations , each zeroing one subdiagonal entry — the incremental QR of 43.04.01. Applying the accumulated rotations to yields a vector whose first entries feed a triangular back-substitution for and whose last entry satisfies $$ |b - A x_m|2 = |g{m+1}|. $$ The residual norm is therefore available at every stage without forming : one monitors and only assembles once a stopping tolerance is met. Each new stage appends one rotation, so the update is incremental.

Restarted GMRES. Storing costs and the orthogonalisation cost grows like ; for large this is prohibitive. Restarted GMRES, written GMRES, runs stages, forms , discards , and restarts the whole process with and . This caps storage at at the cost of losing the global optimality across restart boundaries.

Counterexamples to common slips

  • GMRES minimises the residual, not the error. The quantity made small is , not . For an ill-conditioned a small residual can coexist with a large error; the error bound carries a factor .
  • Breakdown is a feature. The Arnoldi subdiagonal vanishing, , is a lucky breakdown: it signals that is -invariant and that is the exact solution, residual zero. Unlike the nonsymmetric Lanczos process, GMRES cannot suffer a serious breakdown.
  • The spectrum alone need not predict convergence. For a non-normal , eigenvalues clustered away from the origin do not guarantee fast GMRES convergence; the eigenvector conditioning , the field of values, or the pseudospectrum enters. The diagonalisable bound is only a bound, and a loose one when is ill-conditioned.
  • Restarted GMRES can stagnate. Plain (full) GMRES converges in at most steps in exact arithmetic, but GMRES can stall short of the solution, the residual flat across many restart cycles, when the discarded directions were the ones that mattered.

Key theorem with proof Intermediate+

The signature result is that the GMRES minimisation over the Krylov subspace reduces exactly to the small Hessenberg least-squares problem, that the residual norm is the last rotated component, and that the residual is monotonically non-increasing with finite termination.

Theorem (GMRES least-squares reduction, monotonicity, and finite termination). Let be nonsingular, , , and run Arnoldi on producing and the unreduced Hessenberg . Then the GMRES iterate satisfies $$ x_m = x_0 + Q_m y_m, \qquad y_m = \operatorname*{arg,min}_{y \in \mathbb{F}^m}\big|\beta e_1 - \tilde H_m y\big|_2, \qquad |b - A x_m|2 = \min_y\big|\beta e_1 - \tilde H_m y\big|2. $$ *The residual norms are non-increasing, $|b - A x{m+1}|2 \le |b - A x_m|2\nu \le nr_0h{\nu+1,\nu} = 0x\nu = x\star$ exactly* [Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)].

Proof. The reduction is the computation of the Formal definition. Every is for a unique since has full column rank. The Arnoldi relation and give $$ b - A x = r_0 - A Q_m y = Q_{m+1}(\beta e_1 - \tilde H_m y). $$ The columns of are orthonormal, so for every ; hence . Minimising the left side over is therefore the same as minimising the right side over , which is the stated least-squares problem; its minimiser yields and residual norm equal to the optimal value.

Monotonicity follows from the nesting . The feasible set for stage is contained in that for stage , so the minimum of over the larger set is at most the minimum over the smaller: .

For finite termination, the Krylov dimensions strictly increase by one until the grade of — the least with , established in 43.07.02 — and at that step the Arnoldi remainder vanishes, , so is the square Hessenberg with . The least-squares problem becomes the square system . Since is nonsingular and is -invariant, the restriction of to is nonsingular, so is nonsingular and the system has a solution with zero residual. Then , so , and because .

Bridge. This reduction is the foundational reason GMRES is practical at all: it builds toward the convergence theory of the Advanced results by replacing an -dimensional minimisation with an Hessenberg least-squares problem whose answer carries its own error estimate, the last rotated Givens component. This is exactly the Arnoldi factorisation of 43.07.02 read as a solver rather than an eigenvalue probe — there the projected supplies Ritz values, here the augmented supplies the residual-minimising step — and the monotone residual decrease generalises the monotone Rayleigh-quotient bound of that unit from extremal eigenvalues to the full residual. The least-squares inner solve is exactly the incremental Givens QR of 43.04.01 applied to a Hessenberg matrix, so the per-step cost is one rotation. The construction appears again in 43.07.04, where for symmetric positive-definite the same Krylov optimality is achieved with a short three-term recurrence instead of a growing orthonormal basis; putting these together, the central insight is that GMRES is "least squares on the Arnoldi Hessenberg", and the bridge is that every quantity a practitioner needs — the iterate, the residual norm, the stopping decision — is read off this one small projected problem one column at a time.

Exercises Intermediate+

Advanced results Master

Theorem 1 (minimal-residual optimality and the residual polynomial). Among all the GMRES iterate is the unique minimiser of , and its residual obeys the optimality characterisation $$ |r_m|2 = \min{p \in P_m,, p(0)=1}|p(A) r_0|_2, $$ where is the space of polynomials of degree at most . Uniqueness is convexity: is a strictly convex function on the range of the full-column-rank (an unreduced Hessenberg has rank ), so its minimiser is unique, and the affine map transports uniqueness to . The polynomial form follows because the GMRES residual is with , and minimising over the subspace is minimising over such polynomials. This recasts GMRES convergence as a problem in approximation theory on : how well can a degree- polynomial, pinned to value one at the origin, annihilate [Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems].

Theorem 2 (diagonalisable convergence bound and the clustering heuristic). If is diagonalisable, then $$ \frac{|r_m|2}{|r_0|2} \le \kappa_2(V)\ \min{p\in P_m,, p(0)=1}\ \max{\lambda \in \sigma(A)}|p(\lambda)|. $$ The minimax factor is small whenever the spectrum splits into a few tight clusters bounded away from the origin: a polynomial with one root near each cluster of total degree equal to the number of clusters is already tiny on , so GMRES converges in roughly as many stages as there are clusters. The factor quantifies the eigenvector conditioning; for normal it is one and the bound is essentially tight, but for non-normal it can be astronomically large, and then the bound says little. The clustering heuristic — "GMRES converges fast when the eigenvalues cluster" — is reliable for normal or mildly non-normal matrices and is exactly where preconditioning aims, to reshape into clusters [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].

Theorem 3 (field-of-values bound for the non-normal case). Let $W(A) = {x^ A x : |x|_2 = 1}A0 \notin W(A)\nu = \operatorname{dist}(0, W(A)) > 0W(A)RW(A)\sigma(A)$ alone; a representative Elman-type bound is* $$ \frac{|r_m|_2}{|r_0|_2} \le \Big(1 - \frac{\nu^2}{|A|2^2}\Big)^{m/2}. $$ When is far from normal the spectrum can be deeply misleading — eigenvalues bunched away from zero while the field of values straddles the origin, in which case GMRES stagnates despite a "good" spectrum. The field of values, and more finely the pseudospectrum $\sigma\varepsilon(A) = {z : |(zI - A)^{-1}|_2 \ge \varepsilon^{-1}}$, are the correct convergence-controlling sets: a min-max polynomial small on a pseudospectral region enclosing the spectrum at a safe distance from the origin yields a genuine residual bound by a contour-integral (Cauchy) estimate [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)].

Theorem 4 (any-curve theorem; the spectrum does not determine the GMRES curve). Given any set of nonzero complex numbers (a prescribed spectrum) and any non-increasing sequence , there exists a matrix with that spectrum and a right-hand side such that GMRES applied to produces exactly that residual sequence. The construction (Greenbaum, Pták, Strakoš) decouples the eigenvalues from the convergence history entirely: any admissible residual curve is compatible with any prescribed spectrum, so no theorem bounding GMRES convergence by the eigenvalues alone can exist for general matrices. This is the sharp statement of why the diagonalisable bound's prefactor cannot be removed and why non-normal convergence analysis must invoke the field of values or pseudospectra. For the symmetric case the situation reverts: for the eigenvector matrix is unitary, the prefactor is one, and the spectrum does determine the convergence curve [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)].

Synthesis. GMRES is one construction — minimise over the affine Krylov space — viewed under one invariant, the residual polynomial with , and the foundational reason the method works is that this minimisation is, by the Arnoldi factorisation of 43.07.02, a small Hessenberg least-squares problem whose residual norm is read off a single Givens component. This is exactly the Arnoldi eigenvalue engine of that unit turned to a different end: there the projected delivers Ritz values approximating the spectrum, here the augmented delivers the residual-minimising step, and the same nesting that made the extremal Ritz values monotone makes the GMRES residual monotone. The polynomial recasting of Theorem 1 generalises the static eigenvalue-approximation picture into a dynamical approximation-theory one: the convergence rate is a min-max polynomial value on a spectral set, and the central insight is that whether that set is the spectrum, the field of values, or the pseudospectrum is decided by the normality of .

The non-normal subtlety is where the bridge to honest practice is built: Theorem 2's clustering heuristic is reliable only through the prefactor , Theorem 3 replaces the spectrum by the field of values when is far from normal, and Theorem 4 shows the eigenvalues alone determine nothing about the GMRES curve for general matrices — the any-curve construction is dual to the diagonalisable bound's looseness, each expressing that is not . Putting these together, GMRES is the general-matrix residual-minimiser whose symmetric-positive-definite specialisation reappears in 43.07.04 as conjugate gradients with a short recurrence and a spectrum-determined error bound; the bridge is that the stationary-iteration splitting of 43.07.01, powerless on its own against a spectral radius creeping to one, returns here as a preconditioner whose only job is to cluster the spectrum or shrink the field of values so the GMRES residual polynomial can do its work in a handful of stages.

Full proof set Master

Proposition 1 (least-squares reduction and norm identity). With from Arnoldi on and , the GMRES iterate is with , and .

Proof. Each is for a unique because has orthonormal (hence independent) columns spanning . The Arnoldi relation and give . Orthonormality of yields . Minimising both sides over their (corresponding) feasible sets is the same problem; the minimiser gives and the residual identity.

Proposition 2 (monotonicity of the residual). for all before termination.

Proof. The feasible affine spaces are nested, , since . The minimum of a fixed function over a larger set cannot exceed its minimum over a subset, so .

Proposition 3 (lucky breakdown equals exact solution). If at stage and is nonsingular, then with , and this is the only way GMRES can break down.

Proof. Breakdown closes the Arnoldi relation to , so is -invariant. Restricted to the -dimensional invariant , the nonsingular acts bijectively, so is nonsingular and has the exact solution . The residual is . No serious breakdown is possible: the Arnoldi step fails only by producing the zero remainder , which is precisely , and that case has just been shown to be convergence. (Contrast the two-sided nonsymmetric Lanczos process, where a serious breakdown — a vanishing bilinear product with nonzero vectors — can halt progress without convergence.)

Proposition 4 (finite termination at the grade). Let be the grade of with respect to . Then GMRES produces the exact solution at stage , and , with the degree of the minimal polynomial of .

Proof. By 43.07.02, the Krylov dimensions strictly increase until the grade , where is -invariant and . Proposition 3 then gives . The grade is the degree of the minimal polynomial of restricted to the cyclic subspace generated by , which divides , so . In exact arithmetic GMRES is therefore a finite method; its value is that the residual usually falls below tolerance long before stage .

Proposition 5 (residual-polynomial optimality and the diagonalisable bound). The GMRES residual satisfies , and for diagonalisable , .

Proof. The Krylov subspace is , so any has residual with , and is exactly the image of under (bijectively, the inverse being , polynomial since ). Minimising over equals . For the bound, and gives since is diagonal. Taking the minimum over admissible and dividing by yields the stated inequality.

Proposition 6 (restarted GMRES need not be monotone across cycles in the error, and may stagnate). Full GMRES residuals are monotone and reach zero by stage ; restarted GMRES residuals are monotone within each cycle of stages but the method can stagnate, with constant across many cycles when .

Proof sketch. Within one cycle, GMRES is full GMRES on the current residual, so Proposition 2 gives monotone decrease across its stages. At a restart the basis is discarded and Arnoldi begins afresh from the new residual , whose Krylov subspace need not contain the previously useful directions. If the optimal degree- residual polynomial requires roots that no degree- polynomial can supply — as when the spectrum has more than well-separated clusters, or when is non-normal so that the field of values straddles the origin — then each restarted cycle achieves the same modest reduction and the residual plateaus. The any-curve theorem (Theorem 4) shows such stagnation profiles are realisable. Stagnation is the price of the bounded storage ; remedies include larger , preconditioning to cluster the spectrum, and augmenting the restart subspace with retained approximate eigenvectors (deflated/augmented GMRES).

Connections Master

  • Krylov subspaces, Arnoldi, and Lanczos 43.07.02 is the engine GMRES runs on: the orthonormal basis and the Hessenberg factorisation built there are exactly what turns the -dimensional residual minimisation into the small least-squares problem solved here. The Ritz-value residual identity of that unit and the GMRES residual identity are two readings of the same subdiagonal — eigenvalue accuracy there, linear-solve accuracy here — and the lucky breakdown means exact invariance in both, exact eigenpairs there, exact solution here.

  • Least squares: normal equations, QR, and conditioning 43.04.01 supplies the inner solve: the Hessenberg least-squares problem is solved by the incremental Givens QR of that unit, one plane rotation per stage zeroing one subdiagonal, with the last rotated component of giving the residual norm for free. The conditioning theory there is what warns that GMRES minimises the residual, not the error: a small controls the error only up to , the very condition number that unit analyses.

  • The conjugate gradient method 43.07.04 is the symmetric-positive-definite specialisation of the Krylov-solver idea developed here: where GMRES needs the full growing orthonormal basis to minimise the residual for a general nonsymmetric , conjugate gradients exploits the short three-term Lanczos recurrence available for to minimise the energy norm of the error with storage and no stored basis. Both inherit the Krylov-subspace optimality and the polynomial-approximation convergence picture, but CG's spectrum is real and its eigenvector matrix unitary, so for CG the spectrum determines the convergence curve — the any-curve pathology of GMRES (Theorem 4) cannot occur — and the field-of-values machinery collapses back to the eigenvalue interval.

  • Stationary iterative methods: Jacobi, Gauss-Seidel, SOR 43.07.01 returns as the preconditioner: a splitting that was a weak solver on its own — its spectral radius creeping to one as the grid refines — becomes the operator that GMRES applies to cluster near one or shrink its field of values, converting the slow convergence of even optimal SOR into the handful of stages a well-clustered GMRES needs. The clustering heuristic of Theorem 2 is precisely the design target of that preconditioning.

Historical & philosophical context Master

GMRES was introduced by Yousef Saad and Martin H. Schultz in 1986, in a paper that gave the nonsymmetric linear-system problem its first robust minimal-residual Krylov method with a guaranteed non-increasing residual and finite termination [Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems]. It superseded a generation of earlier nonsymmetric attempts — the unstable variants of the biconjugate gradient family and the ORTHODIR/ORTHOMIN methods — by anchoring the iterate to the numerically reliable Arnoldi orthogonalisation rather than to the short but fragile two-sided recurrences. The restart device GMRES appeared in the same paper as the concession to finite memory, and the stagnation it can suffer has driven the subsequent literature on deflation, augmentation, and flexible preconditioning.

The convergence theory matured more slowly and more surprisingly. The naive expectation — inherited from the symmetric conjugate-gradient case — was that clustered eigenvalues away from the origin guarantee fast convergence. Anne Greenbaum, Vlastimil Pták, and Zdeněk Strakoš showed in 1996 that this expectation is false for nonsymmetric matrices in the strongest possible sense: any non-increasing residual curve is attainable by some matrix with any prescribed spectrum, so the eigenvalues alone carry no information about the GMRES history [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]. This redirected the analysis toward the field of values, for which Howard Elman's 1982 thesis bound and later refinements give genuine rates, and toward the pseudospectra developed by Lloyd N. Trefethen and Mark Embree, whose contour-integral estimates bound the residual polynomial on a region enclosing the spectrum at a controlled distance from the origin [Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]. The textbook consolidation of the algorithm, its preconditioned and restarted forms, and the polynomial-approximation convergence framework is in Saad's Iterative Methods for Sparse Linear Systems [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].

Bibliography Master

@article{SaadSchultz1986,
  author  = {Saad, Yousef and Schultz, Martin H.},
  title   = {{GMRES}: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems},
  journal = {SIAM Journal on Scientific and Statistical Computing},
  volume  = {7},
  number  = {3},
  year    = {1986},
  pages   = {856--869}
}

@book{Saad2003,
  author    = {Saad, Yousef},
  title     = {Iterative Methods for Sparse Linear Systems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2003}
}

@book{Greenbaum1997,
  author    = {Greenbaum, Anne},
  title     = {Iterative Methods for Solving Linear Systems},
  series    = {Frontiers in Applied Mathematics},
  publisher = {SIAM},
  year      = {1997}
}

@article{GreenbaumPtakStrakos1996,
  author  = {Greenbaum, Anne and Pt{\'a}k, Vlastimil and Strako{\v{s}}, Zden{\v{e}}k},
  title   = {Any Nonincreasing Convergence Curve Is Possible for {GMRES}},
  journal = {SIAM Journal on Matrix Analysis and Applications},
  volume  = {17},
  number  = {3},
  year    = {1996},
  pages   = {465--469}
}

@book{TrefethenEmbree2005,
  author    = {Trefethen, Lloyd N. and Embree, Mark},
  title     = {Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators},
  publisher = {Princeton University Press},
  year      = {2005}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@phdthesis{Elman1982,
  author = {Elman, Howard C.},
  title  = {Iterative Methods for Large, Sparse, Nonsymmetric Systems of Linear Equations},
  school = {Yale University},
  year   = {1982}
}