43.07.03 · numerical-analysis / 07-iterative-krylov-methods

GMRES

shipped3 tiersLean: none

Anchor (Master): Saad & Schultz 1986 *SIAM J. Sci. Stat. Comput.* 7(3):856-869 (the original GMRES paper — optimality, finite termination, the minimal-residual property); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 3 (convergence via polynomial approximation, the non-normal subtlety); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) §6.5-6.11 and §6.30 (restarted GMRES, stagnation, field-of-values bounds)

Intuition Beginner

Suppose you face a giant system of equations $A x = b$ , so large you can never write the matrix down, but you can do one cheap thing: hand the matrix a vector and get back the matrix times that vector. From this one power you grow a small space of candidate answers — the Krylov space of 43.07.02 — built from your starting guess and its successive products by the matrix. The question of this unit is: among all the candidate answers in that growing space, which one is best?

GMRES answers with a precise rule. For any candidate answer $x$ , the residual $b - A x$ measures how badly the equation is missed: a residual of zero means $x$ is exact. GMRES picks, at each stage, the candidate whose residual is as short as possible. That is why it is called the generalised minimal residual method. As the space grows by one direction each stage, the best residual can only get shorter or stay the same — never longer. So the method marches steadily toward the answer.

There is a clever bookkeeping trick that makes this cheap. Building the Krylov space the tidy way (the Arnoldi process) turns the giant minimisation into a tiny one: at stage $m$ you only solve a small lopsided system with $m + 1$ rows and $m$ columns. And you can read off how short the current residual is without ever assembling the full candidate answer — a single number falls out of the bookkeeping for free, so you know when to stop before paying to form the solution.

GMRES works for a general matrix, even a lopsided non-symmetric one. When the matrix is symmetric and positive-definite, a cheaper cousin called conjugate gradients does the same job with short-term memory; GMRES is the price you pay for generality.

Visual Beginner

The picture is a growing space of candidate answers, with GMRES standing at the true target and, at each stage, picking the candidate closest to it — so the leftover miss shrinks step by step.

Read the table top to bottom. Each row is one stage. The middle column is the length of the shortest residual achievable from the space built so far; it never grows. The right column notes the size of the tiny least-squares problem solved at that stage — always one row taller than it is wide.

stage $m$	shortest residual length	small problem solved
1	$∣ r_{1} ∣ \leq ∣ r_{0} ∣$	$2 \times 1$
2	$∣ r_{2} ∣ \leq ∣ r_{1} ∣$	$3 \times 2$
3	$∣ r_{3} ∣ \leq ∣ r_{2} ∣$	$4 \times 3$
4	$∣ r_{4} ∣ \leq ∣ r_{3} ∣$	$5 \times 4$

The takeaway: GMRES is "find the nearest point of a growing space to a fixed target". Because each new space contains the last, the nearest point can only get nearer. The work at each stage is a small, tall least-squares fit, and the size of the leftover at the bottom of that fit is exactly the residual length you are driving to zero.

Worked example Beginner

Take the small system $$ A = \begin{pmatrix} 2 & 1 \ 0 & 2 \end{pmatrix}, \qquad b = \begin{pmatrix} 1 \ 1 \end{pmatrix}, $$ and start from the guess $x_{0} = (0, 0)$ , so the starting residual is $r_{0} = b - A x_{0} = b = (1, 1)$ , of length $2$ .

Stage 1 (the first Krylov direction). The first tidied direction is $r_{0}$ rescaled to length one: $q_{1} = (1, 1) / 2$ . GMRES looks for the best candidate of the form $x_{1} = x_{0} + τ q_{1}$ — a single number $τ$ to choose. We want to make $b - A x_{1}$ as short as possible.

Compute $A q_{1} = \frac{1}{2} (2 + 1, 0 + 2) = \frac{1}{2} (3, 2)$ . The residual of the candidate is $r_{0} - τ A q_{1} = (1, 1) - \frac{τ}{2} (3, 2)$ .

Choosing $τ$ to minimise the length of $(1 - 3 τ / 2, 1 - 2 τ / 2)$ is a one-variable least-squares fit. Setting the derivative of the squared length to zero: the squared length is $(1 - 3 s)^{2} + (1 - 2 s)^{2}$ with $s = τ / 2$ , whose derivative is $- 6 (1 - 3 s) - 4 (1 - 2 s) = - 10 + 26 s$ , zero at $s = 10/26 = 5/13$ . The shortest residual then has squared length $(1 - 15/13)^{2} + (1 - 10/13)^{2} = (2/13)^{2} + (3/13)^{2} = 13/169 = 1/13$ , so $∥ r_{1} ∥ = 1/ 13 \approx 0.277$ .

What this tells us: one stage cut the residual from $2 \approx 1.414$ down to about $0.277$ , just by choosing the single best step along the first Krylov direction. A second stage would add the next direction and drive the residual to zero, since the space then fills all of the plane. Each stage solved only a tiny minimisation, never touching $A$ except through one matrix-times-vector product.

Check your understanding Beginner

Formal definition Intermediate+

Let $A \in F^{n \times n}$ with $F \in {R, C}$ be nonsingular and possibly nonsymmetric, let $b \in F^{n}$ , and fix an initial guess $x_{0}$ with residual $r_{0} = b - A x_{0} \neq = 0$ . The $m$ -th GMRES iterate is the unique minimiser $$ x_m = \operatorname*{arg,min}_{x \in x_0 + \mathcal{K}_m(A, r_0)} |b - A x|_2, \qquad \mathcal{K}_m(A, r_0) = \operatorname{span}{r_0, A r_0, \dots, A^{m-1} r_0}, $$ where $K_{m} (A, r_{0})$ is the Krylov subspace of 43.07.02 generated by the residual. Equivalently, writing $x = x_{0} + z$ with $z \in K_{m}$ , GMRES minimises $∥ r_{0} - A z ∥_{2}$ over $z \in K_{m}$ ^{[Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems]}.

The Arnoldi reduction. Run the Arnoldi process of 43.07.02 on $A$ with starting vector $r_{0}$ . It produces, before breakdown, an orthonormal basis $Q_{m} = [q_{1} \dots q_{m}]$ of $K_{m}$ with $q_{1} = r_{0} / β$ , $β = ∥ r_{0} ∥_{2}$ , together with the $(m + 1) \times m$ upper-Hessenberg matrix $\tilde{H}_{m}$ satisfying the Arnoldi relation $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ . Parametrise $z = Q_{m} y$ with $y \in F^{m}$ . Then $$ r_0 - A z = \beta q_1 - A Q_m y = \beta q_1 - Q_{m+1}\tilde H_m y = Q_{m+1}\big(\beta e_1 - \tilde H_m y\big), $$ using $q_{1} = Q_{m + 1} e_{1}$ . Since $Q_{m + 1}$ has orthonormal columns it preserves the Euclidean norm, so $$ |b - A x|_2 = |r_0 - A z|_2 = \big|,\beta e_1 - \tilde H_m y,\big|2. $$ The large minimisation collapses to the small $(m + 1) \times m$ least-squares problem $\min{y}|\beta e_1 - \tilde H_m y|_2 $, o f t h e k in d t r e a t e d in [43.04.01] . T h e minimi ser$ y_m $g i v es$ x_m = x_0 + Q_m y_m $, an d t h er es i d u a l n or m e q u a l s t h eo pt ima l l e a s t - s q u a r esr es i d u a l$ |\beta e_1 - \tilde H_m y_m|_2$.

Givens-rotation solution and the cheap residual. Because $\tilde{H}_{m}$ is upper-Hessenberg, its QR factorisation is built by $m$ plane (Givens) rotations $G_{1}, \dots, G_{m}$ , each zeroing one subdiagonal entry $h_{j + 1, j}$ — the incremental QR of 43.04.01. Applying the accumulated rotations to $β e_{1}$ yields a vector $g \in F^{m + 1}$ whose first $m$ entries feed a triangular back-substitution for $y_{m}$ and whose last entry $g_{m + 1}$ satisfies $$ |b - A x_m|2 = |g{m+1}|. $$ The residual norm is therefore available at every stage without forming $x_{m}$ : one monitors $∣ g_{m + 1} ∣$ and only assembles $x_{m} = x_{0} + Q_{m} y_{m}$ once a stopping tolerance is met. Each new stage appends one rotation, so the update is incremental.

Restarted GMRES. Storing $Q_{m}$ costs $O (mn)$ and the orthogonalisation cost grows like $O (m^{2} n)$ ; for large $m$ this is prohibitive. Restarted GMRES, written GMRES $(k)$ , runs $k$ stages, forms $x_{k}$ , discards $Q_{k}$ , and restarts the whole process with $x_{0} \leftarrow x_{k}$ and $r_{0} \leftarrow b - A x_{k}$ . This caps storage at $O (k n)$ at the cost of losing the global optimality across restart boundaries.

Counterexamples to common slips

GMRES minimises the residual, not the error. The quantity made small is $∥ b - A x_{m} ∥_{2} = ∥ A (x_{⋆} - x_{m}) ∥_{2}$ , not $∥ x_{⋆} - x_{m} ∥_{2}$ . For an ill-conditioned $A$ a small residual can coexist with a large error; the error bound carries a factor $κ (A)$ .
Breakdown is a feature. The Arnoldi subdiagonal vanishing, $h_{m + 1, m} = 0$ , is a lucky breakdown: it signals that $K_{m}$ is $A$ -invariant and that $x_{m}$ is the exact solution, residual zero. Unlike the nonsymmetric Lanczos process, GMRES cannot suffer a serious breakdown.
The spectrum alone need not predict convergence. For a non-normal $A$ , eigenvalues clustered away from the origin do not guarantee fast GMRES convergence; the eigenvector conditioning $κ (V)$ , the field of values, or the pseudospectrum enters. The diagonalisable bound is only a bound, and a loose one when $V$ is ill-conditioned.
Restarted GMRES $(k)$ can stagnate. Plain (full) GMRES converges in at most $n$ steps in exact arithmetic, but GMRES $(k)$ can stall short of the solution, the residual flat across many restart cycles, when the discarded directions were the ones that mattered.

Key theorem with proof Intermediate+

The signature result is that the GMRES minimisation over the Krylov subspace reduces exactly to the small Hessenberg least-squares problem, that the residual norm is the last rotated component, and that the residual is monotonically non-increasing with finite termination.

Theorem (GMRES least-squares reduction, monotonicity, and finite termination). Let $A$ be nonsingular, $r_{0} = b - A x_{0} \neq = 0$ , $β = ∥ r_{0} ∥_{2}$ , and run Arnoldi on $(A, r_{0})$ producing $Q_{m + 1}$ and the unreduced $(m + 1) \times m$ Hessenberg $\tilde{H}_{m}$ . Then the GMRES iterate satisfies $$ x_m = x_0 + Q_m y_m, \qquad y_m = \operatorname*{arg,min}_{y \in \mathbb{F}^m}\big|\beta e_1 - \tilde H_m y\big|_2, \qquad |b - A x_m|2 = \min_y\big|\beta e_1 - \tilde H_m y\big|2. $$ *The residual norms are non-increasing, $|b - A x{m+1}|2 \le |b - A x_m|2 $, an d t h er e i s a l e a s t in d e x$ \nu \le n $(t h e g r a d eo f$ r_0 $) a tw hi c h$ h{\nu+1,\nu} = 0 $an d$ x\nu = x\star$ exactly* ^{[Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997)]}.

Proof. The reduction is the computation of the Formal definition. Every $x \in x_{0} + K_{m}$ is $x = x_{0} + Q_{m} y$ for a unique $y \in F^{m}$ since $Q_{m}$ has full column rank. The Arnoldi relation $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ and $r_{0} = β q_{1} = β Q_{m + 1} e_{1}$ give $$ b - A x = r_0 - A Q_m y = Q_{m+1}(\beta e_1 - \tilde H_m y). $$ The columns of $Q_{m + 1}$ are orthonormal, so $∥ Q_{m + 1} u ∥_{2} = ∥ u ∥_{2}$ for every $u \in F^{m + 1}$ ; hence $∥ b - A x ∥_{2} = ∥ β e_{1} - \tilde{H}_{m} y ∥_{2}$ . Minimising the left side over $x \in x_{0} + K_{m}$ is therefore the same as minimising the right side over $y \in F^{m}$ , which is the stated least-squares problem; its minimiser $y_{m}$ yields $x_{m} = x_{0} + Q_{m} y_{m}$ and residual norm equal to the optimal value.

Monotonicity follows from the nesting $K_{m} \subseteq K_{m + 1}$ . The feasible set $x_{0} + K_{m}$ for stage $m$ is contained in that for stage $m + 1$ , so the minimum of $∥ b - A x ∥_{2}$ over the larger set is at most the minimum over the smaller: $∥ b - A x_{m + 1} ∥_{2} \leq ∥ b - A x_{m} ∥_{2}$ .

For finite termination, the Krylov dimensions $dim K_{m}$ strictly increase by one until the grade $ν$ of $r_{0}$ — the least $m$ with $K_{m} = K_{m + 1}$ , established in 43.07.02 — and at that step the Arnoldi remainder vanishes, $h_{ν + 1, ν} = 0$ , so $\tilde{H}_{ν} = H_{ν}$ is the square $ν \times ν$ Hessenberg with $A Q_{ν} = Q_{ν} H_{ν}$ . The least-squares problem becomes the square system $H_{ν} y = β e_{1}$ . Since $A$ is nonsingular and $K_{ν}$ is $A$ -invariant, the restriction of $A$ to $K_{ν}$ is nonsingular, so $H_{ν} = Q_{ν}^{*} A Q_{ν}$ is nonsingular and the system has a solution $y_{ν}$ with zero residual. Then $b - A x_{ν} = Q_{ν + 1} (β e_{1} - \tilde{H}_{ν} y_{ν}) = 0$ , so $x_{ν} = x_{⋆}$ , and $ν \leq n$ because $dim K_{ν} \leq n$ . $□$

Bridge. This reduction is the foundational reason GMRES is practical at all: it builds toward the convergence theory of the Advanced results by replacing an $n$ -dimensional minimisation with an $(m + 1) \times m$ Hessenberg least-squares problem whose answer carries its own error estimate, the last rotated Givens component. This is exactly the Arnoldi factorisation of 43.07.02 read as a solver rather than an eigenvalue probe — there the projected $H_{m}$ supplies Ritz values, here the augmented $\tilde{H}_{m}$ supplies the residual-minimising step — and the monotone residual decrease generalises the monotone Rayleigh-quotient bound of that unit from extremal eigenvalues to the full residual. The least-squares inner solve is exactly the incremental Givens QR of 43.04.01 applied to a Hessenberg matrix, so the per-step cost is one rotation. The construction appears again in 43.07.04, where for symmetric positive-definite $A$ the same Krylov optimality is achieved with a short three-term recurrence instead of a growing orthonormal basis; putting these together, the central insight is that GMRES is "least squares on the Arnoldi Hessenberg", and the bridge is that every quantity a practitioner needs — the iterate, the residual norm, the stopping decision — is read off this one small projected problem one column at a time.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Derive the residual-norm identity $∥ b - A x_{m} ∥_{2} = ∥ β e_{1} - \tilde{H}_{m} y_{m} ∥_{2}$ from the Arnoldi relation, stating where orthonormality of $Q_{m + 1}$ is used.

Hint

Write $x_{m} = x_{0} + Q_{m} y_{m}$ , substitute $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ and $r_{0} = β Q_{m + 1} e_{1}$ , then use $∥ Q_{m + 1} u ∥_{2} = ∥ u ∥_{2}$ .

Answer

With $x_{m} = x_{0} + Q_{m} y_{m}$ the residual is $b - A x_{m} = r_{0} - A Q_{m} y_{m}$ . Substituting $r_{0} = β q_{1} = β Q_{m + 1} e_{1}$ and the Arnoldi relation $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ gives $b - A x_{m} = Q_{m + 1} (β e_{1} - \tilde{H}_{m} y_{m})$ . Because $Q_{m + 1}^{*} Q_{m + 1} = I_{m + 1}$ , the map $u \mapsto Q_{m + 1} u$ preserves the Euclidean norm: $∥ Q_{m + 1} u ∥_{2}^{2} = u^{*} Q_{m + 1}^{*} Q_{m + 1} u = u^{*} u = ∥ u ∥_{2}^{2}$ . Hence $∥ b - A x_{m} ∥_{2} = ∥ β e_{1} - \tilde{H}_{m} y_{m} ∥_{2}$ . Orthonormality of $Q_{m + 1}$ is exactly what turns the $n$ -dimensional residual norm into the $(m + 1)$ -dimensional one.

Exercise 4 (medium, symbolic).

Show that GMRES breakdown — $h_{m + 1, m} = 0$ at stage $m$ — implies $x_{m} = x_{⋆}$ exactly, given $A$ nonsingular.

Hint

Breakdown makes $\tilde{H}_{m} = H_{m}$ square with $A Q_{m} = Q_{m} H_{m}$ ; argue $H_{m}$ is nonsingular, then solve $H_{m} y = β e_{1}$ .

Answer

If $h_{m + 1, m} = 0$ the Arnoldi relation closes to $A Q_{m} = Q_{m} H_{m}$ with $H_{m}$ the square $m \times m$ block, so $K_{m} = range (Q_{m})$ is $A$ -invariant. The restriction $A ∣_{K_{m}}$ has matrix $H_{m} = Q_{m}^{*} A Q_{m}$ in the orthonormal basis $Q_{m}$ . Since $A$ is nonsingular and $K_{m}$ is $A$ -invariant of dimension $m$ , $A$ maps $K_{m}$ bijectively onto itself, so $H_{m}$ is nonsingular. The least-squares problem $min_{y} ∥ β e_{1} - H_{m} y ∥$ is now a square nonsingular system $H_{m} y_{m} = β e_{1}$ with exact solution and zero residual; thus $b - A x_{m} = Q_{m} (β e_{1} - H_{m} y_{m}) = 0$ , so $x_{m} = x_{⋆}$ . The vanishing subdiagonal is a lucky breakdown: the method has converged exactly.

Exercise 5 (medium, symbolic).

Prove the residual-polynomial characterisation $∥ r_{m} ∥_{2} = min_{p \in P_{m}, p (0) = 1} ∥ p (A) r_{0} ∥_{2}$ , where $P_{m}$ is the set of polynomials of degree $\leq m$ .

Hint

Every $z \in K_{m}$ has the form $z = q (A) r_{0}$ with $de g q < m$ ; set $x_{m} = x_{0} + z$ and write the residual as $p (A) r_{0}$ with $p (t) = 1 - t q (t)$ , so $p (0) = 1$ .

Answer

Any $x \in x_{0} + K_{m}$ is $x = x_{0} + q (A) r_{0}$ for a polynomial $q$ of degree $< m$ , since $K_{m} = {q (A) r_{0} : de g q < m}$ . Its residual is $b - A x = r_{0} - A q (A) r_{0} = (I - A q (A)) r_{0} = p (A) r_{0}$ where $p (t) = 1 - t q (t)$ . As $q$ ranges over degree $< m$ , $p$ ranges over all polynomials of degree $\leq m$ with $p (0) = 1 - 0 = 1$ , and conversely every such $p$ arises from $q (t) = (1 - p (t)) / t$ (a polynomial since $p (0) = 1$ makes $1 - p$ vanish at $0$ ). GMRES minimises $∥ b - A x ∥_{2}$ over this set, so $∥ r_{m} ∥_{2} = min_{p \in P_{m}, p (0) = 1} ∥ p (A) r_{0} ∥_{2}$ . The minimal residual is the smallest size a degree- $m$ polynomial in $A$ , normalised to $p (0) = 1$ , can make $p (A)$ act on $r_{0}$ .

Exercise 7 (hard, symbolic).

Prove the diagonalisable convergence bound: if $A = V Λ V^{- 1}$ with $Λ = diag (λ_{1}, \dots, λ_{n})$ , then $$ \frac{|r_m|2}{|r_0|2} \le \kappa_2(V),\min{p\in P_m,,p(0)=1}\ \max{1\le i\le n}|p(\lambda_i)|, $$ where $κ_{2} (V) = ∥ V ∥_{2} ∥ V^{- 1} ∥_{2}$ .

Hint

Start from $∥ r_{m} ∥_{2} = min_{p} ∥ p (A) r_{0} ∥_{2}$ (Exercise 5), substitute $p (A) = V p (Λ) V^{- 1}$ , and bound $∥ p (Λ) ∥_{2} = max_{i} ∣ p (λ_{i}) ∣$ .

Answer

By Exercise 5, $∥ r_{m} ∥_{2} = min_{p \in P_{m}, p (0) = 1} ∥ p (A) r_{0} ∥_{2} \leq min_{p} ∥ p (A) ∥_{2} ∥ r_{0} ∥_{2}$ . For diagonalisable $A$ , $p (A) = V p (Λ) V^{- 1}$ , so by submultiplicativity $∥ p (A) ∥_{2} \leq ∥ V ∥_{2} ∥ p (Λ) ∥_{2} ∥ V^{- 1} ∥_{2} = κ_{2} (V) ∥ p (Λ) ∥_{2}$ . Since $p (Λ)$ is diagonal with entries $p (λ_{i})$ , its spectral norm is $∥ p (Λ) ∥_{2} = max_{i} ∣ p (λ_{i}) ∣$ . Combining, $$ |r_m|2 \le \kappa_2(V),\Big(\min{p}\max_i |p(\lambda_i)|\Big),|r_0|_2, $$ and dividing by $∥ r_{0} ∥_{2}$ gives the bound. The minimax factor is a pure approximation-theory quantity: how small a degree- $m$ polynomial normalised by $p (0) = 1$ can be made on the spectrum. When the eigenvalues cluster into a few groups away from the origin, a low-degree polynomial with roots near the clusters makes the factor tiny, explaining fast convergence; the prefactor $κ_{2} (V)$ is where non-normality intrudes, inflating the bound when the eigenvectors are nearly dependent.

Exercise 8 (hard, symbolic).

Show that for normal $A$ (so $V$ can be taken unitary, $κ_{2} (V) = 1$ ) the GMRES convergence is governed purely by the spectrum, and explain why the same conclusion fails for a highly non-normal $A$ even with identical eigenvalues.

Hint

Set $κ_{2} (V) = 1$ in the bound of Exercise 7 and argue it is essentially sharp for normal matrices. For the non-normal case, consider that $κ_{2} (V)$ can be enormous, and recall the residual depends on $p (A)$ , not on $p (Λ)$ alone.

Answer

If $A$ is normal it is unitarily diagonalisable, $A = U Λ U^{*}$ with $U$ unitary, so $κ_{2} (V) = κ_{2} (U) = 1$ and the bound becomes $∥ r_{m} ∥/∥ r_{0} ∥ \leq min_{p} max_{i} ∣ p (λ_{i}) ∣$ with no prefactor. This bound is attainable up to the choice of $r_{0}$ : $∥ p (A) r_{0} ∥_{2} \leq max_{i} ∣ p (λ_{i}) ∣ ∥ r_{0} ∥_{2}$ with equality when $r_{0}$ aligns with the eigenvector of the maximising $λ_{i}$ , so the minimax polynomial value on the spectrum determines convergence. For a non-normal $A$ with the same eigenvalues, $V$ may be wildly ill-conditioned ( $κ_{2} (V) ≫ 1$ ), and the inequality $∥ p (A) ∥_{2} \leq κ_{2} (V) max_{i} ∣ p (λ_{i}) ∣$ becomes loose; worse, $∥ p (A) ∥_{2}$ can be many orders larger than $max_{i} ∣ p (λ_{i}) ∣$ because $p (A)$ depends on the whole Jordan-like structure, not just the eigenvalues. The honest governing quantities are then the field of values $W (A) = {x^{*} A x : ∥ x ∥ = 1}$ or the pseudospectrum, on which one bounds $min_{p} max_{z} ∣ p (z) ∣$ . Identical spectra can produce radically different GMRES curves — the Greenbaum-Pták-Strakoš construction realises any non-increasing residual curve with a prescribed spectrum — so for non-normal problems the spectrum alone is not a reliable predictor.

Advanced results Master

Theorem 1 (minimal-residual optimality and the residual polynomial). Among all $x \in x_{0} + K_{m} (A, r_{0})$ the GMRES iterate $x_{m}$ is the unique minimiser of $∥ b - A x ∥_{2}$ , and its residual obeys the optimality characterisation $$ |r_m|2 = \min{p \in P_m,, p(0)=1}|p(A) r_0|_2, $$ where $P_{m}$ is the space of polynomials of degree at most $m$ . Uniqueness is convexity: $y \mapsto ∥ β e_{1} - \tilde{H}_{m} y ∥_{2}$ is a strictly convex function on the range of the full-column-rank $\tilde{H}_{m}$ (an unreduced Hessenberg has rank $m$ ), so its minimiser is unique, and the affine map $y \mapsto x_{0} + Q_{m} y$ transports uniqueness to $x_{m}$ . The polynomial form follows because the GMRES residual is $p (A) r_{0}$ with $p (0) = 1$ , and minimising over the subspace is minimising over such polynomials. This recasts GMRES convergence as a problem in approximation theory on $A$ : how well can a degree- $m$ polynomial, pinned to value one at the origin, annihilate $r_{0}$ ^{[Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems]}.

Theorem 2 (diagonalisable convergence bound and the clustering heuristic). If $A = V Λ V^{- 1}$ is diagonalisable, then $$ \frac{|r_m|2}{|r_0|2} \le \kappa_2(V)\ \min{p\in P_m,, p(0)=1}\ \max{\lambda \in \sigma(A)}|p(\lambda)|. $$ The minimax factor is small whenever the spectrum splits into a few tight clusters bounded away from the origin: a polynomial with one root near each cluster of total degree equal to the number of clusters is already tiny on $σ (A)$ , so GMRES converges in roughly as many stages as there are clusters. The factor $κ_{2} (V)$ quantifies the eigenvector conditioning; for normal $A$ it is one and the bound is essentially tight, but for non-normal $A$ it can be astronomically large, and then the bound says little. The clustering heuristic — "GMRES converges fast when the eigenvalues cluster" — is reliable for normal or mildly non-normal matrices and is exactly where preconditioning aims, to reshape $σ (A)$ into clusters ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Theorem 3 (field-of-values bound for the non-normal case). Let $W(A) = {x^ A x : |x|_2 = 1} $b e t h e f i e l d o f v a l u es (n u m er i c a l r an g e) o f$ A $, an d s u pp ose$ 0 \notin W(A) $, so t h e d i s t an ce$ \nu = \operatorname{dist}(0, W(A)) > 0 $an d$ W(A) $l i es ina d i sco f r a d i u s$ R $ab o u t so m ece n t r e . T h e n GM R E S co n v er g es l in e a r l y w i t ha r a t e d e t er min e d b y$ W(A) $r a t h er t hanb y$ \sigma(A)$ alone; a representative Elman-type bound is* $$ \frac{|r_m|_2}{|r_0|_2} \le \Big(1 - \frac{\nu^2}{|A|2^2}\Big)^{m/2}. $$ When $A$ is far from normal the spectrum can be deeply misleading — eigenvalues bunched away from zero while the field of values straddles the origin, in which case GMRES stagnates despite a "good" spectrum. The field of values, and more finely the pseudospectrum $\sigma\varepsilon(A) = {z : |(zI - A)^{-1}|_2 \ge \varepsilon^{-1}}$, are the correct convergence-controlling sets: a min-max polynomial small on a pseudospectral region enclosing the spectrum at a safe distance from the origin yields a genuine residual bound by a contour-integral (Cauchy) estimate ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}.

Theorem 4 (any-curve theorem; the spectrum does not determine the GMRES curve). Given any set of $n$ nonzero complex numbers (a prescribed spectrum) and any non-increasing sequence $∥ r_{0} ∥ \geq ∥ r_{1} ∥ \geq \dots \geq ∥ r_{n - 1} ∥ > ∥ r_{n} ∥ = 0$ , there exists a matrix $A$ with that spectrum and a right-hand side $b$ such that GMRES applied to $A x = b$ produces exactly that residual sequence. The construction (Greenbaum, Pták, Strakoš) decouples the eigenvalues from the convergence history entirely: any admissible residual curve is compatible with any prescribed spectrum, so no theorem bounding GMRES convergence by the eigenvalues alone can exist for general matrices. This is the sharp statement of why the diagonalisable bound's $κ_{2} (V)$ prefactor cannot be removed and why non-normal convergence analysis must invoke the field of values or pseudospectra. For the symmetric case the situation reverts: for $A = A^{*}$ the eigenvector matrix is unitary, the prefactor is one, and the spectrum does determine the convergence curve ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}.

Synthesis. GMRES is one construction — minimise $∥ b - A x ∥_{2}$ over the affine Krylov space $x_{0} + K_{m} (A, r_{0})$ — viewed under one invariant, the residual polynomial $p (A) r_{0}$ with $p (0) = 1$ , and the foundational reason the method works is that this minimisation is, by the Arnoldi factorisation of 43.07.02, a small Hessenberg least-squares problem whose residual norm is read off a single Givens component. This is exactly the Arnoldi eigenvalue engine of that unit turned to a different end: there the projected $H_{m}$ delivers Ritz values approximating the spectrum, here the augmented $\tilde{H}_{m}$ delivers the residual-minimising step, and the same nesting $K_{m} \subseteq K_{m + 1}$ that made the extremal Ritz values monotone makes the GMRES residual monotone. The polynomial recasting of Theorem 1 generalises the static eigenvalue-approximation picture into a dynamical approximation-theory one: the convergence rate is a min-max polynomial value on a spectral set, and the central insight is that whether that set is the spectrum, the field of values, or the pseudospectrum is decided by the normality of $A$ .

The non-normal subtlety is where the bridge to honest practice is built: Theorem 2's clustering heuristic is reliable only through the prefactor $κ_{2} (V)$ , Theorem 3 replaces the spectrum by the field of values when $A$ is far from normal, and Theorem 4 shows the eigenvalues alone determine nothing about the GMRES curve for general matrices — the any-curve construction is dual to the diagonalisable bound's looseness, each expressing that $p (A)$ is not $p (Λ)$ . Putting these together, GMRES is the general-matrix residual-minimiser whose symmetric-positive-definite specialisation reappears in 43.07.04 as conjugate gradients with a short recurrence and a spectrum-determined error bound; the bridge is that the stationary-iteration splitting $M$ of 43.07.01, powerless on its own against a spectral radius creeping to one, returns here as a preconditioner whose only job is to cluster the spectrum or shrink the field of values so the GMRES residual polynomial can do its work in a handful of stages.

Full proof set Master

Proposition 1 (least-squares reduction and norm identity). With $Q_{m + 1}, \tilde{H}_{m}$ from Arnoldi on $(A, r_{0})$ and $β = ∥ r_{0} ∥_{2}$ , the GMRES iterate is $x_{m} = x_{0} + Q_{m} y_{m}$ with $y_{m} = ar g min_{y} ∥ β e_{1} - \tilde{H}_{m} y ∥_{2}$ , and $∥ b - A x_{m} ∥_{2} = ∥ β e_{1} - \tilde{H}_{m} y_{m} ∥_{2}$ .

Proof. Each $x \in x_{0} + K_{m}$ is $x_{0} + Q_{m} y$ for a unique $y$ because $Q_{m}$ has orthonormal (hence independent) columns spanning $K_{m}$ . The Arnoldi relation $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ and $r_{0} = β q_{1} = β Q_{m + 1} e_{1}$ give $b - A x = r_{0} - A Q_{m} y = Q_{m + 1} (β e_{1} - \tilde{H}_{m} y)$ . Orthonormality of $Q_{m + 1}$ yields $∥ b - A x ∥_{2} = ∥ β e_{1} - \tilde{H}_{m} y ∥_{2}$ . Minimising both sides over their (corresponding) feasible sets is the same problem; the minimiser $y_{m}$ gives $x_{m}$ and the residual identity. $□$

Proposition 2 (monotonicity of the residual). $∥ r_{m + 1} ∥_{2} \leq ∥ r_{m} ∥_{2}$ for all $m$ before termination.

Proof. The feasible affine spaces are nested, $x_{0} + K_{m} \subseteq x_{0} + K_{m + 1}$ , since $K_{m} \subseteq K_{m + 1}$ . The minimum of a fixed function over a larger set cannot exceed its minimum over a subset, so $∥ r_{m + 1} ∥_{2} = min_{x \in x_{0} + K_{m + 1}} ∥ b - A x ∥_{2} \leq min_{x \in x_{0} + K_{m}} ∥ b - A x ∥_{2} = ∥ r_{m} ∥_{2}$ . $□$

Proposition 3 (lucky breakdown equals exact solution). If $h_{m + 1, m} = 0$ at stage $m$ and $A$ is nonsingular, then $x_{m} = x_{⋆}$ with $r_{m} = 0$ , and this is the only way GMRES can break down.

Proof. Breakdown closes the Arnoldi relation to $A Q_{m} = Q_{m} H_{m}$ , so $K_{m} = range (Q_{m})$ is $A$ -invariant. Restricted to the $m$ -dimensional invariant $K_{m}$ , the nonsingular $A$ acts bijectively, so $H_{m} = Q_{m}^{*} A Q_{m}$ is nonsingular and $H_{m} y_{m} = β e_{1}$ has the exact solution $y_{m} = β H_{m}^{- 1} e_{1}$ . The residual is $b - A x_{m} = Q_{m} (β e_{1} - H_{m} y_{m}) = 0$ . No serious breakdown is possible: the Arnoldi step fails only by producing the zero remainder $w = 0$ , which is precisely $h_{m + 1, m} = 0$ , and that case has just been shown to be convergence. (Contrast the two-sided nonsymmetric Lanczos process, where a serious breakdown — a vanishing bilinear product with nonzero vectors — can halt progress without convergence.) $□$

Proposition 4 (finite termination at the grade). Let $ν$ be the grade of $r_{0}$ with respect to $A$ . Then GMRES produces the exact solution at stage $ν$ , and $ν \leq de g m_{A} \leq n$ , with $de g m_{A}$ the degree of the minimal polynomial of $A$ .

Proof. By 43.07.02, the Krylov dimensions strictly increase until the grade $ν$ , where $K_{ν} = K_{ν + 1}$ is $A$ -invariant and $h_{ν + 1, ν} = 0$ . Proposition 3 then gives $x_{ν} = x_{⋆}$ . The grade is the degree of the minimal polynomial of $A$ restricted to the cyclic subspace generated by $r_{0}$ , which divides $m_{A}$ , so $ν \leq de g m_{A} \leq n$ . In exact arithmetic GMRES is therefore a finite method; its value is that the residual usually falls below tolerance long before stage $ν$ . $□$

Proposition 5 (residual-polynomial optimality and the diagonalisable bound). The GMRES residual satisfies $∥ r_{m} ∥_{2} = min_{p \in P_{m}, p (0) = 1} ∥ p (A) r_{0} ∥_{2}$ , and for diagonalisable $A = V Λ V^{- 1}$ , $\frac{∥ r _{m} ∥ _{2}}{∥ r _{0} ∥ _{2}} \leq κ_{2} (V) min_{p \in P_{m}, p (0) = 1} max_{λ \in σ (A)} ∣ p (λ) ∣$ .

Proof. The Krylov subspace is $K_{m} = {q (A) r_{0} : de g q < m}$ , so any $x \in x_{0} + K_{m}$ has residual $b - A x = (I - A q (A)) r_{0} = p (A) r_{0}$ with $p (t) = 1 - tq (t)$ , and ${p \in P_{m} : p (0) = 1}$ is exactly the image of ${q : de g q < m}$ under $q \mapsto 1 - tq$ (bijectively, the inverse being $q = (1 - p) / t$ , polynomial since $p (0) = 1$ ). Minimising $∥ b - A x ∥_{2}$ over $K_{m}$ equals $min_{p \in P_{m}, p (0) = 1} ∥ p (A) r_{0} ∥_{2}$ . For the bound, $∥ p (A) r_{0} ∥_{2} \leq ∥ p (A) ∥_{2} ∥ r_{0} ∥_{2}$ and $p (A) = V p (Λ) V^{- 1}$ gives $∥ p (A) ∥_{2} \leq κ_{2} (V) ∥ p (Λ) ∥_{2} = κ_{2} (V) max_{i} ∣ p (λ_{i}) ∣$ since $p (Λ)$ is diagonal. Taking the minimum over admissible $p$ and dividing by $∥ r_{0} ∥_{2}$ yields the stated inequality. $□$

Proposition 6 (restarted GMRES need not be monotone across cycles in the error, and may stagnate). Full GMRES residuals are monotone and reach zero by stage $ν$ ; restarted GMRES $(k)$ residuals are monotone within each cycle of $k$ stages but the method can stagnate, with $∥ r ∥$ constant across many cycles when $k < ν$ .

Proof sketch. Within one cycle, GMRES $(k)$ is full GMRES on the current residual, so Proposition 2 gives monotone decrease across its $k$ stages. At a restart the basis $Q_{k}$ is discarded and Arnoldi begins afresh from the new residual $r_{0}^{'} = b - A x_{k}$ , whose Krylov subspace $K_{k} (A, r_{0}^{'})$ need not contain the previously useful directions. If the optimal degree- $(> k)$ residual polynomial requires roots that no degree- $k$ polynomial can supply — as when the spectrum has more than $k$ well-separated clusters, or when $A$ is non-normal so that the field of values straddles the origin — then each restarted cycle achieves the same modest reduction and the residual plateaus. The any-curve theorem (Theorem 4) shows such stagnation profiles are realisable. Stagnation is the price of the bounded storage $O (k n)$ ; remedies include larger $k$ , preconditioning to cluster the spectrum, and augmenting the restart subspace with retained approximate eigenvectors (deflated/augmented GMRES). $□$

Connections Master

Krylov subspaces, Arnoldi, and Lanczos 43.07.02 is the engine GMRES runs on: the orthonormal basis $Q_{m}$ and the Hessenberg factorisation $A Q_{m} = Q_{m + 1} \tilde{H}_{m}$ built there are exactly what turns the $n$ -dimensional residual minimisation into the small $(m + 1) \times m$ least-squares problem solved here. The Ritz-value residual identity $∥ A y - θ y ∥ = ∣ h_{m + 1, m} ∣ ∣ e_{m}^{*} s ∣$ of that unit and the GMRES residual identity $∥ r_{m} ∥ = ∣ g_{m + 1} ∣$ are two readings of the same subdiagonal $h_{m + 1, m}$ — eigenvalue accuracy there, linear-solve accuracy here — and the lucky breakdown $h_{m + 1, m} = 0$ means exact invariance in both, exact eigenpairs there, exact solution here.
Least squares: normal equations, QR, and conditioning 43.04.01 supplies the inner solve: the $(m + 1) \times m$ Hessenberg least-squares problem $min_{y} ∥ β e_{1} - \tilde{H}_{m} y ∥$ is solved by the incremental Givens QR of that unit, one plane rotation per stage zeroing one subdiagonal, with the last rotated component of $β e_{1}$ giving the residual norm for free. The conditioning theory there is what warns that GMRES minimises the residual, not the error: a small $∥ b - A x_{m} ∥$ controls the error only up to $κ (A)$ , the very condition number that unit analyses.
The conjugate gradient method 43.07.04 is the symmetric-positive-definite specialisation of the Krylov-solver idea developed here: where GMRES needs the full growing orthonormal basis $Q_{m}$ to minimise the residual for a general nonsymmetric $A$ , conjugate gradients exploits the short three-term Lanczos recurrence available for $A = A^{*}$ to minimise the energy norm of the error with $O (n)$ storage and no stored basis. Both inherit the Krylov-subspace optimality and the polynomial-approximation convergence picture, but CG's spectrum is real and its eigenvector matrix unitary, so for CG the spectrum determines the convergence curve — the any-curve pathology of GMRES (Theorem 4) cannot occur — and the field-of-values machinery collapses back to the eigenvalue interval.
Stationary iterative methods: Jacobi, Gauss-Seidel, SOR 43.07.01 returns as the preconditioner: a splitting $A = M - N$ that was a weak solver on its own — its spectral radius creeping to one as the grid refines — becomes the operator $M^{- 1}$ that GMRES applies to cluster $σ (M^{- 1} A)$ near one or shrink its field of values, converting the slow $1 - O (h)$ convergence of even optimal SOR into the handful of stages a well-clustered GMRES needs. The clustering heuristic of Theorem 2 is precisely the design target of that preconditioning.

Historical & philosophical context Master

GMRES was introduced by Yousef Saad and Martin H. Schultz in 1986, in a paper that gave the nonsymmetric linear-system problem its first robust minimal-residual Krylov method with a guaranteed non-increasing residual and finite termination ^{[Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems]}. It superseded a generation of earlier nonsymmetric attempts — the unstable variants of the biconjugate gradient family and the ORTHODIR/ORTHOMIN methods — by anchoring the iterate to the numerically reliable Arnoldi orthogonalisation rather than to the short but fragile two-sided recurrences. The restart device GMRES $(k)$ appeared in the same paper as the concession to finite memory, and the stagnation it can suffer has driven the subsequent literature on deflation, augmentation, and flexible preconditioning.

The convergence theory matured more slowly and more surprisingly. The naive expectation — inherited from the symmetric conjugate-gradient case — was that clustered eigenvalues away from the origin guarantee fast convergence. Anne Greenbaum, Vlastimil Pták, and Zdeněk Strakoš showed in 1996 that this expectation is false for nonsymmetric matrices in the strongest possible sense: any non-increasing residual curve is attainable by some matrix with any prescribed spectrum, so the eigenvalues alone carry no information about the GMRES history ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}. This redirected the analysis toward the field of values, for which Howard Elman's 1982 thesis bound and later refinements give genuine rates, and toward the pseudospectra developed by Lloyd N. Trefethen and Mark Embree, whose contour-integral estimates bound the residual polynomial on a region enclosing the spectrum at a controlled distance from the origin ^{[Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997)]}. The textbook consolidation of the algorithm, its preconditioned and restarted forms, and the polynomial-approximation convergence framework is in Saad's Iterative Methods for Sparse Linear Systems ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Bibliography Master

@article{SaadSchultz1986,
  author  = {Saad, Yousef and Schultz, Martin H.},
  title   = {{GMRES}: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems},
  journal = {SIAM Journal on Scientific and Statistical Computing},
  volume  = {7},
  number  = {3},
  year    = {1986},
  pages   = {856--869}
}

@book{Saad2003,
  author    = {Saad, Yousef},
  title     = {Iterative Methods for Sparse Linear Systems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2003}
}

@book{Greenbaum1997,
  author    = {Greenbaum, Anne},
  title     = {Iterative Methods for Solving Linear Systems},
  series    = {Frontiers in Applied Mathematics},
  publisher = {SIAM},
  year      = {1997}
}

@article{GreenbaumPtakStrakos1996,
  author  = {Greenbaum, Anne and Pt{\'a}k, Vlastimil and Strako{\v{s}}, Zden{\v{e}}k},
  title   = {Any Nonincreasing Convergence Curve Is Possible for {GMRES}},
  journal = {SIAM Journal on Matrix Analysis and Applications},
  volume  = {17},
  number  = {3},
  year    = {1996},
  pages   = {465--469}
}

@book{TrefethenEmbree2005,
  author    = {Trefethen, Lloyd N. and Embree, Mark},
  title     = {Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators},
  publisher = {Princeton University Press},
  year      = {2005}
}

@book{TrefethenBau1997,
  author    = {Trefethen, Lloyd N. and Bau, David},
  title     = {Numerical Linear Algebra},
  publisher = {SIAM},
  address   = {Philadelphia},
  year      = {1997}
}

@phdthesis{Elman1982,
  author = {Elman, Howard C.},
  title  = {Iterative Methods for Large, Sparse, Nonsymmetric Systems of Linear Equations},
  school = {Yale University},
  year   = {1982}
}

Prerequisites

43.07.02
43.04.01

Tier anchors

beginner: Finding the best answer reachable from a single matrix-times-vector engine by making the leftover error as small as it can be at every stage — Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) §11.3 (iterative methods and least squares); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 35 (the GMRES idea — minimise the residual over a Krylov subspace)
intermediate: Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 35 (GMRES: the Arnoldi-based least-squares formulation $\min_y \|\,\|b\|e_1 - \tilde H_m y\,\|$, Givens-rotation solution, residual norm available without forming $x_m$); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) §6.5 (the GMRES algorithm, restarting, breakdown)
master: Saad & Schultz 1986 *SIAM J. Sci. Stat. Comput.* 7(3):856-869 (the original GMRES paper — optimality, finite termination, the minimal-residual property); Greenbaum 1997 *Iterative Methods for Solving Linear Systems* (SIAM) Ch. 3 (convergence via polynomial approximation, the non-normal subtlety); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) §6.5-6.11 and §6.30 (restarted GMRES, stagnation, field-of-values bounds)

References

Trefethen, L. N. & Bau, D. — Numerical Linear Algebra (SIAM, 1997) · Lecture 35: GMRES as the Krylov method minimising ||b - A x|| over x in x_0 + K_m; the Arnoldi relation A Q_m = Q_{m+1} H-tilde_m turning the minimisation into the (m+1)-by-m least-squares problem min_y || ||b|| e_1 - H-tilde_m y ||; the solution by Givens rotations applied to the Hessenberg matrix, the residual norm read off the last rotated component without forming x_m; the monotone non-increase of the residual and convergence in at most n steps in exact arithmetic.
Saad, Y. & Schultz, M. H. — GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems · SIAM J. Sci. Stat. Comput. 7(3):856-869, 1986. The original GMRES paper: the minimal-residual property over the Krylov subspace, the Arnoldi-based implementation, the breakdown-equals-convergence (lucky breakdown) theorem, finite termination at the grade, and the restarted GMRES(m) variant introduced to bound storage and orthogonalisation cost.
Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.) · SIAM, 2003. Ch. 6, §6.5: the GMRES algorithm in full, the residual-polynomial characterisation ||r_m|| = min over p in P_m with p(0)=1 of ||p(A) r_0||, the diagonalisable convergence bound ||r_m||/||r_0|| <= kappa(V) min_p max_{lambda in sigma(A)} |p(lambda)|, restarting GMRES(m), and the stagnation of restarted GMRES; §6.11 right/left preconditioning.
Greenbaum, A. — Iterative Methods for Solving Linear Systems (SIAM, 1997) · Ch. 3 and Ch. 6: the GMRES convergence theory via min-max polynomial approximation on the spectrum, the eigenvalue-clustering heuristic and its failure for highly non-normal matrices, the role of the field of values and pseudospectra, the Greenbaum-Pták-Strakoš result that any non-increasing residual curve is attainable by a matrix with prescribed spectrum, and the contrast with the spectrum-determined CG convergence for symmetric positive-definite A.

Estimated time

beginner: 18m
intermediate: 45m
master: 88m