43.07.01 · numerical-analysis / 07-iterative-krylov-methods

Stationary iterative methods: Jacobi, Gauss-Seidel, SOR

shipped3 tiersLean: none

Anchor (Master): Young 1971 *Iterative Solution of Large Linear Systems* (Academic Press) Ch. 3-6 (the SOR theory, consistently ordered matrices, the optimal omega for the model problem); Varga 2000 *Matrix Iterative Analysis* 2e (Springer) Ch. 3-4 (regular splittings, the Stein-Rosenberg theorem, the SOR analysis); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 4

Intuition Beginner

Solving a large system of equations by elimination, the direct method of 43.03.01, works on every entry of the matrix and costs an amount of arithmetic that grows like the cube of the size. For the enormous, mostly-empty matrices that come from physics simulations — millions of unknowns, but each equation touching only a handful of neighbours — that cubic cost and the memory it fills become impossible. There is a different idea: instead of computing the answer outright, guess it, then repeatedly nudge the guess closer.

The nudge has a simple shape. Split the matrix into an easy part you can solve quickly and a leftover part. At each round, you pretend the leftover acts on your current guess, move it to the right-hand side, and solve the easy part to get a better guess. The easy part is chosen so that solving it is cheap — just the diagonal, or a triangle. You repeat this until the guess stops changing.

Whether the guesses actually march toward the true answer, or wander off, comes down to a single number that measures how much the leftover part amplifies an error each round. If every round shrinks the error, you converge; if some round can grow it, you diverge. That amplification number is the spectral radius you met in 01.01.08.

The three classic recipes differ only in which easy part they pick. Jacobi uses just the diagonal. Gauss-Seidel uses the lower triangle, reusing each freshly updated value right away. Over-relaxation pushes each Gauss-Seidel correction a little further than it asks for, and with the right amount of push, it converges dramatically faster.

Visual Beginner

The picture is a guess being pulled, round after round, toward the true solution, with the size of each pull set by how much the leftover part of the matrix amplifies the current error.

Read the table top to bottom. Each row is one round. The error — the gap between your guess and the truth — is multiplied by the same amplification factor every round. If that factor is below one, the gap shrinks geometrically and you converge; if it is above one, the gap grows and you diverge.

round	error size (factor $0.5$ )	error size (factor $0.9$ )	error size (factor $1.1$ )
start	$1.000$	$1.000$	$1.000$
1	$0.500$	$0.900$	$1.100$
2	$0.250$	$0.810$	$1.210$
3	$0.125$	$0.729$	$1.331$
10	$0.001$	$0.349$	$2.594$

The takeaway: convergence is a yes-or-no question decided by one number below or above one, and the speed is set by how far below one it sits. A factor of $0.5$ gains a digit of accuracy every few rounds; a factor of $0.9$ crawls; a factor above one is hopeless.

Worked example Beginner

Solve the small system $$ \begin{aligned} 4x + y &= 5,\ x + 3y &= 4, \end{aligned} $$ whose exact answer is $x = 1$ , $y = 1$ . We use the Jacobi recipe: solve each equation for its own diagonal unknown, using the other unknown from the previous round.

Rearranged, the rules are $x = (5 - y) /4$ and $y = (4 - x) /3$ . Start from the guess $x_{0} = 0$ , $y_{0} = 0$ .

Round 1. Use the old values on the right: $x_{1} = (5 - 0) /4 = 1.25$ and $y_{1} = (4 - 0) /3 = 1.333$ .

Round 2. Use round-1 values: $x_{2} = (5 - 1.333) /4 = 0.917$ and $y_{2} = (4 - 1.25) /3 = 0.917$ .

Round 3. Use round-2 values: $x_{3} = (5 - 0.917) /4 = 1.021$ and $y_{3} = (4 - 0.917) /3 = 1.028$ .

Round 4. Use round-3 values: $x_{4} = (5 - 1.028) /4 = 0.993$ and $y_{4} = (4 - 1.021) /3 = 0.993$ .

The guesses are closing in on $(1, 1)$ , overshooting then undershooting, with the gap roughly a third the size each round. That one-third is the amplification factor of this system under Jacobi.

What this tells us: we never touched the matrix as a whole, never did elimination. We solved two one-variable updates per round, repeated, and watched the error fall by a steady factor. For a system this size that is silly; for a system with a million unknowns and only a few entries per row, it is the only thing that works.

Check your understanding Beginner

Formal definition Intermediate+

Let $A \in F^{n \times n}$ with $F \in {R, C}$ be nonsingular, and consider the system $A x = b$ . A splitting of $A$ is a decomposition $$ A = M - N, $$ with $M$ nonsingular and chosen so that linear systems $M z = c$ are cheap to solve. Substituting $A = M - N$ into $A x = b$ gives $M x = N x + b$ , which suggests the stationary iteration $$ M x_{k+1} = N x_k + b, \qquad \text{equivalently} \qquad x_{k+1} = M^{-1} N x_k + M^{-1} b. $$ The matrix $G = M^{- 1} N$ is the iteration matrix; the map is "stationary" because $G$ and the constant vector $M^{- 1} b$ do not change from step to step. A fixed point $x_{⋆}$ of the iteration satisfies $x_{⋆} = M^{- 1} N x_{⋆} + M^{- 1} b$ , i.e. $M x_{⋆} = N x_{⋆} + b$ , i.e. $A x_{⋆} = b$ : the fixed point is exactly the solution of the system ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

The error recursion. Subtracting $x_{⋆} = G x_{⋆} + M^{- 1} b$ from $x_{k + 1} = G x_{k} + M^{- 1} b$ gives the homogeneous recursion for the error $e_{k} = x_{k} - x_{⋆}$ , $$ e_{k+1} = G e_k, \qquad \text{so} \qquad e_k = G^k e_0. $$ Convergence for every $e_{0}$ is therefore the statement $G^{k} \to 0$ , which is controlled by the spectral radius $ρ (G) = max {∣ λ ∣ : λ \in σ (G)}$ of 01.01.08.

The three classical splittings. Write $A = D - L - U$ , where $D$ is the diagonal of $A$ , $- L$ its strictly-lower-triangular part, and $- U$ its strictly-upper-triangular part (so $L, U$ have nonnegative-indexed entries with the sign absorbed). Assume $D$ nonsingular.

Jacobi: $M = D$ , $N = L + U$ . Iteration matrix $G_{J} = D^{- 1} (L + U)$ . Each component is updated independently from the others' previous values.
Gauss-Seidel: $M = D - L$ , $N = U$ . Iteration matrix $G_{GS} = (D - L)^{- 1} U$ . The lower-triangular $M$ lets each component use the already-updated earlier components within the same sweep.
Successive over-relaxation (SOR): introduce a relaxation parameter $ω$ and split $M = \frac{1}{ω} D - L$ , $N = (\frac{1}{ω} - 1) D + U$ , giving $G_{ω} = (D - ω L)^{- 1} [(1 - ω) D + ω U] .$ The SOR step takes the Gauss-Seidel correction and scales it by $ω$ : $ω = 1$ recovers Gauss-Seidel, $ω > 1$ over-relaxes (pushes further), $0 < ω < 1$ under-relaxes.

Asymptotic convergence rate. When $ρ (G) < 1$ , the error norm eventually decays like $ρ (G)^{k}$ , so reducing the error by a factor of $10$ takes about $1/ (- lo g_{10} ρ (G))$ iterations. The quantity $- lo g_{10} ρ (G)$ is the asymptotic rate of convergence; it measures the digits of accuracy gained per iteration, and is the right figure of merit for comparing splittings.

Counterexamples to common slips

A small norm of $G$ is sufficient but not necessary; the spectral radius is the exact criterion. A matrix can have $∥ G ∥ > 1$ in every standard operator norm yet $ρ (G) < 1$ , in which case the iteration still converges (after a possible transient growth). Convergence is governed by $ρ (G)$ , not by any single norm.
Gauss-Seidel is not always better than Jacobi. For symmetric positive-definite matrices Gauss-Seidel beats Jacobi, and for a large class (consistently ordered matrices) $ρ (G_{GS}) = ρ (G_{J})^{2}$ , but there exist matrices for which Jacobi converges and Gauss-Seidel diverges, and vice versa. The Stein-Rosenberg theorem describes when they agree.
Over-relaxation can hurt. SOR accelerates only for $ω$ in a window; outside $0 < ω < 2$ the method diverges for SPD matrices (Kahan's necessary condition $ρ (G_{ω}) \geq ∣ ω - 1∣$ ), and even inside that window a poorly chosen $ω$ can be slower than plain Gauss-Seidel.
The diagonal must be nonsingular. Jacobi and the others divide by the diagonal entries; a zero on the diagonal makes $D$ singular and $M^{- 1}$ undefined. A symmetric permutation may be needed first to bring nonzero entries onto the diagonal.

Key theorem with proof Intermediate+

The signature result identifies convergence of every stationary iteration with a single spectral inequality, and reads the speed off the same quantity.

Theorem (spectral-radius convergence criterion). Let $A = M - N$ be a splitting with $M$ nonsingular, iteration matrix $G = M^{- 1} N$ , and let $x_{⋆}$ solve $A x = b$ . The stationary iteration $x_{k + 1} = G x_{k} + M^{- 1} b$ converges to $x_{⋆}$ for every initial vector $x_{0}$ if and only if $ρ (G) < 1$ . When it converges, the error satisfies $lim sup_{k \to \infty} ∥ e_{k} ∥^{1/ k} = ρ (G)$ , so the asymptotic per-step reduction factor is $ρ (G)$ and the rate of convergence is $- lo g_{10} ρ (G)$ ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Proof. The error obeys $e_{k} = G^{k} e_{0}$ , so convergence for every $x_{0}$ — equivalently every $e_{0}$ — is the statement that $G^{k} \to 0$ as $k \to \infty$ .

Assume first $ρ (G) < 1$ . Put $G$ in Jordan form $G = S J S^{- 1}$ , so $G^{k} = S J^{k} S^{- 1}$ . Each Jordan block for an eigenvalue $λ$ has $k$ -th power with entries that are products of $(j k) λ^{k - j}$ ; since $∣ λ ∣ \leq ρ (G) < 1$ , every such entry tends to $0$ because the geometric decay $∣ λ ∣^{k - j}$ defeats the polynomial growth $(j k)$ . Hence $J^{k} \to 0$ , so $G^{k} \to 0$ and $e_{k} \to 0$ .

Conversely, suppose $ρ (G) \geq 1$ . Choose an eigenpair $G v = λ v$ with $∣ λ ∣ = ρ (G) \geq 1$ and take $e_{0} = v$ . Then $e_{k} = G^{k} v = λ^{k} v$ , whose norm $∣ λ ∣^{k} ∥ v ∥$ does not tend to $0$ . So the iteration fails to converge for this starting error, and the criterion is sharp.

For the rate, apply Gelfand's formula $ρ (G) = lim_{k \to \infty} ∥ G^{k} ∥^{1/ k}$ , valid in any submultiplicative norm. From $e_{k} = G^{k} e_{0}$ one has $∥ e_{k} ∥ \leq ∥ G^{k} ∥ ∥ e_{0} ∥$ , so $lim sup_{k} ∥ e_{k} ∥^{1/ k} \leq lim_{k} ∥ G^{k} ∥^{1/ k} = ρ (G)$ ; and taking $e_{0}$ a dominant eigenvector gives equality, so $lim sup_{k} ∥ e_{k} ∥^{1/ k} = ρ (G)$ . Thus $∥ e_{k} ∥ \approx ρ (G)^{k} ∥ e_{0} ∥$ asymptotically, and the number of iterations to gain one decimal digit is $1/ (- lo g_{10} ρ (G))$ . $□$

Bridge. This theorem is the foundational reason a splitting can be trusted as a solver: it reduces the entire question of convergence to the single scalar $ρ (M^{- 1} N)$ , and this is exactly the role the growth factor played for direct elimination in 43.03.01 — one number that decides whether the method is safe and how fast it runs. The result builds toward the classical convergence guarantees of the Advanced results — strict diagonal dominance forcing $ρ (G_{J}) < 1$ , and positive-definiteness forcing $ρ (G_{ω}) < 1$ for $0 < ω < 2$ — each of which is a sufficient structural condition that pins this spectral radius below one without computing it. It appears again in 43.07.02 and the Krylov methods that follow, where the same spectral picture, applied now to the residual polynomial $p (A)$ rather than to a fixed iteration matrix, governs convergence; the central insight is that an iterative solver is a dynamical system whose contraction rate is a spectral radius, and the foundational reason stationary methods are slow is that this radius creeps toward one as the problem grows. Putting these together, the splitting supplies the cheap solve and the spectral radius supplies the speed, and the bridge is that making the radius small — by relaxation, or later by building an optimal polynomial in $A$ — is the whole game of iterative linear algebra.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Show that any fixed point of the stationary iteration $x_{k + 1} = M^{- 1} N x_{k} + M^{- 1} b$ solves $A x = b$ , and conversely. Conclude that if the iteration converges, its limit is the exact solution regardless of $x_{0}$ .

Hint

A fixed point satisfies $x_{⋆} = M^{- 1} N x_{⋆} + M^{- 1} b$ . Multiply through by $M$ and use $A = M - N$ .

Answer

If $x_{⋆}$ is a fixed point, $x_{⋆} = M^{- 1} N x_{⋆} + M^{- 1} b$ ; multiplying by $M$ gives $M x_{⋆} = N x_{⋆} + b$ , hence $(M - N) x_{⋆} = b$ , i.e. $A x_{⋆} = b$ . Conversely, if $A x_{⋆} = b$ then $(M - N) x_{⋆} = b$ , so $M x_{⋆} = N x_{⋆} + b$ and $x_{⋆} = M^{- 1} N x_{⋆} + M^{- 1} b$ is a fixed point. Since $A$ is nonsingular the solution is unique, so there is exactly one fixed point. If the iterates converge, the limit $x_{\infty}$ satisfies the fixed-point equation by continuity of the affine map, so $x_{\infty} = x_{⋆}$ no matter where $x_{0}$ started — the method has no spurious limits.

Exercise 6 (medium, symbolic).

Prove that the SOR iteration matrix satisfies $ρ (G_{ω}) \geq ∣ ω - 1∣$ for every $ω$ (Kahan's bound), and conclude that $0 < ω < 2$ is necessary for convergence.

Hint

The determinant of $G_{ω}$ equals the product of its eigenvalues. Compute $det G_{ω}$ from $G_{ω} = (D - ω L)^{- 1} [(1 - ω) D + ω U]$ , using that $D - ω L$ is lower triangular and $(1 - ω) D + ω U$ is upper triangular.

Answer

$det G_{ω} = \frac{det [( 1 - ω ) D + ω U ]}{det ( D - ω L )}$ . The matrix $(1 - ω) D + ω U$ is upper triangular with diagonal $(1 - ω) d_{ii}$ , so its determinant is $(1 - ω)^{n} \prod d_{ii}$ . The matrix $D - ω L$ is lower triangular with diagonal $d_{ii}$ , determinant $\prod d_{ii}$ . Hence $det G_{ω} = (1 - ω)^{n}$ . The spectral radius bounds the geometric mean of the eigenvalue moduli: $ρ (G_{ω})^{n} \geq ∣ det G_{ω} ∣ = ∣1 - ω ∣^{n}$ , so $ρ (G_{ω}) \geq ∣ ω - 1∣$ . For convergence we need $ρ (G_{ω}) < 1$ , which forces $∣ ω - 1∣ < 1$ , i.e. $0 < ω < 2$ . This is necessary; sufficiency requires extra structure such as positive-definiteness.

Exercise 7 (hard, symbolic).

Prove that if $A$ is strictly diagonally dominant by rows — $∣ a_{ii} ∣ > \sum_{j \neq = i} ∣ a_{ij} ∣$ for every $i$ — then the Jacobi iteration converges, i.e. $ρ (G_{J}) < 1$ .

Hint

The entries of $G_{J} = D^{- 1} (L + U)$ are $- a_{ij} / a_{ii}$ off the diagonal and $0$ on it. Bound $∥ G_{J} ∥_{\infty}$ , the maximum absolute row sum, and use $ρ (G_{J}) \leq ∥ G_{J} ∥_{\infty}$ .

Answer

The iteration matrix has entries $(G_{J})_{ij} = - a_{ij} / a_{ii}$ for $j \neq = i$ and $(G_{J})_{ii} = 0$ . The maximum absolute row sum is $$ |G_J|\infty = \max_i \sum{j \ne i} \frac{|a_{ij}|}{|a_{ii}|} = \max_i \frac{1}{|a_{ii}|}\sum_{j\ne i}|a_{ij}|. $$ Strict row diagonal dominance says $\sum_{j \neq = i} ∣ a_{ij} ∣ < ∣ a_{ii} ∣$ for every $i$ , so each ratio is strictly below $1$ , and the maximum over the finitely many rows is some number $< 1$ . Since the spectral radius is bounded above by any induced operator norm, $ρ (G_{J}) \leq ∥ G_{J} ∥_{\infty} < 1$ , so Jacobi converges. The same bound, applied to the lower-triangular Gauss-Seidel iteration matrix, also yields $ρ (G_{GS}) < 1$ under strict diagonal dominance.

Exercise 8 (hard, symbolic).

For a consistently ordered matrix with $ρ (G_{J}) = β < 1$ , the SOR eigenvalues satisfy Young's functional equation $(λ + ω - 1)^{2} = λ ω^{2} β^{2}$ . Use it to derive the optimal parameter $ω_{opt} = 2/ (1 + 1 - β^{2})$ and show $ρ (G_{ω_{opt}}) = ω_{opt} - 1$ .

Hint

Treat the functional equation as a quadratic in $λ$ . The two roots collide (the optimal $ω$ minimises the larger root) exactly when the discriminant vanishes.

Answer

Write $μ = λ$ . The equation $(λ + ω - 1)^{2} = λ ω^{2} β^{2}$ becomes $λ + ω - 1 = \pm μ ω β$ , i.e. $μ^{2} \mp ω β μ + (ω - 1) = 0$ , a quadratic in $μ$ with roots $μ = \frac{1}{2} (ω β \pm ω^{2} β^{2} - 4 (ω - 1))$ . As $ω$ increases from $1$ , the larger $∣ λ ∣ = μ^{2}$ decreases until the discriminant $ω^{2} β^{2} - 4 (ω - 1)$ hits zero; beyond that the roots are complex with $∣ λ ∣ = ω - 1$ , which then increases. The minimum of $ρ (G_{ω})$ is at the collision, where $ω^{2} β^{2} - 4 ω + 4 = 0$ . Solving this quadratic in $ω$ , $ω = \frac{4 - 16 - 16 β ^{2}}{2 β ^{2}} = \frac{2 ( 1 - 1 - β ^{2} )}{β ^{2}}$ , and rationalising the numerator gives $ω_{opt} = \frac{2}{1 + 1 - β ^{2}}$ . At that $ω$ the discriminant vanishes so both roots have $∣ λ ∣ = ω_{opt} - 1$ , hence $ρ (G_{ω_{opt}}) = ω_{opt} - 1$ .

Advanced results Master

Theorem 1 (convergence under strict diagonal dominance). If $A \in F^{n \times n}$ is strictly diagonally dominant — by rows or by columns — then both the Jacobi and the Gauss-Seidel iterations converge: $ρ (G_{J}) < 1$ and $ρ (G_{GS}) < 1$ , for every right-hand side and every starting vector. For Jacobi the bound $ρ (G_{J}) \leq ∥ G_{J} ∥_{\infty} < 1$ is immediate from the row-sum estimate. For Gauss-Seidel one shows that any eigenvalue $λ$ of $G_{GS}$ with $∣ λ ∣ \geq 1$ would force the singular matrix $D - L - λ^{- 1} \dots$ — more cleanly, that $λ (D - L) - λ U$ inherits strict diagonal dominance and hence nonsingularity for $∣ λ ∣ \geq 1$ , contradicting $det (G_{GS} - λ I) = 0$ . Diagonal dominance is the most common structural hypothesis under which the classical iterations are guaranteed to work, and it holds automatically for the diffusion and Markov-chain matrices that motivated the methods ^{[Varga, R. S. — Matrix Iterative Analysis (2nd ed.)]}.

Theorem 2 (the Householder-John theorem; SPD convergence of SOR and Gauss-Seidel). Let $A$ be Hermitian positive-definite with splitting $A = M - N$ , $M$ nonsingular. If the matrix $M + M^ - A = M^* + N $i s p os i t i v e - d e f ini t e, t h e n$ \rho(M^{-1}N) < 1 $. C o n se q u e n tl y, f or H er mi t ian p os i t i v e - d e f ini t e$ A $, t h e S O R i t er a t i o n co n v er g es f or e v er y$ \omega \in (0, 2) $— in p a r t i c u l a r G a u ss - S e i d e l ($ \omega = 1 $) co n v er g es . * T h e p r oo f u ses t h ee n er g y n or m$ |v|_A^2 = v^Av $: o n es h o w s$ |G|_A < 1 $b y co m p u t in g, f or t h eer r or ma p,$ |Gv|_A^2 - |v|_A^2 = -w^(M + M^* - A)w $w i t h$ w = M^{-1}Av $, w hi c hi ss t r i c tl y n e g a t i v e w h e n$ M + M^* - A \succ 0 $an d$ v \ne 0 $. F or S O R,$ M + M^* - A = (\tfrac{2}{\omega} - 1)D $, p os i t i v e - d e f ini t ee x a c tl y w h e n$ 0 < \omega < 2$, recovering Kahan's window as both necessary and — under positive-definiteness — sufficient ^{[Varga, R. S. — Matrix Iterative Analysis (2nd ed.)]}.

Theorem 3 (Stein-Rosenberg; Jacobi versus Gauss-Seidel). Let the Jacobi iteration matrix $G_{J} = D^{- 1} (L + U)$ be nonnegative (as when $A$ has nonpositive off-diagonal entries and positive diagonal, an M-matrix). Then exactly one of the following holds: (i) $ρ (G_{J}) = ρ (G_{GS}) = 0$ ; (ii) $0 < ρ (G_{GS}) < ρ (G_{J}) < 1$ ; (iii) $ρ (G_{J}) = ρ (G_{GS}) = 1$ ; (iv) $1 < ρ (G_{J}) < ρ (G_{GS})$ . Cases (ii) and (iv) say Jacobi and Gauss-Seidel converge or diverge together, and when they converge Gauss-Seidel is strictly faster. For consistently ordered matrices the relation sharpens to the exact identity $ρ (G_{GS}) = ρ (G_{J})^{2}$ , so Gauss-Seidel gains digits at exactly twice the Jacobi rate ^{[Varga, R. S. — Matrix Iterative Analysis (2nd ed.)]}.

Theorem 4 (optimal SOR for the model problem and the order-of-magnitude speedup). Let $A$ be consistently ordered with $ρ (G_{J}) = β < 1$ . The SOR spectral radius $ρ (G_{ω})$ is minimised at $$ \omega_{\mathrm{opt}} = \frac{2}{1 + \sqrt{1 - \beta^2}}, \qquad \rho(G_{\omega_{\mathrm{opt}}}) = \omega_{\mathrm{opt}} - 1 = \frac{1 - \sqrt{1 - \beta^2}}{1 + \sqrt{1 - \beta^2}}. $$ For the model Poisson problem on an $N \times N$ grid, $β = cos (π h)$ with $h = 1/ (N + 1)$ , so $ρ (G_{J}) = 1 - \frac{1}{2} π^{2} h^{2} + O (h^{4})$ , $ρ (G_{GS}) = β^{2} = 1 - π^{2} h^{2} + O (h^{4})$ , but $ρ (G_{ω_{opt}}) = 1 - 2 π h + O (h^{2})$ . The decisive change is in the power of $h$ : Jacobi and Gauss-Seidel have spectral radius $1 - O (h^{2})$ , requiring $O (h^{- 2}) = O (N^{2})$ iterations to converge, whereas optimal SOR has radius $1 - O (h)$ , requiring only $O (h^{- 1}) = O (N)$ iterations. On a grid with $N = 1000$ this is the difference between roughly a million and roughly a thousand iterations — the order-of-magnitude acceleration that made SOR the workhorse iterative solver of the 1950s-70s before Krylov methods and multigrid ^{[Young, D. M. — Iterative Solution of Large Linear Systems]}.

Synthesis. The stationary iterations are one construction — a splitting $A = M - N$ and the affine map $x \mapsto M^{- 1} (N x + b)$ — viewed under one invariant, and the spectral radius $ρ (M^{- 1} N)$ is the foundational reason any of them works or fails: convergence is exactly $ρ < 1$ , the speed is exactly $- lo g ρ$ , and every named method is a choice of $M$ that trades cost-per-step against this radius. This is exactly the structure of 43.03.01, where a single scalar — there the growth factor, here the spectral radius — decides whether a linear solver is safe and how accurately it performs, and the central insight is that an iterative method is a discrete dynamical system whose fixed point is the solution and whose contraction rate is a spectral radius. The classical theory is the catalogue of structural hypotheses that pin that radius below one without computing it: strict diagonal dominance (Theorem 1), positive-definiteness via the energy norm (the Householder-John Theorem 2), and the nonnegativity that lets Stein-Rosenberg (Theorem 3) rank Jacobi against Gauss-Seidel.

The relaxation parameter generalises the splitting from a fixed choice to a one-parameter family, and optimising it (Theorem 4) is dual to reshaping the spectrum: $ω_{opt}$ collides the SOR eigenvalues to convert a radius $1 - O (h^{2})$ into $1 - O (h)$ , the same order-of-magnitude lever that polynomial acceleration and preconditioning will pull again. Putting these together, the splitting supplies the cheap solve and the spectral radius supplies the verdict, and this same dynamical picture builds toward the Krylov methods. The bridge to the rest of the chapter is that as the grid refines, $ρ \to 1$ for every stationary method, so the iteration count grows with the problem size; this defect appears again in 43.07.02, where the Krylov-subspace methods, and the conjugate gradient and GMRES algorithms built on them, remove it by constructing at each step the optimal polynomial in $A$ rather than re-applying a fixed iteration matrix.

Full proof set Master

Proposition 1 (spectral-radius convergence criterion). The stationary iteration $x_{k + 1} = G x_{k} + c$ with $G = M^{- 1} N$ , $c = M^{- 1} b$ converges to the unique fixed point for every $x_{0}$ if and only if $ρ (G) < 1$ .

Proof. The fixed point $x_{⋆}$ exists and is unique because $I - G = M^{- 1} (M - N) = M^{- 1} A$ is nonsingular, so $x_{⋆} = (I - G)^{- 1} c$ solves $x = G x + c$ . The error $e_{k} = x_{k} - x_{⋆}$ satisfies $e_{k + 1} = G e_{k}$ , hence $e_{k} = G^{k} e_{0}$ . Convergence for every $x_{0}$ is the statement $G^{k} \to 0$ . If $ρ (G) < 1$ , write $G = S J S^{- 1}$ in Jordan form; a Jordan block of size $m$ for eigenvalue $λ$ has $(J_{λ}^{k})_{ij} = (j - i k) λ^{k - (j - i)}$ for $j \geq i$ , and since $∣ λ ∣ < 1$ the factor $∣ λ ∣^{k - (j - i)}$ decays geometrically while $(j - i k)$ grows only polynomially, so each entry $\to 0$ ; thus $J^{k} \to 0$ and $G^{k} = S J^{k} S^{- 1} \to 0$ . Conversely, if $ρ (G) \geq 1$ , take an eigenpair $G v = λ v$ with $∣ λ ∣ = ρ (G) \geq 1$ ; then $e_{0} = v$ gives $e_{k} = λ^{k} v$ with $∥ e_{k} ∥ = ∣ λ ∣^{k} ∥ v ∥ \neq \to 0$ , so convergence fails. $□$

Proposition 2 (asymptotic convergence factor). When $ρ (G) < 1$ , $lim sup_{k \to \infty} ∥ e_{k} ∥^{1/ k} = ρ (G)$ , and the number of iterations to reduce the error by a factor $10$ is asymptotically $1/ (- lo g_{10} ρ (G))$ .

Proof. By Gelfand's formula, $ρ (G) = lim_{k \to \infty} ∥ G^{k} ∥^{1/ k}$ in any submultiplicative matrix norm. From $e_{k} = G^{k} e_{0}$ , $∥ e_{k} ∥^{1/ k} \leq ∥ G^{k} ∥^{1/ k} ∥ e_{0} ∥^{1/ k}$ , and $∥ e_{0} ∥^{1/ k} \to 1$ , so $lim sup_{k} ∥ e_{k} ∥^{1/ k} \leq ρ (G)$ . For the reverse, pick $e_{0}$ a unit eigenvector for an eigenvalue $λ$ of modulus $ρ (G)$ ; then $∥ e_{k} ∥ = ρ (G)^{k}$ , giving $∥ e_{k} ∥^{1/ k} = ρ (G)$ and hence $lim sup = ρ (G)$ . Thus asymptotically $∥ e_{k} ∥ \sim ρ (G)^{k}$ , and solving $ρ (G)^{k} = 1 0^{- 1}$ for $k$ gives $k = 1/ (- lo g_{10} ρ (G))$ . $□$

Proposition 3 (Householder-John SPD convergence). Let $A = A^ \succ 0 $an d$ A = M - N $w i t h$ M $n o n s in g u l a r . I f$ M + M^* - A \succ 0 $, t h e n$ \rho(M^{-1}N) < 1$.*

Proof. Work in the energy inner product $⟨ u, v ⟩_{A} = v^{*} A u$ , with norm $∥ v ∥_{A}^{2} = v^{*} A v > 0$ for $v \neq = 0$ . The iteration matrix is $G = M^{- 1} N = M^{- 1} (M - A) = I - M^{- 1} A$ . For any $v \neq = 0$ , set $w = M^{- 1} A v$ , so $G v = v - w$ . Then $$ |Gv|_A^2 = (v-w)^*A(v-w) = v^*Av - v^*Aw - w^*Av + w^Aw. $$ From $w = M^{- 1} A v$ one has $A v = M w$ and, taking adjoints with $A = A^ $,$ v^*A = w^M^ $. H e n ce$ v^*Aw = w^*M^*w $an d$ w^*Av = w^*Mw$, so $$ |Gv|_A^2 - |v|_A^2 = -w^*M^w - w^Mw + w^Aw = -w^(M + M^ - A)w. $$ By hypothesis $M + M^ - A \succ 0 $, an d$ w = M^{-1}Av \ne 0 $w h e n$ v \ne 0 $(a s$ M, A $n o n s in g u l a r), so t h er i g h t s i d e i ss t r i c tl y n e g a t i v e :$ |Gv|_A < |v|_A $. T h u s$ |G|_A < 1 $in t h ee n er g y - in d u ce d o p er a t or n or m, an d$ \rho(G) \le |G|_A < 1 $.$ \square$

Proposition 4 (SOR convergence for SPD $A$ , $0 < ω < 2$ ). For Hermitian positive-definite $A$ with diagonal $D$ and the SOR splitting $M = \frac{1}{ω} D - L$ , the iteration converges if and only if $0 < ω < 2$ .

Proof. Necessity is Kahan's bound: $det G_{ω} = (1 - ω)^{n}$ (Exercise 6), so $ρ (G_{ω}) \geq ∣1 - ω ∣$ , forcing $∣1 - ω ∣ < 1$ . For sufficiency, apply Proposition 3. With $A = A^{*} ≻ 0$ , the diagonal $D$ is real positive-definite, and since $A = D - L - U$ with $U = L^{*}$ (Hermitian), $M = \frac{1}{ω} D - L$ gives $$ M + M^* - A = \Big(\tfrac1\omega D - L\Big) + \Big(\tfrac1\omega D - L^*\Big) - (D - L - L^*) = \Big(\tfrac2\omega - 1\Big)D. $$ This is positive-definite exactly when $\frac{2}{ω} - 1 > 0$ , i.e. $0 < ω < 2$ . By Proposition 3, $ρ (G_{ω}) < 1$ there. Combining, the SOR iteration on an SPD matrix converges precisely for $ω \in (0, 2)$ . $□$

Proposition 5 (optimal $ω$ for consistently ordered matrices). Let $A$ be consistently ordered with $ρ (G_{J}) = β \in [0, 1)$ . Then $ρ (G_{ω})$ is minimised at $ω_{opt} = 2/ (1 + 1 - β^{2})$ , where $ρ (G_{ω_{opt}}) = ω_{opt} - 1$ .

Proof. Consistent ordering yields Young's functional equation: $λ$ is an eigenvalue of $G_{ω}$ if and only if $(λ + ω - 1)^{2} = λ ω^{2} β^{2}$ for some eigenvalue $\pm β$ of $G_{J}$ (the equation depends only on $β^{2}$ ). Setting $μ = λ$ gives $μ^{2} - ω β μ + (ω - 1) = 0$ (choosing the branch that maximises $∣ λ ∣$ ), with roots $μ_{\pm} = \frac{1}{2} (ω β \pm ω^{2} β^{2} - 4 (ω - 1))$ . For $ω$ below the critical value the discriminant $Δ (ω) = ω^{2} β^{2} - 4 (ω - 1)$ is positive and the dominant root $∣ λ ∣ = μ_{+}^{2}$ is real and decreasing in $ω$ ; for $ω$ above it $Δ < 0$ , the roots are complex conjugate with $∣ λ ∣ = ∣ μ ∣^{2} = ω - 1$ (product of roots), which increases in $ω$ . The minimum is at $Δ (ω) = 0$ : $ω^{2} β^{2} - 4 ω + 4 = 0$ , whose relevant root is $ω_{opt} = (4 - 16 - 16 β^{2}) / (2 β^{2}) = 2 (1 - 1 - β^{2}) / β^{2}$ . Multiplying numerator and denominator by $(1 + 1 - β^{2})$ and using $β^{2} = 1 - (1 - β^{2})$ gives $ω_{opt} = 2/ (1 + 1 - β^{2})$ . At this point $Δ = 0$ , so both roots coincide with modulus $ω_{opt} - 1$ , hence $ρ (G_{ω_{opt}}) = ω_{opt} - 1$ . $□$

Connections Master

Eigenvalue, eigenvector, and the spectral radius 01.01.08 provides the single invariant on which this entire unit turns: the convergence criterion $ρ (M^{- 1} N) < 1$ , the asymptotic factor $ρ (G)$ , and the rate $- lo g_{10} ρ (G)$ are all read off the spectrum of the iteration matrix. The eigenvalue theory there, static, becomes here the contraction rate of a dynamical system; Young's functional equation relating the Jacobi and SOR spectra is a concrete computation in exactly that eigenvalue language, and the proof of the convergence criterion is a Jordan-form argument straight from that unit.
Gaussian elimination, LU factorization, and its stability 43.03.01 is the direct-solve baseline these iterations are built to compete with: where elimination costs $O (n^{3})$ and fills in the sparse zeros of $A$ , a stationary iteration costs $O (nnz)$ per step and preserves sparsity, paying instead in iteration count. The methods are the answer to the regime where 43.03.01 is too expensive, and each Gauss-Seidel or SOR sweep is itself a triangular solve — the cheap solves whose efficiency that unit's factorisation theory underwrites — so the chapter's direct and iterative halves meet at the triangular system.
Krylov subspaces, Arnoldi, and Lanczos 43.07.02 are the successors that this unit motivates: the defect exposed here — that $ρ (G) \to 1$ as the grid refines, so the iteration count grows with problem size — is precisely what the Krylov methods remove. Where a stationary method re-applies the fixed iteration matrix $G$ , a Krylov method searches the growing subspace $span {b, A b, \dots, A^{m - 1} b}$ for the best polynomial in $A$ , turning the fixed contraction rate into an optimised one; the splitting matrix $M$ of this unit reappears there as a preconditioner, the lever that clusters the spectrum and restores fast convergence.

Historical & philosophical context Master

The Jacobi method takes its name from Carl Gustav Jacob Jacobi, who in 1845 described an iterative scheme for the normal equations of least-squares astronomy, decoupling the system one variable at a time. The sequential-update variant is attributed to Carl Friedrich Gauss, in correspondence of the 1820s on geodetic least squares, and to Philipp Ludwig von Seidel, who published the systematic procedure in 1874; the pairing of their names is a twentieth-century convention.

The acceleration of these classical schemes was the central achievement of mid-century numerical analysis. David M. Young, in his 1950 Harvard doctoral thesis and the monograph Iterative Solution of Large Linear Systems (Academic Press, 1971), developed the theory of successive over-relaxation, introducing consistently ordered matrices and property A and deriving the optimal relaxation parameter $ω_{opt} = 2/ (1 + 1 - ρ (G_{J})^{2})$ that converts the $O (N^{2})$ iteration count of Gauss-Seidel into $O (N)$ on the model problem ^{[Young, D. M. — Iterative Solution of Large Linear Systems]}. The same period saw Stanley Frankel (1950) propose over-relaxation independently for wartime computation. The structural convergence theory was consolidated by Richard S. Varga in Matrix Iterative Analysis (Prentice-Hall, 1962; 2nd ed. Springer, 2000), which organised the subject around regular splittings, M-matrices, and the Stein-Rosenberg comparison of Jacobi and Gauss-Seidel ^{[Varga, R. S. — Matrix Iterative Analysis (2nd ed.)]}. The modern textbook synthesis, situating the stationary methods as the historical prelude and the preconditioning component of the Krylov methods that superseded them, is Yousef Saad's Iterative Methods for Sparse Linear Systems (PWS, 1996; 2nd ed. SIAM, 2003) ^{[Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)]}.

Bibliography Master

@book{saad2003iterative,
  author    = {Saad, Yousef},
  title     = {Iterative Methods for Sparse Linear Systems},
  edition   = {2},
  publisher = {Society for Industrial and Applied Mathematics},
  year      = {2003}
}

@book{young1971iterative,
  author    = {Young, David M.},
  title     = {Iterative Solution of Large Linear Systems},
  publisher = {Academic Press},
  address   = {New York},
  year      = {1971}
}

@book{varga2000matrix,
  author    = {Varga, Richard S.},
  title     = {Matrix Iterative Analysis},
  edition   = {2},
  publisher = {Springer},
  address   = {Berlin},
  year      = {2000}
}

@book{golubvanloan2013mc,
  author    = {Golub, Gene H. and Van Loan, Charles F.},
  title     = {Matrix Computations},
  edition   = {4},
  publisher = {Johns Hopkins University Press},
  year      = {2013}
}

@article{steinrosenberg1948,
  author  = {Stein, P. and Rosenberg, R. L.},
  title   = {On the Solution of Linear Simultaneous Equations by Iteration},
  journal = {Journal of the London Mathematical Society},
  volume  = {s1-23},
  number  = {2},
  year    = {1948},
  pages   = {111--118}
}

@phdthesis{young1950thesis,
  author = {Young, David M.},
  title  = {Iterative Methods for Solving Partial Difference Equations of Elliptic Type},
  school = {Harvard University},
  year   = {1950}
}

Prerequisites

01.01.08
43.03.01

Tier anchors

beginner: Strang 2016 *Introduction to Linear Algebra* 5e (Wellesley-Cambridge) §11.2-11.3 (iterative methods, Jacobi and Gauss-Seidel as splitting the matrix); Trefethen-Bau 1997 *Numerical Linear Algebra* (SIAM) Lecture 32 (overview of classical iterations — opening discussion)
intermediate: Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 4 (basic iterative methods: splitting A = M - N, the iteration matrix M^{-1}N, convergence iff spectral radius < 1, Jacobi/Gauss-Seidel/SOR); Golub-Van Loan 2013 *Matrix Computations* 4e (Johns Hopkins) §11.2 (the classical splittings and their convergence)
master: Young 1971 *Iterative Solution of Large Linear Systems* (Academic Press) Ch. 3-6 (the SOR theory, consistently ordered matrices, the optimal omega for the model problem); Varga 2000 *Matrix Iterative Analysis* 2e (Springer) Ch. 3-4 (regular splittings, the Stein-Rosenberg theorem, the SOR analysis); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 4

References

Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.) · SIAM, 2003. Ch. 4: the splitting A = M - N, the stationary iteration x_{k+1} = M^{-1}(N x_k + b), the iteration matrix G = M^{-1}N, convergence if and only if the spectral radius rho(G) < 1, the asymptotic convergence rate -log10 rho(G), and the Jacobi, Gauss-Seidel, and SOR splittings with their convergence theorems.
Young, D. M. — Iterative Solution of Large Linear Systems · Academic Press, 1971. Ch. 3-6: the theory of successive over-relaxation, consistently ordered matrices and property A, the eigenvalue functional equation relating the Jacobi and SOR spectra, and the derivation of the optimal relaxation parameter omega_opt = 2 / (1 + sqrt(1 - rho(G_J)^2)) for the model problem.
Varga, R. S. — Matrix Iterative Analysis (2nd ed.) · Springer, 2000. Ch. 3-4: regular splittings and monotone convergence, the Stein-Rosenberg theorem comparing Jacobi and Gauss-Seidel, the Householder-John theorem for SPD splittings, and the spectral-radius analysis of the SOR iteration matrix.
Golub, G. H. & Van Loan, C. F. — Matrix Computations (4th ed.) · Johns Hopkins University Press, 2013. §11.2: the classical stationary iterations as matrix splittings, the convergence criterion via the spectral radius of the iteration matrix, and the order-of-magnitude speedup of optimally relaxed SOR over Jacobi and Gauss-Seidel on the model Poisson problem.

Estimated time

beginner: 20m
intermediate: 45m
master: 90m