Stationary iterative methods: Jacobi, Gauss-Seidel, SOR
Anchor (Master): Young 1971 *Iterative Solution of Large Linear Systems* (Academic Press) Ch. 3-6 (the SOR theory, consistently ordered matrices, the optimal omega for the model problem); Varga 2000 *Matrix Iterative Analysis* 2e (Springer) Ch. 3-4 (regular splittings, the Stein-Rosenberg theorem, the SOR analysis); Saad 2003 *Iterative Methods for Sparse Linear Systems* 2e (SIAM) Ch. 4
Intuition Beginner
Solving a large system of equations by elimination, the direct method of 43.03.01, works on every entry of the matrix and costs an amount of arithmetic that grows like the cube of the size. For the enormous, mostly-empty matrices that come from physics simulations — millions of unknowns, but each equation touching only a handful of neighbours — that cubic cost and the memory it fills become impossible. There is a different idea: instead of computing the answer outright, guess it, then repeatedly nudge the guess closer.
The nudge has a simple shape. Split the matrix into an easy part you can solve quickly and a leftover part. At each round, you pretend the leftover acts on your current guess, move it to the right-hand side, and solve the easy part to get a better guess. The easy part is chosen so that solving it is cheap — just the diagonal, or a triangle. You repeat this until the guess stops changing.
Whether the guesses actually march toward the true answer, or wander off, comes down to a single number that measures how much the leftover part amplifies an error each round. If every round shrinks the error, you converge; if some round can grow it, you diverge. That amplification number is the spectral radius you met in 01.01.08.
The three classic recipes differ only in which easy part they pick. Jacobi uses just the diagonal. Gauss-Seidel uses the lower triangle, reusing each freshly updated value right away. Over-relaxation pushes each Gauss-Seidel correction a little further than it asks for, and with the right amount of push, it converges dramatically faster.
Visual Beginner
The picture is a guess being pulled, round after round, toward the true solution, with the size of each pull set by how much the leftover part of the matrix amplifies the current error.
Read the table top to bottom. Each row is one round. The error — the gap between your guess and the truth — is multiplied by the same amplification factor every round. If that factor is below one, the gap shrinks geometrically and you converge; if it is above one, the gap grows and you diverge.
| round | error size (factor ) | error size (factor ) | error size (factor ) |
|---|---|---|---|
| start | |||
| 1 | |||
| 2 | |||
| 3 | |||
| 10 |
The takeaway: convergence is a yes-or-no question decided by one number below or above one, and the speed is set by how far below one it sits. A factor of gains a digit of accuracy every few rounds; a factor of crawls; a factor above one is hopeless.
Worked example Beginner
Solve the small system $$ \begin{aligned} 4x + y &= 5,\ x + 3y &= 4, \end{aligned} $$ whose exact answer is , . We use the Jacobi recipe: solve each equation for its own diagonal unknown, using the other unknown from the previous round.
Rearranged, the rules are and . Start from the guess , .
Round 1. Use the old values on the right: and .
Round 2. Use round-1 values: and .
Round 3. Use round-2 values: and .
Round 4. Use round-3 values: and .
The guesses are closing in on , overshooting then undershooting, with the gap roughly a third the size each round. That one-third is the amplification factor of this system under Jacobi.
What this tells us: we never touched the matrix as a whole, never did elimination. We solved two one-variable updates per round, repeated, and watched the error fall by a steady factor. For a system this size that is silly; for a system with a million unknowns and only a few entries per row, it is the only thing that works.
Check your understanding Beginner
Formal definition Intermediate+
Let with be nonsingular, and consider the system . A splitting of is a decomposition $$ A = M - N, $$ with nonsingular and chosen so that linear systems are cheap to solve. Substituting into gives , which suggests the stationary iteration $$ M x_{k+1} = N x_k + b, \qquad \text{equivalently} \qquad x_{k+1} = M^{-1} N x_k + M^{-1} b. $$ The matrix is the iteration matrix; the map is "stationary" because and the constant vector do not change from step to step. A fixed point of the iteration satisfies , i.e. , i.e. : the fixed point is exactly the solution of the system [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].
The error recursion. Subtracting from gives the homogeneous recursion for the error ,
$$
e_{k+1} = G e_k, \qquad \text{so} \qquad e_k = G^k e_0.
$$
Convergence for every is therefore the statement , which is controlled by the spectral radius of 01.01.08.
The three classical splittings. Write , where is the diagonal of , its strictly-lower-triangular part, and its strictly-upper-triangular part (so have nonnegative-indexed entries with the sign absorbed). Assume nonsingular.
- Jacobi: , . Iteration matrix . Each component is updated independently from the others' previous values.
- Gauss-Seidel: , . Iteration matrix . The lower-triangular lets each component use the already-updated earlier components within the same sweep.
- Successive over-relaxation (SOR): introduce a relaxation parameter and split , , giving The SOR step takes the Gauss-Seidel correction and scales it by : recovers Gauss-Seidel, over-relaxes (pushes further), under-relaxes.
Asymptotic convergence rate. When , the error norm eventually decays like , so reducing the error by a factor of takes about iterations. The quantity is the asymptotic rate of convergence; it measures the digits of accuracy gained per iteration, and is the right figure of merit for comparing splittings.
Counterexamples to common slips
A small norm of is sufficient but not necessary; the spectral radius is the exact criterion. A matrix can have in every standard operator norm yet , in which case the iteration still converges (after a possible transient growth). Convergence is governed by , not by any single norm.
Gauss-Seidel is not always better than Jacobi. For symmetric positive-definite matrices Gauss-Seidel beats Jacobi, and for a large class (consistently ordered matrices) , but there exist matrices for which Jacobi converges and Gauss-Seidel diverges, and vice versa. The Stein-Rosenberg theorem describes when they agree.
Over-relaxation can hurt. SOR accelerates only for in a window; outside the method diverges for SPD matrices (Kahan's necessary condition ), and even inside that window a poorly chosen can be slower than plain Gauss-Seidel.
The diagonal must be nonsingular. Jacobi and the others divide by the diagonal entries; a zero on the diagonal makes singular and undefined. A symmetric permutation may be needed first to bring nonzero entries onto the diagonal.
Key theorem with proof Intermediate+
The signature result identifies convergence of every stationary iteration with a single spectral inequality, and reads the speed off the same quantity.
Theorem (spectral-radius convergence criterion). Let be a splitting with nonsingular, iteration matrix , and let solve . The stationary iteration converges to for every initial vector if and only if . When it converges, the error satisfies , so the asymptotic per-step reduction factor is and the rate of convergence is [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].
Proof. The error obeys , so convergence for every — equivalently every — is the statement that as .
Assume first . Put in Jordan form , so . Each Jordan block for an eigenvalue has -th power with entries that are products of ; since , every such entry tends to because the geometric decay defeats the polynomial growth . Hence , so and .
Conversely, suppose . Choose an eigenpair with and take . Then , whose norm does not tend to . So the iteration fails to converge for this starting error, and the criterion is sharp.
For the rate, apply Gelfand's formula , valid in any submultiplicative norm. From one has , so ; and taking a dominant eigenvector gives equality, so . Thus asymptotically, and the number of iterations to gain one decimal digit is .
Bridge. This theorem is the foundational reason a splitting can be trusted as a solver: it reduces the entire question of convergence to the single scalar , and this is exactly the role the growth factor played for direct elimination in 43.03.01 — one number that decides whether the method is safe and how fast it runs. The result builds toward the classical convergence guarantees of the Advanced results — strict diagonal dominance forcing , and positive-definiteness forcing for — each of which is a sufficient structural condition that pins this spectral radius below one without computing it. It appears again in 43.07.02 and the Krylov methods that follow, where the same spectral picture, applied now to the residual polynomial rather than to a fixed iteration matrix, governs convergence; the central insight is that an iterative solver is a dynamical system whose contraction rate is a spectral radius, and the foundational reason stationary methods are slow is that this radius creeps toward one as the problem grows. Putting these together, the splitting supplies the cheap solve and the spectral radius supplies the speed, and the bridge is that making the radius small — by relaxation, or later by building an optimal polynomial in — is the whole game of iterative linear algebra.
Exercises Intermediate+
Advanced results Master
Theorem 1 (convergence under strict diagonal dominance). If is strictly diagonally dominant — by rows or by columns — then both the Jacobi and the Gauss-Seidel iterations converge: and , for every right-hand side and every starting vector. For Jacobi the bound is immediate from the row-sum estimate. For Gauss-Seidel one shows that any eigenvalue of with would force the singular matrix — more cleanly, that inherits strict diagonal dominance and hence nonsingularity for , contradicting . Diagonal dominance is the most common structural hypothesis under which the classical iterations are guaranteed to work, and it holds automatically for the diffusion and Markov-chain matrices that motivated the methods [Varga, R. S. — Matrix Iterative Analysis (2nd ed.)].
Theorem 2 (the Householder-John theorem; SPD convergence of SOR and Gauss-Seidel). Let be Hermitian positive-definite with splitting , nonsingular. If the matrix $M + M^ - A = M^* + N\rho(M^{-1}N) < 1A\omega \in (0, 2)\omega = 1|v|_A^2 = v^Av|G|_A < 1|Gv|_A^2 - |v|_A^2 = -w^(M + M^* - A)ww = M^{-1}AvM + M^* - A \succ 0v \ne 0M + M^* - A = (\tfrac{2}{\omega} - 1)D0 < \omega < 2$, recovering Kahan's window as both necessary and — under positive-definiteness — sufficient [Varga, R. S. — Matrix Iterative Analysis (2nd ed.)].
Theorem 3 (Stein-Rosenberg; Jacobi versus Gauss-Seidel). Let the Jacobi iteration matrix be nonnegative (as when has nonpositive off-diagonal entries and positive diagonal, an M-matrix). Then exactly one of the following holds: (i) ; (ii) ; (iii) ; (iv) . Cases (ii) and (iv) say Jacobi and Gauss-Seidel converge or diverge together, and when they converge Gauss-Seidel is strictly faster. For consistently ordered matrices the relation sharpens to the exact identity , so Gauss-Seidel gains digits at exactly twice the Jacobi rate [Varga, R. S. — Matrix Iterative Analysis (2nd ed.)].
Theorem 4 (optimal SOR for the model problem and the order-of-magnitude speedup). Let be consistently ordered with . The SOR spectral radius is minimised at $$ \omega_{\mathrm{opt}} = \frac{2}{1 + \sqrt{1 - \beta^2}}, \qquad \rho(G_{\omega_{\mathrm{opt}}}) = \omega_{\mathrm{opt}} - 1 = \frac{1 - \sqrt{1 - \beta^2}}{1 + \sqrt{1 - \beta^2}}. $$ For the model Poisson problem on an grid, with , so , , but . The decisive change is in the power of : Jacobi and Gauss-Seidel have spectral radius , requiring iterations to converge, whereas optimal SOR has radius , requiring only iterations. On a grid with this is the difference between roughly a million and roughly a thousand iterations — the order-of-magnitude acceleration that made SOR the workhorse iterative solver of the 1950s-70s before Krylov methods and multigrid [Young, D. M. — Iterative Solution of Large Linear Systems].
Synthesis. The stationary iterations are one construction — a splitting and the affine map — viewed under one invariant, and the spectral radius is the foundational reason any of them works or fails: convergence is exactly , the speed is exactly , and every named method is a choice of that trades cost-per-step against this radius. This is exactly the structure of 43.03.01, where a single scalar — there the growth factor, here the spectral radius — decides whether a linear solver is safe and how accurately it performs, and the central insight is that an iterative method is a discrete dynamical system whose fixed point is the solution and whose contraction rate is a spectral radius. The classical theory is the catalogue of structural hypotheses that pin that radius below one without computing it: strict diagonal dominance (Theorem 1), positive-definiteness via the energy norm (the Householder-John Theorem 2), and the nonnegativity that lets Stein-Rosenberg (Theorem 3) rank Jacobi against Gauss-Seidel.
The relaxation parameter generalises the splitting from a fixed choice to a one-parameter family, and optimising it (Theorem 4) is dual to reshaping the spectrum: collides the SOR eigenvalues to convert a radius into , the same order-of-magnitude lever that polynomial acceleration and preconditioning will pull again. Putting these together, the splitting supplies the cheap solve and the spectral radius supplies the verdict, and this same dynamical picture builds toward the Krylov methods. The bridge to the rest of the chapter is that as the grid refines, for every stationary method, so the iteration count grows with the problem size; this defect appears again in 43.07.02, where the Krylov-subspace methods, and the conjugate gradient and GMRES algorithms built on them, remove it by constructing at each step the optimal polynomial in rather than re-applying a fixed iteration matrix.
Full proof set Master
Proposition 1 (spectral-radius convergence criterion). The stationary iteration with , converges to the unique fixed point for every if and only if .
Proof. The fixed point exists and is unique because is nonsingular, so solves . The error satisfies , hence . Convergence for every is the statement . If , write in Jordan form; a Jordan block of size for eigenvalue has for , and since the factor decays geometrically while grows only polynomially, so each entry ; thus and . Conversely, if , take an eigenpair with ; then gives with , so convergence fails.
Proposition 2 (asymptotic convergence factor). When , , and the number of iterations to reduce the error by a factor is asymptotically .
Proof. By Gelfand's formula, in any submultiplicative matrix norm. From , , and , so . For the reverse, pick a unit eigenvector for an eigenvalue of modulus ; then , giving and hence . Thus asymptotically , and solving for gives .
Proposition 3 (Householder-John SPD convergence). Let $A = A^ \succ 0A = M - NMM + M^* - A \succ 0\rho(M^{-1}N) < 1$.*
Proof. Work in the energy inner product , with norm for . The iteration matrix is . For any , set , so . Then $$ |Gv|_A^2 = (v-w)^*A(v-w) = v^*Av - v^*Aw - w^*Av + w^Aw. $$ From one has and, taking adjoints with $A = A^v^*A = w^M^v^*Aw = w^*M^*ww^*Av = w^*Mw$, so $$ |Gv|_A^2 - |v|_A^2 = -w^*M^w - w^Mw + w^Aw = -w^(M + M^ - A)w. $$ By hypothesis $M + M^ - A \succ 0w = M^{-1}Av \ne 0v \ne 0M, A|Gv|_A < |v|_A|G|_A < 1\rho(G) \le |G|_A < 1\square$
Proposition 4 (SOR convergence for SPD , ). For Hermitian positive-definite with diagonal and the SOR splitting , the iteration converges if and only if .
Proof. Necessity is Kahan's bound: (Exercise 6), so , forcing . For sufficiency, apply Proposition 3. With , the diagonal is real positive-definite, and since with (Hermitian), gives $$ M + M^* - A = \Big(\tfrac1\omega D - L\Big) + \Big(\tfrac1\omega D - L^*\Big) - (D - L - L^*) = \Big(\tfrac2\omega - 1\Big)D. $$ This is positive-definite exactly when , i.e. . By Proposition 3, there. Combining, the SOR iteration on an SPD matrix converges precisely for .
Proposition 5 (optimal for consistently ordered matrices). Let be consistently ordered with . Then is minimised at , where .
Proof. Consistent ordering yields Young's functional equation: is an eigenvalue of if and only if for some eigenvalue of (the equation depends only on ). Setting gives (choosing the branch that maximises ), with roots . For below the critical value the discriminant is positive and the dominant root is real and decreasing in ; for above it , the roots are complex conjugate with (product of roots), which increases in . The minimum is at : , whose relevant root is . Multiplying numerator and denominator by and using gives . At this point , so both roots coincide with modulus , hence .
Connections Master
Eigenvalue, eigenvector, and the spectral radius
01.01.08provides the single invariant on which this entire unit turns: the convergence criterion , the asymptotic factor , and the rate are all read off the spectrum of the iteration matrix. The eigenvalue theory there, static, becomes here the contraction rate of a dynamical system; Young's functional equation relating the Jacobi and SOR spectra is a concrete computation in exactly that eigenvalue language, and the proof of the convergence criterion is a Jordan-form argument straight from that unit.Gaussian elimination, LU factorization, and its stability
43.03.01is the direct-solve baseline these iterations are built to compete with: where elimination costs and fills in the sparse zeros of , a stationary iteration costs per step and preserves sparsity, paying instead in iteration count. The methods are the answer to the regime where43.03.01is too expensive, and each Gauss-Seidel or SOR sweep is itself a triangular solve — the cheap solves whose efficiency that unit's factorisation theory underwrites — so the chapter's direct and iterative halves meet at the triangular system.Krylov subspaces, Arnoldi, and Lanczos
43.07.02are the successors that this unit motivates: the defect exposed here — that as the grid refines, so the iteration count grows with problem size — is precisely what the Krylov methods remove. Where a stationary method re-applies the fixed iteration matrix , a Krylov method searches the growing subspace for the best polynomial in , turning the fixed contraction rate into an optimised one; the splitting matrix of this unit reappears there as a preconditioner, the lever that clusters the spectrum and restores fast convergence.
Historical & philosophical context Master
The Jacobi method takes its name from Carl Gustav Jacob Jacobi, who in 1845 described an iterative scheme for the normal equations of least-squares astronomy, decoupling the system one variable at a time. The sequential-update variant is attributed to Carl Friedrich Gauss, in correspondence of the 1820s on geodetic least squares, and to Philipp Ludwig von Seidel, who published the systematic procedure in 1874; the pairing of their names is a twentieth-century convention.
The acceleration of these classical schemes was the central achievement of mid-century numerical analysis. David M. Young, in his 1950 Harvard doctoral thesis and the monograph Iterative Solution of Large Linear Systems (Academic Press, 1971), developed the theory of successive over-relaxation, introducing consistently ordered matrices and property A and deriving the optimal relaxation parameter that converts the iteration count of Gauss-Seidel into on the model problem [Young, D. M. — Iterative Solution of Large Linear Systems]. The same period saw Stanley Frankel (1950) propose over-relaxation independently for wartime computation. The structural convergence theory was consolidated by Richard S. Varga in Matrix Iterative Analysis (Prentice-Hall, 1962; 2nd ed. Springer, 2000), which organised the subject around regular splittings, M-matrices, and the Stein-Rosenberg comparison of Jacobi and Gauss-Seidel [Varga, R. S. — Matrix Iterative Analysis (2nd ed.)]. The modern textbook synthesis, situating the stationary methods as the historical prelude and the preconditioning component of the Krylov methods that superseded them, is Yousef Saad's Iterative Methods for Sparse Linear Systems (PWS, 1996; 2nd ed. SIAM, 2003) [Saad, Y. — Iterative Methods for Sparse Linear Systems (2nd ed.)].
Bibliography Master
@book{saad2003iterative,
author = {Saad, Yousef},
title = {Iterative Methods for Sparse Linear Systems},
edition = {2},
publisher = {Society for Industrial and Applied Mathematics},
year = {2003}
}
@book{young1971iterative,
author = {Young, David M.},
title = {Iterative Solution of Large Linear Systems},
publisher = {Academic Press},
address = {New York},
year = {1971}
}
@book{varga2000matrix,
author = {Varga, Richard S.},
title = {Matrix Iterative Analysis},
edition = {2},
publisher = {Springer},
address = {Berlin},
year = {2000}
}
@book{golubvanloan2013mc,
author = {Golub, Gene H. and Van Loan, Charles F.},
title = {Matrix Computations},
edition = {4},
publisher = {Johns Hopkins University Press},
year = {2013}
}
@article{steinrosenberg1948,
author = {Stein, P. and Rosenberg, R. L.},
title = {On the Solution of Linear Simultaneous Equations by Iteration},
journal = {Journal of the London Mathematical Society},
volume = {s1-23},
number = {2},
year = {1948},
pages = {111--118}
}
@phdthesis{young1950thesis,
author = {Young, David M.},
title = {Iterative Methods for Solving Partial Difference Equations of Elliptic Type},
school = {Harvard University},
year = {1950}
}