Fixed-Point Iteration, Contraction Convergence, and Order of Convergence
Anchor (Master): Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (the full fixed-point convergence theory closing with the order framework that Newton's method realises at $p=2$); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.1, §12.1 (the contraction-mapping theorem in Banach space, the local convergence theorem, and $R$-/$Q$-order of convergence); Kelley 1995 *Iterative Methods for Linear and Nonlinear Equations* (SIAM) Ch. 4-5 (fixed-point iteration, the standard assumptions, and the local convergence rate)
Intuition Beginner
Suppose you want to solve an equation like , but no formula hands you the answer. There is a second way to phrase the question that often makes it easier to attack. Rearrange the equation so that sits alone on one side: write it as , where is some expression built from the original problem. A number that satisfies this is called a fixed point of , because feeding it into gives back the very same number. It is fixed in the sense that does not move it.
Once the problem reads , a natural procedure suggests itself. Pick any starting guess . Feed it to to get . Feed that back in to get , and keep going. You are repeatedly applying the same rule, hoping the outputs settle down on the number that leaves unmoved. This is fixed-point iteration, and it is one of the oldest and most reusable ideas in all of computation.
Whether the procedure settles down depends on one thing: does pull nearby numbers closer together, or push them apart? If applying always shrinks the gap between two inputs, then each step shrinks the gap between your guess and the true answer, and you home in on it. A rule that always shrinks gaps is called a contraction. If instead stretches gaps apart, your guesses fly away from the answer no matter how good your start was.
This unit measures that homing-in. The bisection method of the previous unit always worked but crawled; here we learn when iteration converges, and the language of order of convergence that says how fast — the same yardstick the chapter uses to crown the fastest methods.
Visual Beginner
The picture for fixed-point iteration is the staircase (or cobweb) diagram. Draw the curve and the straight line . A fixed point is where they cross, since there equals . Starting from a guess on the horizontal axis, you bounce between the curve and the line: go up to the curve to apply , then across to the line to copy the output back as the next input, and repeat.
When the curve is shallower than the line near the crossing — gentle slope — the staircase spirals inward to the crossing, and the iteration converges. When the curve is steeper than the line, the staircase marches outward and the iteration diverges. The single number deciding this is the steepness of at the fixed point.
| slope of at the crossing | what the staircase does | iteration |
|---|---|---|
| between and (shallow) | spirals inward to the crossing | converges |
| above or below (steep) | marches outward | diverges |
| exactly (flat at the crossing) | snaps in very fast | converges fast |
The flat case in the last row is the special prize: when is flat at the fixed point, the error does not just shrink, it shrinks dramatically faster each step. That is the seed of the fast methods later in the chapter.
Worked example Beginner
Let us find the number whose square is — the square root of , about — by fixed-point iteration. Start from the equation . One clean rearrangement into the form is $$ x = \tfrac{1}{2}\left(x + \tfrac{2}{x}\right), $$ which averages a guess with divided by that guess. Call the right side . We will iterate it from the start .
Step 1. Apply to : .
Step 2. Apply to :
Step 3. Apply to :
Step 4. Read off the accuracy. After three steps already matches the true square root of to five decimal places. Compare the bisection unit, where reaching five correct digits took roughly seventeen steps.
What this tells us: the same problem that bisection solved one bit at a time is solved here in three steps, because this particular is flat at the fixed point. The number of correct digits roughly doubles each step. Choosing the right rearrangement is what buys the speed, and the rest of the unit explains exactly why this races while a careless one would crawl or fail.
Check your understanding Beginner
Formal definition Intermediate+
Let . A point is a fixed point of if . Given the root-finding problem 43.02.01, a fixed-point reformulation is any continuous for which on the interval of interest; the canonical choice with works for any nonvanishing weight . The simple iteration (Picard iteration) generated by from a starting point is the sequence
$$
x_{n+1} = g(x_n), \qquad n = 0, 1, 2, \dots
$$
Write for the error at step . The symbols (fixed point), (error), (Lipschitz / contraction constant), and (-th derivative of ) are recorded in _meta/NOTATION.md.
The map is a contraction on with contraction constant if maps into itself and there is with
$$
|g(x) - g(y)| \le L,|x - y| \qquad \text{for all } x, y \in [a, b].
$$
This is the metric-space contraction condition of 02.01.05 specialised to a closed real interval (which, as a closed subset of the complete space , is itself complete). When is differentiable, the mean value theorem 02.05.02 supplies the constant: for some between and , so serves whenever that maximum is below .
A method producing iterates converges with order if there is a constant (the asymptotic error constant, with required when ) such that
$$
\lim_{n \to \infty} \frac{|e_{n+1}|}{|e_n|^{p}} = C
\qquad\text{(equivalently } |e_{n+1}| \le C,|e_n|^{p} \text{ for large } n).
$$
The case with is linear convergence with rate ; with (or any ) is superlinear; is quadratic. This is the same order framework introduced for bisection 43.02.01, where and . The local convergence question asks whether the iteration converges for every in some neighbourhood of ; the contraction condition is a sufficient global form of it.
Counterexamples to common slips Intermediate+
"Every rearrangement of gives a convergent iteration." The fixed point is the same for all of them, but convergence depends on . For , the rearrangement has , so it does not converge (it oscillates between two values), while has and converges quadratically. Same root, opposite behaviour.
"Convergence is decided by the value of at the fixed point." It is decided by the derivative there. By definition for every reformulation; what separates them is the slope , which governs how the error is scaled each step.
"A contraction constant is necessary for convergence." It is sufficient and global; convergence can be merely local. If but exceeds elsewhere on , the iteration still converges from starts close enough to , because continuity of makes a contraction on a small neighbourhood of even when it is not one on all of .
"Linear convergence with rate $|g'(x^)|pe_{n+1} \approx g'(x^) e_n1e_{n+1} \approx \frac{g^{(p)}(x^)}{p!} e_n^ppgx^*$.
Key theorem with proof Intermediate+
The signature result is the contraction-mapping convergence theorem with its two error bounds, because it is the single statement that certifies the iteration converges, identifies its limit as the fixed point, and quantifies the error both before running (a-priori) and during running (a-posteriori).
Theorem (contraction-mapping convergence with a-priori and a-posteriori bounds). Let satisfy for all with . Then has a unique fixed point $x^ \in [a, b]x_0 \in [a, b]x_{n+1} = g(x_n)x^$; and the errors obey $$ \underbrace{|x_n - x^| \le \frac{L^{,n}}{1 - L},|x_1 - x_0|}_{\text{a-priori}}, \qquad \underbrace{|x_n - x^| \le \frac{L}{1 - L},|x_n - x_{n-1}|}_{\text{a-posteriori}}. $$
Proof. Existence and uniqueness are the Banach fixed-point statement on the complete space 02.01.05; we reproduce the constructive argument because it yields the bounds. Consecutive iterates contract: , so by induction . For the triangle inequality and the geometric sum give
$$
|x_m - x_n| \le \sum_{k=n}^{m-1} |x_{k+1} - x_k| \le |x_1 - x_0|\sum_{k=n}^{m-1} L^k \le \frac{L^n}{1 - L},|x_1 - x_0|.
$$
The right side tends to as , so is a Cauchy sequence; by completeness of it converges to a limit . Continuity of (a contraction is Lipschitz, hence continuous) lets us pass to the limit in , giving , so is a fixed point. If were another, with forces , so the fixed point is unique.
For the a-priori bound, let in the Cauchy estimate above with fixed: gives . For the a-posteriori bound, start from and apply the same geometric tail anchored at index : for , $$ |x_m - x_n| \le \sum_{k=n}^{m-1} L^{,k - n + 1},|x_n - x_{n-1}| \le \frac{L}{1 - L},|x_n - x_{n-1}|, $$ and letting yields .
Bridge. This theorem is the foundational reason fixed-point iteration is the organising abstraction of the whole nonlinear-equations chapter: it certifies convergence from a single inequality on , and the a-priori bound shows the error falls geometrically as , which is exactly linear convergence with rate , the order- case of the framework set up for bisection 43.02.01. The a-posteriori bound is the practical companion — it turns the observable step into a certified error, so it is the stopping criterion the chapter's methods actually use, and it builds toward the local-convergence refinement where is replaced by . The contraction picture generalises directly to several variables, where becomes the Jacobian spectral-radius condition , and it is dual to the bracketing picture of bisection: there convergence comes from halving an interval, here from a derivative below in size. Putting these together, the same convergence appears again in 43.02.03, where Newton's iteration is the fixed-point map with and the geometric rate collapses to quadratic order — the central insight that links this elementary contraction estimate to the fastest method of the chapter.
Exercises Intermediate+
Advanced results Master
Fixed-point iteration is the master template under which every iterative root-finder of the chapter is a special case, and the results below place the scalar order theory inside its sharper and higher-dimensional forms: the exact order- asymptotics, the second-order acceleration that converts linear into quadratic convergence, the Banach-space generalisation that subsumes the multivariable case, and the global pathologies that the local theory deliberately ignores.
Theorem 1 (order of convergence from the leading derivative). Let be in a neighbourhood of a fixed point with and , and suppose the iteration converges to . Then it converges with order exactly and asymptotic error constant , since the Lagrange-remainder Taylor expansion gives with . The order is therefore the index of the first non-vanishing derivative of at the fixed point: gives linear order with rate ; gives quadratic order . Newton's method is engineered precisely to land in the second case [Süli-Mayers §1.4].
Theorem 2 (Aitken acceleration and Steffensen's method). For a linearly convergent sequence with , , the Aitken extrapolate $$ \hat{x}n = x_n - \frac{(x{n+1} - x_n)^2}{x_{n+2} - 2x_{n+1} + x_n} $$ satisfies : it converges faster than the original sequence by cancelling the leading geometric error term. Folding the extrapolation back into the iteration — applying to every three fixed-point steps — gives Steffensen's method, which attains quadratic convergence for a simple fixed point without evaluating any derivative of , by using three function values to estimate the local rate that Newton would read directly [Atkinson §2.6].
Theorem 3 (contraction mapping and local convergence in Banach space). Let be a Banach space, closed, and a contraction with constant in the norm of . Then has a unique fixed point , the iterates converge from any start in , and the a-priori and a-posteriori bounds hold verbatim with replaced by . The local form: if is Fréchet-differentiable at a fixed point with spectral radius , then for any there is an operator norm in which , so is a contraction near and the iteration converges locally with asymptotic rate . For this is the multivariable fixed-point theorem, the Jacobian spectral radius replacing — the setting in which Newton-on-systems is analysed [Ortega-Rheinboldt §10.1, §12.1].
Theorem 4 (global structure: basins, periodic orbits, and the boundary case). The local theory is silent about starts far from . Globally a fixed-point iteration is a discrete dynamical system: may possess several fixed points with disjoint basins of attraction, attracting periodic orbits (as the involution exhibits a -cycle for ), and at the boundary the linearisation is inconclusive — convergence then depends on the higher-order terms and may be one-sided, algebraically slow, or absent. For analytic on the basin boundaries are Julia sets, and Newton's method for a polynomial with three or more roots has a basin structure of fractal complexity, the historical origin of complex dynamics [Süli-Mayers §1.4].
Synthesis. Fixed-point iteration is the foundational reason the order-of-convergence framework is a single theory rather than a catalogue: every iterative root-finder writes the equation as and runs , so its convergence is governed by one quantity, the derivative of at the fixed point, and its order is exactly the index of the first non-vanishing derivative there. The central insight is the dichotomy this produces — when the error contracts geometrically and convergence is linear with rate , the bisection-level regime 43.02.01; when the geometric term vanishes and the next surviving derivative sets a higher order, so making is exactly how one buys quadratic speed, which is what Newton's method does 43.02.03. This is dual to the bracketing guarantee of bisection: there safety comes from interval halving at the fixed rate ; here speed comes from a small derivative and the rate is problem-dependent and can be driven to zero, but the unconditional guarantee is lost — convergence is only local. Putting these together, the contraction-mapping theorem certifies that the iteration converges and the order theorem says how fast, and the same machinery appears again in 43.02.03, where Newton is the fixed-point map with a vanishing first derivative, and generalises through the Banach-space form to the multivariable Jacobian condition — the bridge from this scalar estimate to nonlinear systems.
Full proof set Master
Proposition 1 (contraction-mapping theorem with a-priori and a-posteriori bounds). Under the hypotheses of the Key theorem, has a unique fixed point $x^|x_n - x^| \le \frac{L^n}{1-L}|x_1 - x_0||x_n - x^| \le \frac{L}{1-L}|x_n - x_{n-1}|$.*
Proof. As established in the Key theorem: by induction on the contraction inequality; the geometric tail bounds for , forcing the Cauchy property and, by completeness of the closed interval 02.01.05, a limit ; continuity of gives ; the contraction inequality with forces uniqueness. Letting in the tail with fixed gives the a-priori bound; anchoring the tail at the latest step and summing gives the a-posteriori bound.
Proposition 2 (local convergence from ). If is near a fixed point $x^|g'(x^)| < 1Ix^gx^I|g'(x^)|g'(x^) \ne 0$.
Proof. Choose with . Continuity of gives with on . For , the mean value theorem 02.05.02 gives , so ; and for , , so is a contraction on with constant . Proposition 1 applies on , giving convergence. For the rate, the mean value theorem gives with between and ; as , , so , hence , linear convergence with that rate.
Proposition 3 (order equals the index of the first non-vanishing derivative). If is near $x^g'(x^) = \cdots = g^{(p-1)}(x^) = 0g^{(p)}(x^) \ne 0x^p|e_{n+1}|/|e_n|^p \to |g^{(p)}(x^)|/p!$.
Proof. Taylor's theorem with Lagrange remainder 02.05.02 about gives with between and . The hypothesis annihilates the sum and cancels the constant, leaving . Thus ; convergence forces , and continuity of gives the limit , which is order exactly .
Proposition 4 (Aitken acceleration cancels the leading linear error). Let $x_n \to x^e_{n+1} = (\lambda + \eta_n)e_n|\lambda| < 1\eta_n \to 0\hat{x}n = x_n - \frac{(\Delta x_n)^2}{\Delta^2 x_n}\Delta x_n = x{n+1} - x_n\Delta^2 x_n = x_{n+2} - 2x_{n+1} + x_n(\hat{x}_n - x^)/(x_n - x^) \to 0$.*
Proof. Write . To leading order , so and . Then $$ \hat{x}_n - x^* = e_n - \frac{(\Delta x_n)^2}{\Delta^2 x_n} = e_n - \frac{(\lambda-1)^2 e_n^2(1+o(1))}{(\lambda-1)^2 e_n(1+o(1))} = e_n - e_n(1 + o(1)) = e_n, o(1). $$ Hence : the extrapolate converges faster than because the geometric error is cancelled by the correction, leaving only the higher-order remainder. Iterating this cancellation within the fixed-point loop is Steffensen's method, whose quadratic order follows from re-deriving Proposition 3 for the composed map -corrected iteration.
Connections Master
The bisection method
43.02.01introduced the order-of-convergence definition and realised it at , through interval halving. This unit shows that same linear case arising instead from a derivative: a fixed-point iteration with converges linearly with rate . The contrast is the heart of the comparison — bisection's rate is a fixed guaranteed unconditionally, while the fixed-point rate is problem-dependent, can exceed and diverge, and is only locally certified, so the two methods trade guarantee against tunable speed.Newton's method
43.02.03is exactly the fixed-point iteration for the map , whose derivative at a simple root is . By the order theorem of this unit, the vanishing of raises the convergence order to , quadratic, with asymptotic constant . Newton is therefore not a separate theory but the engineered instance of this unit's framework in which the first derivative of the iteration map is forced to zero; the secant method is the derivative-free quasi-fixed-point variant of intermediate order .The Banach fixed-point theorem
02.01.05is the abstract existence-and-uniqueness statement on which the convergence proof here rests; this unit is the numerical layer that the metric-space theorem does not develop — the asymptotic rate and the order set by the leading derivative of . The same contraction principle drives Picard-Lindelöf existence for ODEs and the implicit/inverse function theorems, so the convergence-rate analysis here is the discrete-iteration shadow of those continuous existence results.Conditioning, floating-point arithmetic, and backward error
43.01.01floor the attainable accuracy of fixed-point iteration just as they do bisection: once falls to the level of rounding error in evaluating , the a-posteriori bound can no longer certify further accuracy, and the iterates stagnate at a residual of order scaled by the conditioning of the fixed point . This unit's stopping criteria forward-reference that chapter for the precision limit, and the factor in the a-posteriori bound is the same conditioning multiplier.
Historical & philosophical context Master
The method of successive substitution is among the oldest in analysis. The systematic study of iteration as a tool for solving equations runs through the nineteenth century alongside the development of rigorous analysis; Émile Picard's method of successive approximations (1890), built to prove existence for differential equations, is the same contraction argument applied to an integral operator, and it isolated the contraction estimate as the engine of convergence. Stefan Banach's 1922 thesis [Banach 1922] abstracted the argument into the fixed-point theorem for complete metric spaces, giving the existence, uniqueness, and the geometric a-priori bound in the generality that subsumes both the scalar iteration of this unit and the Banach-space form used for nonlinear systems. The numerical reading — that is the asymptotic linear rate and that vanishing derivatives raise the order — is the contribution of the twentieth-century numerical-analysis literature.
Süli and Mayers present the order-of-convergence theorem as the organising result of their nonlinear-equations chapter, with Newton's method appearing as the realisation of order [Süli-Mayers Ch. 1]. The multivariable theory, with the Jacobian spectral-radius condition and the -order / -order refinement of the scalar order index, was given its definitive treatment by Ortega and Rheinboldt in 1970 [Ortega-Rheinboldt §12.1]. Alexander Aitken's 1926 process [Aitken 1926] supplied the acceleration that, folded into the iteration by Steffensen, recovers quadratic order from a linearly convergent scheme without derivatives.
Bibliography Master
@book{sulimayers2003,
author = {S\"{u}li, Endre and Mayers, David F.},
title = {An Introduction to Numerical Analysis},
publisher = {Cambridge University Press},
year = {2003}
}
@book{atkinson1989,
author = {Atkinson, Kendall E.},
title = {An Introduction to Numerical Analysis},
edition = {2},
publisher = {John Wiley \& Sons},
year = {1989}
}
@book{ortegarheinboldt2000,
author = {Ortega, James M. and Rheinboldt, Werner C.},
title = {Iterative Solution of Nonlinear Equations in Several Variables},
series = {Classics in Applied Mathematics},
volume = {30},
publisher = {SIAM},
year = {2000},
note = {Reprint of the 1970 Academic Press edition}
}
@book{kelley1995,
author = {Kelley, C. T.},
title = {Iterative Methods for Linear and Nonlinear Equations},
series = {Frontiers in Applied Mathematics},
volume = {16},
publisher = {SIAM},
year = {1995}
}
@article{banach1922,
author = {Banach, Stefan},
title = {Sur les op\'{e}rations dans les ensembles abstraits et leur application aux \'{e}quations int\'{e}grales},
journal = {Fundamenta Mathematicae},
volume = {3},
year = {1922},
pages = {133--181}
}
@article{aitken1926,
author = {Aitken, Alexander C.},
title = {On Bernoulli's numerical solution of algebraic equations},
journal = {Proceedings of the Royal Society of Edinburgh},
volume = {46},
year = {1926},
pages = {289--305}
}