43.02.02 · numerical-analysis / nonlinear-equations

Fixed-Point Iteration, Contraction Convergence, and Order of Convergence

shipped3 tiersLean: none

Anchor (Master): Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (the full fixed-point convergence theory closing with the order framework that Newton's method realises at $p=2$); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.1, §12.1 (the contraction-mapping theorem in Banach space, the local convergence theorem, and $R$-/$Q$-order of convergence); Kelley 1995 *Iterative Methods for Linear and Nonlinear Equations* (SIAM) Ch. 4-5 (fixed-point iteration, the standard assumptions, and the local convergence rate)

Intuition Beginner

Suppose you want to solve an equation like $f (x) = 0$ , but no formula hands you the answer. There is a second way to phrase the question that often makes it easier to attack. Rearrange the equation so that $x$ sits alone on one side: write it as $x = g (x)$ , where $g$ is some expression built from the original problem. A number that satisfies this is called a fixed point of $g$ , because feeding it into $g$ gives back the very same number. It is fixed in the sense that $g$ does not move it.

Once the problem reads $x = g (x)$ , a natural procedure suggests itself. Pick any starting guess $x_{0}$ . Feed it to $g$ to get $x_{1} = g (x_{0})$ . Feed that back in to get $x_{2} = g (x_{1})$ , and keep going. You are repeatedly applying the same rule, hoping the outputs settle down on the number that $g$ leaves unmoved. This is fixed-point iteration, and it is one of the oldest and most reusable ideas in all of computation.

Whether the procedure settles down depends on one thing: does $g$ pull nearby numbers closer together, or push them apart? If applying $g$ always shrinks the gap between two inputs, then each step shrinks the gap between your guess and the true answer, and you home in on it. A rule that always shrinks gaps is called a contraction. If instead $g$ stretches gaps apart, your guesses fly away from the answer no matter how good your start was.

This unit measures that homing-in. The bisection method of the previous unit always worked but crawled; here we learn when iteration converges, and the language of order of convergence that says how fast — the same yardstick the chapter uses to crown the fastest methods.

Visual Beginner

The picture for fixed-point iteration is the staircase (or cobweb) diagram. Draw the curve $y = g (x)$ and the straight line $y = x$ . A fixed point is where they cross, since there $g (x)$ equals $x$ . Starting from a guess on the horizontal axis, you bounce between the curve and the line: go up to the curve to apply $g$ , then across to the line to copy the output back as the next input, and repeat.

When the curve is shallower than the line near the crossing — gentle slope — the staircase spirals inward to the crossing, and the iteration converges. When the curve is steeper than the line, the staircase marches outward and the iteration diverges. The single number deciding this is the steepness of $g$ at the fixed point.

slope of $g$ at the crossing	what the staircase does	iteration
between $- 1$ and $1$ (shallow)	spirals inward to the crossing	converges
above $1$ or below $- 1$ (steep)	marches outward	diverges
exactly $0$ (flat at the crossing)	snaps in very fast	converges fast

The flat case in the last row is the special prize: when $g$ is flat at the fixed point, the error does not just shrink, it shrinks dramatically faster each step. That is the seed of the fast methods later in the chapter.

Worked example Beginner

Let us find the number whose square is $2$ — the square root of $2$ , about $1.41421$ — by fixed-point iteration. Start from the equation $x^{2} = 2$ . One clean rearrangement into the form $x = g (x)$ is $$ x = \tfrac{1}{2}\left(x + \tfrac{2}{x}\right), $$ which averages a guess $x$ with $2$ divided by that guess. Call the right side $g (x)$ . We will iterate it from the start $x_{0} = 1$ .

Step 1. Apply $g$ to $x_{0} = 1$ : $x_{1} = \frac{1}{2} (1 + 2/1) = \frac{1}{2} (1 + 2) = 1.5$ .

Step 2. Apply $g$ to $x_{1} = 1.5$ : $x_{2} = \frac{1}{2} (1.5 + 2/1.5) = \frac{1}{2} (1.5 + 1.3333 \dots) = \frac{1}{2} (2.8333 \dots) = 1.41666 \dots$

Step 3. Apply $g$ to $x_{2} = 1.41666 \dots$ : $x_{3} = \frac{1}{2} (1.41666 \dots + 2/1.41666 \dots) = \frac{1}{2} (1.41666 \dots + 1.41176 \dots) = 1.41421 \dots$

Step 4. Read off the accuracy. After three steps $x_{3} = 1.41421 \dots$ already matches the true square root of $2$ to five decimal places. Compare the bisection unit, where reaching five correct digits took roughly seventeen steps.

What this tells us: the same problem that bisection solved one bit at a time is solved here in three steps, because this particular $g$ is flat at the fixed point. The number of correct digits roughly doubles each step. Choosing the right rearrangement $g$ is what buys the speed, and the rest of the unit explains exactly why this $g$ races while a careless one would crawl or fail.

Check your understanding Beginner

Formal definition Intermediate+

Let $g : [a, b] \to R$ . A point $x^{*} \in [a, b]$ is a fixed point of $g$ if $g (x^{*}) = x^{*}$ . Given the root-finding problem $f (x) = 0$ 43.02.01, a fixed-point reformulation is any continuous $g$ for which $f (x) = 0 ⟺ x = g (x)$ on the interval of interest; the canonical choice $g (x) = x - λ (x) f (x)$ with $λ (x) \neq = 0$ works for any nonvanishing weight $λ$ . The simple iteration (Picard iteration) generated by $g$ from a starting point $x_{0} \in [a, b]$ is the sequence $$ x_{n+1} = g(x_n), \qquad n = 0, 1, 2, \dots $$ Write $e_{n} = x_{n} - x^{*}$ for the error at step $n$ . The symbols $x^{*}$ (fixed point), $e_{n}$ (error), $L$ (Lipschitz / contraction constant), and $g^{(p)}$ ( $p$ -th derivative of $g$ ) are recorded in _meta/NOTATION.md.

The map $g$ is a contraction on $[a, b]$ with contraction constant $L$ if $g$ maps $[a, b]$ into itself and there is $L \in [0, 1)$ with $$ |g(x) - g(y)| \le L,|x - y| \qquad \text{for all } x, y \in [a, b]. $$ This is the metric-space contraction condition of 02.01.05 specialised to a closed real interval (which, as a closed subset of the complete space $R$ , is itself complete). When $g$ is differentiable, the mean value theorem 02.05.02 supplies the constant: $∣ g (x) - g (y) ∣ = ∣ g^{'} (ξ) ∣ ∣ x - y ∣$ for some $ξ$ between $x$ and $y$ , so $L = max_{[a, b]} ∣ g^{'} ∣$ serves whenever that maximum is below $1$ .

A method producing iterates $x_{n} \to x^{*}$ converges with order $p \geq 1$ if there is a constant $C \geq 0$ (the asymptotic error constant, with $C < 1$ required when $p = 1$ ) such that $$ \lim_{n \to \infty} \frac{|e_{n+1}|}{|e_n|^{p}} = C \qquad\text{(equivalently } |e_{n+1}| \le C,|e_n|^{p} \text{ for large } n). $$ The case $p = 1$ with $0 < C < 1$ is linear convergence with rate $C$ ; $p = 1$ with $C = 0$ (or any $1 < p < 2$ ) is superlinear; $p = 2$ is quadratic. This is the same order framework introduced for bisection 43.02.01, where $p = 1$ and $C = \frac{1}{2}$ . The local convergence question asks whether the iteration converges for every $x_{0}$ in some neighbourhood of $x^{*}$ ; the contraction condition is a sufficient global form of it.

Counterexamples to common slips Intermediate+

"Every rearrangement $x = g (x)$ of $f (x) = 0$ gives a convergent iteration." The fixed point is the same for all of them, but convergence depends on $∣ g^{'} (x^{*}) ∣$ . For $f (x) = x^{2} - 2$ , the rearrangement $g (x) = 2/ x$ has $g^{'} (x^{*}) = - 2/ (x^{*})^{2} = - 1$ , so it does not converge (it oscillates between two values), while $g (x) = \frac{1}{2} (x + 2/ x)$ has $g^{'} (x^{*}) = 0$ and converges quadratically. Same root, opposite behaviour.
"Convergence is decided by the value of $g$ at the fixed point." It is decided by the derivative there. By definition $g (x^{*}) = x^{*}$ for every reformulation; what separates them is the slope $g^{'} (x^{*})$ , which governs how the error is scaled each step.
"A contraction constant $L < 1$ is necessary for convergence." It is sufficient and global; convergence can be merely local. If $∣ g^{'} (x^{*}) ∣ < 1$ but $∣ g^{'} ∣$ exceeds $1$ elsewhere on $[a, b]$ , the iteration still converges from starts close enough to $x^{*}$ , because continuity of $g^{'}$ makes $g$ a contraction on a small neighbourhood of $x^{*}$ even when it is not one on all of $[a, b]$ .
"Linear convergence with rate $|g'(x^)| $an d or d er$ p $f r o ma v ani s hin g d er i v a t i v e a r e d i f f er e n tp h e n o m e na ." * T h ey a r e t h e tw oe n d so f o n e t h eor e m . T h e a sy m pt o t i cer r or r e l a t i o n$ e_{n+1} \approx g'(x^) e_n $i s t h eor d er -$ 1 $c a seo f$ e_{n+1} \approx \frac{g^{(p)}(x^)}{p!} e_n^p $; t h eor d er r i ses t o t h e in d e x$ p $o f t h e f i r s t d er i v a t i v eo f$ g $t ha t d oes n o t v ani s ha t$ x^*$.

Key theorem with proof Intermediate+

The signature result is the contraction-mapping convergence theorem with its two error bounds, because it is the single statement that certifies the iteration converges, identifies its limit as the fixed point, and quantifies the error both before running (a-priori) and during running (a-posteriori).

Theorem (contraction-mapping convergence with a-priori and a-posteriori bounds). Let $g : [a, b] \to [a, b]$ satisfy $∣ g (x) - g (y) ∣ \leq L ∣ x - y ∣$ for all $x, y \in [a, b]$ with $L \in [0, 1)$ . Then $g$ has a unique fixed point $x^ \in [a, b] $; f or e v er y$ x_0 \in [a, b] $t h e i t er a t es$ x_{n+1} = g(x_n) $co n v er g e t o$ x^$; and the errors obey $$ \underbrace{|x_n - x^| \le \frac{L^{,n}}{1 - L},|x_1 - x_0|}_{\text{a-priori}}, \qquad \underbrace{|x_n - x^| \le \frac{L}{1 - L},|x_n - x_{n-1}|}_{\text{a-posteriori}}. $$

Proof. Existence and uniqueness are the Banach fixed-point statement on the complete space $[a, b]$ 02.01.05; we reproduce the constructive argument because it yields the bounds. Consecutive iterates contract: $∣ x_{n + 1} - x_{n} ∣ = ∣ g (x_{n}) - g (x_{n - 1}) ∣ \leq L ∣ x_{n} - x_{n - 1} ∣$ , so by induction $∣ x_{n + 1} - x_{n} ∣ \leq L^{n} ∣ x_{1} - x_{0} ∣$ . For $m > n$ the triangle inequality and the geometric sum give $$ |x_m - x_n| \le \sum_{k=n}^{m-1} |x_{k+1} - x_k| \le |x_1 - x_0|\sum_{k=n}^{m-1} L^k \le \frac{L^n}{1 - L},|x_1 - x_0|. $$ The right side tends to $0$ as $n \to \infty$ , so $(x_{n})$ is a Cauchy sequence; by completeness of $[a, b]$ it converges to a limit $x^{*} \in [a, b]$ . Continuity of $g$ (a contraction is Lipschitz, hence continuous) lets us pass to the limit in $x_{n + 1} = g (x_{n})$ , giving $x^{*} = g (x^{*})$ , so $x^{*}$ is a fixed point. If $y^{*}$ were another, $∣ x^{*} - y^{*} ∣ = ∣ g (x^{*}) - g (y^{*}) ∣ \leq L ∣ x^{*} - y^{*} ∣$ with $L < 1$ forces $∣ x^{*} - y^{*} ∣ = 0$ , so the fixed point is unique.

For the a-priori bound, let $m \to \infty$ in the Cauchy estimate above with $n$ fixed: $x_{m} \to x^{*}$ gives $∣ x_{n} - x^{*} ∣ \leq \frac{L ^{n}}{1 - L} ∣ x_{1} - x_{0} ∣$ . For the a-posteriori bound, start from $∣ x_{n + 1} - x_{n} ∣ \leq L ∣ x_{n} - x_{n - 1} ∣$ and apply the same geometric tail anchored at index $n$ : for $m > n$ , $$ |x_m - x_n| \le \sum_{k=n}^{m-1} L^{,k - n + 1},|x_n - x_{n-1}| \le \frac{L}{1 - L},|x_n - x_{n-1}|, $$ and letting $m \to \infty$ yields $∣ x_{n} - x^{*} ∣ \leq \frac{L}{1 - L} ∣ x_{n} - x_{n - 1} ∣$ . $□$

Bridge. This theorem is the foundational reason fixed-point iteration is the organising abstraction of the whole nonlinear-equations chapter: it certifies convergence from a single inequality on $g$ , and the a-priori bound shows the error falls geometrically as $L^{n}$ , which is exactly linear convergence with rate $L$ , the order- $1$ case of the framework set up for bisection 43.02.01. The a-posteriori bound is the practical companion — it turns the observable step $∣ x_{n} - x_{n - 1} ∣$ into a certified error, so it is the stopping criterion the chapter's methods actually use, and it builds toward the local-convergence refinement where $L$ is replaced by $∣ g^{'} (x^{*}) ∣$ . The contraction picture generalises directly to several variables, where $∣ g^{'} (x^{*}) ∣ < 1$ becomes the Jacobian spectral-radius condition $ρ (g^{'} (x^{*})) < 1$ , and it is dual to the bracketing picture of bisection: there convergence comes from halving an interval, here from a derivative below $1$ in size. Putting these together, the same convergence appears again in 43.02.03, where Newton's iteration is the fixed-point map with $g^{'} (x^{*}) = 0$ and the geometric rate collapses to quadratic order — the central insight that links this elementary contraction estimate to the fastest method of the chapter.

Exercises Intermediate+

Exercise 3 (medium, numeric).

The equation $x^{3} + x - 1 = 0$ has a root in $[0, 1]$ . The rearrangement $g (x) = 1/ (1 + x^{2})$ has this root as a fixed point. Compute $g^{'}$ at the root $x^{*} \approx 0.6823$ and state the asymptotic linear rate of convergence.

Hint

Differentiate $g (x) = (1 + x^{2})^{- 1}$ to get $g^{'} (x) = - 2 x / (1 + x^{2})^{2}$ , then evaluate at $x^{*} \approx 0.6823$ . The rate is $∣ g^{'} (x^{*}) ∣$ .

Answer

Rate $\approx 0.466$ . With $g^{'} (x) = - 2 x / (1 + x^{2})^{2}$ and $x^{*} \approx 0.6823$ : $1 + (x^{*})^{2} \approx 1.4655$ , so $(1 + (x^{*})^{2})^{2} \approx 2.1477$ , and $g^{'} (x^{*}) \approx - 2 (0.6823) /2.1477 \approx - 0.6353$ . Hence $∣ g^{'} (x^{*}) ∣ \approx 0.635 < 1$ , and the iteration converges linearly with asymptotic rate about $0.64$ : each step multiplies the error by roughly $0.64$ in the limit. (Any answer consistent with a correct $g^{'} (x^{*})$ evaluation earns full credit; the key point is $∣ g^{'} (x^{*}) ∣ < 1$ giving linear convergence.)

Exercise 4 (medium, symbolic).

Show that the rearrangement $g (x) = 2/ x$ of $x^{2} - 2 = 0$ does not converge by simple iteration, by computing $g^{'} (2)$ and describing the orbit from a start $x_{0} \neq = 2$ .

Hint

Compute $g^{'} (x) = - 2/ x^{2}$ at $x^{*} = 2$ . Then note that $g (g (x)) = x$ for all $x$ , so the iterates cycle.

Answer

At $x^{*} = 2$ , $g^{'} (x) = - 2/ x^{2}$ gives $g^{'} (2) = - 2/2 = - 1$ , so $∣ g^{'} (x^{*}) ∣ = 1$ , the boundary case where the linear-convergence theorem gives no contraction. More strongly, $g (g (x)) = 2/ (2/ x) = x$ for every $x \neq = 0$ , so $g$ is an involution: the iterates satisfy $x_{2} = x_{0}$ , $x_{3} = x_{1}$ , and the orbit is the $2$ -cycle ${x_{0}, 2/ x_{0}}$ , which never approaches $2$ unless $x_{0} = 2$ exactly. The reformulation has the right fixed point but a slope of size $1$ that forbids convergence. Full credit for $g^{'} (2) = - 1$ and the periodic-orbit observation.

Exercise 6 (hard, symbolic).

Prove that if $g$ is $C^{1}$ near a fixed point $x^{*}$ with $∣ g^{'} (x^{*}) ∣ = m < 1$ , then for every $ρ$ with $m < ρ < 1$ there is a closed interval $I = [x^{*} - δ, x^{*} + δ]$ on which $g$ is a contraction with constant $ρ$ , so the iteration converges locally.

Hint

Continuity of $g^{'}$ gives $∣ g^{'} (x) ∣ \leq ρ$ on a small interval around $x^{*}$ . Use the mean value theorem to turn the derivative bound into a Lipschitz bound, and check $g$ maps $I$ into $I$ .

Answer

Since $g^{'}$ is continuous at $x^{*}$ and $∣ g^{'} (x^{*}) ∣ = m < ρ$ , there is $δ > 0$ with $∣ g^{'} (x) ∣ \leq ρ$ for all $x \in I = [x^{*} - δ, x^{*} + δ]$ . For $x \in I$ , the mean value theorem 02.05.02 gives $∣ g (x) - g (x^{*}) ∣ = ∣ g^{'} (ξ) ∣ ∣ x - x^{*} ∣ \leq ρ ∣ x - x^{*} ∣ \leq ρ δ < δ$ , and since $g (x^{*}) = x^{*}$ this says $∣ g (x) - x^{*} ∣ < δ$ , so $g (I) \subseteq I$ : $g$ maps $I$ into itself. For any $x, y \in I$ the mean value theorem again gives $∣ g (x) - g (y) ∣ = ∣ g^{'} (ξ) ∣ ∣ x - y ∣ \leq ρ ∣ x - y ∣$ , so $g$ is a contraction on $I$ with constant $ρ < 1$ . The contraction-mapping theorem then applies on $I$ , giving convergence of the iteration to $x^{*}$ from every start in $I$ . Full credit for using continuity of $g^{'}$ to get the derivative bound, the self-map check $g (I) \subseteq I$ , and the mean-value Lipschitz estimate.

Exercise 7 (hard, symbolic).

Suppose $g$ is $C^{p}$ near its fixed point $x^{*}$ with $g^{'} (x^{*}) = \dots = g^{(p - 1)} (x^{*}) = 0$ and $g^{(p)} (x^{*}) \neq = 0$ , and the iteration converges. Use Taylor's theorem to show the order of convergence is exactly $p$ and identify the asymptotic error constant.

Hint

Expand $g (x_{n})$ about $x^{*}$ with Lagrange remainder. All terms below order $p$ vanish, leaving $e_{n + 1} = \frac{g ^{(p)} ( ξ _{n} )}{p !} e_{n}^{p}$ for some $ξ_{n}$ between $x_{n}$ and $x^{*}$ .

Answer

By Taylor's theorem with Lagrange remainder 02.05.02 about $x^{*}$ , for $x_{n}$ near $x^{*}$ , $$ x_{n+1} = g(x_n) = g(x^) + \sum_{j=1}^{p-1}\frac{g^{(j)}(x^)}{j!}(x_n - x^)^j + \frac{g^{(p)}(\xi_n)}{p!}(x_n - x^)^p, $$ with $ξ_{n}$ between $x_{n}$ and $x^{*}$ . Now $g (x^{*}) = x^{*}$ and the hypothesis kills every derivative term from $j = 1$ to $p - 1$ , leaving $$ e_{n+1} = x_{n+1} - x^* = \frac{g^{(p)}(\xi_n)}{p!},e_n^{,p}. $$ Dividing, $∣ e_{n + 1} ∣/∣ e_{n} ∣^{p} = ∣ g^{(p)} (ξ_{n}) ∣/ p!$ . Since the iteration converges, $x_{n} \to x^{*}$ forces $ξ_{n} \to x^{*}$ , so by continuity of $g^{(p)}$ , $$ \lim_{n\to\infty}\frac{|e_{n+1}|}{|e_n|^{p}} = \frac{|g^{(p)}(x^)|}{p!} =: C \ne 0. $$ This is convergence of order exactly $p$ with asymptotic error constant $C = |g^{(p)}(x^)|/p! $. T h eor d er c ann o t e x cee d$ p $b ec a u se$ C \ne 0 $, an d c ann o t b e l ess b ec a u se a l l l o w er - or d er t er m s v ani s h . F u l l cr e d i t f or t h e T a y l or e x p an s i o n, t h ec an ce l l a t i o n o f t h e l o w er d er i v a t i v es, an d t h e l imi t i d e n t i f y in g$ C$.

Advanced results Master

Fixed-point iteration is the master template under which every iterative root-finder of the chapter is a special case, and the results below place the scalar order theory inside its sharper and higher-dimensional forms: the exact order- $p$ asymptotics, the second-order acceleration that converts linear into quadratic convergence, the Banach-space generalisation that subsumes the multivariable case, and the global pathologies that the local theory deliberately ignores.

Theorem 1 (order of convergence from the leading derivative). Let $g$ be $C^{p}$ in a neighbourhood of a fixed point $x^{*}$ with $g^{'} (x^{*}) = \dots = g^{(p - 1)} (x^{*}) = 0$ and $g^{(p)} (x^{*}) \neq = 0$ , and suppose the iteration converges to $x^{*}$ . Then it converges with order exactly $p$ and asymptotic error constant $C = ∣ g^{(p)} (x^{*}) ∣/ p!$ , since the Lagrange-remainder Taylor expansion gives $e_{n + 1} = \frac{g ^{(p)} ( ξ _{n} )}{p !} e_{n}^{p}$ with $ξ_{n} \to x^{*}$ . The order is therefore the index of the first non-vanishing derivative of $g$ at the fixed point: $∣ g^{'} (x^{*}) ∣ \in (0, 1)$ gives linear order $p = 1$ with rate $∣ g^{'} (x^{*}) ∣$ ; $g^{'} (x^{*}) = 0 \neq = g^{''} (x^{*})$ gives quadratic order $p = 2$ . Newton's method is engineered precisely to land in the second case ^{[Süli-Mayers §1.4]}.

Theorem 2 (Aitken $Δ^{2}$ acceleration and Steffensen's method). For a linearly convergent sequence with $e_{n + 1} \approx λ e_{n}$ , $∣ λ ∣ < 1$ , the Aitken extrapolate $$ \hat{x}n = x_n - \frac{(x{n+1} - x_n)^2}{x_{n+2} - 2x_{n+1} + x_n} $$ satisfies $(\overset{x}{^}_{n} - x^{*}) / (x_{n} - x^{*}) \to 0$ : it converges faster than the original sequence by cancelling the leading geometric error term. Folding the extrapolation back into the iteration — applying $Δ^{2}$ to every three fixed-point steps — gives Steffensen's method, which attains quadratic convergence for a simple fixed point without evaluating any derivative of $g$ , by using three function values to estimate the local rate $g^{'} (x^{*})$ that Newton would read directly ^{[Atkinson §2.6]}.

Theorem 3 (contraction mapping and local convergence in Banach space). Let $X$ be a Banach space, $D \subseteq X$ closed, and $G : D \to D$ a contraction with constant $L < 1$ in the norm of $X$ . Then $G$ has a unique fixed point $x^{*} \in D$ , the iterates converge from any start in $D$ , and the a-priori and a-posteriori bounds hold verbatim with $∣ \cdot ∣$ replaced by $∥ \cdot ∥$ . The local form: if $G$ is Fréchet-differentiable at a fixed point $x^{*}$ with spectral radius $ρ (G^{'} (x^{*})) < 1$ , then for any $ε > 0$ there is an operator norm in which $∥ G^{'} (x^{*}) ∥ \leq ρ (G^{'} (x^{*})) + ε < 1$ , so $G$ is a contraction near $x^{*}$ and the iteration converges locally with asymptotic rate $ρ (G^{'} (x^{*}))$ . For $G : R^{d} \to R^{d}$ this is the multivariable fixed-point theorem, the Jacobian spectral radius replacing $∣ g^{'} (x^{*}) ∣$ — the setting in which Newton-on-systems is analysed ^{[Ortega-Rheinboldt §10.1, §12.1]}.

Theorem 4 (global structure: basins, periodic orbits, and the boundary case). The local theory is silent about starts far from $x^{*}$ . Globally a fixed-point iteration is a discrete dynamical system: $g$ may possess several fixed points with disjoint basins of attraction, attracting periodic orbits (as the involution $g (x) = 2/ x$ exhibits a $2$ -cycle for $x^{2} - 2 = 0$ ), and at the boundary $∣ g^{'} (x^{*}) ∣ = 1$ the linearisation is inconclusive — convergence then depends on the higher-order terms and may be one-sided, algebraically slow, or absent. For analytic $g$ on $C$ the basin boundaries are Julia sets, and Newton's method for a polynomial with three or more roots has a basin structure of fractal complexity, the historical origin of complex dynamics ^{[Süli-Mayers §1.4]}.

Synthesis. Fixed-point iteration is the foundational reason the order-of-convergence framework is a single theory rather than a catalogue: every iterative root-finder writes the equation as $x = g (x)$ and runs $x_{n + 1} = g (x_{n})$ , so its convergence is governed by one quantity, the derivative of $g$ at the fixed point, and its order is exactly the index of the first non-vanishing derivative there. The central insight is the dichotomy this produces — when $∣ g^{'} (x^{*}) ∣ \in (0, 1)$ the error contracts geometrically and convergence is linear with rate $∣ g^{'} (x^{*}) ∣$ , the bisection-level regime 43.02.01; when $g^{'} (x^{*}) = 0$ the geometric term vanishes and the next surviving derivative sets a higher order, so making $g^{'} (x^{*}) = 0$ is exactly how one buys quadratic speed, which is what Newton's method does 43.02.03. This is dual to the bracketing guarantee of bisection: there safety comes from interval halving at the fixed rate $\frac{1}{2}$ ; here speed comes from a small derivative and the rate is problem-dependent and can be driven to zero, but the unconditional guarantee is lost — convergence is only local. Putting these together, the contraction-mapping theorem certifies that the iteration converges and the order theorem says how fast, and the same machinery appears again in 43.02.03, where Newton is the fixed-point map with a vanishing first derivative, and generalises through the Banach-space form to the multivariable Jacobian condition $ρ (g^{'} (x^{*})) < 1$ — the bridge from this scalar estimate to nonlinear systems.

Full proof set Master

Proposition 1 (contraction-mapping theorem with a-priori and a-posteriori bounds). Under the hypotheses of the Key theorem, $g$ has a unique fixed point $x^ $, t h e i t er a t esco n v er g e t o i t f r o man y s t a r t, an d$ |x_n - x^| \le \frac{L^n}{1-L}|x_1 - x_0| $an d$ |x_n - x^| \le \frac{L}{1-L}|x_n - x_{n-1}|$.*

Proof. As established in the Key theorem: $∣ x_{n + 1} - x_{n} ∣ \leq L^{n} ∣ x_{1} - x_{0} ∣$ by induction on the contraction inequality; the geometric tail bounds $∣ x_{m} - x_{n} ∣ \leq \frac{L ^{n}}{1 - L} ∣ x_{1} - x_{0} ∣$ for $m > n$ , forcing the Cauchy property and, by completeness of the closed interval $[a, b] \subseteq R$ 02.01.05, a limit $x^{*} \in [a, b]$ ; continuity of $g$ gives $x^{*} = g (x^{*})$ ; the contraction inequality with $L < 1$ forces uniqueness. Letting $m \to \infty$ in the tail with $n$ fixed gives the a-priori bound; anchoring the tail at the latest step $∣ x_{n + 1} - x_{n} ∣ \leq L ∣ x_{n} - x_{n - 1} ∣$ and summing $\sum_{k \geq 0} L^{k + 1} ∣ x_{n} - x_{n - 1} ∣ = \frac{L}{1 - L} ∣ x_{n} - x_{n - 1} ∣$ gives the a-posteriori bound. $□$

Proposition 2 (local convergence from $∣ g^{'} (x^{*}) ∣ < 1$ ). If $g$ is $C^{1}$ near a fixed point $x^ $w i t h$ |g'(x^)| < 1 $, t h e n t h er e i s a c l ose d in t er v a l$ I $ab o u t$ x^ $o n w hi c h$ g $i s a co n t r a c t i o n, so t h e i t er a t i o n co n v er g es t o$ x^ $f r o m e v er y s t a r t in$ I $, w i t ha sy m pt o t i c l in e a r r a t e$ |g'(x^)| $w h e n$ g'(x^) \ne 0$.

Proof. Choose $ρ$ with $∣ g^{'} (x^{*}) ∣ < ρ < 1$ . Continuity of $g^{'}$ gives $δ > 0$ with $∣ g^{'} (x) ∣ \leq ρ$ on $I = [x^{*} - δ, x^{*} + δ]$ . For $x \in I$ , the mean value theorem 02.05.02 gives $∣ g (x) - x^{*} ∣ = ∣ g (x) - g (x^{*}) ∣ = ∣ g^{'} (ξ) ∣∣ x - x^{*} ∣ \leq ρ ∣ x - x^{*} ∣ \leq ρ δ < δ$ , so $g (I) \subseteq I$ ; and for $x, y \in I$ , $∣ g (x) - g (y) ∣ = ∣ g^{'} (ξ) ∣∣ x - y ∣ \leq ρ ∣ x - y ∣$ , so $g$ is a contraction on $I$ with constant $ρ$ . Proposition 1 applies on $I$ , giving convergence. For the rate, the mean value theorem gives $e_{n + 1} = g (x_{n}) - g (x^{*}) = g^{'} (ξ_{n}) e_{n}$ with $ξ_{n}$ between $x_{n}$ and $x^{*}$ ; as $x_{n} \to x^{*}$ , $ξ_{n} \to x^{*}$ , so $e_{n + 1} / e_{n} = g^{'} (ξ_{n}) \to g^{'} (x^{*})$ , hence $∣ e_{n + 1} ∣/∣ e_{n} ∣ \to ∣ g^{'} (x^{*}) ∣$ , linear convergence with that rate. $□$

Proposition 3 (order equals the index of the first non-vanishing derivative). If $g$ is $C^{p}$ near $x^ $w i t h$ g'(x^) = \cdots = g^{(p-1)}(x^) = 0 $,$ g^{(p)}(x^) \ne 0 $, an d t h e i t er a t i o n co n v er g es t o$ x^ $, t h e ni t ha sor d er e x a c tl y$ p $w i t h$ |e_{n+1}|/|e_n|^p \to |g^{(p)}(x^)|/p!$.

Proof. Taylor's theorem with Lagrange remainder 02.05.02 about $x^{*}$ gives $x_{n + 1} = g (x_{n}) = g (x^{*}) + \sum_{j = 1}^{p - 1} \frac{g ^{(j)} ( x ^{*} )}{j !} e_{n}^{j} + \frac{g ^{(p)} ( ξ _{n} )}{p !} e_{n}^{p}$ with $ξ_{n}$ between $x_{n}$ and $x^{*}$ . The hypothesis annihilates the sum and $g (x^{*}) = x^{*}$ cancels the constant, leaving $e_{n + 1} = \frac{g ^{(p)} ( ξ _{n} )}{p !} e_{n}^{p}$ . Thus $∣ e_{n + 1} ∣/∣ e_{n} ∣^{p} = ∣ g^{(p)} (ξ_{n}) ∣/ p!$ ; convergence forces $ξ_{n} \to x^{*}$ , and continuity of $g^{(p)}$ gives the limit $∣ g^{(p)} (x^{*}) ∣/ p! \neq = 0$ , which is order exactly $p$ . $□$

Proposition 4 (Aitken acceleration cancels the leading linear error). Let $x_n \to x^ $w i t h$ e_{n+1} = (\lambda + \eta_n)e_n $,$ |\lambda| < 1 $,$ \eta_n \to 0 $. T h e n t h e A i t k e ni t er a t e$ \hat{x}n = x_n - \frac{(\Delta x_n)^2}{\Delta^2 x_n} $, w h er e$ \Delta x_n = x{n+1} - x_n $an d$ \Delta^2 x_n = x_{n+2} - 2x_{n+1} + x_n $, s a t i s f i es$ (\hat{x}_n - x^)/(x_n - x^) \to 0$.*

Proof. Write $e_{n} = x_{n} - x^{*}$ . To leading order $e_{n + 1} = λ e_{n} (1 + o (1))$ , so $Δ x_{n} = e_{n + 1} - e_{n} = (λ - 1) e_{n} (1 + o (1))$ and $Δ^{2} x_{n} = e_{n + 2} - 2 e_{n + 1} + e_{n} = (λ - 1)^{2} e_{n} (1 + o (1))$ . Then $$ \hat{x}_n - x^* = e_n - \frac{(\Delta x_n)^2}{\Delta^2 x_n} = e_n - \frac{(\lambda-1)^2 e_n^2(1+o(1))}{(\lambda-1)^2 e_n(1+o(1))} = e_n - e_n(1 + o(1)) = e_n, o(1). $$ Hence $(\overset{x}{^}_{n} - x^{*}) / e_{n} \to 0$ : the extrapolate converges faster than $x_{n}$ because the geometric error $λ e_{n}$ is cancelled by the $Δ^{2}$ correction, leaving only the higher-order remainder. Iterating this cancellation within the fixed-point loop is Steffensen's method, whose quadratic order follows from re-deriving Proposition 3 for the composed map $(g \circ g (x) - 2 g (x) + x)$ -corrected iteration. $□$

Connections Master

The bisection method 43.02.01 introduced the order-of-convergence definition $∣ e_{n + 1} ∣ \leq C ∣ e_{n} ∣^{p}$ and realised it at $p = 1$ , $C = \frac{1}{2}$ through interval halving. This unit shows that same linear case arising instead from a derivative: a fixed-point iteration with $∣ g^{'} (x^{*}) ∣ \in (0, 1)$ converges linearly with rate $∣ g^{'} (x^{*}) ∣$ . The contrast is the heart of the comparison — bisection's rate is a fixed $\frac{1}{2}$ guaranteed unconditionally, while the fixed-point rate is problem-dependent, can exceed $1$ and diverge, and is only locally certified, so the two methods trade guarantee against tunable speed.
Newton's method 43.02.03 is exactly the fixed-point iteration $x_{n + 1} = g (x_{n})$ for the map $g (x) = x - f (x) / f^{'} (x)$ , whose derivative at a simple root is $g^{'} (x^{*}) = f (x^{*}) f^{''} (x^{*}) / f^{'} (x^{*})^{2} = 0$ . By the order theorem of this unit, the vanishing of $g^{'} (x^{*})$ raises the convergence order to $p = 2$ , quadratic, with asymptotic constant $∣ g^{''} (x^{*}) ∣/2! = ∣ f^{''} (x^{*}) /2 f^{'} (x^{*}) ∣$ . Newton is therefore not a separate theory but the engineered instance of this unit's framework in which the first derivative of the iteration map is forced to zero; the secant method is the derivative-free quasi-fixed-point variant of intermediate order $φ = (1 + 5) /2$ .
The Banach fixed-point theorem 02.01.05 is the abstract existence-and-uniqueness statement on which the convergence proof here rests; this unit is the numerical layer that the metric-space theorem does not develop — the asymptotic rate $∣ g^{'} (x^{*}) ∣$ and the order $p$ set by the leading derivative of $g$ . The same contraction principle drives Picard-Lindelöf existence for ODEs and the implicit/inverse function theorems, so the convergence-rate analysis here is the discrete-iteration shadow of those continuous existence results.
Conditioning, floating-point arithmetic, and backward error 43.01.01 floor the attainable accuracy of fixed-point iteration just as they do bisection: once $∣ x_{n} - x_{n - 1} ∣$ falls to the level of rounding error in evaluating $g$ , the a-posteriori bound can no longer certify further accuracy, and the iterates stagnate at a residual of order $∣ x^{*} ∣ ε_{mach}$ scaled by the conditioning of the fixed point $1/∣1 - g^{'} (x^{*}) ∣$ . This unit's stopping criteria forward-reference that chapter for the precision limit, and the factor $1/ (1 - g^{'} (x^{*}))$ in the a-posteriori bound is the same conditioning multiplier.

Historical & philosophical context Master

The method of successive substitution is among the oldest in analysis. The systematic study of iteration $x_{n + 1} = g (x_{n})$ as a tool for solving equations runs through the nineteenth century alongside the development of rigorous analysis; Émile Picard's method of successive approximations (1890), built to prove existence for differential equations, is the same contraction argument applied to an integral operator, and it isolated the contraction estimate as the engine of convergence. Stefan Banach's 1922 thesis ^{[Banach 1922]} abstracted the argument into the fixed-point theorem for complete metric spaces, giving the existence, uniqueness, and the geometric a-priori bound in the generality that subsumes both the scalar iteration of this unit and the Banach-space form used for nonlinear systems. The numerical reading — that $∣ g^{'} (x^{*}) ∣$ is the asymptotic linear rate and that vanishing derivatives raise the order — is the contribution of the twentieth-century numerical-analysis literature.

Süli and Mayers present the order-of-convergence theorem as the organising result of their nonlinear-equations chapter, with Newton's method appearing as the realisation of order $p = 2$ ^{[Süli-Mayers Ch. 1]}. The multivariable theory, with the Jacobian spectral-radius condition $ρ (g^{'} (x^{*})) < 1$ and the $Q$ -order / $R$ -order refinement of the scalar order index, was given its definitive treatment by Ortega and Rheinboldt in 1970 ^{[Ortega-Rheinboldt §12.1]}. Alexander Aitken's 1926 $Δ^{2}$ process ^{[Aitken 1926]} supplied the acceleration that, folded into the iteration by Steffensen, recovers quadratic order from a linearly convergent scheme without derivatives.

Bibliography Master

@book{sulimayers2003,
  author    = {S\"{u}li, Endre and Mayers, David F.},
  title     = {An Introduction to Numerical Analysis},
  publisher = {Cambridge University Press},
  year      = {2003}
}

@book{atkinson1989,
  author    = {Atkinson, Kendall E.},
  title     = {An Introduction to Numerical Analysis},
  edition   = {2},
  publisher = {John Wiley \& Sons},
  year      = {1989}
}

@book{ortegarheinboldt2000,
  author    = {Ortega, James M. and Rheinboldt, Werner C.},
  title     = {Iterative Solution of Nonlinear Equations in Several Variables},
  series    = {Classics in Applied Mathematics},
  volume    = {30},
  publisher = {SIAM},
  year      = {2000},
  note      = {Reprint of the 1970 Academic Press edition}
}

@book{kelley1995,
  author    = {Kelley, C. T.},
  title     = {Iterative Methods for Linear and Nonlinear Equations},
  series    = {Frontiers in Applied Mathematics},
  volume    = {16},
  publisher = {SIAM},
  year      = {1995}
}

@article{banach1922,
  author  = {Banach, Stefan},
  title   = {Sur les op\'{e}rations dans les ensembles abstraits et leur application aux \'{e}quations int\'{e}grales},
  journal = {Fundamenta Mathematicae},
  volume  = {3},
  year    = {1922},
  pages   = {133--181}
}

@article{aitken1926,
  author  = {Aitken, Alexander C.},
  title   = {On Bernoulli's numerical solution of algebraic equations},
  journal = {Proceedings of the Royal Society of Edinburgh},
  volume  = {46},
  year    = {1926},
  pages   = {289--305}
}

Prerequisites

02.01.05
02.05.02
43.02.01

Tier anchors

beginner: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §1.2-1.3 (rewriting $f(x)=0$ as $x=g(x)$, the simple iteration $x_{n+1}=g(x_n)$, and the staircase/cobweb picture of convergence and divergence); Burden & Faires 2011 *Numerical Analysis* 9e (Brooks/Cole) §2.2 for the hands-on fixed-point examples
intermediate: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §1.3-1.4 (the contraction-mapping theorem on an interval, the a-priori and a-posteriori error bounds, local convergence under $|g'(x^*)|<1$, and the order-of-convergence theorem tying the order to the first non-vanishing derivative of $g$ at $x^*$); Atkinson 1989 *An Introduction to Numerical Analysis* 2e (Wiley) §2.2; Stoer & Bulirsch 2002 *Introduction to Numerical Analysis* 3e (Springer) §5.2
master: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (the full fixed-point convergence theory closing with the order framework that Newton's method realises at $p=2$); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.1, §12.1 (the contraction-mapping theorem in Banach space, the local convergence theorem, and $R$-/$Q$-order of convergence); Kelley 1995 *Iterative Methods for Linear and Nonlinear Equations* (SIAM) Ch. 4-5 (fixed-point iteration, the standard assumptions, and the local convergence rate)

References

Süli, E. & Mayers, D. — An Introduction to Numerical Analysis · Cambridge University Press (2003). Chapter 1 'Solution of equations by iteration' develops the simple (fixed-point) iteration $x_{n+1}=g(x_n)$ from the equivalence $f(x)=0 \iff x=g(x)$; the contraction-mapping theorem on a closed interval $[a,b]$ mapped into itself with $|g'(x)|\le L<1$, giving existence and uniqueness of a fixed point $x^*$ and convergence of the iterates from any start in $[a,b]$; the a-priori bound $|x_n-x^*|\le \frac{L^n}{1-L}|x_1-x_0|$ and the a-posteriori bound $|x_n-x^*|\le \frac{L}{1-L}|x_n-x_{n-1}|$; local convergence with asymptotic linear rate $|g'(x^*)|$ when $|g'(x^*)|<1$ and divergence when $|g'(x^*)|>1$; and the order-of-convergence theorem: if $g'(x^*)=\cdots=g^{(p-1)}(x^*)=0$ and $g^{(p)}(x^*)\ne0$ then the iteration has order $p$ with $e_{n+1}\sim \frac{g^{(p)}(x^*)}{p!}e_n^p$, the framework Newton's method realises at $p=2$.
Atkinson, K. E. — An Introduction to Numerical Analysis · Wiley, 2nd edition (1989). §2.2 presents fixed-point iteration with the same contraction hypothesis, the linear convergence rate $|g'(x^*)|$, the error-estimate $x_n-x^*\approx \frac{1}{1-g'(x^*)}(x_n-x_{n-1})$, and the higher-order theorem relating the order of convergence to the order of the first non-vanishing derivative of the iteration map at the fixed point; it places Newton and the secant method as fixed-point and quasi-fixed-point schemes of order $2$ and $\varphi=(1+\sqrt5)/2$.
Ortega, J. M. & Rheinboldt, W. C. — Iterative Solution of Nonlinear Equations in Several Variables · SIAM Classics in Applied Mathematics 30 (2000 reprint of the 1970 Academic Press edition). §10.1 states the contraction-mapping theorem in a Banach space (the Banach fixed-point theorem with the geometric a-priori bound), §12.1 develops the local convergence theorem for stationary iterative processes and the $Q$-order / $R$-order of convergence framework, the multivariable generalisation of the scalar order-$p$ theory, with the Jacobian spectral radius $\rho(g'(x^*))<1$ as the local-convergence condition replacing $|g'(x^*)|<1$.

Estimated time

beginner: 20m
intermediate: 50m
master: 85m