43.02.03 · numerical-analysis / nonlinear-equations

Newton's Method and the Secant Method: Superlinear and Quadratic Convergence

shipped3 tiersLean: none

Anchor (Master): Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (Newton as the fixed-point map with $g'(x^*)=0$, the secant's irrational order, and Newton for systems); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.2, §12.6 (the Newton-Kantorovich theorem, affine-invariant convergence, and the $\mathbb{R}^n$ quadratic-convergence theorem with the Jacobian); Deuflhard 2004 *Newton Methods for Nonlinear Problems* (Springer) §1-2 (affine-invariant Lipschitz theory, global Newton, and continuation)

Intuition Beginner

The previous units solved $f (x) = 0$ by trapping a root in a bracket and shrinking it, or by rewriting the problem as $x = g (x)$ and iterating. Both are reliable, but the bracketing method crawls. There is a far faster idea hiding in the picture of the curve itself: instead of just checking signs, use the slope of the curve to aim your next guess straight at the root.

Here is the picture. Stand at your current guess on the curve $y = f (x)$ . Draw the tangent line — the straight line that just grazes the curve there, matching its height and its slope. A straight line is something you can solve exactly: find where the tangent crosses the horizontal axis, and make that crossing your next guess. The tangent is a stand-in for the curve, and near the root the stand-in is so good that each step roughly doubles the number of correct digits. This is Newton's method.

Drawing a tangent needs the slope, and sometimes you do not have a formula for it. The fix is to use a nearby chord instead of the tangent: take your last two guesses, draw the straight line through those two points on the curve, and follow it to the axis. That is the secant method. It is almost as fast as Newton and never asks for the slope directly.

Speed has a price. Bisection could never fail once it had a bracket. Newton and the secant method are fast only when you start close enough; from a bad start they can overshoot, wander, or fly off entirely. The chapter's theme is exactly this trade: guaranteed-but-slow against fast-but-fragile.

Visual Beginner

The picture for Newton's method is the tangent staircase. Draw the curve $y = f (x)$ and the horizontal axis. From your guess $x_{n}$ , go up (or down) to the curve, draw the tangent there, and slide along it to where it hits the axis: that landing point is $x_{n + 1}$ . Repeat, and the landing points race toward the root where the curve crosses the axis.

The secant method replaces each tangent with the straight line through the two most recent points on the curve. You need two starting guesses instead of one, but you never compute a slope — the chord supplies it.

method	what line is used	what it needs	speed near the root
bisection	none (just signs)	sign of $f$	one digit per step
secant	chord through last two points	$f$ only	digits grow by about $1.6 \times$ each step
Newton	tangent at current point	$f$ and its slope $f^{'}$	digits roughly double each step

The bottom two rows are the fast methods. The doubling of digits in Newton's row is what "quadratic convergence" means, and it traces back to the flat-derivative case of the fixed-point staircase from the previous unit.

Worked example Beginner

Let us find the square root of $2$ , about $1.41421356$ , as a root of $f (x) = x^{2} - 2$ by Newton's method. The slope is $f^{'} (x) = 2 x$ , so the tangent-crossing rule $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = x_n - \frac{x_n^2 - 2}{2x_n} $$ simplifies to the averaging rule $x_{n + 1} = \frac{1}{2} (x_{n} + 2/ x_{n})$ — the very map the previous unit found converged fast. Start from $x_{0} = 1$ .

Step 1. $x_{1} = \frac{1}{2} (1 + 2/1) = \frac{1}{2} (3) = 1.5$ . Already the error is $0.0858$ .

Step 2. $x_{2} = \frac{1}{2} (1.5 + 2/1.5) = \frac{1}{2} (1.5 + 1.33333) = 1.41666 \dots$ The error has dropped to about $0.0025$ .

Step 3. $x_{3} = \frac{1}{2} (1.41666 + 2/1.41666) = \frac{1}{2} (1.41666 + 1.41176) = 1.41421568 \dots$ The error is now about $0.0000021$ .

Step 4. Read off the accuracy. From $x_{1}$ to $x_{3}$ the number of correct digits went roughly $1, 3, 6$ — each step about doubled it.

What this tells us: Newton's tangent rule turned a slow problem into a three-step one, and the doubling of correct digits per step is the hallmark of the method. Bisection needed about fifty steps to reach this precision; Newton reaches it in four. The catch, which the formal sections make precise, is that this race only happens once the guesses are close enough to the root.

Check your understanding Beginner

Exercise (easy, multiple-choice).

The secant method differs from Newton's method because it:

Needs a bracket with a sign change to start.
Uses a chord through the last two guesses instead of the tangent, so it never needs the slope.
Is slower than bisection.
Only works for straight-line functions.

Hint

Think about what line the secant method draws and what information that line needs.

Answer

B. The secant method draws the straight line through the two most recent points on the curve and follows it to the axis, so it estimates the slope from those two points and never asks for $f^{'}$ . Feedback-correct: this makes it usable when no slope formula is available, at a speed close to Newton's. Feedback-wrong: A describes bisection's requirement; C is false since the secant is much faster than bisection; D misreads the chord idea.

Formal definition Intermediate+

Let $f : I \to R$ be differentiable on an open interval $I$ containing a root $x^{*}$ with $f (x^{*}) = 0$ . The root is simple if $f^{'} (x^{*}) \neq = 0$ . Newton's method (the Newton-Raphson iteration) generates iterates from a start $x_{0} \in I$ by $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, \qquad n = 0, 1, 2, \dots, $$ defined wherever $f^{'} (x_{n}) \neq = 0$ . Geometrically $x_{n + 1}$ is the root of the affine model $ℓ_{n} (x) = f (x_{n}) + f^{'} (x_{n}) (x - x_{n})$ , the tangent line — the first-order Taylor polynomial of $f$ at $x_{n}$ 02.05.02. Newton's method is the fixed-point iteration $x_{n + 1} = g (x_{n})$ 43.02.02 for the Newton map $$ g(x) = x - \frac{f(x)}{f'(x)}. $$ Write $e_{n} = x_{n} - x^{*}$ for the error. The symbols $x^{*}$ (root), $e_{n}$ (error), $g$ (Newton/iteration map), $J$ (Jacobian), and $F$ (vector field) are recorded in _meta/NOTATION.md.

The secant method replaces the derivative by the slope of the chord through the two most recent iterates: given $x_{0}, x_{1} \in I$ , $$ x_{n+1} = x_n - f(x_n),\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}, \qquad n = 1, 2, \dots $$ It uses two starting points and a single new evaluation of $f$ per step (the previous value is reused); it needs no $f^{'}$ .

For a system $F : Ω \subseteq R^{d} \to R^{d}$ with $F (x^{*}) = 0$ and Jacobian $J (x) = F^{'} (x) \in R^{d \times d}$ , Newton's method for systems is $$ \mathbf x_{k+1} = \mathbf x_k - J(\mathbf x_k)^{-1} F(\mathbf x_k), $$ each step solving the linear system $J (x_{k}) Δ_{k} = - F (x_{k})$ for the update $Δ_{k} = x_{k + 1} - x_{k}$ . The simple-root condition becomes invertibility of $J (x^{*})$ , the nonsingularity hypothesis of the inverse function theorem 02.05.04 that makes the local solution and its differentiable dependence well defined; the multivariable Taylor expansion of $F$ 02.05.05 supplies the error analysis.

Recall the order of convergence 43.02.02: iterates converge with order $p \geq 1$ if $∣ e_{n + 1} ∣/∣ e_{n} ∣^{p} \to C$ for a constant $C$ (with $C < 1$ when $p = 1$ ). Newton attains $p = 2$ (quadratic); the secant attains $p = φ = (1 + 5) /2 \approx 1.618$ (superlinear, between linear and quadratic).

Counterexamples to common slips Intermediate+

"Newton's method always converges." It converges quadratically only from a start sufficiently close to a simple root. For $f (x) = arctan x$ with root $0$ , starting too far out makes the tangent overshoot so badly that $∣ x_{n + 1} ∣ > ∣ x_{n} ∣$ and the iterates diverge; for $f (x) = x^{3} - 2 x + 2$ , the start $x_{0} = 0$ produces the $2$ -cycle $0 \to 1 \to 0$ and never reaches the root.
"Newton needs a bracket like bisection." Newton uses no bracket and gives no sign guarantee; a single starting point and the derivative are its only inputs. This is the source of both its speed and its fragility — there is no interval keeping it honest.
"Newton is quadratic for every root." At a multiple root, where $f^{'} (x^{*}) = 0$ , the Newton map has $g^{'} (x^{*}) = 1 - 1/ m \neq = 0$ for a root of multiplicity $m$ , so convergence drops to linear with rate $1 - 1/ m$ . Quadratic order requires a simple root; the modified iteration $x_{n + 1} = x_{n} - m f (x_{n}) / f^{'} (x_{n})$ restores it when $m$ is known.
"The secant method is linear because it lacks the derivative." Its order is $φ \approx 1.618$ , strictly above $1$ : superlinear. The chord slope $(f (x_{n}) - f (x_{n - 1})) / (x_{n} - x_{n - 1})$ converges to $f^{'} (x^{*})$ as the iterates close in, so the secant step approaches the Newton step, and the order falls just short of $2$ only because it uses a one-step-stale slope.

Key theorem with proof Intermediate+

The signature result is the local quadratic convergence of Newton's method at a simple root, because it is the single statement that certifies the method's defining feature — the doubling of correct digits — and exhibits Newton as the realisation of order $2$ in the fixed-point framework 43.02.02.

Theorem (local quadratic convergence of Newton's method). Let $f \in C^{2}$ on an open interval containing a simple root $x^ $($ f(x^) = 0 $,$ f'(x^) \ne 0 $) . T h e n t h er e i s$ \delta > 0 $s u c h t ha t f or e v er y$ x_0 $w i t h$ |x_0 - x^| \le \delta $t h e N e w t o ni t er a t es a r e w e l l d e f in e d, co n v er g e t o$ x^$, and satisfy* $$ \lim_{n\to\infty} \frac{e_{n+1}}{e_n^{,2}} = \frac{f''(x^)}{2,f'(x^)}, \qquad e_n = x_n - x^*, $$ so the convergence is quadratic.

Proof. Since $f^{'}$ is continuous and $f^{'} (x^{*}) \neq = 0$ , choose $δ_{0} > 0$ with $f^{'} (x) \neq = 0$ on $J_{0} = [x^{*} - δ_{0}, x^{*} + δ_{0}]$ ; the Newton map $g (x) = x - f (x) / f^{'} (x)$ is then $C^{1}$ on $J_{0}$ . Differentiating, $$ g'(x) = 1 - \frac{f'(x)^2 - f(x)f''(x)}{f'(x)^2} = \frac{f(x),f''(x)}{f'(x)^2}. $$ At the root $f (x^{*}) = 0$ , so $g^{'} (x^{*}) = 0$ . By the local-convergence proposition for fixed-point iteration 43.02.02, $g^{'} (x^{*}) = 0$ with $∣ g^{'} (x^{*}) ∣ < 1$ gives a closed interval $J \subseteq J_{0}$ about $x^{*}$ on which $g$ is a contraction mapping $J$ into itself, so the iterates converge to $x^{*}$ from any start in $J$ . Take $δ$ with $[x^{*} - δ, x^{*} + δ] \subseteq J$ .

For the rate, expand $f$ at $x_{n}$ by Taylor's theorem with Lagrange remainder 02.05.02, evaluated at the root $x^{*}$ : there is $ξ_{n}$ between $x_{n}$ and $x^{*}$ with $$ 0 = f(x^) = f(x_n) + f'(x_n)(x^ - x_n) + \tfrac{1}{2}f''(\xi_n)(x^* - x_n)^2. $$ Divide by $f^{'} (x_{n})$ (nonzero on $J$ ) and rearrange, using $x^{*} - x_{n} = - e_{n}$ : $$ \frac{f(x_n)}{f'(x_n)} - e_n = -\tfrac{1}{2}\frac{f''(\xi_n)}{f'(x_n)}e_n^2. $$ The left side is $(x_{n} - x_{n + 1}) - (x_{n} - x^{*}) = x^{*} - x_{n + 1} = - e_{n + 1}$ , so $$ e_{n+1} = \frac{f''(\xi_n)}{2,f'(x_n)},e_n^{,2}. $$ As $n \to \infty$ , $x_{n} \to x^{*}$ forces $ξ_{n} \to x^{*}$ and $f^{'} (x_{n}) \to f^{'} (x^{*})$ ; continuity of $f^{''}$ and $f^{'}$ gives $e_{n + 1} / e_{n}^{2} \to f^{''} (x^{*}) / (2 f^{'} (x^{*}))$ . The limit is finite and the order is exactly $2$ . $□$

Bridge. This theorem is the foundational reason Newton's method is the fast end of the nonlinear-equations chapter: the computation $g^{'} (x^{*}) = f (x^{*}) f^{''} (x^{*}) / f^{'} (x^{*})^{2} = 0$ shows Newton is exactly the fixed-point iteration of 43.02.02 engineered so the first derivative of the iteration map vanishes, and the order theorem of that unit then forces order $2$ — this is exactly the flat-staircase case promoted from a curiosity to the design principle. The error law $e_{n + 1} \approx \frac{f ^{''} ( x ^{*} )}{2 f ^{'} ( x ^{*} )} e_{n}^{2}$ is the central insight that makes the digit-doubling precise, and it builds toward the systems theorem where the scalar $f^{''} /2 f^{'}$ becomes a Jacobian-Lipschitz constant. The simple-root hypothesis $f^{'} (x^{*}) \neq = 0$ is dual to the inverse-function nonsingularity $det J (x^{*}) \neq = 0$ 02.05.04, which is why the same proof generalises to $R^{d}$ , and the quadratic law appears again in 43.09.03 indirectly through Newton-based nonlinear solvers. Putting these together, Newton sits atop the order hierarchy bisection 43.02.01 anchors at the bottom, and the contrast — guaranteed linear halving versus local quadratic speed — is the comparison the chapter is built to make.

Exercises Intermediate+

Exercise 5 (medium, symbolic).

At a root $x^{*}$ of multiplicity $m \geq 2$ (so $f (x) = (x - x^{*})^{m} h (x)$ with $h (x^{*}) \neq = 0$ ), show the Newton map satisfies $g^{'} (x^{*}) = 1 - 1/ m$ , so Newton converges only linearly there.

Hint

Near $x^{*}$ , $f / f^{'} \approx (x - x^{*}) / m$ for a pure power; compute $g (x) = x - f / f^{'}$ and its derivative at $x^{*}$ , or use the limit of $f (x) f^{''} (x) / f^{'} (x)^{2}$ .

Answer

For $f (x) = (x - x^{*})^{m} h (x)$ with $h (x^{*}) \neq = 0$ , near $x^{*}$ the leading behaviour is $f \sim (x - x^{*})^{m} h (x^{*})$ and $f^{'} \sim m (x - x^{*})^{m - 1} h (x^{*})$ , so $f / f^{'} \to (x - x^{*}) / m$ . Then $g (x) = x - f / f^{'}$ behaves like $x - (x - x^{*}) / m = x^{*} + (x - x^{*}) (1 - 1/ m)$ , an affine map with slope $1 - 1/ m$ at $x^{*}$ ; hence $g^{'} (x^{*}) = 1 - 1/ m$ . Since $m \geq 2$ gives $0 < 1 - 1/ m < 1$ , the fixed-point theory 43.02.02 yields linear convergence with rate $1 - 1/ m$ , not quadratic. The modified iteration $x_{n + 1} = x_{n} - m f (x_{n}) / f^{'} (x_{n})$ restores $g^{'} (x^{*}) = 0$ and quadratic order. Full credit for $g^{'} (x^{*}) = 1 - 1/ m$ and the linear-rate conclusion.

Exercise 6 (hard, symbolic).

Derive the secant method's order $φ = (1 + 5) /2$ from the error recurrence $e_{n + 1} = C e_{n} e_{n - 1} (1 + o (1))$ with $C \neq = 0$ .

Hint

Posit $∣ e_{n + 1} ∣ \approx K ∣ e_{n} ∣^{p}$ . Substitute into the recurrence and match exponents of $∣ e_{n} ∣$ ; you will get $p^{2} = p + 1$ .

Answer

Suppose the method has order $p$ : $∣ e_{n + 1} ∣ \approx K ∣ e_{n} ∣^{p}$ for large $n$ and some $K > 0$ . Then also $∣ e_{n} ∣ \approx K ∣ e_{n - 1} ∣^{p}$ , so $∣ e_{n - 1} ∣ \approx (∣ e_{n} ∣/ K)^{1/ p}$ . Substituting into the recurrence $∣ e_{n + 1} ∣ \approx ∣ C ∣ ∣ e_{n} ∣ ∣ e_{n - 1} ∣$ , $$ K|e_n|^p \approx |C|,|e_n|,\big(|e_n|/K\big)^{1/p} = |C|K^{-1/p},|e_n|^{,1 + 1/p}. $$ For this to hold as $∣ e_{n} ∣ \to 0$ the exponents of $∣ e_{n} ∣$ must match: $p = 1 + 1/ p$ , i.e. $p^{2} = p + 1$ . The positive root is $p = (1 + 5) /2 = φ \approx 1.618$ . (Matching the constants then fixes $K = ∣ C ∣^{1/ (p + 1)} = ∣ C ∣^{1/ p}$ .) So the secant method is superlinear of order the golden ratio — below Newton's $2$ because it uses a one-step-stale chord slope rather than the exact derivative, but above any linear method, while costing only one new function evaluation per step. Full credit for the exponent equation $p^{2} = p + 1$ and its positive root.

Exercise 7 (hard, symbolic).

For Newton on a system $F : R^{d} \to R^{d}$ with $F (x^{*}) = 0$ , $J (x^{*})$ invertible, and $J$ Lipschitz with constant $L$ near $x^{*}$ , show $∥ e_{k + 1} ∥ \leq β L ∥ e_{k} ∥^{2}$ locally for a constant $β$ depending on $∥ J (x^{*})^{- 1} ∥$ , giving quadratic convergence.

Hint

Use $e_{k + 1} = e_{k} - J (x_{k})^{- 1} F (x_{k})$ and the integral Taylor remainder $F (x^{*}) - F (x_{k}) - J (x_{k}) e_{k}^{-} = \int_{0}^{1} (J (x_{k} + t (x^{*} - x_{k})) - J (x_{k})) (x^{*} - x_{k}) d t$ .

Answer

Write $e_{k} = x_{k} - x^{*}$ . From the iteration, $e_{k + 1} = e_{k} - J (x_{k})^{- 1} F (x_{k}) = J (x_{k})^{- 1} (J (x_{k}) e_{k} - F (x_{k}))$ , using $F (x^{*}) = 0$ . The multivariable Taylor expansion with integral remainder 02.05.05 gives $$ \mathbf 0 = F(\mathbf x^) = F(\mathbf x_k) - J(\mathbf x_k)\mathbf e_k + \int_0^1 \big(J(\mathbf x_k - t\mathbf e_k) - J(\mathbf x_k)\big)(-\mathbf e_k),dt, $$ so $J (x_{k}) e_{k} - F (x_{k}) = \int_{0}^{1} (J (x_{k} - t e_{k}) - J (x_{k})) e_{k} d t$ . The Lipschitz bound makes the integrand at most $L t ∥ e_{k} ∥ \cdot ∥ e_{k} ∥$ , so the integral has norm $\leq \frac{1}{2} L ∥ e_{k} ∥^{2}$ . Near $\mathbf x^ $,$ |J(\mathbf x_k)^{-1}| \le 2|J(\mathbf x^)^{-1}| =: 2\gamma$ by continuity of the inverse. Therefore $$ |\mathbf e_{k+1}| \le |J(\mathbf x_k)^{-1}|\cdot\tfrac{1}{2}L|\mathbf e_k|^2 \le \gamma L,|\mathbf e_k|^2, $$ quadratic convergence with $\beta = \gamma = |J(\mathbf x^)^{-1}| $. T h e in v er t ibi l i t y o f$ J(\mathbf x^) $i s t h e in v er se - f u n c t i o nn o n s in g u l a r i t y [02.05.04] s t an d in g in f or t h esc a l a r$ f'(x^) \ne 0$. Full credit for the integral-remainder estimate and the resulting quadratic bound.

Advanced results Master

Newton's method is the fixed-point iteration engineered so that the first derivative of the iteration map vanishes, and the results below place its scalar quadratic law inside the sharper semilocal and higher-dimensional theory: the affine-invariant Kantorovich theorem that converts data at the starting point into a convergence guarantee, the precise order of the secant family, the systems theorem with the Jacobian, and the global structure that the local theorem ignores.

Theorem 1 (Newton as the order- $2$ fixed-point map). For $f \in C^{2}$ near a simple root $x^{*}$ , the Newton map $g (x) = x - f (x) / f^{'} (x)$ satisfies $g (x^{*}) = x^{*}$ , $g^{'} (x^{*}) = f (x^{*}) f^{''} (x^{*}) / f^{'} (x^{*})^{2} = 0$ , and $g^{''} (x^{*}) = f^{''} (x^{*}) / f^{'} (x^{*})$ . By the order-of-convergence theorem 43.02.02, the first non-vanishing derivative of $g$ at $x^{*}$ has index $2$ , so the iteration has order exactly $2$ with asymptotic error constant $∣ g^{''} (x^{*}) ∣/2! = ∣ f^{''} (x^{*}) / (2 f^{'} (x^{*})) ∣$ . Newton is thus not a method apart from fixed-point iteration but its order- $2$ instance, obtained by choosing the multiplier $λ (x) = 1/ f^{'} (x)$ in the reformulation $g (x) = x - λ (x) f (x)$ so that $g^{'} (x^{*}) = 1 - λ (x^{*}) f^{'} (x^{*}) = 0$ ^{[Süli-Mayers §1.4]}.

Theorem 2 (the secant family and the golden-ratio order). The secant error obeys $e_{n + 1} = f [x_{n - 1}, x_{n}, x^{*}] / f [x_{n - 1}, x_{n}] e_{n} e_{n - 1}$ , where $f [\cdot]$ are divided differences; as $x_{n - 1}, x_{n} \to x^{*}$ the prefactor tends to $f^{''} (x^{*}) / (2 f^{'} (x^{*}))$ , so $e_{n + 1} \approx C e_{n} e_{n - 1}$ with $C = f^{''} (x^{*}) / (2 f^{'} (x^{*}))$ . The ansatz $∣ e_{n + 1} ∣ \sim K ∣ e_{n} ∣^{p}$ forces $p^{2} = p + 1$ , whose positive root $φ = (1 + 5) /2$ is the order. The asymptotic relation $∣ e_{n + 1} ∣/∣ e_{n} ∣^{φ} \to ∣ C ∣^{1/ φ}$ records the superlinear rate. Per function evaluation the secant method is often more efficient than Newton: two secant steps cost two evaluations of $f$ and reduce the error by the factor of order $φ^{2} = φ + 1 \approx 2.618 > 2$ in the exponent, beating Newton's two evaluations of $f$ and $f^{'}$ at one quadratic step when $f^{'}$ is expensive ^{[Atkinson §2.4]}.

Theorem 3 (Newton for systems; the Newton-Kantorovich theorem). Let $F : Ω \subseteq R^{d} \to R^{d}$ be $C^{1}$ with $J = F^{'}$ Lipschitz of constant $L$ on a ball, and let $x_{0}$ satisfy $∥ J (x_{0})^{- 1} ∥ \leq β$ , $∥ J (x_{0})^{- 1} F (x_{0}) ∥ \leq η$ , with $h = β L η \leq \frac{1}{2}$ . Then the Newton iterates $x_{k + 1} = x_{k} - J (x_{k})^{- 1} F (x_{k})$ stay in the ball, converge to a root $x^{*}$ , and converge quadratically when $h < \frac{1}{2}$ , with $∥ x_{k} - x^{*} ∥ \leq (2 η) (2 h)^{2^{k} - 1} / 2^{k}$ . This is a semilocal theorem — its hypotheses are quantities computable at $x_{0}$ , not at the unknown root — and it is affine invariant: replacing $F$ by $A F$ for invertible $A$ leaves both the iteration and the hypothesis $h = β L η$ unchanged, since $J^{- 1} F$ is unaltered. The invertibility of $J$ along the iterates is the inverse-function-theorem nonsingularity 02.05.04 that makes each linearised solve well posed ^{[Ortega-Rheinboldt §12.6]}.

Theorem 4 (global structure: basins, overshoot, and complex dynamics). The local theorem says nothing about distant starts. Globally Newton's method is the discrete dynamical system $x \mapsto g (x)$ , and for $f$ with several roots the plane (or line) partitions into basins of attraction, one per root, whose boundaries are generically fractal: for a polynomial with three or more roots in $C$ , the Newton map is a rational function whose Julia set is the common basin boundary, the historical seed of complex dynamics. On the line, overshoot ( $f^{'}$ small) can throw an iterate far from any root, attracting periodic cycles can trap it (as $x^{3} - 2 x + 2$ from $x_{0} = 0$ ), and a near-horizontal tangent can send $x_{n + 1}$ to numerical infinity. Globalisation strategies — damped Newton $x_{n + 1} = x_{n} - t_{n} f (x_{n}) / f^{'} (x_{n})$ with a line-search step $t_{n} \in (0, 1]$ , trust regions, and homotopy/continuation — restore reliability while preserving the local quadratic phase once the iterates enter the basin ^{[Deuflhard §3]}.

Synthesis. Newton's method is the foundational reason the order-of-convergence hierarchy has a fast top: it is exactly the fixed-point iteration 43.02.02 whose multiplier $λ = 1/ f^{'}$ is chosen to kill the first derivative of the iteration map, and the order theorem then forces quadratic order — this is exactly the flat-staircase case of the previous unit promoted to a design principle. The central insight is the dichotomy this produces against bisection 43.02.01: bisection guarantees convergence and pays with a fixed linear rate $\frac{1}{2}$ , while Newton attains quadratic order and pays with the loss of any global guarantee, so robustness and speed are dual and the chapter's hybrids buy both by wrapping a Newton or secant step inside a bracket. The secant method is the derivative-free interpolation between them — order $φ$ , costing one evaluation, its golden-ratio exponent the root of $p^{2} = p + 1$ that records using a one-step-stale slope. Putting these together, the scalar quadratic law $e_{n + 1} \approx \frac{f ^{''} ( x ^{*} )}{2 f ^{'} ( x ^{*} )} e_{n}^{2}$ generalises through the multivariable Taylor expansion 02.05.05 to the systems theorem, where the simple-root condition $f^{'} (x^{*}) \neq = 0$ becomes the inverse-function nonsingularity $det J (x^{*}) \neq = 0$ 02.05.04, and the Kantorovich theorem turns starting-point data into a guarantee; the same Newton solve appears again wherever a nonlinear equation must be cracked at speed, the bridge from this elementary tangent rule to large-scale scientific computing.

Full proof set Master

Proposition 1 (local quadratic convergence of scalar Newton). If $f \in C^{2}$ near a simple root $x^ $, t h er e i s$ \delta > 0 $s u c h t ha tN e w t o n^{'} s i t er a t es f r o man y$ x_0 \in [x^-\delta, x^+\delta] $co n v er g e t o$ x^ $w i t h$ e_{n+1}/e_n^2 \to f''(x^)/(2f'(x^))$.

Proof. As in the Key theorem: $f^{'} (x^{*}) \neq = 0$ and continuity of $f^{'}$ give an interval $J_{0}$ on which $f^{'} \neq = 0$ , so $g = x - f / f^{'}$ is $C^{1}$ there with $g^{'} = f f^{''} / (f^{'})^{2}$ and $g^{'} (x^{*}) = 0$ . The local-convergence proposition for fixed-point iteration 43.02.02 gives a closed interval $J \subseteq J_{0}$ on which $g$ is a contraction into itself, hence convergence from any start in $J$ ; pick $δ$ with $[x^{*} - δ, x^{*} + δ] \subseteq J$ . Taylor's theorem with Lagrange remainder 02.05.02 gives $0 = f (x^{*}) = f (x_{n}) - f^{'} (x_{n}) e_{n} + \frac{1}{2} f^{''} (ξ_{n}) e_{n}^{2}$ with $ξ_{n}$ between $x_{n}$ and $x^{*}$ ; dividing by $f^{'} (x_{n})$ and using $e_{n + 1} = e_{n} - f (x_{n}) / f^{'} (x_{n})$ yields $e_{n + 1} = \frac{f ^{''} ( ξ _{n} )}{2 f ^{'} ( x _{n} )} e_{n}^{2}$ . Letting $n \to \infty$ , $ξ_{n} \to x^{*}$ and $f^{'} (x_{n}) \to f^{'} (x^{*})$ give the stated limit, order exactly $2$ . $□$

Proposition 2 (secant order is the golden ratio). Under $f \in C^{2}$ near a simple root, the secant error satisfies $e_{n+1} = \dfrac{f[x_{n-1},x_n,x^]}{f[x_{n-1},x_n]},e_n e_{n-1} $, an d t h eor d er o f co n v er g e n ce i s$ \varphi = (1+\sqrt5)/2$.*

Proof. The secant iterate is the root of the linear interpolant $p_{1}$ of $f$ at $x_{n - 1}, x_{n}$ . The interpolation error identity gives $f (x) - p_{1} (x) = f [x_{n - 1}, x_{n}, x] (x - x_{n - 1}) (x - x_{n})$ . Evaluate at $x = x^{*}$ : since $f (x^{*}) = 0$ and $p_{1} (x_{n + 1}) = 0$ , subtracting and using $p_{1} (x) = f (x_{n}) + f [x_{n - 1}, x_{n}] (x - x_{n})$ gives, after rearrangement, $x_{n + 1} - x^{*} = \frac{f [ x _{n - 1} , x _{n} , x ^{*} ]}{f [ x _{n - 1} , x _{n} ]} (x_{n} - x^{*}) (x_{n - 1} - x^{*})$ , i.e. $e_{n + 1} = \frac{f [ x _{n - 1} , x _{n} , x ^{*} ]}{f [ x _{n - 1} , x _{n} ]} e_{n} e_{n - 1}$ . By the mean-value property of divided differences $f [x_{n - 1}, x_{n}] \to f^{'} (x^{*}) \neq = 0$ and $f [x_{n - 1}, x_{n}, x^{*}] \to \frac{1}{2} f^{''} (x^{*})$ , so $e_{n + 1} \approx C e_{n} e_{n - 1}$ with $C = f^{''} (x^{*}) / (2 f^{'} (x^{*}))$ . Writing $∣ e_{n + 1} ∣ \sim K ∣ e_{n} ∣^{p}$ and substituting $∣ e_{n - 1} ∣ \sim (∣ e_{n} ∣/ K)^{1/ p}$ forces the exponent equation $p = 1 + 1/ p$ , i.e. $p^{2} - p - 1 = 0$ , whose positive root is $φ = (1 + 5) /2$ . $□$

Proposition 3 (quadratic convergence of Newton for systems). Let $F : Ω \to R^{d}$ be $C^{1}$ with $J(\mathbf x^) $in v er t ib l e an d$ J $L i p sc hi t z o f co n s t an t$ L $n e a r$ \mathbf x^ $. T h e n t h er e i s aba l l ab o u t$ \mathbf x^ $f r o m w hi c h t h e i t er a t es$ \mathbf x_{k+1} = \mathbf x_k - J(\mathbf x_k)^{-1}F(\mathbf x_k) $co n v er g e w i t h$ |\mathbf e_{k+1}| \le \gamma L|\mathbf e_k|^2 $,$ \gamma = |J(\mathbf x^)^{-1}|$.

Proof. By continuity of $x \mapsto J (x)$ and of matrix inversion, there is a ball $B$ about $x^{*}$ on which $J (x)$ is invertible with $∥ J (x)^{- 1} ∥ \leq 2 γ$ . For $x_{k} \in B$ , the integral form of the multivariable Taylor remainder 02.05.05 gives $$ F(\mathbf x_k) = F(\mathbf x_k) - F(\mathbf x^) = \int_0^1 J(\mathbf x^ + s,\mathbf e_k),\mathbf e_k,ds, $$ so $J (x_{k}) e_{k} - F (x_{k}) = \int_{0}^{1} (J (x_{k}) - J (x^{*} + s e_{k})) e_{k} d s$ , whose norm is at most $\int_{0}^{1} L (1 - s) ∥ e_{k} ∥ \cdot ∥ e_{k} ∥ d s = \frac{1}{2} L ∥ e_{k} ∥^{2}$ by the Lipschitz bound (the displacement from $x_{k}$ to $x^{*} + s e_{k}$ has length $(1 - s) ∥ e_{k} ∥$ ). Since $e_{k + 1} = J (x_{k})^{- 1} (J (x_{k}) e_{k} - F (x_{k}))$ , $$ |\mathbf e_{k+1}| \le |J(\mathbf x_k)^{-1}|\cdot\tfrac12 L|\mathbf e_k|^2 \le \gamma L|\mathbf e_k|^2. $$ Shrinking $B$ so that $γ L ∥ e_{k} ∥ < 1$ makes $∥ e_{k} ∥$ strictly decrease, giving convergence; the bound is the quadratic law. $□$

Proposition 4 (Newton drops to linear order at a multiple root). If $x^ $i s a r oo t o f m u l t i pl i c i t y$ m \ge 2 $o f$ f $(so$ f = (x-x^)^m h $,$ h(x^) \ne 0 $,$ h $s m oo t h), N e w t o n^{'} s m e t h o d co n v er g es l in e a r l y w i t h$ g'(x^) = 1 - 1/m $, an d t h e m o d i f i e d i t er a t i o n$ x_{n+1} = x_n - m f(x_n)/f'(x_n)$ restores quadratic order.

Proof. Write $u (x) = x - x^{*}$ . Then $f = u^{m} h$ , $f^{'} = u^{m - 1} (mh + u h^{'})$ , so $f / f^{'} = u h / (mh + u h^{'})$ , which is $C^{1}$ near $x^{*}$ with value $0$ and derivative at $x^{*}$ equal to $h (x^{*}) / (mh (x^{*})) = 1/ m$ (differentiate $u h / (mh + u h^{'})$ and evaluate at $u = 0$ ). Hence $g = x - f / f^{'}$ has $g^{'} (x^{*}) = 1 - 1/ m \in (0, 1)$ for $m \geq 2$ : linear convergence with that rate by the fixed-point order theory 43.02.02. For the modified map $g_{m} = x - m f / f^{'}$ , $g_{m}^{'} (x^{*}) = 1 - m \cdot (1/ m) = 0$ , so its first non-vanishing derivative has index $\geq 2$ and the order returns to quadratic. $□$

Connections Master

Fixed-point iteration 43.02.02 supplies the order-of-convergence machinery this unit instantiates: Newton's method is precisely the iteration $x_{n + 1} = g (x_{n})$ for $g (x) = x - f (x) / f^{'} (x)$ , and the computation $g^{'} (x^{*}) = f (x^{*}) f^{''} (x^{*}) / f^{'} (x^{*})^{2} = 0$ at a simple root, combined with that unit's theorem that the order equals the index of the first non-vanishing derivative of $g$ , is what makes Newton quadratic. The secant method is the derivative-free quasi-fixed-point variant whose order $φ$ is the golden ratio; Steffensen's method from that unit is the alternative derivative-free route to quadratic order.
The bisection method 43.02.01 anchors the bottom of the order hierarchy at guaranteed linear rate $\frac{1}{2}$ , and this unit anchors the top at conditional quadratic order. The contrast is the organising comparison of the chapter: bisection trades speed for an unconditional bracket guarantee, Newton trades the guarantee for quadratic speed, and the practical hybrids (Dekker, Brent) wrap a Newton or secant step inside a maintained bracket so that the fast step is taken when it lands inside and a bisection step is the fallback otherwise — combining this unit's speed with that unit's safety.
The inverse and implicit function theorems 02.05.04 provide the nonsingularity hypothesis that lifts Newton to systems: the condition $det J (x^{*}) \neq = 0$ is exactly the invertibility that makes the local solution branch of $F (x) = 0$ well defined and differentiable, the multivariable analogue of the simple-root condition $f^{'} (x^{*}) \neq = 0$ . Multivariable Taylor expansion 02.05.05 supplies the integral remainder that turns the Jacobian-Lipschitz bound into the quadratic error law, so the systems theorem is the scalar proof re-run with $f^{''} /2 f^{'}$ replaced by a Jacobian-Lipschitz constant.
Conditioning and floating-point arithmetic 43.01.01 floor the attainable accuracy of Newton just as they do bisection: the quadratic law means the error squares each step until it reaches the level of rounding error in evaluating $f$ and $f^{'}$ , after which the iterates stagnate at a residual scaled by the conditioning $1/∣ f^{'} (x^{*}) ∣$ of the root. The quadratic convergence makes Newton reach that floor in a handful of steps, so in practice the limiting factor is precision, not iteration count — the forward cross-reference to that chapter's accuracy analysis.

Historical & philosophical context Master

The tangent-line root-finder descends from Isaac Newton's De analysi (written 1669, published 1711) and Method of Fluxions, where he solved polynomial and transcendental equations by successive linear corrections, though his presentation was a sequence of algebraic substitutions rather than the derivative-based iteration taught today. Joseph Raphson's 1690 Analysis aequationum universalis gave the iteration in a form closer to the modern $x_{n + 1} = x_{n} - f (x_{n}) / f^{'} (x_{n})$ , which is why the scalar method carries both names ^{[Raphson 1690]}. Thomas Simpson in 1740 cast it in fluxional (derivative) language and extended it to systems of two equations, the first appearance of the Jacobian idea. The recognition that the method is the fixed-point iteration with a vanishing iteration-map derivative, and hence quadratic, belongs to the twentieth-century numerical-analysis synthesis.

The semilocal convergence theory — turning data computable at the starting point into a convergence and error guarantee — was established by Leonid Kantorovich in 1948 ^{[Kantorovich 1948]}, whose theorem in Banach space subsumes the scalar and finite-dimensional cases and underlies the affine-invariant treatment of Deuflhard and others. Süli and Mayers present Newton as the order- $2$ realisation of their fixed-point framework with the secant method as the order- $φ$ derivative-free variant ^{[Süli-Mayers §1.4-1.5]}; the multivariable theory with the Jacobian and the $Q$ -order refinement received its definitive treatment from Ortega and Rheinboldt in 1970 ^{[Ortega-Rheinboldt §10.2]}. The fractal basin structure of Newton's method for complex polynomials, observed by Cayley in 1879 when he could solve the quadratic case but not the cubic, became the historical entry point to complex dynamics.

Bibliography Master

@book{sulimayers2003,
  author    = {S\"{u}li, Endre and Mayers, David F.},
  title     = {An Introduction to Numerical Analysis},
  publisher = {Cambridge University Press},
  year      = {2003}
}

@book{atkinson1989,
  author    = {Atkinson, Kendall E.},
  title     = {An Introduction to Numerical Analysis},
  edition   = {2},
  publisher = {John Wiley \& Sons},
  year      = {1989}
}

@book{ortegarheinboldt2000,
  author    = {Ortega, James M. and Rheinboldt, Werner C.},
  title     = {Iterative Solution of Nonlinear Equations in Several Variables},
  series    = {Classics in Applied Mathematics},
  volume    = {30},
  publisher = {SIAM},
  year      = {2000},
  note      = {Reprint of the 1970 Academic Press edition}
}

@book{deuflhard2004,
  author    = {Deuflhard, Peter},
  title     = {Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms},
  series    = {Springer Series in Computational Mathematics},
  volume    = {35},
  publisher = {Springer},
  year      = {2004}
}

@book{stoerbulirsch2002,
  author    = {Stoer, Josef and Bulirsch, Roland},
  title     = {Introduction to Numerical Analysis},
  edition   = {3},
  series    = {Texts in Applied Mathematics},
  volume    = {12},
  publisher = {Springer},
  year      = {2002}
}

@article{kantorovich1948,
  author  = {Kantorovich, Leonid V.},
  title   = {Functional analysis and applied mathematics},
  journal = {Uspekhi Matematicheskikh Nauk},
  volume  = {3},
  number  = {6},
  year    = {1948},
  pages   = {89--185}
}

@book{raphson1690,
  author    = {Raphson, Joseph},
  title     = {Analysis aequationum universalis},
  publisher = {Thomas Braddyll},
  address   = {London},
  year      = {1690}
}

Prerequisites

02.05.02
02.05.04
02.05.05
43.02.02

Tier anchors

beginner: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §1.4-1.5 (Newton's method as the tangent-line iteration, the secant method as the chord variant, the doubling of correct digits); Burden & Faires 2011 *Numerical Analysis* 9e (Brooks/Cole) §2.3-2.4 for the worked Newton and secant tables
intermediate: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §1.4-1.5 (the local quadratic-convergence theorem via Taylor with remainder under a simple root and $f\in C^2$, the basin/overshoot failure modes, and the secant method's order $\varphi=(1+\sqrt5)/2$); Atkinson 1989 *An Introduction to Numerical Analysis* 2e (Wiley) §2.2-2.5; Stoer & Bulirsch 2002 *Introduction to Numerical Analysis* 3e (Springer) §5.3-5.4
master: Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (Newton as the fixed-point map with $g'(x^*)=0$, the secant's irrational order, and Newton for systems); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.2, §12.6 (the Newton-Kantorovich theorem, affine-invariant convergence, and the $\mathbb{R}^n$ quadratic-convergence theorem with the Jacobian); Deuflhard 2004 *Newton Methods for Nonlinear Problems* (Springer) §1-2 (affine-invariant Lipschitz theory, global Newton, and continuation)

References

Süli, E. & Mayers, D. — An Introduction to Numerical Analysis · Cambridge University Press (2003). Chapter 1 'Solution of equations by iteration', §1.4-1.5: Newton's method $x_{n+1}=x_n-f(x_n)/f'(x_n)$ derived as the root of the tangent line / linearisation of $f$ at $x_n$; the local convergence theorem proving that for a simple root $x^*$ ($f(x^*)=0$, $f'(x^*)\ne0$) with $f\in C^2$ near $x^*$ there is a neighbourhood from which the iterates converge quadratically with $e_{n+1}\to \frac{f''(x^*)}{2f'(x^*)}e_n^2$; Newton read as the fixed-point iteration of $g(x)=x-f(x)/f'(x)$ with $g'(x^*)=0$ (hence order $2$ by the order-of-convergence theorem of §1.3); the failure modes (a vanishing or near-vanishing derivative causing overshoot, non-convergence from a poor start, the basin-of-attraction picture, and the drop to linear convergence at a multiple root); and the secant method $x_{n+1}=x_n-f(x_n)(x_n-x_{n-1})/(f(x_n)-f(x_{n-1}))$ as the derivative-free finite-difference variant whose error obeys $e_{n+1}\approx C e_n e_{n-1}$, giving superlinear order $\varphi=(1+\sqrt5)/2\approx1.618$.
Atkinson, K. E. — An Introduction to Numerical Analysis · Wiley, 2nd edition (1989). §2.2-2.5 present Newton's method and the secant method, the quadratic-convergence proof for Newton at a simple root via Taylor expansion of $f$ about the root, the error relation $e_{n+1}=-\frac{f''(\xi_n)}{2f'(x_n)}e_n^2$; the secant method's order as the positive root of $t^2=t+1$, namely $\varphi=(1+\sqrt5)/2$, derived from the divided-difference error recurrence $e_{n+1}=f[x_n,x_{n-1},x^*]/f[x_n,x_{n-1}]\,e_ne_{n-1}$; and the comparison of cost-per-step (Newton needs $f$ and $f'$; the secant needs only $f$ and reuses the previous value), with the secant method often more efficient per function evaluation.
Ortega, J. M. & Rheinboldt, W. C. — Iterative Solution of Nonlinear Equations in Several Variables · SIAM Classics in Applied Mathematics 30 (2000 reprint of the 1970 Academic Press edition). §10.2 develops Newton's method for systems $F(\mathbf x)=\mathbf 0$ with $F:\mathbb{R}^n\to\mathbb{R}^n$ via the Jacobian $J=F'$: $\mathbf x_{k+1}=\mathbf x_k-J(\mathbf x_k)^{-1}F(\mathbf x_k)$, with the local quadratic-convergence theorem under invertibility of $J(\mathbf x^*)$ and a Lipschitz condition on $J$; §12.6 gives the Newton-Kantorovich theorem (semilocal convergence from computable data at the starting point) and the affine-invariant formulation. The inverse-function-theorem hypothesis $\det J(\mathbf x^*)\ne0$ is exactly the nonsingularity that makes the local solution branch and its differentiable dependence well defined, the multivariable analogue of the simple-root condition $f'(x^*)\ne0$.

Estimated time

beginner: 20m
intermediate: 50m
master: 90m