43.02.03 · numerical-analysis / nonlinear-equations

Newton's Method and the Secant Method: Superlinear and Quadratic Convergence

shipped3 tiersLean: none

Anchor (Master): Süli & Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) Ch. 1 (Newton as the fixed-point map with $g'(x^*)=0$, the secant's irrational order, and Newton for systems); Ortega & Rheinboldt 2000 *Iterative Solution of Nonlinear Equations in Several Variables* (SIAM, reprint) §10.2, §12.6 (the Newton-Kantorovich theorem, affine-invariant convergence, and the $\mathbb{R}^n$ quadratic-convergence theorem with the Jacobian); Deuflhard 2004 *Newton Methods for Nonlinear Problems* (Springer) §1-2 (affine-invariant Lipschitz theory, global Newton, and continuation)

Intuition Beginner

The previous units solved by trapping a root in a bracket and shrinking it, or by rewriting the problem as and iterating. Both are reliable, but the bracketing method crawls. There is a far faster idea hiding in the picture of the curve itself: instead of just checking signs, use the slope of the curve to aim your next guess straight at the root.

Here is the picture. Stand at your current guess on the curve . Draw the tangent line — the straight line that just grazes the curve there, matching its height and its slope. A straight line is something you can solve exactly: find where the tangent crosses the horizontal axis, and make that crossing your next guess. The tangent is a stand-in for the curve, and near the root the stand-in is so good that each step roughly doubles the number of correct digits. This is Newton's method.

Drawing a tangent needs the slope, and sometimes you do not have a formula for it. The fix is to use a nearby chord instead of the tangent: take your last two guesses, draw the straight line through those two points on the curve, and follow it to the axis. That is the secant method. It is almost as fast as Newton and never asks for the slope directly.

Speed has a price. Bisection could never fail once it had a bracket. Newton and the secant method are fast only when you start close enough; from a bad start they can overshoot, wander, or fly off entirely. The chapter's theme is exactly this trade: guaranteed-but-slow against fast-but-fragile.

Visual Beginner

The picture for Newton's method is the tangent staircase. Draw the curve and the horizontal axis. From your guess , go up (or down) to the curve, draw the tangent there, and slide along it to where it hits the axis: that landing point is . Repeat, and the landing points race toward the root where the curve crosses the axis.

The secant method replaces each tangent with the straight line through the two most recent points on the curve. You need two starting guesses instead of one, but you never compute a slope — the chord supplies it.

method what line is used what it needs speed near the root
bisection none (just signs) sign of one digit per step
secant chord through last two points only digits grow by about each step
Newton tangent at current point and its slope digits roughly double each step

The bottom two rows are the fast methods. The doubling of digits in Newton's row is what "quadratic convergence" means, and it traces back to the flat-derivative case of the fixed-point staircase from the previous unit.

Worked example Beginner

Let us find the square root of , about , as a root of by Newton's method. The slope is , so the tangent-crossing rule $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)} = x_n - \frac{x_n^2 - 2}{2x_n} $$ simplifies to the averaging rule — the very map the previous unit found converged fast. Start from .

Step 1. . Already the error is .

Step 2. The error has dropped to about .

Step 3. The error is now about .

Step 4. Read off the accuracy. From to the number of correct digits went roughly — each step about doubled it.

What this tells us: Newton's tangent rule turned a slow problem into a three-step one, and the doubling of correct digits per step is the hallmark of the method. Bisection needed about fifty steps to reach this precision; Newton reaches it in four. The catch, which the formal sections make precise, is that this race only happens once the guesses are close enough to the root.

Check your understanding Beginner

Formal definition Intermediate+

Let be differentiable on an open interval containing a root with . The root is simple if . Newton's method (the Newton-Raphson iteration) generates iterates from a start by $$ x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}, \qquad n = 0, 1, 2, \dots, $$ defined wherever . Geometrically is the root of the affine model , the tangent line — the first-order Taylor polynomial of at 02.05.02. Newton's method is the fixed-point iteration 43.02.02 for the Newton map $$ g(x) = x - \frac{f(x)}{f'(x)}. $$ Write for the error. The symbols (root), (error), (Newton/iteration map), (Jacobian), and (vector field) are recorded in _meta/NOTATION.md.

The secant method replaces the derivative by the slope of the chord through the two most recent iterates: given , $$ x_{n+1} = x_n - f(x_n),\frac{x_n - x_{n-1}}{f(x_n) - f(x_{n-1})}, \qquad n = 1, 2, \dots $$ It uses two starting points and a single new evaluation of per step (the previous value is reused); it needs no .

For a system with and Jacobian , Newton's method for systems is $$ \mathbf x_{k+1} = \mathbf x_k - J(\mathbf x_k)^{-1} F(\mathbf x_k), $$ each step solving the linear system for the update . The simple-root condition becomes invertibility of , the nonsingularity hypothesis of the inverse function theorem 02.05.04 that makes the local solution and its differentiable dependence well defined; the multivariable Taylor expansion of 02.05.05 supplies the error analysis.

Recall the order of convergence 43.02.02: iterates converge with order if for a constant (with when ). Newton attains (quadratic); the secant attains (superlinear, between linear and quadratic).

Counterexamples to common slips Intermediate+

  • "Newton's method always converges." It converges quadratically only from a start sufficiently close to a simple root. For with root , starting too far out makes the tangent overshoot so badly that and the iterates diverge; for , the start produces the -cycle and never reaches the root.

  • "Newton needs a bracket like bisection." Newton uses no bracket and gives no sign guarantee; a single starting point and the derivative are its only inputs. This is the source of both its speed and its fragility — there is no interval keeping it honest.

  • "Newton is quadratic for every root." At a multiple root, where , the Newton map has for a root of multiplicity , so convergence drops to linear with rate . Quadratic order requires a simple root; the modified iteration restores it when is known.

  • "The secant method is linear because it lacks the derivative." Its order is , strictly above : superlinear. The chord slope converges to as the iterates close in, so the secant step approaches the Newton step, and the order falls just short of only because it uses a one-step-stale slope.

Key theorem with proof Intermediate+

The signature result is the local quadratic convergence of Newton's method at a simple root, because it is the single statement that certifies the method's defining feature — the doubling of correct digits — and exhibits Newton as the realisation of order in the fixed-point framework 43.02.02.

Theorem (local quadratic convergence of Newton's method). Let on an open interval containing a simple root $x^f(x^) = 0f'(x^) \ne 0\delta > 0x_0|x_0 - x^| \le \deltax^$, and satisfy* $$ \lim_{n\to\infty} \frac{e_{n+1}}{e_n^{,2}} = \frac{f''(x^)}{2,f'(x^)}, \qquad e_n = x_n - x^*, $$ so the convergence is quadratic.

Proof. Since is continuous and , choose with on ; the Newton map is then on . Differentiating, $$ g'(x) = 1 - \frac{f'(x)^2 - f(x)f''(x)}{f'(x)^2} = \frac{f(x),f''(x)}{f'(x)^2}. $$ At the root , so . By the local-convergence proposition for fixed-point iteration 43.02.02, with gives a closed interval about on which is a contraction mapping into itself, so the iterates converge to from any start in . Take with .

For the rate, expand at by Taylor's theorem with Lagrange remainder 02.05.02, evaluated at the root : there is between and with $$ 0 = f(x^) = f(x_n) + f'(x_n)(x^ - x_n) + \tfrac{1}{2}f''(\xi_n)(x^* - x_n)^2. $$ Divide by (nonzero on ) and rearrange, using : $$ \frac{f(x_n)}{f'(x_n)} - e_n = -\tfrac{1}{2}\frac{f''(\xi_n)}{f'(x_n)}e_n^2. $$ The left side is , so $$ e_{n+1} = \frac{f''(\xi_n)}{2,f'(x_n)},e_n^{,2}. $$ As , forces and ; continuity of and gives . The limit is finite and the order is exactly .

Bridge. This theorem is the foundational reason Newton's method is the fast end of the nonlinear-equations chapter: the computation shows Newton is exactly the fixed-point iteration of 43.02.02 engineered so the first derivative of the iteration map vanishes, and the order theorem of that unit then forces order — this is exactly the flat-staircase case promoted from a curiosity to the design principle. The error law is the central insight that makes the digit-doubling precise, and it builds toward the systems theorem where the scalar becomes a Jacobian-Lipschitz constant. The simple-root hypothesis is dual to the inverse-function nonsingularity 02.05.04, which is why the same proof generalises to , and the quadratic law appears again in 43.09.03 indirectly through Newton-based nonlinear solvers. Putting these together, Newton sits atop the order hierarchy bisection 43.02.01 anchors at the bottom, and the contrast — guaranteed linear halving versus local quadratic speed — is the comparison the chapter is built to make.

Exercises Intermediate+

Advanced results Master

Newton's method is the fixed-point iteration engineered so that the first derivative of the iteration map vanishes, and the results below place its scalar quadratic law inside the sharper semilocal and higher-dimensional theory: the affine-invariant Kantorovich theorem that converts data at the starting point into a convergence guarantee, the precise order of the secant family, the systems theorem with the Jacobian, and the global structure that the local theorem ignores.

Theorem 1 (Newton as the order- fixed-point map). For near a simple root , the Newton map satisfies , , and . By the order-of-convergence theorem 43.02.02, the first non-vanishing derivative of at has index , so the iteration has order exactly with asymptotic error constant . Newton is thus not a method apart from fixed-point iteration but its order- instance, obtained by choosing the multiplier in the reformulation so that [Süli-Mayers §1.4].

Theorem 2 (the secant family and the golden-ratio order). The secant error obeys , where are divided differences; as the prefactor tends to , so with . The ansatz forces , whose positive root is the order. The asymptotic relation records the superlinear rate. Per function evaluation the secant method is often more efficient than Newton: two secant steps cost two evaluations of and reduce the error by the factor of order in the exponent, beating Newton's two evaluations of and at one quadratic step when is expensive [Atkinson §2.4].

Theorem 3 (Newton for systems; the Newton-Kantorovich theorem). Let be with Lipschitz of constant on a ball, and let satisfy , , with . Then the Newton iterates stay in the ball, converge to a root , and converge quadratically when , with . This is a semilocal theorem — its hypotheses are quantities computable at , not at the unknown root — and it is affine invariant: replacing by for invertible leaves both the iteration and the hypothesis unchanged, since is unaltered. The invertibility of along the iterates is the inverse-function-theorem nonsingularity 02.05.04 that makes each linearised solve well posed [Ortega-Rheinboldt §12.6].

Theorem 4 (global structure: basins, overshoot, and complex dynamics). The local theorem says nothing about distant starts. Globally Newton's method is the discrete dynamical system , and for with several roots the plane (or line) partitions into basins of attraction, one per root, whose boundaries are generically fractal: for a polynomial with three or more roots in , the Newton map is a rational function whose Julia set is the common basin boundary, the historical seed of complex dynamics. On the line, overshoot ( small) can throw an iterate far from any root, attracting periodic cycles can trap it (as from ), and a near-horizontal tangent can send to numerical infinity. Globalisation strategies — damped Newton with a line-search step , trust regions, and homotopy/continuation — restore reliability while preserving the local quadratic phase once the iterates enter the basin [Deuflhard §3].

Synthesis. Newton's method is the foundational reason the order-of-convergence hierarchy has a fast top: it is exactly the fixed-point iteration 43.02.02 whose multiplier is chosen to kill the first derivative of the iteration map, and the order theorem then forces quadratic order — this is exactly the flat-staircase case of the previous unit promoted to a design principle. The central insight is the dichotomy this produces against bisection 43.02.01: bisection guarantees convergence and pays with a fixed linear rate , while Newton attains quadratic order and pays with the loss of any global guarantee, so robustness and speed are dual and the chapter's hybrids buy both by wrapping a Newton or secant step inside a bracket. The secant method is the derivative-free interpolation between them — order , costing one evaluation, its golden-ratio exponent the root of that records using a one-step-stale slope. Putting these together, the scalar quadratic law generalises through the multivariable Taylor expansion 02.05.05 to the systems theorem, where the simple-root condition becomes the inverse-function nonsingularity 02.05.04, and the Kantorovich theorem turns starting-point data into a guarantee; the same Newton solve appears again wherever a nonlinear equation must be cracked at speed, the bridge from this elementary tangent rule to large-scale scientific computing.

Full proof set Master

Proposition 1 (local quadratic convergence of scalar Newton). If near a simple root $x^\delta > 0x_0 \in [x^-\delta, x^+\delta]x^e_{n+1}/e_n^2 \to f''(x^)/(2f'(x^))$.

Proof. As in the Key theorem: and continuity of give an interval on which , so is there with and . The local-convergence proposition for fixed-point iteration 43.02.02 gives a closed interval on which is a contraction into itself, hence convergence from any start in ; pick with . Taylor's theorem with Lagrange remainder 02.05.02 gives with between and ; dividing by and using yields . Letting , and give the stated limit, order exactly .

Proposition 2 (secant order is the golden ratio). Under near a simple root, the secant error satisfies $e_{n+1} = \dfrac{f[x_{n-1},x_n,x^]}{f[x_{n-1},x_n]},e_n e_{n-1}\varphi = (1+\sqrt5)/2$.*

Proof. The secant iterate is the root of the linear interpolant of at . The interpolation error identity gives . Evaluate at : since and , subtracting and using gives, after rearrangement, , i.e. . By the mean-value property of divided differences and , so with . Writing and substituting forces the exponent equation , i.e. , whose positive root is .

Proposition 3 (quadratic convergence of Newton for systems). Let be with $J(\mathbf x^)JL\mathbf x^\mathbf x^\mathbf x_{k+1} = \mathbf x_k - J(\mathbf x_k)^{-1}F(\mathbf x_k)|\mathbf e_{k+1}| \le \gamma L|\mathbf e_k|^2\gamma = |J(\mathbf x^)^{-1}|$.

Proof. By continuity of and of matrix inversion, there is a ball about on which is invertible with . For , the integral form of the multivariable Taylor remainder 02.05.05 gives $$ F(\mathbf x_k) = F(\mathbf x_k) - F(\mathbf x^) = \int_0^1 J(\mathbf x^ + s,\mathbf e_k),\mathbf e_k,ds, $$ so , whose norm is at most by the Lipschitz bound (the displacement from to has length ). Since , $$ |\mathbf e_{k+1}| \le |J(\mathbf x_k)^{-1}|\cdot\tfrac12 L|\mathbf e_k|^2 \le \gamma L|\mathbf e_k|^2. $$ Shrinking so that makes strictly decrease, giving convergence; the bound is the quadratic law.

Proposition 4 (Newton drops to linear order at a multiple root). If $x^m \ge 2ff = (x-x^)^m hh(x^) \ne 0hg'(x^) = 1 - 1/mx_{n+1} = x_n - m f(x_n)/f'(x_n)$ restores quadratic order.

Proof. Write . Then , , so , which is near with value and derivative at equal to (differentiate and evaluate at ). Hence has for : linear convergence with that rate by the fixed-point order theory 43.02.02. For the modified map , , so its first non-vanishing derivative has index and the order returns to quadratic.

Connections Master

  • Fixed-point iteration 43.02.02 supplies the order-of-convergence machinery this unit instantiates: Newton's method is precisely the iteration for , and the computation at a simple root, combined with that unit's theorem that the order equals the index of the first non-vanishing derivative of , is what makes Newton quadratic. The secant method is the derivative-free quasi-fixed-point variant whose order is the golden ratio; Steffensen's method from that unit is the alternative derivative-free route to quadratic order.

  • The bisection method 43.02.01 anchors the bottom of the order hierarchy at guaranteed linear rate , and this unit anchors the top at conditional quadratic order. The contrast is the organising comparison of the chapter: bisection trades speed for an unconditional bracket guarantee, Newton trades the guarantee for quadratic speed, and the practical hybrids (Dekker, Brent) wrap a Newton or secant step inside a maintained bracket so that the fast step is taken when it lands inside and a bisection step is the fallback otherwise — combining this unit's speed with that unit's safety.

  • The inverse and implicit function theorems 02.05.04 provide the nonsingularity hypothesis that lifts Newton to systems: the condition is exactly the invertibility that makes the local solution branch of well defined and differentiable, the multivariable analogue of the simple-root condition . Multivariable Taylor expansion 02.05.05 supplies the integral remainder that turns the Jacobian-Lipschitz bound into the quadratic error law, so the systems theorem is the scalar proof re-run with replaced by a Jacobian-Lipschitz constant.

  • Conditioning and floating-point arithmetic 43.01.01 floor the attainable accuracy of Newton just as they do bisection: the quadratic law means the error squares each step until it reaches the level of rounding error in evaluating and , after which the iterates stagnate at a residual scaled by the conditioning of the root. The quadratic convergence makes Newton reach that floor in a handful of steps, so in practice the limiting factor is precision, not iteration count — the forward cross-reference to that chapter's accuracy analysis.

Historical & philosophical context Master

The tangent-line root-finder descends from Isaac Newton's De analysi (written 1669, published 1711) and Method of Fluxions, where he solved polynomial and transcendental equations by successive linear corrections, though his presentation was a sequence of algebraic substitutions rather than the derivative-based iteration taught today. Joseph Raphson's 1690 Analysis aequationum universalis gave the iteration in a form closer to the modern , which is why the scalar method carries both names [Raphson 1690]. Thomas Simpson in 1740 cast it in fluxional (derivative) language and extended it to systems of two equations, the first appearance of the Jacobian idea. The recognition that the method is the fixed-point iteration with a vanishing iteration-map derivative, and hence quadratic, belongs to the twentieth-century numerical-analysis synthesis.

The semilocal convergence theory — turning data computable at the starting point into a convergence and error guarantee — was established by Leonid Kantorovich in 1948 [Kantorovich 1948], whose theorem in Banach space subsumes the scalar and finite-dimensional cases and underlies the affine-invariant treatment of Deuflhard and others. Süli and Mayers present Newton as the order- realisation of their fixed-point framework with the secant method as the order- derivative-free variant [Süli-Mayers §1.4-1.5]; the multivariable theory with the Jacobian and the -order refinement received its definitive treatment from Ortega and Rheinboldt in 1970 [Ortega-Rheinboldt §10.2]. The fractal basin structure of Newton's method for complex polynomials, observed by Cayley in 1879 when he could solve the quadratic case but not the cubic, became the historical entry point to complex dynamics.

Bibliography Master

@book{sulimayers2003,
  author    = {S\"{u}li, Endre and Mayers, David F.},
  title     = {An Introduction to Numerical Analysis},
  publisher = {Cambridge University Press},
  year      = {2003}
}

@book{atkinson1989,
  author    = {Atkinson, Kendall E.},
  title     = {An Introduction to Numerical Analysis},
  edition   = {2},
  publisher = {John Wiley \& Sons},
  year      = {1989}
}

@book{ortegarheinboldt2000,
  author    = {Ortega, James M. and Rheinboldt, Werner C.},
  title     = {Iterative Solution of Nonlinear Equations in Several Variables},
  series    = {Classics in Applied Mathematics},
  volume    = {30},
  publisher = {SIAM},
  year      = {2000},
  note      = {Reprint of the 1970 Academic Press edition}
}

@book{deuflhard2004,
  author    = {Deuflhard, Peter},
  title     = {Newton Methods for Nonlinear Problems: Affine Invariance and Adaptive Algorithms},
  series    = {Springer Series in Computational Mathematics},
  volume    = {35},
  publisher = {Springer},
  year      = {2004}
}

@book{stoerbulirsch2002,
  author    = {Stoer, Josef and Bulirsch, Roland},
  title     = {Introduction to Numerical Analysis},
  edition   = {3},
  series    = {Texts in Applied Mathematics},
  volume    = {12},
  publisher = {Springer},
  year      = {2002}
}

@article{kantorovich1948,
  author  = {Kantorovich, Leonid V.},
  title   = {Functional analysis and applied mathematics},
  journal = {Uspekhi Matematicheskikh Nauk},
  volume  = {3},
  number  = {6},
  year    = {1948},
  pages   = {89--185}
}

@book{raphson1690,
  author    = {Raphson, Joseph},
  title     = {Analysis aequationum universalis},
  publisher = {Thomas Braddyll},
  address   = {London},
  year      = {1690}
}