43.08.04 · numerical-analysis / 08-interpolation-approximation

Best uniform approximation, minimax, and Chebyshev polynomials

shipped3 tiersLean: none

Anchor (Master): Cheney 1966 *Introduction to Approximation Theory* (McGraw-Hill / AMS Chelsea reprint) Ch. 1-3 (existence, the Haar condition, equioscillation, uniqueness, the Remez algorithm); DeVore-Lorentz 1993 *Constructive Approximation* (Springer Grundlehren 303) Ch. 3 and Ch. 7 (the Chebyshev systems, strong uniqueness, the extremal theory of Chebyshev polynomials); Trefethen 2019 *Approximation Theory and Approximation Practice* extended ed. (SIAM) Ch. 9-10, 16 (near-best Chebyshev interpolation versus true minimax, the Remez algorithm in practice)

Intuition Beginner

In the previous unit you learned that an interpolating polynomial hits the data points exactly but can swing badly in between, and that you only get to control where the error lives by choosing where to sample. This unit asks a sharper question. Forget hitting any point exactly. Among all polynomials of a fixed degree, which one stays closest to a target function across the whole interval — measuring "closest" by the single worst gap anywhere?

This is the minimax idea: minimise the maximum error. You imagine sliding a polynomial of the allowed degree around until its largest deviation from the target, taken over the entire interval, is as small as it can be. The polynomial that wins this contest is the best uniform approximation. It is the right notion when you care about a guarantee — when you need every point to be within some tolerance, not just most points on average.

There is a strikingly clean signature of the winner. When a polynomial is the best one, its error curve does not just stay small; it touches its largest size again and again, flipping sign each time, like a guitar string vibrating between two rails. The error rides up to the maximum, comes down through zero, rides down to the negative of that maximum, comes back up, and so on. This back-and-forth touching is called equioscillation, and it is the fingerprint of the optimal fit.

Why must the best fit wobble like that? Picture a fit that touches its peak error only on one side, say it pokes up high but never dips correspondingly low. Then you could push the whole polynomial up a little and shave that peak without creating a worse one elsewhere. So a one-sided error is never optimal — only a fit balanced between equal-and-opposite extremes cannot be improved.

The hero polynomials of this story are the Chebyshev polynomials. They are built to oscillate between $- 1$ and $+ 1$ as evenly as possible, and that even oscillation makes them the answer to many minimax questions at once — including the choice of interpolation nodes that finally cures the Runge wobble from the last unit.

Visual Beginner

Picture the error curve of a candidate fit drawn across the interval. For a poor fit the curve has one tall spike and is small elsewhere. For the best fit the curve looks like an even ripple: it rises to the same height, falls to the same depth, rises again, touching the top and bottom rails several times with the sign flipping at each touch.

The table below contrasts the two error-measuring philosophies you have now met.

how you measure error	what the best fit does	named tool
worst gap anywhere (uniform)	error equioscillates between equal extremes	Chebyshev polynomials
total squared gap (least squares)	error is perpendicular to the fit space	orthogonal polynomials

The whole point of the top row is the alternation: a fit whose error reaches its maximum size at several points with flipping signs cannot be nudged to do better, and that is exactly what "best in the worst-case sense" means.

Worked example Beginner

Let us find the best constant — a degree-zero polynomial $p (x) = c$ — that approximates $f (x) = x$ on the interval from $0$ to $1$ , measuring by the worst gap.

Step 1. Write the error. The error is $f (x) - c = x - c$ . As $x$ runs from $0$ to $1$ , this runs from $- c$ up to $1 - c$ .

Step 2. Find the worst gap for a given $c$ . The largest size of $x - c$ over the interval is whichever is bigger: $∣ - c ∣ = c$ at the left end, or $∣1 - c ∣$ at the right end. The worst gap is the larger of $c$ and $1 - c$ .

Step 3. Balance the two ends. The larger of $c$ and $1 - c$ is made as small as possible when the two are equal: $c = 1 - c$ , giving $c = 0.5$ . Then both ends have gap $0.5$ , and that is the smallest the worst gap can be.

Step 4. Check the signature. With $c = 0.5$ the error is $x - 0.5$ . At $x = 0$ it equals $- 0.5$ ; at $x = 1$ it equals $+ 0.5$ . The error touches its maximum size $0.5$ at two points, with opposite signs — it equioscillates at two points, exactly the fingerprint of a best fit at this degree.

What this tells us: the best uniform fit is not found by averaging or by matching any single point, but by balancing the error so it reaches equal-and-opposite extremes. Even at the lowliest degree, the equioscillation signature already appears.

Check your understanding Beginner

Exercise (easy, multiple choice).

The Chebyshev polynomials are prized in this unit because they:

A. pass through every data point exactly B. oscillate as evenly as possible between $- 1$ and $+ 1$ C. are the simplest polynomials with no real roots D. always have positive coefficients

Hint

Think about the even-ripple picture and what makes a polynomial good at the worst-case game.

Answer

B. Chebyshev polynomials oscillate between $- 1$ and $+ 1$ with equal-height ripples, and that even oscillation is what makes them the answer to minimax questions. Feedback-correct: even oscillation is the equioscillation signature, which is why these polynomials are extremal. Feedback-wrong: the other options describe properties Chebyshev polynomials do not generally have or that are irrelevant to the worst-case fit.

Formal definition Intermediate+

Throughout, $f \in C [a, b]$ is a continuous real function, $Π_{n}$ is the space of real polynomials of degree at most $n$ in the sense of 43.08.01, and $∥ g ∥_{\infty} = max_{x \in [a, b]} ∣ g (x)∣$ is the supremum (uniform) norm. The contrast object throughout is the $L^{2}$ best approximation of 43.08.05, which uses the inner product of 02.11.07 in place of this norm.

Definition (minimax error and best uniform approximation). The minimax error (or best uniform approximation error) of $f$ from $Π_{n}$ is $$ E_n(f) = \inf_{p \in \Pi_n} \lVert f - p \rVert_\infty = \inf_{p \in \Pi_n}\ \max_{x \in [a,b]} \lvert f(x) - p(x) \rvert . $$ A polynomial $p_{n}^{\*} \in Π_{n}$ attaining the infimum, $∥ f - p_{n}^{\*} ∥_{\infty} = E_{n} (f)$ , is a best uniform approximation (or minimax polynomial) of $f$ in $Π_{n}$ .

Definition (alternation / equioscillation set). Let $r = f - p$ be the error of a candidate $p \in Π_{n}$ , and let $L = ∥ r ∥_{\infty}$ . A set of points $a \leq t_{0} < t_{1} < \dots < t_{m} \leq b$ is an alternation set of length $m + 1$ for $r$ if $∣ r (t_{i})∣ = L$ for every $i$ and the signs alternate: $r (t_{i + 1}) = - r (t_{i})$ for each $i$ , i.e. $r (t_{i}) = ε (- 1)^{i} L$ with $ε \in {- 1, + 1}$ fixed. The error equioscillates on $m + 1$ points when such a set exists.

Definition (Chebyshev polynomials of the first kind). The Chebyshev polynomial of the first kind $T_{n}$ is the unique polynomial with $T_{n} (cos θ) = cos (n θ)$ for all $θ$ . The substitution $x = cos θ$ makes this an identity on $[- 1, 1]$ ; it extends to a genuine polynomial through the recurrence $$ T_0 = 1, \quad T_1 = x, \quad T_{n+1}(x) = 2x,T_n(x) - T_{n-1}(x), $$ which follows from $cos ((n + 1) θ) + cos ((n - 1) θ) = 2 cos θ cos (n θ)$ . Then $T_{n}$ has degree $n$ , leading coefficient $2^{n - 1}$ for $n \geq 1$ , satisfies $∥ T_{n} ∥_{\infty} = 1$ on $[- 1, 1]$ , and attains $\pm 1$ alternately at the $n + 1$ Chebyshev extreme points $x_{k} = cos (k π / n)$ , $k = 0, \dots, n$ , with $T_{n} (x_{k}) = (- 1)^{k}$ . Its $n$ roots, the Chebyshev points, are $ξ_{j} = cos (\frac{( 2 j + 1 ) π}{2 n})$ , $j = 0, \dots, n - 1$ .

The symbols $Π_{n}$ , $\prod$ , the supremum norm $∥ \cdot ∥_{\infty}$ , the node polynomial $π_{n + 1}$ , and $C [a, b]$ are recorded in _meta/NOTATION.md; the node polynomial and the minimax error $E_{n} (f)$ are inherited from 43.08.02, where $E_{n} (f)$ governs the near-best interpolation bound.

Counterexamples to common slips Intermediate+

"The best uniform approximation is the interpolant at any $n + 1$ nodes." It is not. Interpolation forces $n + 1$ zeros of the error and pins the error to zero there; the minimax error instead equioscillates and is generically nonzero at $n + 2$ points. The Chebyshev interpolant is near-minimax — within a factor $1 + Λ_{n} \sim (2/ π) lo g n$ of $E_{n} (f)$ (43.08.02) — but it is not the true minimax polynomial.
"Equioscillation on $n + 1$ points characterises the best approximation." The correct count is $n + 2$ alternation points for $Π_{n}$ . The extra point is what makes the de la Vallée-Poussin and exchange arguments work; $n + 1$ alternations is one short and does not force optimality.
"The Chebyshev polynomial $T_{n}$ itself is the minimal-sup-norm polynomial." The monic rescaling $2^{1 - n} T_{n}$ is the minimal-sup-norm element among monic degree- $n$ polynomials on $[- 1, 1]$ ; $T_{n}$ has leading coefficient $2^{n - 1}$ and sup norm $1$ , but the extremal statement is about the monic normalisation, with minimal deviation $2^{1 - n}$ .
"Best $L^{2}$ and best $L^{\infty}$ approximation coincide." They differ. The $L^{2}$ projection of 43.08.05 makes the residual orthogonal to $Π_{n}$ and minimises a weighted average of the squared error; the $L^{\infty}$ best approximation makes the error equioscillate and minimises the worst case. They agree only in degenerate cases (e.g. the best constant for a monotone $f$ via a symmetric weight); in general the minimax polynomial is not any orthogonal projection.

Key theorem with proof Intermediate+

The signature result is the Chebyshev equioscillation theorem: a polynomial is the best uniform approximation if and only if its error reaches the maximum size at $n + 2$ points with alternating signs. The forward direction is a perturbation argument — if the alternation is too short, a correcting polynomial shaves the error — and the converse is a clean lower bound. The proof follows Süli-Mayers ^{[Süli-Mayers §8.3]} and Cheney ^{[Cheney Ch. 3]}.

Theorem (Chebyshev equioscillation / alternation). Let $f \in C [a, b]$ and $p \in Π_{n}$ , with error $r = f - p$ and $L = ∥ r ∥_{\infty}$ . Then $p$ is the best uniform approximation of $f$ from $Π_{n}$ if and only if $r$ has an alternation set of length at least $n + 2$ — that is, there exist $n + 2$ points $t_{0} < \dots < t_{n + 1}$ in $[a, b]$ with $r (t_{i}) = ε (- 1)^{i} L$ for a fixed sign $ε$ . The best approximation is unique.

Proof. Assume first that $r$ equioscillates on $n + 2$ points $t_{0} < \dots < t_{n + 1}$ , and suppose for contradiction that some $q \in Π_{n}$ does strictly better: $∥ f - q ∥_{\infty} = L^{'} < L$ . Consider $d = q - p = (f - p) - (f - q) = r - (f - q)$ . At each $t_{i}$ , $∣(f - q) (t_{i})∣ \leq L^{'} < L = ∣ r (t_{i})∣$ , so the sign of $d (t_{i}) = r (t_{i}) - (f - q) (t_{i})$ equals the sign of $r (t_{i})$ , which alternates with $i$ . A continuous function changing sign across each of the $n + 1$ gaps between the $t_{i}$ has at least $n + 1$ zeros; but $d \in Π_{n}$ is a polynomial of degree $\leq n$ , so $d \equiv 0$ , i.e. $q = p$ , contradicting $L^{'} < L$ . Hence no $q$ beats $p$ : $p$ is a best approximation.

For the converse, assume $p$ is a best approximation but $r$ has a longest alternation set of length $m + 1 \leq n + 1$ (so $m \leq n$ ). Partition $[a, b]$ by the points where $r$ attains $\pm L$ into maximal runs of constant extremal sign; there are $m + 1$ such runs, separated by $m$ interior points $s_{1} < \dots < s_{m}$ at which the extremal sign of $r$ flips (each $s_{j}$ lies strictly between a $+ L$ block and a $- L$ block and may be chosen with $∣ r (s_{j})∣ < L$ ). Form the *correcting polynomial* $$ \sigma(x) = \varepsilon \prod_{j=1}^{m}(s_j - x), $$ of degree $m \leq n$ , choosing the sign $ε \in {\pm 1}$ so that $σ$ carries the same sign as $r$ on each run (it switches sign exactly at the $s_{j}$ , matching the flips of $r$ ). For small $δ > 0$ the perturbed error $f - (p + δ σ) = r - δ σ$ has strictly smaller maximum modulus: on each extremal run $r$ and $δ σ$ share sign, so subtracting pulls $∣ r ∣$ below $L$ there, while away from the runs $∣ r ∣$ is already bounded away from $L$ and $δ$ is taken small enough that $δ ∥ σ ∥_{\infty}$ does not overshoot. Then $∥ f - (p + δ σ) ∥_{\infty} < L$ , contradicting optimality of $p$ . So the alternation length is at least $n + 2$ .

Uniqueness: suppose $p$ and $\tilde{p}$ are both best, each with $∥ f - p ∥_{\infty} = ∥ f - \tilde{p} ∥_{\infty} = E_{n} (f) = L$ . Their average $\overset{p}{ˉ} = \frac{1}{2} (p + \tilde{p}) \in Π_{n}$ also satisfies $∥ f - \overset{p}{ˉ} ∥_{\infty} \leq \frac{1}{2} ∥ f - p ∥_{\infty} + \frac{1}{2} ∥ f - \tilde{p} ∥_{\infty} = L$ , so $\overset{p}{ˉ}$ is best too and its error equioscillates on $n + 2$ points $t_{i}$ . At each $t_{i}$ , $∣(f - \overset{p}{ˉ}) (t_{i})∣ = L$ forces $(f - p) (t_{i}) = (f - \tilde{p}) (t_{i}) = (f - \overset{p}{ˉ}) (t_{i})$ , since two numbers of modulus $\leq L$ whose average has modulus $L$ must both equal that average. Thus $p - \tilde{p}$ vanishes at the $n + 2$ points $t_{i}$ ; a degree- $\leq n$ polynomial with $n + 2$ zeros is identically zero, so $p = \tilde{p}$ . $□$

Bridge. This theorem is the foundational reason the minimax problem is solvable in closed structural terms: optimality is not an inscrutable minimisation but the visible geometric fact that the error equioscillates on $n + 2$ points, and the correcting-polynomial perturbation is exactly the construction that shows any shorter alternation can be improved. The result builds toward the extremal characterisation of the Chebyshev polynomials in the Advanced section, where $2^{1 - n} T_{n}$ is the minimal-deviation monic polynomial precisely because $T_{n}$ equioscillates on $n + 1$ points, and it appears again in the node-optimisation of 43.08.02, where minimising the node polynomial $π_{n + 1}$ in the sup norm is the same alternation problem with $f \equiv 0$ and a leading-coefficient constraint. The alternation count $n + 2$ generalises the two-point balance of the Beginner constant fit, and the perturbation mechanism is dual to the orthogonality condition of 43.08.05: putting these together, both the $L^{\infty}$ and $L^{2}$ best approximations are characterised by an orthogonality-type condition on the residual — sign-alternation against $Π_{n}$ in the uniform case, perpendicularity to $Π_{n}$ in the quadratic case — which is exactly the central insight unifying the two best-approximation theories of this chapter.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Prove the de la Vallée-Poussin lower bound: if $q \in Π_{n}$ and there are $n + 2$ points $t_{0} < \dots < t_{n + 1}$ where the error $f - q$ takes alternating signs (not necessarily of equal size), then $E_{n} (f) \geq min_{i} ∣(f - q) (t_{i})∣$ .

Hint

Suppose $p^{\*}$ is the best approximation with $∥ f - p^{\*} ∥_{\infty} = E_{n} (f) < min_{i} ∣(f - q) (t_{i})∣$ . Examine the sign of $q - p^{\*} = (f - p^{\*}) - (f - q)$ at the $t_{i}$ .

Answer

Let $m = min_{i} ∣(f - q) (t_{i})∣$ and suppose for contradiction $E_{n} (f) < m$ , so a best $p^{\*} \in Π_{n}$ has $∣(f - p^{\*}) (t_{i})∣ \leq E_{n} (f) < m \leq ∣(f - q) (t_{i})∣$ at every $t_{i}$ . Then $d = q - p^{\*} = (f - p^{\*}) - (f - q)$ has, at each $t_{i}$ , the sign of $- (f - q) (t_{i})$ , because the $(f - p^{\*})$ term is strictly smaller in modulus. Since $f - q$ alternates sign across the $t_{i}$ , so does $d$ , giving $d$ a sign change in each of the $n + 1$ gaps and hence $\geq n + 1$ zeros. But $d \in Π_{n}$ , forcing $d \equiv 0$ , which contradicts $∣ d (t_{i})∣ \geq m - E_{n} (f) > 0$ . So $E_{n} (f) \geq m$ . Rubric: full credit for the sign-domination at the $t_{i}$ , the $n + 1$ sign changes, and the degree contradiction.

Exercise 5 (medium, symbolic).

Show that the Chebyshev polynomial $T_{n}$ equioscillates on $[- 1, 1]$ : it attains $+ 1$ and $- 1$ alternately at the $n + 1$ points $x_{k} = cos (k π / n)$ , and $∥ T_{n} ∥_{\infty} = 1$ .

Hint

Use $T_{n} (cos θ) = cos (n θ)$ . At $θ_{k} = k π / n$ , evaluate $cos (n θ_{k})$ . Bound $∣ cos (n θ)∣$ for all $θ$ .

Answer

For $x = cos θ$ with $θ \in [0, π]$ , $T_{n} (x) = cos (n θ)$ , so $∣ T_{n} (x)∣ = ∣ cos (n θ)∣ \leq 1$ , giving $∥ T_{n} ∥_{\infty} \leq 1$ . At $x_{k} = cos (k π / n)$ , i.e. $θ_{k} = k π / n$ , $T_{n} (x_{k}) = cos (n \cdot k π / n) = cos (k π) = (- 1)^{k}$ , so the value is $+ 1$ for even $k$ and $- 1$ for odd $k$ , alternating across $k = 0, 1, \dots, n$ . These $n + 1$ points have $∣ T_{n} ∣ = 1$ , so the bound is attained and $∥ T_{n} ∥_{\infty} = 1$ , and the alternation has length $n + 1$ . Rubric: full credit for the modulus bound, the value $(- 1)^{k}$ at $x_{k}$ , and the alternation conclusion.

Exercise 7 (hard, symbolic).

Prove the minimal-deviation theorem: among all monic polynomials of degree $n$ on $[- 1, 1]$ , the polynomial $\tilde{T}_{n} = 2^{1 - n} T_{n}$ uniquely minimises the sup norm, with $∥ \tilde{T}_{n} ∥_{\infty} = 2^{1 - n}$ .

Hint

$\tilde{T}_{n}$ is monic (leading coefficient $2^{n - 1} \cdot 2^{1 - n} = 1$ ) and equioscillates at $n + 1$ points. Suppose a monic $q$ has $∥ q ∥_{\infty} < 2^{1 - n}$ ; examine $\tilde{T}_{n} - q$ at the extreme points.

Answer

$\tilde{T}_{n} = 2^{1 - n} T_{n}$ has leading coefficient $2^{1 - n} \cdot 2^{n - 1} = 1$ , so it is monic, and $∥ \tilde{T}_{n} ∥_{\infty} = 2^{1 - n} ∥ T_{n} ∥_{\infty} = 2^{1 - n}$ , attained with alternating sign at the $n + 1$ extreme points $x_{k} = cos (k π / n)$ , where $\tilde{T}_{n} (x_{k}) = 2^{1 - n} (- 1)^{k}$ . Suppose a monic $q \in Π_{n}$ had $∥ q ∥_{\infty} < 2^{1 - n}$ . Then $d = \tilde{T}_{n} - q$ has degree $\leq n - 1$ (the leading terms cancel, both monic), and at each $x_{k}$ , $d (x_{k}) = \tilde{T}_{n} (x_{k}) - q (x_{k})$ has the sign of $\tilde{T}_{n} (x_{k}) = 2^{1 - n} (- 1)^{k}$ , since $∣ q (x_{k})∣ \leq ∥ q ∥_{\infty} < 2^{1 - n} = ∣ \tilde{T}_{n} (x_{k})∣$ . So $d$ alternates sign across the $n + 1$ points $x_{k}$ , giving $\geq n$ sign changes and $\geq n$ zeros; but $de g d \leq n - 1$ forces $d \equiv 0$ , i.e. $q = \tilde{T}_{n}$ , contradicting $∥ q ∥_{\infty} < 2^{1 - n} = ∥ \tilde{T}_{n} ∥_{\infty}$ . Hence $\tilde{T}_{n}$ is the unique minimiser. Rubric: full credit for the monic-cancellation degree drop, the sign-alternation at the $x_{k}$ , and the contradiction.

Exercise 8 (hard, symbolic).

Deduce that the monic node polynomial $π_{n + 1} (x) = \prod_{k = 0}^{n} (x - x_{k})$ on $[- 1, 1]$ has minimal sup norm exactly when its nodes are the roots of $T_{n + 1}$ , and state the consequence for the interpolation-error bound of 43.08.02.

Hint

$π_{n + 1}$ is monic of degree $n + 1$ ; the minimal-deviation theorem (Exercise 7) at degree $n + 1$ identifies the unique minimiser as $2^{- n} T_{n + 1}$ , whose roots are the Chebyshev points.

Answer

The node polynomial $π_{n + 1} = \prod_{k = 0}^{n} (x - x_{k})$ is monic of degree $n + 1$ . By the minimal-deviation theorem applied at degree $n + 1$ (Exercise 7 with $n \to n + 1$ ), the unique monic polynomial of degree $n + 1$ of least sup norm on $[- 1, 1]$ is $2^{- n} T_{n + 1}$ , with $∥ 2^{- n} T_{n + 1} ∥_{\infty} = 2^{- n}$ . Since a monic polynomial is determined by its roots, $π_{n + 1}$ achieves this minimum exactly when its roots $x_{0}, \dots, x_{n}$ are the $n + 1$ roots of $T_{n + 1}$ , the Chebyshev points $ξ_{j} = cos (\frac{( 2 j + 1 ) π}{2 ( n + 1 )})$ . Consequence: in the interpolation-error corollary $∥ f - p_{n} ∥_{\infty} \leq \frac{M _{n + 1}}{( n + 1 )!} ∥ π_{n + 1} ∥_{\infty}$ of 43.08.02, choosing Chebyshev nodes drives the controllable factor down to its minimum $2^{- n}$ , in contrast to the exponentially larger equispaced value, which is the cure for the Runge phenomenon. Rubric: full credit for identifying $2^{- n} T_{n + 1}$ as the minimiser, the root condition on the nodes, and the error-bound consequence.

Advanced results Master

The elementary theory establishes existence, the alternation characterisation, and uniqueness; the master layer concerns the abstract Haar-system setting that isolates why polynomials behave well, the strong form of uniqueness, the precise sense in which Chebyshev interpolation is near-minimax, and the Remez algorithm that computes the true minimax polynomial.

Theorem 1 (existence of a best approximation). For $f \in C [a, b]$ and any $n$ , the infimum $E_{n} (f) = in f_{p \in Π_{n}} ∥ f - p ∥_{\infty}$ is attained. The map $p \mapsto ∥ f - p ∥_{\infty}$ is continuous and coercive on the finite-dimensional $Π_{n}$ (it tends to infinity as $∥ p ∥ \to \infty$ , since $∥ f - p ∥_{\infty} \geq ∥ p ∥_{\infty} - ∥ f ∥_{\infty}$ and norms on $Π_{n}$ are equivalent), so it attains its minimum on the compact sublevel set ${p : ∥ f - p ∥_{\infty} \leq ∥ f ∥_{\infty}}$ ^{[Cheney Ch. 1]}. Existence requires only that $Π_{n}$ be a finite-dimensional subspace of a normed space; no Haar condition is needed for existence, only for the alternation characterisation and uniqueness.

Theorem 2 (Haar systems and the Chebyshev characterisation). A system $g_{0}, \dots, g_{n} \in C [a, b]$ is a Haar (Chebyshev) system if every nonzero generalised polynomial $\sum_{k} c_{k} g_{k}$ has at most $n$ distinct zeros in $[a, b]$ , equivalently if every $(n + 1) \times (n + 1)$ collocation matrix $[g_{j} (x_{i})]$ is nonsingular for distinct nodes — the unisolvence of 43.08.01. The monomials $1, x, \dots, x^{n}$ form a Haar system on any interval. For approximation from a Haar subspace $G = span {g_{k}}$ , the equioscillation theorem holds verbatim: $p \in G$ is the best uniform approximation of $f$ iff $f - p$ has an alternation set of length $\geq n + 2$ , and the best approximation is unique ^{[Cheney Ch. 3]}. The Haar condition is exactly the hypothesis that makes the correcting-polynomial perturbation available: a generalised polynomial can be built with prescribed sign changes at up to $n$ interior points.

Theorem 3 (strong uniqueness and Lipschitz dependence). For a Haar subspace, the best approximation is strongly unique: there is a constant $γ = γ (f) > 0$ with $$ \lVert f - p \rVert_\infty \ge \lVert f - p_n^* \rVert_\infty + \gamma,\lVert p - p_n^* \rVert_\infty \quad \text{for all } p \in G, $$ a one-sided quadratic-free lower bound that ordinary uniqueness lacks ^{[DeVore-Lorentz Ch. 3]}. Strong uniqueness forces the best-approximation operator $f \mapsto p_{n}^{\*}$ to be locally Lipschitz (indeed pointwise-Lipschitz with constant $2/ γ$ ), in sharp contrast to the linear $L^{2}$ projection of 43.08.05: the minimax operator is nonlinear, since $p_{n}^{\*} (f + g) \neq = p_{n}^{\*} (f) + p_{n}^{\*} (g)$ in general, yet it is stable. The nonlinearity is the price of the uniform norm's lack of an inner product; strong uniqueness is the compensation that keeps the problem well-posed.

Theorem 4 (near-minimax Chebyshev interpolation). Let $p_{n}^{Cheb}$ interpolate $f$ at the $n + 1$ Chebyshev points on $[- 1, 1]$ . Then $$ \lVert f - p_n^{\mathrm{Cheb}} \rVert_\infty \le (1 + \Lambda_n),E_n(f), \qquad \Lambda_n \sim \tfrac{2}{\pi}\log n , $$ so Chebyshev interpolation is within a factor $\frac{2}{π} lo g n + O (1)$ of the true minimax error — for $n = 100$ the factor is below $4$ ^{[Trefethen Ch. 16]}. This is the quantitative payoff of 43.08.02: the Lebesgue constant for Chebyshev nodes grows only logarithmically, where for equispaced nodes it grows faster than any power of $n$ . Chebyshev interpolation therefore delivers near-optimal uniform approximation by a linear projection that requires no iteration, recovering most of the benefit of the minimax polynomial at a fraction of the cost; the true minimax polynomial $p_{n}^{\*}$ , which the next theorem computes, improves on it by at most that logarithmic factor.

Theorem 5 (Remez exchange algorithm). The minimax polynomial is computed by the Remez exchange algorithm: start with a reference of $n + 2$ points $t_{0} < \dots < t_{n + 1}$ ; solve the square linear system $$ f(t_i) - \sum_{j=0}^{n} a_j, t_i^{,j} = (-1)^i,h, \qquad i = 0, \dots, n+1, $$ for the $n + 1$ coefficients $a_{j}$ and the levelled error $h$ (one equation per reference point, $n + 2$ equations in $n + 2$ unknowns, solvable because the reference is a Haar collocation augmented by the alternating $\pm h$ column); then exchange the reference for the points where the current error $f - p$ attains its local extrema, preserving the alternation pattern. The iteration increases $∣ h ∣$ monotonically toward $E_{n} (f)$ , sandwiched below by the de la Vallée-Poussin bound $∣ h ∣ \leq E_{n} (f) \leq ∥ f - p ∥_{\infty}$ , and converges quadratically once the reference is near the true alternation set ^{[DeVore-Lorentz Ch. 7]}. Each iteration's $∣ h ∣$ is a guaranteed lower bound on $E_{n} (f)$ and $∥ f - p ∥_{\infty}$ a guaranteed upper bound, so the algorithm self-certifies its accuracy.

Synthesis. The equioscillation theorem is the foundational reason best uniform approximation is a structural rather than a merely variational subject: optimality is the geometric fact that the error alternates on $n + 2$ points, and every other result is read off it. The central insight is that the uniform norm replaces the inner-product orthogonality of 43.08.05 by sign-alternation against the approximating space, so the best-approximation operator is nonlinear yet strongly unique — this is exactly the trade the uniform norm imposes, and it generalises the two-point balance of the elementary constant fit to the full $n + 2$ -point alternation. The Chebyshev polynomials are the fixed point of the whole theory: $T_{n}$ equioscillates on $n + 1$ points, which makes $2^{1 - n} T_{n}$ the minimal-deviation monic polynomial, which makes its roots the optimal interpolation nodes that minimise the node polynomial of 43.08.02 and defeat the Runge phenomenon, and which makes Chebyshev interpolation near-minimax by the logarithmic Lebesgue constant. The bridge is that one extremal object, the equioscillating Chebyshev polynomial, simultaneously solves the minimax node problem, supplies the near-best linear interpolant, and seeds the Remez reference — the same polynomial that the $L^{2}$ theory of 43.08.05 meets through the weight $(1 - x^{2})^{- 1/2}$ , so the uniform and quadratic best-approximation theories are dual faces of one approximation problem, putting these together into the single principle that node placement is governed by the arcsine measure of the interval and the Weierstrass density theorem guarantees the whole tower converges.

Full proof set Master

Proposition 1 (existence by compactness). For $f \in C [a, b]$ , the minimax error $E_{n} (f) = in f_{p \in Π_{n}} ∥ f - p ∥_{\infty}$ is attained by some $p_{n}^{\*} \in Π_{n}$ .

Proof. The functional $Φ (p) = ∥ f - p ∥_{\infty}$ is continuous on $Π_{n}$ (it is $1$ -Lipschitz: $∣ Φ (p) - Φ (q)∣ \leq ∥ p - q ∥_{\infty}$ ). For $p \in Π_{n}$ , the reverse triangle inequality gives $Φ (p) \geq ∥ p ∥_{\infty} - ∥ f ∥_{\infty}$ , so $Φ (p) \leq ∥ f ∥_{\infty}$ forces $∥ p ∥_{\infty} \leq 2 ∥ f ∥_{\infty}$ . The sublevel set $K = {p \in Π_{n} : Φ (p) \leq ∥ f ∥_{\infty}}$ is therefore bounded; it is closed by continuity of $Φ$ , and since $Π_{n}$ is finite-dimensional (all norms equivalent), $K$ is compact. The set $K$ is nonempty ( $p = 0$ gives $Φ (0) = ∥ f ∥_{\infty}$ ). A continuous function on a nonempty compact set attains its infimum, so $Φ$ attains its minimum on $K$ , which equals $E_{n} (f)$ because any $p \in / K$ has $Φ (p) > ∥ f ∥_{\infty} \geq min_{K} Φ$ . $□$

Proposition 2 (equioscillation characterisation). For $p \in Π_{n}$ with $r = f - p$ and $L = ∥ r ∥_{\infty}$ , $p$ is a best uniform approximation iff $r$ has an alternation set of length $\geq n + 2$ .

Proof. ( $\Leftarrow$ ) Let $r$ equioscillate on $t_{0} < \dots < t_{n + 1}$ . If some $q \in Π_{n}$ had $∥ f - q ∥_{\infty} < L$ , then $d = q - p = r - (f - q)$ satisfies, at each $t_{i}$ , $sign d (t_{i}) = sign r (t_{i})$ (the subtracted term is strictly smaller in modulus), so $d$ alternates sign across the $n + 1$ gaps and has $\geq n + 1$ zeros; as $d \in Π_{n}$ , $d \equiv 0$ , contradicting $∥ f - q ∥_{\infty} < L = ∥ f - p ∥_{\infty}$ . So $p$ is best. ( $\Rightarrow$ ) Suppose $p$ is best but the maximal alternation length is $m + 1 \leq n + 1$ . Let $s_{1} < \dots < s_{m}$ separate the maximal extremal-sign runs of $r$ , chosen with $∣ r (s_{j})∣ < L$ , and set $σ (x) = ε \prod_{j = 1}^{m} (s_{j} - x) \in Π_{m} \subseteq Π_{n}$ with $ε$ fixed so $σ$ matches the extremal sign of $r$ on every run. Let $ρ < L$ bound $∣ r ∣$ on the compact complement of small neighbourhoods of the extremal runs where $σ$ might disagree in sign; for $δ < (L - ρ) / ∥ σ ∥_{\infty}$ small, $r - δ σ$ has modulus $< L$ everywhere (on each run $r$ and $δ σ$ share sign and $∣ r - δ σ ∣ < L$ ; off the runs $∣ r ∣ \leq ρ$ and the perturbation is small). Then $p + δ σ$ beats $p$ , a contradiction; so $m + 1 \geq n + 2$ . $□$

Proposition 3 (uniqueness). The best uniform approximation from $Π_{n}$ is unique.

Proof. Let $p, \tilde{p}$ be best, $∥ f - p ∥_{\infty} = ∥ f - \tilde{p} ∥_{\infty} = L = E_{n} (f)$ . Then $\overset{p}{ˉ} = \frac{1}{2} (p + \tilde{p}) \in Π_{n}$ has $∥ f - \overset{p}{ˉ} ∥_{\infty} \leq \frac{1}{2} (∥ f - p ∥_{\infty} + ∥ f - \tilde{p} ∥_{\infty}) = L$ , so $\overset{p}{ˉ}$ is best and, by Proposition 2, $f - \overset{p}{ˉ}$ equioscillates on points $t_{0} < \dots < t_{n + 1}$ with $∣(f - \overset{p}{ˉ}) (t_{i})∣ = L$ . At each $t_{i}$ , $(f - \overset{p}{ˉ}) (t_{i}) = \frac{1}{2} [(f - p) (t_{i}) + (f - \tilde{p}) (t_{i})]$ has modulus $L$ while each summand has modulus $\leq L$ ; equality in $∣ \frac{1}{2} (u + v)∣ = L$ with $∣ u ∣, ∣ v ∣ \leq L$ forces $u = v = (f - \overset{p}{ˉ}) (t_{i})$ . Hence $(p - \tilde{p}) (t_{i}) = (f - \tilde{p}) (t_{i}) - (f - p) (t_{i}) = 0$ at the $n + 2$ points $t_{i}$ . A polynomial in $Π_{n}$ with $n + 2$ zeros is identically zero, so $p = \tilde{p}$ . $□$

Proposition 4 (Chebyshev minimal deviation). Among monic degree- $n$ polynomials on $[- 1, 1]$ , $\tilde{T}_{n} = 2^{1 - n} T_{n}$ is the unique minimiser of the sup norm, with $∥ \tilde{T}_{n} ∥_{\infty} = 2^{1 - n}$ .

Proof. The leading coefficient of $T_{n}$ is $2^{n - 1}$ (by induction on the recurrence $T_{n + 1} = 2 x T_{n} - T_{n - 1}$ : the new leading term is $2 x \cdot 2^{n - 1} x^{n}$ ), so $\tilde{T}_{n} = 2^{1 - n} T_{n}$ is monic, and $∥ \tilde{T}_{n} ∥_{\infty} = 2^{1 - n}$ with $\tilde{T}_{n} (x_{k}) = 2^{1 - n} (- 1)^{k}$ alternating at the $n + 1$ points $x_{k} = cos (k π / n)$ . Suppose a monic $q \in Π_{n}$ had $∥ q ∥_{\infty} < 2^{1 - n}$ . The difference $d = \tilde{T}_{n} - q$ has degree $\leq n - 1$ (monic leading terms cancel), and at each $x_{k}$ , $∣ q (x_{k})∣ < 2^{1 - n} = ∣ \tilde{T}_{n} (x_{k})∣$ , so $d (x_{k})$ has the alternating sign of $\tilde{T}_{n} (x_{k})$ . Thus $d$ changes sign across each of the $n$ gaps between consecutive $x_{k}$ , giving $\geq n$ zeros; with $de g d \leq n - 1$ this forces $d \equiv 0$ , so $q = \tilde{T}_{n}$ , contradicting $∥ q ∥_{\infty} < ∥ \tilde{T}_{n} ∥_{\infty}$ . Uniqueness and minimality follow. $□$

Proposition 5 (de la Vallée-Poussin lower bound). If $q \in Π_{n}$ and $f - q$ takes alternating signs at $n + 2$ points $t_{0} < \dots < t_{n + 1}$ , then $E_{n} (f) \geq min_{i} ∣(f - q) (t_{i})∣$ .

Proof. Set $m = min_{i} ∣(f - q) (t_{i})∣$ and suppose $E_{n} (f) < m$ ; let $p^{\*} \in Π_{n}$ be best, $∥ f - p^{\*} ∥_{\infty} = E_{n} (f) < m$ . Then $d = q - p^{\*} = (f - p^{\*}) - (f - q)$ has at each $t_{i}$ the sign of $- (f - q) (t_{i})$ , since $∣(f - p^{\*}) (t_{i})∣ \leq E_{n} (f) < m \leq ∣(f - q) (t_{i})∣$ . As $f - q$ alternates across the $t_{i}$ , $d$ has $n + 1$ sign changes, hence $\geq n + 1$ zeros; $d \in Π_{n}$ forces $d \equiv 0$ , contradicting $∣ d (t_{i})∣ \geq m - E_{n} (f) > 0$ . So $E_{n} (f) \geq m$ . Combined with $E_{n} (f) \leq ∥ f - q ∥_{\infty}$ , any near-equioscillating $q$ brackets $E_{n} (f)$ between $min_{i} ∣(f - q) (t_{i})∣$ and $∥ f - q ∥_{\infty}$ , the bracket the Remez iteration tightens. $□$

Connections Master

The minimal node polynomial closes the Runge loop of 43.08.02: minimising $∥ π_{n + 1} ∥_{\infty} = ∥ \prod_{k} (x - x_{k}) ∥_{\infty}$ over monic degree- $(n + 1)$ polynomials is exactly the Chebyshev minimal-deviation problem of Proposition 4, whose solution $2^{- n} T_{n + 1}$ has roots at the Chebyshev points; placing interpolation nodes there drives the interpolation-error corollary's controllable factor to its minimum $2^{- n}$ and makes the Lebesgue constant logarithmic, the precise correction to the exponential equispaced node-polynomial swelling and the cure for the divergence exhibited in 43.08.02.
The uniform best-approximation theory is dual to the $L^{2}$ best-approximation theory of 43.08.05: where this unit characterises optimality by sign-alternation of the residual against $Π_{n}$ , that unit characterises it by orthogonality of the residual to $Π_{n}$ in a weighted inner product; the two meet on the Chebyshev system, since the Chebyshev polynomials are simultaneously the equioscillating minimax extremals here and the orthogonal polynomials for the weight $(1 - x^{2})^{- 1/2}$ there, and the near-minimax property of Chebyshev interpolation is the bridge that makes the cheap $L^{2}$ -flavoured construction nearly solve the $L^{\infty}$ problem.
The Chebyshev points constructed here are the optimal-node half of the Gauss-quadrature story of 43.09.03: Gauss-Chebyshev quadrature places its nodes at these same roots, so the extremal interpolation theory and the extremal integration theory share one node set, and the positivity and stability of the Gauss weights inherit from the same orthogonal-polynomial root structure that the minimal-deviation property exposes.
Existence of a best uniform approximation presupposes that polynomials can approximate continuous functions arbitrarily well, which is the Weierstrass approximation theorem; the trigonometric/Fourier density established via Fejér in 02.10.01 is the classical density backdrop, guaranteeing $E_{n} (f) \to 0$ for every $f \in C [a, b]$ so that the equioscillation theory operates on a convergent tower rather than on a sequence with a positive floor, tying the numerical minimax theory to the harmonic-analysis density results of the analysis section.

Historical & philosophical context Master

The theory originates with Pafnuty Chebyshev, who in his 1854 memoir Théorie des mécanismes connus sous le nom de parallélogrammes studied the linkage design problem of approximating straight-line motion and was led to the polynomials of least deviation from zero; he proved that the monic degree- $n$ polynomial minimising the maximum modulus on a symmetric interval is the rescaled $T_{n}$ , and identified the alternation of the error as the signature of optimality ^{[Chebyshev 1854]}. The polynomials $T_{n} (cos θ) = cos (n θ)$ now bear his name through the transliteration "Tchebychef", which fixes the symbol $T$ .

The general characterisation theorem — that the best uniform approximation from the polynomials is the one whose error equioscillates on $n + 2$ points — and its uniqueness were brought to their modern form in the early twentieth century, with Émile Borel's 1905 Leçons sur les fonctions de variables réelles presenting the alternation theorem for polynomial approximation. The abstraction to Haar systems is due to Alfréd Haar (1918), who isolated the unisolvence condition under which the alternation theorem and uniqueness survive for a general finite-dimensional subspace. The computational method is Evgeny Remez's 1934 exchange algorithm, which turns the existence theorem into an iteration that converges to the equioscillating error ^{[Remez 1934]}. The existence backdrop is the Weierstrass approximation theorem of 1885, that polynomials are dense in $C [a, b]$ , without which $E_{n} (f) \to 0$ could fail; the synthesis of the minimax theory with the analyticity-controlled convergence rate $E_{n} (f) \sim ρ^{- n}$ via the Bernstein ellipse, and the practical Remez computation, is the subject of the modern treatments by Cheney, DeVore-Lorentz, and Trefethen.

Bibliography Master

@book{sulimayers2003,
  author    = {S\"{u}li, Endre and Mayers, David F.},
  title     = {An Introduction to Numerical Analysis},
  publisher = {Cambridge University Press},
  year      = {2003}
}

@book{cheney1966,
  author    = {Cheney, Elliott Ward},
  title     = {Introduction to Approximation Theory},
  publisher = {McGraw-Hill},
  year      = {1966}
}

@book{devorelorentz1993,
  author    = {DeVore, Ronald A. and Lorentz, George G.},
  title     = {Constructive Approximation},
  series    = {Grundlehren der mathematischen Wissenschaften},
  volume    = {303},
  publisher = {Springer},
  year      = {1993}
}

@book{trefethen2019,
  author    = {Trefethen, Lloyd N.},
  title     = {Approximation Theory and Approximation Practice, Extended Edition},
  publisher = {SIAM},
  year      = {2019}
}

@article{chebyshev1854,
  author  = {Chebyshev, Pafnuty L.},
  title   = {Th\'{e}orie des m\'{e}canismes connus sous le nom de parall\'{e}logrammes},
  journal = {M\'{e}moires des Savants \'{e}trangers pr\'{e}sent\'{e}s \`{a} l'Acad\'{e}mie de Saint-P\'{e}tersbourg},
  volume  = {7},
  year    = {1854},
  pages   = {539--568}
}

@article{remez1934,
  author  = {Remez, Evgeny Ya.},
  title   = {Sur la d\'{e}termination des polyn\^{o}mes d'approximation de degr\'{e} donn\'{e}e},
  journal = {Communications de la Soci\'{e}t\'{e} math\'{e}matique de Kharkov},
  volume  = {10},
  year    = {1934},
  pages   = {41--63}
}

@book{borel1905,
  author    = {Borel, \'{E}mile},
  title     = {Le\c{c}ons sur les fonctions de variables r\'{e}elles et les d\'{e}veloppements en s\'{e}ries de polyn\^{o}mes},
  publisher = {Gauthier-Villars, Paris},
  year      = {1905}
}

Prerequisites

43.08.02
02.11.07

Tier anchors

beginner: Süli-Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §8.1-8.3 (best approximation in the infinity norm and the minimax idea at first-course level); Trefethen 2019 *Approximation Theory and Approximation Practice* extended ed. (SIAM) Ch. 2-3 (Chebyshev points and the wobble-and-equalise picture)
intermediate: Süli-Mayers 2003 *An Introduction to Numerical Analysis* (Cambridge) §8.3-8.5 (the equioscillation theorem, uniqueness, the Chebyshev polynomials and their minimal-sup-norm property); Cheney 1966 *Introduction to Approximation Theory* (McGraw-Hill) Ch. 3 (Haar systems, the characterisation theorem, de la Vallée-Poussin)
master: Cheney 1966 *Introduction to Approximation Theory* (McGraw-Hill / AMS Chelsea reprint) Ch. 1-3 (existence, the Haar condition, equioscillation, uniqueness, the Remez algorithm); DeVore-Lorentz 1993 *Constructive Approximation* (Springer Grundlehren 303) Ch. 3 and Ch. 7 (the Chebyshev systems, strong uniqueness, the extremal theory of Chebyshev polynomials); Trefethen 2019 *Approximation Theory and Approximation Practice* extended ed. (SIAM) Ch. 9-10, 16 (near-best Chebyshev interpolation versus true minimax, the Remez algorithm in practice)

References

Süli, E. & Mayers, D. F. — An Introduction to Numerical Analysis · Cambridge University Press (2003). Chapter 8 ('Polynomial approximation in the infinity-norm') develops best uniform approximation: §8.1 poses the minimax problem of minimising ||f - p||_infinity over p in Pi_n; §8.2 proves existence of a best approximation by a compactness argument on the finite-dimensional Pi_n; §8.3 states and proves the Chebyshev equioscillation (oscillation) theorem characterising the best approximation by the existence of at least n+2 alternation points where the error attains +/- its maximum modulus with alternating signs, and deduces uniqueness; §8.4 introduces the Chebyshev polynomials T_n via T_n(cos theta) = cos(n theta), the three-term recurrence T_{n+1} = 2 x T_n - T_{n-1}, and proves the monic 2^{1-n} T_n minimises the sup norm among monic degree-n polynomials on [-1,1] with value 2^{1-n}; §8.5 connects the minimal node polynomial to the choice of the Chebyshev points as interpolation nodes that nearly minimise the interpolation error.
Cheney, E. W. — Introduction to Approximation Theory · McGraw-Hill (1966); AMS Chelsea reprint (1982/1998). Chapters 1-3 give the abstract theory in a normed linear space: existence of best approximations from a finite-dimensional subspace (Ch. 1), the Haar / Chebyshev-system condition that a basis g_0,...,g_n is unisolvent so that no nonzero generalised polynomial has more than n zeros (Ch. 3 §1), the characterisation theorem that p* is the best uniform approximation of f from a Haar subspace iff the error f - p* equioscillates on at least n+2 points (Ch. 3 §1, the Chebyshev theorem), the uniqueness theorem (Ch. 3 §1), the de la Vallee-Poussin lower bound on the minimax error from any alternating-sign error sample, and the Remez exchange algorithm (Ch. 3 §8) as the iterative reference-set method that converges to the equioscillating error.
DeVore, R. A. & Lorentz, G. G. — Constructive Approximation · Springer, Grundlehren der mathematischen Wissenschaften 303 (1993). Chapter 3 treats the Chebyshev systems and the extremal properties of the Chebyshev polynomials of the first kind, including the minimal-deviation (least-deviation-from-zero among monic polynomials) property, the extremal points cos(k pi / n), and the Markov and Bernstein inequalities for the derivative of a polynomial bounded on [-1,1]; Chapter 7 develops the characterisation of best uniform approximation, the strong uniqueness theorem (the best approximation from a Haar subspace is unique with a linear modulus of continuity in f), and the Remez algorithm with its convergence analysis, together with the relation E_n(f) <= (1 + Lambda_n) E_n(f) tying near-best Chebyshev interpolation to the true minimax error.
Trefethen, L. N. — Approximation Theory and Approximation Practice (Extended Edition) · SIAM (2019). Chapters 9-10 and 16 treat best approximation computationally: the distinction between the true minimax polynomial p* (the equioscillating best uniform approximation) and the near-best Chebyshev interpolant, which differs from p* by only a factor (2/pi) log n + O(1) in error through the Lebesgue constant; the Remez algorithm as implemented in the chebfun 'minimax' command, alternating a linear solve for the reference error level with an exchange of the reference points to the local extrema of the current error; and Chapter 8 on the convergence rate E_n(f) ~ rho^{-n} controlled by the Bernstein-ellipse parameter rho, the analyticity backdrop against which the equioscillation theory operates.

Estimated time

beginner: 20m
intermediate: 50m
master: 90m