40.07.05 · combinatorics / probabilistic-method

Concentration for Combinatorial Functionals: Azuma and the Bounded-Differences Method

shipped3 tiersLean: none

Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 7 and §7.7 (Talagrand's inequality, convex distance, certifiability) and Ch. 8 (the LIS and the $\sqrt{\lambda}$ improvement); Azuma 1967 *Tôhoku Math. J.* 19 and Hoeffding 1963 *J. Amer. Statist. Assoc.* 58; Shamir-Spencer 1987 *Combinatorica* 7; McDiarmid 1989; Talagrand 1995 *Publ. Math. IHÉS* 81 (the isoperimetric/convex-distance inequality)

Intuition Beginner

Suppose a quantity depends on a long list of random choices — the colours of all the edges of a random network, the heights dealt to a deck of cards, the bin each of many balls lands in. You would like to know not just its average, but how tightly the quantity hugs that average. Does it wander all over, or does almost every random run land within a narrow band? For a surprising range of such quantities, the answer is: a narrow band, and you can prove it knowing almost nothing about the quantity itself.

The reason is a single, modest fact. If changing any one choice in the list — recolour one edge, swap one card, move one ball — can only nudge the final answer by a small amount, then the answer cannot swing wildly. Many small, independent nudges tend to cancel rather than pile up, the same way a thousand coin flips give a total very close to five hundred heads. The more choices there are, each with a small per-choice influence, the tighter the band around the average becomes, in relative terms.

This "one change moves it a little, so the whole thing stays put" principle is the heart of the bounded-differences method. It is powerful precisely because it asks so little: you never compute the average, you only check that no single input is too influential. The quantity then concentrates near wherever its average happens to be.

Visual Beginner

Imagine revealing the random choices one at a time and, at each step, recording your best current guess for the final answer — the average over all the choices still hidden. Before you reveal anything, your guess is the overall average. As each choice is uncovered, your guess jumps, but only by a little, because no single choice has much pull. By the time everything is revealed, your guess has walked, in small steps, all the way to the true answer.

Step	What is revealed	What the running guess does
Start	nothing	sits at the overall average
Each step	one more choice	moves up or down by a little
End	everything	arrives at the true answer

The walk takes many short steps, so it ends close to where it started: the answer concentrates near its average, in a band whose width grows like the square root of the number of steps, not the number itself.

Worked example Beginner

Throw $9$ balls independently into $9$ bins, each ball landing in a uniformly random bin. Let $E$ be the number of empty bins at the end. We will see why $E$ stays close to its average.

Step 1. Find the average. A fixed bin is missed by one ball with chance $8/9$ , and the nine balls are independent, so that bin is empty with chance $(8/9)^{9}$ . There are nine bins, so the average number of empty bins is $9 \times (8/9)^{9}$ . Numerically $(8/9)^{9} \approx 0.346$ , so the average is about $9 \times 0.346 \approx 3.1$ .

Step 2. Check the per-choice influence. Think of the random choices as the nine landing spots, one per ball. Move a single ball from one bin to another. This can change the empty-bin count by at most $2$ : the bin it left might become empty (a change of $+ 1$ ) and the bin it entered might stop being empty (a change of $- 1$ ). No single ball can swing $E$ by more than $2$ .

Step 3. Read off concentration. Nine choices, each with influence at most $2$ , means the band around the average scales with $9 \times 2^{2} = 36 = 6$ . The bounded-differences rule then says a deviation of size $t$ from the average has chance at most $2 \times exp (- 2 t^{2} /36) = 2 exp (- t^{2} /18)$ .

Step 4. Plug in a number. For a deviation of $t = 6$ empty bins, this chance is at most $2 exp (- 36/18) = 2 e^{- 2} \approx 0.27$ ; for $t = 9$ it is at most $2 exp (- 81/18) = 2 e^{- 4.5} \approx 0.022$ .

What this tells us: without ever pinning the average exactly, the bare fact that one ball changes $E$ by at most $2$ already forces $E$ to sit within a handful of its mean on almost every throw.

Check your understanding Beginner

Exercise (easy, multiple choice).

A quantity depends on $100$ independent random choices, and changing any one choice alters it by at most $1$ . Which statement does the bounded-differences principle support?

A. The quantity equals its average exactly on every run.

B. The quantity stays within a band of width about $100 = 10$ of its average on almost every run.

C. The quantity can be anywhere between its smallest and largest possible values with equal chance.

D. Nothing can be said without first computing the average.

Hint

Many small influences cancel rather than add. The band scales like the square root of the number of choices, not the number itself.

Answer

B. With $100$ choices each of influence $1$ , the deviation scale is $100 = 10$ , far below the worst-case spread of $100$ . Feedback-correct: this square-root band is the whole point — concentration without computing the mean. Feedback-wrong: A overclaims (the value still fluctuates); C ignores cancellation; D is backwards, since the principle needs no knowledge of the average.

Formal definition Intermediate+

Throughout, $(Ω, F, Pr)$ is a probability space carrying a filtration $\emptyset, Ω = F_{0} \subseteq F_{1} \subseteq \dots \subseteq F_{n} = F$ , an increasing chain of $σ$ -algebras recording how much information has been revealed. The martingale theory used here — conditional expectation, the tower property, and the martingale property itself — is imported from the cross-spine unit 37.04.01 and applied, not reproved.

Definition (Doob exposure martingale). Let $X : Ω \to R$ be an integrable random variable measured by $F_{n}$ . Its Doob martingale relative to the filtration is the sequence $Z_{i} = E [X ∣ F_{i}]$ , $0 \leq i \leq n$ . By the tower property $E [Z_{i + 1} ∣ F_{i}] = Z_{i}$ , so $(Z_{i})$ is a martingale with $Z_{0} = E [X]$ and $Z_{n} = X$ . When $X = f (ξ_{1}, \dots, ξ_{n})$ is a function of the inputs and $F_{i} = σ (ξ_{1}, \dots, ξ_{i})$ records the first $i$ inputs, the steps $Z_{i}$ are the running conditional means as the inputs are exposed one at a time. For a graph functional on $G (n, p)$ two filtrations are standard: the vertex-exposure martingale, where $F_{i}$ reveals all edges among the first $i$ vertices, and the edge-exposure martingale, where $F_{i}$ reveals the first $i$ edges in a fixed order.

Definition (bounded increments). The martingale $(Z_{i})$ has increments bounded by $(c_{i})$ if $∣ Z_{i} - Z_{i - 1} ∣ \leq c_{i}$ almost surely for each $i$ .

Definition (Lipschitz / bounded-differences functional). A function $f : \prod_{j = 1}^{n} Λ_{j} \to R$ satisfies the bounded-differences condition with constants $(c_{j})$ if, for every coordinate $j$ and any two inputs $x, x^{'}$ differing only in coordinate $j$ , $$ |f(x) - f(x')| \le c_j . $$ Equivalently, $f$ is $c_{j}$ -Lipschitz in its $j$ -th coordinate with respect to the discrete metric.

Definition (certifiability). For $f \geq 0$ on a product space and a monotone gauge $ψ$ , $f$ is $ψ$ -certifiable if whenever $f (x) \geq s$ there is an index set $I \subseteq [n]$ with $∣ I ∣ \leq ψ (s)$ such that every $y$ agreeing with $x$ on $I$ also has $f (y) \geq s$ : a witness of size $ψ (s)$ certifies the value. The longest increasing subsequence is $ψ$ -certifiable with $ψ (s) = s$ , the increasing subsequence itself being the witness. Certifiability is the hypothesis that upgrades Talagrand's inequality from a bound at scale $n$ to one at scale $f$ ^{[Talagrand 1995]}.

Counterexamples to common slips Intermediate+

"Bounded differences needs the coordinates to be identically distributed." It does not. The inputs $ξ_{j}$ need only be independent; they may live in different spaces with different laws. Only the per-coordinate Lipschitz constants $c_{j}$ enter the bound.
"A martingale with small steps must have small range, so concentration is automatic." Small steps bound the range by $\sum c_{i}$ , which can be large; the gain is that the typical deviation is only $\sum c_{i}^{2}$ . Azuma's exponential bound, not the triangle inequality, supplies that square-root scale.
"McDiarmid follows from independence of $f$ 's contributions." The function $f$ need not decompose as a sum; $f$ may be a wildly non-linear functional (chromatic number, LIS). What is used is only that each coordinate's marginal effect is bounded, which makes the Doob increments bounded.
"Concentration around the mean and around the median are interchangeable." For the bounded-differences scale $\sum c_{i}^{2}$ they differ by $O (\sum c_{i}^{2})$ and may be conflated; for Talagrand's $f$ scale the median is the natural centre and the mean-to-median gap must be controlled separately.

Key theorem with proof Intermediate+

The signature result is the Azuma-Hoeffding inequality: a martingale whose steps are bounded deviates from its start by at most the square root of the summed squared step-bounds, with Gaussian tails. The combinatorial method of this unit is the systematic application of this one inequality to exposure martingales.

Theorem (Azuma-Hoeffding). Let $Z_{0}, Z_{1}, \dots, Z_{n}$ be a martingale with $∣ Z_{i} - Z_{i - 1} ∣ \leq c_{i}$ almost surely for each $i$ . Then for every $t > 0$ , $$ \Pr\big(Z_n - Z_0 \ge t\big) \le \exp!\Big(\frac{-t^2}{2\sum_{i=1}^n c_i^2}\Big), \qquad \Pr\big(|Z_n - Z_0| \ge t\big) \le 2\exp!\Big(\frac{-t^2}{2\sum_{i=1}^n c_i^2}\Big). $$ ^{[Azuma 1967]}

Proof. Write $D_{i} = Z_{i} - Z_{i - 1}$ , so $E [D_{i} ∣ F_{i - 1}] = 0$ and $∣ D_{i} ∣ \leq c_{i}$ . The engine is the conditional Hoeffding lemma: if $W$ is a random variable with $E [W ∣ G] = 0$ and $∣ W ∣ \leq c$ , then for any $λ > 0$ , $E [e^{λW} ∣ G] \leq e^{λ^{2} c^{2} /2}$ . To see it, convexity of $w \mapsto e^{λ w}$ on $[- c, c]$ gives $e^{λ w} \leq \frac{c - w}{2 c} e^{- λ c} + \frac{c + w}{2 c} e^{λ c}$ ; taking conditional expectation and using $E [W ∣ G] = 0$ leaves $cosh (λ c) \leq e^{λ^{2} c^{2} /2}$ , the last step from comparing Taylor series term by term.

Now bound the moment generating function of $Z_{n} - Z_{0} = \sum_{i} D_{i}$ by conditioning outward-in. For $λ > 0$ , $$ \mathbb{E}\big[e^{\lambda(Z_n - Z_0)}\big] = \mathbb{E}\Big[e^{\lambda \sum_{i<n} D_i},\mathbb{E}\big[e^{\lambda D_n}\mid \mathcal{F}{n-1}\big]\Big] \le e^{\lambda^2 c_n^2 / 2},\mathbb{E}\big[e^{\lambda \sum{i<n} D_i}\big], $$ since $\sum_{i < n} D_{i}$ is $F_{n - 1}$ -measurable and the conditional Hoeffding lemma bounds the inner factor. Iterating over $i = n, n - 1, \dots, 1$ , $$ \mathbb{E}\big[e^{\lambda(Z_n - Z_0)}\big] \le \exp!\Big(\tfrac{\lambda^2}{2}\sum_{i=1}^n c_i^2\Big). $$ Markov's inequality applied to $e^{λ (Z_{n} - Z_{0})}$ gives $Pr (Z_{n} - Z_{0} \geq t) \leq e^{- λ t} exp (\frac{λ ^{2}}{2} \sum c_{i}^{2})$ . Optimising over $λ$ at $λ = t / \sum c_{i}^{2}$ yields $exp (- t^{2} / (2 \sum c_{i}^{2}))$ . The two-sided bound follows by applying the one-sided bound to the martingale $- Z_{i}$ . $□$

Corollary (McDiarmid's bounded-differences inequality). Let $ξ_{1}, \dots, ξ_{n}$ be independent and $f$ satisfy the bounded-differences condition with constants $(c_{j})$ . Then $X = f (ξ_{1}, \dots, ξ_{n})$ obeys $Pr (∣ X - E X ∣ \geq t) \leq 2 exp (- 2 t^{2} / \sum_{j} c_{j}^{2})$ . Proof. Take the Doob martingale $Z_{i} = E [f ∣ ξ_{1}, \dots, ξ_{i}]$ . Independence and the bounded-differences condition give increments bounded by $c_{i}$ — exposing $ξ_{i}$ moves the conditional mean by at most the coordinate's Lipschitz constant — and a sharper coupling improves the constant in the exponent from $\frac{1}{2}$ to $2$ . Azuma applied to $(Z_{i})$ then delivers the bound, with $Z_{0} = E X$ and $Z_{n} = X$ ^{[McDiarmid 1989]}. $□$

Bridge. Azuma-Hoeffding is the foundational reason that combinatorial functionals concentrate without any computation of their means: it converts a purely local hypothesis — each exposure step moves the running conditional mean by at most $c_{i}$ — into a global Gaussian tail at scale $\sum c_{i}^{2}$ , and this is exactly the move that the bounded-differences method industrialises by reading off the $c_{i}$ from a one-coordinate Lipschitz check. The construction builds toward the Shamir-Spencer chromatic concentration of the Advanced results, where the increments are bounded by $1$ for free, and the same exposure-martingale template appears again in the triangle-count, first-passage-percolation, and longest-increasing-subsequence applications below. Putting these together, the unit is the combinatorial face of one inequality: where the large-deviations spine 37.07.02 extracts the exact exponential rate for sums of independent variables, the Azuma method trades that precision for the freedom to handle arbitrary bounded-difference functionals, so the bridge is that both are Cramér-type bounds, one sharp and structured, the other crude and universal.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove the conditional Hoeffding lemma used in the Azuma proof: if $E [W ∣ G] = 0$ and $∣ W ∣ \leq c$ , then $E [e^{λW} ∣ G] \leq e^{λ^{2} c^{2} /2}$ for all $λ$ .

Hint

Bound $e^{λ w}$ above by the chord (secant line) of the convex exponential across $[- c, c]$ , take conditional expectation, then show $cosh (λ c) \leq e^{λ^{2} c^{2} /2}$ by comparing power series.

Answer

Convexity of $w \mapsto e^{λ w}$ gives, for $w \in [- c, c]$ , $e^{λ w} \leq \frac{c - w}{2 c} e^{- λ c} + \frac{c + w}{2 c} e^{λ c}$ , the chord through the endpoints. Taking $E [\cdot ∣ G]$ and using $E [W ∣ G] = 0$ kills the linear-in- $w$ term, leaving $E [e^{λW} ∣ G] \leq \frac{1}{2} (e^{- λ c} + e^{λ c}) = cosh (λ c)$ . Finally $cosh x = \sum_{k \geq 0} x^{2 k} / (2 k)! \leq \sum_{k \geq 0} (x^{2} /2)^{k} / k! = e^{x^{2} /2}$ , since $(2 k)! \geq 2^{k} k!$ , so $cosh (λ c) \leq e^{λ^{2} c^{2} /2}$ . Rubric: full credit for the chord bound, the cancellation of the mean-zero term, and the $cosh \leq e^{x^{2} /2}$ series comparison.

Exercise 4 (medium, symbolic).

Let $f$ count the number of empty bins when $m$ balls are thrown independently into $n$ bins. Show that altering the destination of one ball changes $f$ by at most $1$ , and deduce $Pr (∣ f - E f ∣ \geq t) \leq 2 exp (- 2 t^{2} / m)$ .

Hint

Treat the inputs as the $m$ ball destinations. Moving one ball can empty at most one bin and fill at most one bin — but they cannot both flip in the same direction; the net change in the empty count is at most $1$ in absolute value.

Answer

Encode the configuration as $(ξ_{1}, \dots, ξ_{m})$ , the destination of each ball, independent and uniform on $[n]$ . Reassigning ball $j$ from bin $a$ to bin $b$ can affect only bins $a$ and $b$ : bin $a$ may gain emptiness if ball $j$ was its sole occupant ( $+ 1$ ), and bin $b$ may lose emptiness if it was previously empty ( $- 1$ ). These two effects have opposite sign, so the empty-bin count changes by at most $1$ : $c_{j} = 1$ for each $j$ . With $\sum_{j} c_{j}^{2} = m$ , McDiarmid's inequality gives $Pr (∣ f - E f ∣ \geq t) \leq 2 exp (- 2 t^{2} / m)$ . Rubric: full credit for the per-ball Lipschitz constant $c_{j} = 1$ via the opposite-sign argument and the substitution $\sum c_{j}^{2} = m$ .

Exercise 5 (medium, symbolic).

Let $L_{n}$ be the length of the longest increasing subsequence of a uniformly random permutation of $[n]$ , viewed as a function of the $n$ independent values placed in order. Show the bounded-differences constants are $c_{j} = 1$ and conclude $Pr (∣ L_{n} - E L_{n} ∣ \geq t) \leq 2 exp (- 2 t^{2} / n)$ .

Hint

Model the permutation by $n$ independent uniform $[0, 1]$ values $ξ_{1} < \dots$ read in position order; $L_{n}$ is the longest increasing run. Changing one value alters the LIS by at most $1$ , since at most one element of any increasing subsequence sits at that position.

Answer

Represent the random permutation by independent uniform reals $ξ_{1}, \dots, ξ_{n}$ on $[0, 1]$ ; the longest increasing subsequence of the induced order has length $L_{n}$ . Resampling a single $ξ_{j}$ changes $L_{n}$ by at most $1$ : any increasing subsequence uses the position $j$ at most once, so deleting position $j$ shortens the LIS by at most $1$ , and the resampled value can lengthen it by at most $1$ . Hence $c_{j} = 1$ and $\sum_{j} c_{j}^{2} = n$ , giving $Pr (∣ L_{n} - E L_{n} ∣ \geq t) \leq 2 exp (- 2 t^{2} / n)$ . This bounded-differences bound gives deviations at scale $n$ ; since $E L_{n} \sim 2 n$ , the bound is far from the true fluctuation scale $n^{1/6}$ , which is what motivates Talagrand's refinement. Rubric: full credit for $c_{j} = 1$ via the "one position per subsequence" argument and the $n$ scale, with bonus for noting the bound is loose relative to $n^{1/6}$ .

Exercise 7 (hard, symbolic).

Prove the Shamir-Spencer concentration: the chromatic number of $G (n, p)$ satisfies $Pr (∣ χ - E χ ∣ > λ n - 1) < 2 e^{- λ^{2} /2}$ , using the vertex-exposure martingale. Explain why this needs no estimate of $E χ$ .

Hint

Order the vertices and let $F_{i}$ reveal all edges among the first $i$ vertices. Show the increment $∣ Z_{i} - Z_{i - 1} ∣ \leq 1$ : adding one vertex (with its edges) changes the chromatic number by at most one. Apply Azuma with $n - 1$ informative steps.

Answer

Fix a vertex ordering $v_{1}, \dots, v_{n}$ and let $F_{i}$ be generated by the edges within ${v_{1}, \dots, v_{i}}$ ; set $Z_{i} = E [χ (G) ∣ F_{i}]$ , the vertex-exposure Doob martingale, with $Z_{0} = E χ$ and $Z_{n} = χ$ . Increment bound: for any two graphs differing only in the edges incident to $v_{i}$ , the chromatic numbers differ by at most $1$ , since a proper colouring of the larger graph minus $v_{i}$ extends to $v_{i}$ with at most one fresh colour. Averaging this pointwise bound over the still-hidden edges gives $∣ Z_{i} - Z_{i - 1} ∣ \leq 1$ ; the first vertex $v_{1}$ carries no internal edges, so only $n - 1$ steps are informative and $\sum c_{i}^{2} = n - 1$ . Azuma's two-sided inequality yields $Pr (∣ χ - E χ ∣ > λ n - 1) < 2 exp (- (λ n - 1)^{2} / (2 (n - 1))) = 2 e^{- λ^{2} /2}$ . No value of $E χ$ enters: the increment bound is a structural Lipschitz fact about $χ$ , and Azuma centres automatically at $Z_{0} = E χ$ , so the concentration window is pinned even though locating it within the window remains a separate, much harder problem. Rubric: full credit for the vertex-exposure construction, the $\leq 1$ increment via colour-extension, $\sum c_{i}^{2} = n - 1$ , and the explanation that Azuma centres at the unknown mean.

Exercise 8 (hard, short-answer).

Explain in one paragraph why bounded differences gives the longest increasing subsequence concentration only at scale $n$ , whereas Talagrand's convex-distance inequality with certifiability sharpens it toward scale $E L_{n} \approx 2 n = O (n^{1/4})$ .

Hint

McDiarmid uses $\sum c_{j}^{2} = n$ regardless of how large $f$ is. Talagrand's bound replaces $n$ by the number of coordinates a witness actually uses — for the LIS, a subsequence of length $\approx E L_{n}$ , so the relevant count is $f$ itself, not $n$ .

Answer

McDiarmid charges the full coordinate budget: each of the $n$ positions has Lipschitz constant $1$ , so $\sum c_{j}^{2} = n$ and the deviation scale is $n$ , independent of how small $L_{n}$ actually is. This is wasteful for the LIS, whose value $\approx 2 n$ is certified by a short witness — the increasing subsequence itself, of length $\approx 2 n$ . Talagrand's convex-distance inequality exploits exactly this: for an $f$ -certifiable, $1$ -Lipschitz functional, the deviation is controlled not by $n$ but by the witness size $f$ , giving $Pr (∣ L_{n} - M ∣ \geq t) \leq 4 exp (- t^{2} / (4 E L_{n}))$ around the median $M$ , hence fluctuations at scale $E L_{n} = O (n^{1/4})$ rather than $n$ . The mechanism is that the convex distance to a level set measures how many coordinates must change to push $f$ down, and certifiability caps that count by $f$ , replacing the global $n$ in the exponent with the local $f$ . (The true fluctuation scale $n^{1/6}$ , from the Tracy-Widom limit of Baik-Deift-Johansson, lies beyond even this; Talagrand sharpens but does not saturate.) Rubric: full credit for contrasting $\sum c_{j}^{2} = n$ with the witness-size budget, naming certifiability as the bridge, and locating the Talagrand scale at $E L_{n} = O (n^{1/4})$ .

Advanced results Master

The exposure martingale and Azuma's inequality open into a graded family: the choice of filtration tunes the increment bounds, the bounded-differences inequality handles arbitrary independent-input functionals, and Talagrand's convex-distance inequality replaces the global coordinate count by a local, value-dependent budget for functionals that are cheaply certifiable.

Theorem 1 (Shamir-Spencer chromatic concentration). For $G (n, p)$ with any $p = p (n)$ , the chromatic number satisfies $Pr (∣ χ - E χ ∣ > λ n - 1) < 2 e^{- λ^{2} /2}$ , so $χ$ is concentrated in a window of width $O (n)$ about its mean, with no estimate of the mean required ^{[Shamir-Spencer 1987]}. The vertex-exposure martingale has increments bounded by $1$ because adding a vertex changes $χ$ by at most one; the edge-exposure martingale, with $(2 n)$ steps each of bound $1$ , gives the far weaker scale $n / 2$ , so the choice of filtration is the substantive design decision. For sparse $p = n^{- α}$ , $α > 5/6$ , Łuczak (1991) and Alon-Krivelevich (1997) sharpened the window to a single value or two consecutive values — two-point concentration — by combining the martingale bound with a structural argument bounding the number of colours, a phenomenon the martingale alone cannot reach.

Theorem 2 (triangle-count concentration). Let $T$ be the number of triangles in $G (n, p)$ . The edge-exposure martingale of $T$ has increments bounded by $n - 2$ (one edge lies in at most $n - 2$ triangles), giving the bounded-differences bound $Pr (∣ T - E T ∣ \geq t) \leq 2 exp (- 2 t^{2} / ((2 n) (n - 2)^{2}))$ . This is sharp only in a limited range; the true upper-tail rate for triangle counts is governed by the polynomial concentration of Kim-Vu and the large-deviation rate of Chatterjee-Dembo and Lubetzky-Zhao, where the dominant strategy is the planting of a clique or a hub of high-degree vertices rather than a uniform shift — a regime where the bounded-differences philosophy of uniformly small increments breaks down ^{[Alon-Spencer 2016]}.

Theorem 3 (first-passage percolation). Assign independent edge weights ${w_{e}}$ to the edges of $Z^{d}$ (or $K_{n}$ ), and let $T (u, v) = min_{γ : u \to v} \sum_{e \in γ} w_{e}$ be the passage time. If the weights are bounded in $[0, c]$ , then $T (u, v)$ is $1$ -Lipschitz in each edge weight along any geodesic of bounded length $L$ , so McDiarmid gives $Pr (∣ T - E T ∣ \geq t) \leq 2 exp (- 2 t^{2} / (L c^{2}))$ . The bound captures the correct order in many regimes but not the sublinear $Var T = O (L / lo g L)$ of Benjamini-Kalai-Schramm, whose proof replaces the martingale by a hypercontractive (Fourier-analytic) variance bound — the next layer above Azuma.

Theorem 4 (Talagrand's convex-distance inequality). Let $Ω = \prod_{j = 1}^{n} Ω_{j}$ carry a product measure. For $x \in Ω$ and $A \subseteq Ω$ , define the convex distance $d_{T} (x, A) = sup_{∥ α ∥_{2} \leq 1, α \geq 0} min_{y \in A} \sum_{j : x_{j} \neq = y_{j}} α_{j}$ . Then $Pr (A) Pr ({x : d_{T} (x, A) \geq t}) \leq e^{- t^{2} /4}$ ^{[Talagrand 1995]}. For a $1$ -Lipschitz, $ψ$ -certifiable functional $f$ with median $M$ , this gives $Pr (f \leq M - t) \leq 2 e^{- t^{2} / (4 ψ (M))}$ and $Pr (f \geq M + t) \leq 2 e^{- t^{2} / (4 ψ (M + t))}$ , so the deviation scale is $ψ (f)$ rather than $n$ . For the LIS with $ψ (s) = s$ and $M \approx 2 n$ , the fluctuation scale drops from $n$ to $O (n^{1/4})$ ; for the assignment and Euclidean-TSP functionals the same mechanism gives the right concentration order. The inequality is an isoperimetric statement: it says product measure concentrates on the convex-distance neighbourhood of any set of non-negligible measure, the discrete-product analogue of Gaussian and spherical isoperimetry.

Theorem 5 (the isoperimetric viewpoint). Each concentration result is a statement that a $1$ -Lipschitz function on a metric product space is nearly constant. Azuma corresponds to the Hamming-metric vertex isoperimetry of the cube, where the extremal sets are subcubes and the deviation scale is $n$ ; Talagrand's $d_{T}$ is a strictly finer, convexified metric whose neighbourhoods are smaller, so its concentration is stronger. The chain Azuma $\Rightarrow$ McDiarmid $\Rightarrow$ Talagrand is a chain of progressively sharper isoperimetric inequalities on product spaces, and the entropy method of the co-produced unit on log-Sobolev/entropy concentration is a fourth route to the same family, often with the sharpest constants ^{[McDiarmid 1989]}.

Synthesis. The bounded-differences method is one inequality wearing many combinatorial costumes: Azuma-Hoeffding turns the local hypothesis each exposure step moves the conditional mean by at most $c_{i}$ into the global Gaussian tail at scale $\sum c_{i}^{2}$ , and the foundational reason this works is that mean-zero bounded increments have sub-Gaussian moment generating functions that multiply along the Doob filtration. The chromatic, triangle, percolation, and LIS results are exactly this identity applied to different exposure martingales, and the choice of filtration — vertex versus edge — is what sets the increment bounds and hence the scale, which is why Shamir-Spencer's $O (n)$ window is dual to the cruder $O (n / 2)$ that edge exposure would give. The central insight is that Talagrand's inequality generalises bounded differences by replacing the uniform coordinate budget $\sum c_{j}^{2} = n$ with the convex distance, a value-dependent budget that for certifiable functionals collapses to the witness size $ψ (f)$ ; this is exactly the $n \to f$ improvement for the LIS, and it is the isoperimetric statement that product measure concentrates on convex-distance neighbourhoods. Putting these together, concentration of combinatorial measure is a tower — Azuma, then McDiarmid, then Talagrand, then the entropy method — and the bridge upward at each level is a sharper isoperimetry on the same product space, trading the universal Lipschitz bound for ever more local control, while the large-deviations spine 37.07.02 runs alongside as the structured, exact-rate counterpart for the special case of independent sums.

Full proof set Master

Proposition 1 (conditional Hoeffding lemma). If $E [W ∣ G] = 0$ and $∣ W ∣ \leq c$ almost surely, then $E [e^{λW} ∣ G] \leq e^{λ^{2} c^{2} /2}$ for all real $λ$ .

Proof. For $w \in [- c, c]$ , convexity of $w \mapsto e^{λ w}$ bounds it below its chord: $e^{λ w} \leq \frac{c - w}{2 c} e^{- λ c} + \frac{c + w}{2 c} e^{λ c}$ . Taking $E [\cdot ∣ G]$ and using $E [W ∣ G] = 0$ removes the term linear in $w$ , leaving $E [e^{λW} ∣ G] \leq cosh (λ c)$ . Comparing power series, $cosh x = \sum_{k \geq 0} \frac{x ^{2 k}}{( 2 k )!} \leq \sum_{k \geq 0} \frac{( x ^{2} /2 ) ^{k}}{k !} = e^{x^{2} /2}$ , because $(2 k)! \geq 2^{k} k!$ termwise. Hence $E [e^{λW} ∣ G] \leq e^{λ^{2} c^{2} /2}$ . $□$

Proposition 2 (Azuma-Hoeffding). For a martingale $(Z_{i})_{i = 0}^{n}$ with $∣ Z_{i} - Z_{i - 1} ∣ \leq c_{i}$ almost surely, $Pr (Z_{n} - Z_{0} \geq t) \leq exp (- t^{2} / (2 \sum_{i} c_{i}^{2}))$ .

Proof. Set $D_{i} = Z_{i} - Z_{i - 1}$ , so $E [D_{i} ∣ F_{i - 1}] = 0$ and $∣ D_{i} ∣ \leq c_{i}$ . By Proposition 1 with $G = F_{i - 1}$ , $E [e^{λ D_{i}} ∣ F_{i - 1}] \leq e^{λ^{2} c_{i}^{2} /2}$ . Conditioning on $F_{n - 1}$ and pulling out the measurable factor, $$ \mathbb{E}[e^{\lambda(Z_n - Z_0)}] = \mathbb{E}\big[e^{\lambda(Z_{n-1}-Z_0)},\mathbb{E}[e^{\lambda D_n}\mid\mathcal{F}{n-1}]\big] \le e^{\lambda^2 c_n^2/2},\mathbb{E}[e^{\lambda(Z{n-1}-Z_0)}]. $$ Iterating down to $i = 1$ gives $E [e^{λ (Z_{n} - Z_{0})}] \leq e^{(λ^{2} /2) \sum_{i} c_{i}^{2}}$ . Markov's inequality on $e^{λ (Z_{n} - Z_{0})}$ yields $Pr (Z_{n} - Z_{0} \geq t) \leq e^{- λ t + (λ^{2} /2) \sum_{i} c_{i}^{2}}$ ; minimising the exponent at $λ = t / \sum_{i} c_{i}^{2}$ gives $exp (- t^{2} / (2 \sum_{i} c_{i}^{2}))$ . $□$

Proposition 3 (McDiarmid bounded-differences inequality). Let $ξ_{1}, \dots, ξ_{n}$ be independent and $f$ satisfy $∣ f (x) - f (x^{'}) ∣ \leq c_{j}$ whenever $x, x^{'}$ differ only in coordinate $j$ . Then $Pr (∣ f (ξ) - E f (ξ) ∣ \geq t) \leq 2 exp (- 2 t^{2} / \sum_{j} c_{j}^{2})$ .

Proof. Let $F_{i} = σ (ξ_{1}, \dots, ξ_{i})$ and $Z_{i} = E [f (ξ) ∣ F_{i}]$ , the Doob martingale, with $Z_{0} = E f$ , $Z_{n} = f (ξ)$ . Define the conditional range of step $i$ , $$ \ell_i = \inf_{u}\mathbb{E}[f \mid \mathcal{F}{i-1}, \xi_i = u], \qquad h_i = \sup{u}\mathbb{E}[f \mid \mathcal{F}{i-1}, \xi_i = u]. $$ By independence the conditional expectation given $ξ_{i} = u$ integrates the remaining coordinates against their unchanged product law, so for any two values $u, u^{'}$ the integrands differ pointwise by at most $c_{i}$ (the bounded-differences hypothesis in coordinate $i$ ); hence $h_{i} - ℓ_{i} \leq c_{i}$ . The increment $D_i = Z_i - Z{i-1} $l i es in$ [\ell_i - Z_{i-1}, h_i - Z_{i-1}] $, anin t er v a l o f l e n g t h$ \le c_i $, an d ha sco n d i t i o na l m e an$ 0 $. T h eH oe f f d in g l e mma f or am e an - z er o v a r iab l es u pp or t e d o nanin t er v a l o f l e n g t h$ c_i $s ha r p e n s P r o p os i t i o n 1 t o$ \mathbb{E}[e^{\lambda D_i}\mid\mathcal{F}_{i-1}] \le e^{\lambda^2 c_i^2/8} $. R e p e a t in g t h e i t er a t i o n o f P r o p os i t i o n 2 w i t h t hi s im p r o v e df a c t or g i v es$ \Pr(f - \mathbb{E}f \ge t) \le \exp(-2t^2/\sum_j c_j^2) $, an d t h e tw o - s i d e d b o u n df o l l o w s f r o m$ -f $.$ \square$

Proposition 4 (Shamir-Spencer chromatic concentration). For $G (n, p)$ , $Pr (∣ χ - E χ ∣ > λ n - 1) < 2 e^{- λ^{2} /2}$ .

Proof. Order the vertices $v_{1}, \dots, v_{n}$ and let $F_{i}$ be generated by the edges within ${v_{1}, \dots, v_{i}}$ ; put $Z_{i} = E [χ (G) ∣ F_{i}]$ , so $Z_{0} = E χ$ , $Z_{n} = χ$ . For graphs $G, G^{'}$ on the same vertex set differing only in the edges incident to $v_{i}$ , a proper colouring of $G - v_{i}$ is a proper colouring of $G^{'} - v_{i}$ , and at most one extra colour is needed to colour $v_{i}$ in either; hence $∣ χ (G) - χ (G^{'}) ∣ \leq 1$ . Conditioning on $F_{i - 1}$ — which fixes the edges among ${v_{1}, \dots, v_{i - 1}}$ — and integrating out the edges from $v_{i}$ to earlier vertices against their product law, the pointwise bound $\leq 1$ passes to $∣ Z_{i} - Z_{i - 1} ∣ \leq 1$ . The first vertex contributes no internal edge, so $F_{1} = F_{0}$ and only $n - 1$ increments are nonzero: $\sum_{i} c_{i}^{2} = n - 1$ . Proposition 2 (two-sided) gives $Pr (∣ χ - E χ ∣ > λ n - 1) < 2 exp (- (λ n - 1)^{2} / (2 (n - 1))) = 2 e^{- λ^{2} /2}$ . $□$

Proposition 5 (Talagrand certifiability corollary, lower tail). Let $f \geq 0$ be $1$ -Lipschitz and $ψ$ -certifiable on a product space, with median $M$ . Then $Pr (f \leq M - t) \leq 2 exp (- t^{2} / (4 ψ (M)))$ .

Proof. Let $A = {f \leq M - t}$ . The claim is that any $x$ with $f (x) \geq M$ has $d_{T} (x, A) \geq t / ψ (M)$ . Take $y \in A$ achieving the inner minimum in $d_{T}$ . Since $f (x) \geq M$ , certifiability supplies an index set $I$ with $∣ I ∣ \leq ψ (M)$ such that any $z$ agreeing with $x$ on $I$ has $f (z) \geq M$ . Form $z$ by overwriting $y$ with $x$ on the coordinates of $I$ where they disagree; then $f (z) \geq M$ , while $1$ -Lipschitzness gives $f (z) \leq f (y) + ∣ {j : z_{j} \neq = y_{j}} ∣ \leq (M - t) + ∣ I \cap {x \neq = y} ∣$ . Hence $∣ I \cap {x \neq = y} ∣ \geq t$ , so the weight vector $α = 1_{I} / ∣ I ∣$ (unit $ℓ_{2}$ norm) certifies $\sum_{j : x_{j} \neq = y_{j}} α_{j} \geq t / ∣ I ∣ \geq t / ψ (M)$ , whence $d_{T} (x, A) \geq t / ψ (M)$ . By the convex-distance inequality, $Pr (A) Pr (f \geq M) \leq Pr (A) Pr (d_{T} (\cdot, A) \geq t / ψ (M)) \leq e^{- t^{2} / (4 ψ (M))}$ . As $Pr (f \geq M) \geq 1/2$ by the median property, $Pr (A) \leq 2 e^{- t^{2} / (4 ψ (M))}$ . $□$

Connections Master

The martingale theory this unit consumes — conditional expectation, the tower property, and the Doob martingale of an integrable variable along a filtration — is developed in 37.04.01, with the Doob $L^{p}$ and maximal inequalities in 37.04.03; the exposure martingales here are precisely Doob martingales for the vertex- and edge-revealing filtrations, and the foundational reason Azuma applies is that those increments are mean-zero and bounded, exactly the structure the optional-stopping and convergence theory of that spine is built on.
The Azuma-Hoeffding bound is the bounded-difference, structure-free cousin of the large-deviations machinery of 37.07.02: Cramér's theorem extracts the exact exponential rate $I (a) = sup_{λ} (λa - lo g E e^{λ ξ})$ for sums of independent variables, where Azuma settles for the universal Gaussian surrogate $e^{- t^{2} /2 \sum c_{i}^{2}}$ obtained by bounding every coordinate's moment generating function uniformly; both are Chernoff-method arguments, and Azuma is what survives when the functional is an arbitrary bounded-difference map rather than a sum.
The chromatic-number concentration proved here pairs with the second-moment estimates of the chromatic number and clique number of $G (n, p)$ in 40.07.03: the second moment locates $E χ$ up to lower-order terms by counting colourings and independent sets, while Azuma certifies that $χ$ sits in an $O (n)$ window around that (still imprecisely located) mean; the two results are complementary halves — one fixes where the window is, the other fixes how narrow it is.
The entropy (log-Sobolev) method of the co-produced unit 40.07.06 is a fourth, often sharpest, route to the same concentration family: where Azuma multiplies bounded conditional moment generating functions along the Doob filtration, the entropy method tensorises a single entropy inequality across independent coordinates, recovering McDiarmid-type bounds and the self-bounding refinements with frequently better constants, and reaching variance proxies the martingale increments cannot see.

Historical & philosophical context Master

The exponential bound for sums of bounded independent variables is due to Wassily Hoeffding's 1963 paper in the Journal of the American Statistical Association ^{[Hoeffding 1963]}, which established the moment-generating-function inequality at the heart of the method. Kazuoki Azuma's 1967 note in the Tôhoku Mathematical Journal ^{[Azuma 1967]} extended the bound from independent sums to martingales with bounded increments, the form combinatorics uses, though the same inequality appears in unpublished work of Hoeffding and was anticipated by Sergei Bernstein. The decisive combinatorial application is Shamir and Spencer's 1987 Combinatorica paper ^{[Shamir-Spencer 1987]}, which introduced the vertex-exposure martingale and proved that the chromatic number of $G (n, p)$ is concentrated in an $O (n)$ interval without any knowledge of its location — a result that startled the field precisely because it certified concentration of a quantity whose value was, and largely remains, unknown.

Colin McDiarmid's 1989 survey ^{[McDiarmid 1989]} packaged the martingale argument as the bounded-differences inequality, a black box requiring only a one-coordinate Lipschitz check, and catalogued its applications across occupancy, percolation, and combinatorial optimisation. Michel Talagrand's 1995 Publications IHÉS memoir ^{[Talagrand 1995]} reconceived concentration as product-space isoperimetry, defining the convex distance and proving the inequality that, through certifiability, sharpens the longest-increasing-subsequence and Euclidean-functional bounds from scale $n$ to scale $f$ . The tower from Hoeffding-Azuma through McDiarmid to Talagrand, with the later entropy method of Ledoux and Boucheron-Lugosi-Massart, constitutes the modern theory of concentration of measure as a unifying lens on combinatorial functionals.

Bibliography Master

@article{hoeffding1963,
  author  = {Hoeffding, Wassily},
  title   = {Probability inequalities for sums of bounded random variables},
  journal = {Journal of the American Statistical Association},
  volume  = {58},
  number  = {301},
  pages   = {13--30},
  year    = {1963}
}

@article{azuma1967,
  author  = {Azuma, Kazuoki},
  title   = {Weighted sums of certain dependent random variables},
  journal = {T\^{o}hoku Mathematical Journal},
  volume  = {19},
  number  = {3},
  pages   = {357--367},
  year    = {1967}
}

@article{shamirspencer1987,
  author  = {Shamir, Eli and Spencer, Joel},
  title   = {Sharp concentration of the chromatic number on random graphs $G_{n,p}$},
  journal = {Combinatorica},
  volume  = {7},
  number  = {1},
  pages   = {121--129},
  year    = {1987}
}

@incollection{mcdiarmid1989,
  author    = {McDiarmid, Colin},
  title     = {On the method of bounded differences},
  booktitle = {Surveys in Combinatorics 1989},
  series    = {London Mathematical Society Lecture Note Series},
  volume    = {141},
  publisher = {Cambridge University Press},
  pages     = {148--188},
  year      = {1989}
}

@article{talagrand1995,
  author  = {Talagrand, Michel},
  title   = {Concentration of measure and isoperimetric inequalities in product spaces},
  journal = {Publications Math\'{e}matiques de l'IH\'{E}S},
  volume  = {81},
  pages   = {73--205},
  year    = {1995}
}

@article{bks2003,
  author  = {Benjamini, Itai and Kalai, Gil and Schramm, Oded},
  title   = {First passage percolation has sublinear distance variance},
  journal = {Annals of Probability},
  volume  = {31},
  number  = {4},
  pages   = {1970--1978},
  year    = {2003}
}

@article{alonkrivelevich1997,
  author  = {Alon, Noga and Krivelevich, Michael},
  title   = {The concentration of the chromatic number of random graphs},
  journal = {Combinatorica},
  volume  = {17},
  number  = {3},
  pages   = {303--313},
  year    = {1997}
}

@book{alonspencer2016,
  author    = {Alon, Noga and Spencer, Joel H.},
  title     = {The Probabilistic Method},
  edition   = {4},
  publisher = {Wiley-Interscience},
  year      = {2016}
}

@book{boucheronlugosimassart2013,
  author    = {Boucheron, St\'{e}phane and Lugosi, G\'{a}bor and Massart, Pascal},
  title     = {Concentration Inequalities: A Nonasymptotic Theory of Independence},
  publisher = {Oxford University Press},
  year      = {2013}
}

Prerequisites

40.07.01

Tier anchors

beginner: Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 7 (martingales and tight concentration, the chromatic number of a random graph, exposure martingales); a 'one change moves the answer by a little' analogy for how revealing information step by step pins a quantity near its average
intermediate: Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 7 §7.1-7.4 (Azuma's inequality, edge- and vertex-exposure martingales, Shamir-Spencer chromatic concentration, the four-point and isoperimetric viewpoints); McDiarmid 1989 *Surveys in Combinatorics* (LMS Lecture Notes 141, Cambridge) §3-§6 (the bounded-differences inequality and its applications)
master: Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 7 and §7.7 (Talagrand's inequality, convex distance, certifiability) and Ch. 8 (the LIS and the $\sqrt{\lambda}$ improvement); Azuma 1967 *Tôhoku Math. J.* 19 and Hoeffding 1963 *J. Amer. Statist. Assoc.* 58; Shamir-Spencer 1987 *Combinatorica* 7; McDiarmid 1989; Talagrand 1995 *Publ. Math. IHÉS* 81 (the isoperimetric/convex-distance inequality)

References

Alon, N. & Spencer, J. H. — The Probabilistic Method · Wiley, 4th edition (2016). Chapter 7 develops concentration of measure for combinatorial functionals: the Doob martingale obtained by exposing the input one coordinate at a time (vertex-exposure and edge-exposure martingales of a graph functional), Azuma's inequality for martingales with bounded increments, and its flagship application — the Shamir-Spencer theorem that the chromatic number of $G(n,p)$ is concentrated in an interval of length $O(\sqrt n)$ even when its location is unknown. The chapter treats the bounded-differences (McDiarmid) inequality for functions of independent inputs, the concentration of the number of triangles, the longest increasing subsequence, and chromatic-number sharpenings, and (§7.7) Talagrand's convex-distance inequality with its certifiability refinement that concentrates around the median and gives the $\sqrt{\lambda}$ (not $\sqrt n$) improvement for functions like the LIS. The isoperimetric reading of all three results is made explicit.
McDiarmid, C. — On the method of bounded differences · *Surveys in Combinatorics 1989*, London Mathematical Society Lecture Note Series 141 (Cambridge University Press), pp. 148-188. The canonical survey of the bounded-differences inequality: if $f(x_1,\dots,x_n)$ changes by at most $c_i$ when coordinate $i$ is altered, then for independent inputs $\Pr(|f - \mathbb{E}f| \ge t) \le 2\exp(-2t^2/\sum c_i^2)$. The proof packages the Azuma martingale argument for the Doob filtration of the independent coordinates, and the survey works the applications to balls-in-bins/occupancy, chromatic number, the assignment and travelling-salesman functionals, first-passage percolation, and the longest common subsequence.
Shamir, E. & Spencer, J. — Sharp concentration of the chromatic number on random graphs $G_{n,p}$ · *Combinatorica* 7 (1987), 121-129. The vertex-exposure Doob martingale of $\chi(G(n,p))$ has increments bounded by $1$ (one new vertex changes the chromatic number by at most one), so Azuma's inequality gives $\Pr(|\chi - \mathbb{E}\chi| > \lambda\sqrt{n-1}) < 2e^{-\lambda^2/2}$ — the chromatic number is concentrated in a window of width $O(\sqrt n)$ around its (unknown) mean. The argument needs no estimate of $\mathbb{E}\chi$ whatsoever.
Talagrand, M. — Concentration of measure and isoperimetric inequalities in product spaces · *Publications Mathématiques de l'IHÉS* 81 (1995), 73-205. The convex-distance inequality: for a product probability space and a set $A$, $\Pr(A)\,\Pr(d_T(\cdot,A) \ge t) \le e^{-t^2/4}$, where $d_T$ is the convex (Talagrand) distance. For a $1$-Lipschitz, $f$-certifiable functional this yields concentration around the *median* with deviation scale $\sqrt{f(\cdot)}$ rather than $\sqrt n$, improving the bounded-differences bound for the longest increasing subsequence and similar low-valued functionals.

Estimated time

beginner: 18m
intermediate: 46m
master: 88m