Concentration for Combinatorial Functionals: Azuma and the Bounded-Differences Method
Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 7 and §7.7 (Talagrand's inequality, convex distance, certifiability) and Ch. 8 (the LIS and the $\sqrt{\lambda}$ improvement); Azuma 1967 *Tôhoku Math. J.* 19 and Hoeffding 1963 *J. Amer. Statist. Assoc.* 58; Shamir-Spencer 1987 *Combinatorica* 7; McDiarmid 1989; Talagrand 1995 *Publ. Math. IHÉS* 81 (the isoperimetric/convex-distance inequality)
Intuition Beginner
Suppose a quantity depends on a long list of random choices — the colours of all the edges of a random network, the heights dealt to a deck of cards, the bin each of many balls lands in. You would like to know not just its average, but how tightly the quantity hugs that average. Does it wander all over, or does almost every random run land within a narrow band? For a surprising range of such quantities, the answer is: a narrow band, and you can prove it knowing almost nothing about the quantity itself.
The reason is a single, modest fact. If changing any one choice in the list — recolour one edge, swap one card, move one ball — can only nudge the final answer by a small amount, then the answer cannot swing wildly. Many small, independent nudges tend to cancel rather than pile up, the same way a thousand coin flips give a total very close to five hundred heads. The more choices there are, each with a small per-choice influence, the tighter the band around the average becomes, in relative terms.
This "one change moves it a little, so the whole thing stays put" principle is the heart of the bounded-differences method. It is powerful precisely because it asks so little: you never compute the average, you only check that no single input is too influential. The quantity then concentrates near wherever its average happens to be.
Visual Beginner
Imagine revealing the random choices one at a time and, at each step, recording your best current guess for the final answer — the average over all the choices still hidden. Before you reveal anything, your guess is the overall average. As each choice is uncovered, your guess jumps, but only by a little, because no single choice has much pull. By the time everything is revealed, your guess has walked, in small steps, all the way to the true answer.
| Step | What is revealed | What the running guess does |
|---|---|---|
| Start | nothing | sits at the overall average |
| Each step | one more choice | moves up or down by a little |
| End | everything | arrives at the true answer |
The walk takes many short steps, so it ends close to where it started: the answer concentrates near its average, in a band whose width grows like the square root of the number of steps, not the number itself.
Worked example Beginner
Throw balls independently into bins, each ball landing in a uniformly random bin. Let be the number of empty bins at the end. We will see why stays close to its average.
Step 1. Find the average. A fixed bin is missed by one ball with chance , and the nine balls are independent, so that bin is empty with chance . There are nine bins, so the average number of empty bins is . Numerically , so the average is about .
Step 2. Check the per-choice influence. Think of the random choices as the nine landing spots, one per ball. Move a single ball from one bin to another. This can change the empty-bin count by at most : the bin it left might become empty (a change of ) and the bin it entered might stop being empty (a change of ). No single ball can swing by more than .
Step 3. Read off concentration. Nine choices, each with influence at most , means the band around the average scales with . The bounded-differences rule then says a deviation of size from the average has chance at most .
Step 4. Plug in a number. For a deviation of empty bins, this chance is at most ; for it is at most .
What this tells us: without ever pinning the average exactly, the bare fact that one ball changes by at most already forces to sit within a handful of its mean on almost every throw.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, is a probability space carrying a filtration , an increasing chain of -algebras recording how much information has been revealed. The martingale theory used here — conditional expectation, the tower property, and the martingale property itself — is imported from the cross-spine unit 37.04.01 and applied, not reproved.
Definition (Doob exposure martingale). Let be an integrable random variable measured by . Its Doob martingale relative to the filtration is the sequence , . By the tower property , so is a martingale with and . When is a function of the inputs and records the first inputs, the steps are the running conditional means as the inputs are exposed one at a time. For a graph functional on two filtrations are standard: the vertex-exposure martingale, where reveals all edges among the first vertices, and the edge-exposure martingale, where reveals the first edges in a fixed order.
Definition (bounded increments). The martingale has increments bounded by if almost surely for each .
Definition (Lipschitz / bounded-differences functional). A function satisfies the bounded-differences condition with constants if, for every coordinate and any two inputs differing only in coordinate , $$ |f(x) - f(x')| \le c_j . $$ Equivalently, is -Lipschitz in its -th coordinate with respect to the discrete metric.
Definition (certifiability). For on a product space and a monotone gauge , is -certifiable if whenever there is an index set with such that every agreeing with on also has : a witness of size certifies the value. The longest increasing subsequence is -certifiable with , the increasing subsequence itself being the witness. Certifiability is the hypothesis that upgrades Talagrand's inequality from a bound at scale to one at scale [Talagrand 1995].
Counterexamples to common slips Intermediate+
"Bounded differences needs the coordinates to be identically distributed." It does not. The inputs need only be independent; they may live in different spaces with different laws. Only the per-coordinate Lipschitz constants enter the bound.
"A martingale with small steps must have small range, so concentration is automatic." Small steps bound the range by , which can be large; the gain is that the typical deviation is only . Azuma's exponential bound, not the triangle inequality, supplies that square-root scale.
"McDiarmid follows from independence of 's contributions." The function need not decompose as a sum; may be a wildly non-linear functional (chromatic number, LIS). What is used is only that each coordinate's marginal effect is bounded, which makes the Doob increments bounded.
"Concentration around the mean and around the median are interchangeable." For the bounded-differences scale they differ by and may be conflated; for Talagrand's scale the median is the natural centre and the mean-to-median gap must be controlled separately.
Key theorem with proof Intermediate+
The signature result is the Azuma-Hoeffding inequality: a martingale whose steps are bounded deviates from its start by at most the square root of the summed squared step-bounds, with Gaussian tails. The combinatorial method of this unit is the systematic application of this one inequality to exposure martingales.
Theorem (Azuma-Hoeffding). Let be a martingale with almost surely for each . Then for every , $$ \Pr\big(Z_n - Z_0 \ge t\big) \le \exp!\Big(\frac{-t^2}{2\sum_{i=1}^n c_i^2}\Big), \qquad \Pr\big(|Z_n - Z_0| \ge t\big) \le 2\exp!\Big(\frac{-t^2}{2\sum_{i=1}^n c_i^2}\Big). $$ [Azuma 1967]
Proof. Write , so and . The engine is the conditional Hoeffding lemma: if is a random variable with and , then for any , . To see it, convexity of on gives ; taking conditional expectation and using leaves , the last step from comparing Taylor series term by term.
Now bound the moment generating function of by conditioning outward-in. For , $$ \mathbb{E}\big[e^{\lambda(Z_n - Z_0)}\big] = \mathbb{E}\Big[e^{\lambda \sum_{i<n} D_i},\mathbb{E}\big[e^{\lambda D_n}\mid \mathcal{F}{n-1}\big]\Big] \le e^{\lambda^2 c_n^2 / 2},\mathbb{E}\big[e^{\lambda \sum{i<n} D_i}\big], $$ since is -measurable and the conditional Hoeffding lemma bounds the inner factor. Iterating over , $$ \mathbb{E}\big[e^{\lambda(Z_n - Z_0)}\big] \le \exp!\Big(\tfrac{\lambda^2}{2}\sum_{i=1}^n c_i^2\Big). $$ Markov's inequality applied to gives . Optimising over at yields . The two-sided bound follows by applying the one-sided bound to the martingale .
Corollary (McDiarmid's bounded-differences inequality). Let be independent and satisfy the bounded-differences condition with constants . Then obeys . Proof. Take the Doob martingale . Independence and the bounded-differences condition give increments bounded by — exposing moves the conditional mean by at most the coordinate's Lipschitz constant — and a sharper coupling improves the constant in the exponent from to . Azuma applied to then delivers the bound, with and [McDiarmid 1989].
Bridge. Azuma-Hoeffding is the foundational reason that combinatorial functionals concentrate without any computation of their means: it converts a purely local hypothesis — each exposure step moves the running conditional mean by at most — into a global Gaussian tail at scale , and this is exactly the move that the bounded-differences method industrialises by reading off the from a one-coordinate Lipschitz check. The construction builds toward the Shamir-Spencer chromatic concentration of the Advanced results, where the increments are bounded by for free, and the same exposure-martingale template appears again in the triangle-count, first-passage-percolation, and longest-increasing-subsequence applications below. Putting these together, the unit is the combinatorial face of one inequality: where the large-deviations spine 37.07.02 extracts the exact exponential rate for sums of independent variables, the Azuma method trades that precision for the freedom to handle arbitrary bounded-difference functionals, so the bridge is that both are Cramér-type bounds, one sharp and structured, the other crude and universal.
Exercises Intermediate+
Advanced results Master
The exposure martingale and Azuma's inequality open into a graded family: the choice of filtration tunes the increment bounds, the bounded-differences inequality handles arbitrary independent-input functionals, and Talagrand's convex-distance inequality replaces the global coordinate count by a local, value-dependent budget for functionals that are cheaply certifiable.
Theorem 1 (Shamir-Spencer chromatic concentration). For with any , the chromatic number satisfies , so is concentrated in a window of width about its mean, with no estimate of the mean required [Shamir-Spencer 1987]. The vertex-exposure martingale has increments bounded by because adding a vertex changes by at most one; the edge-exposure martingale, with steps each of bound , gives the far weaker scale , so the choice of filtration is the substantive design decision. For sparse , , Łuczak (1991) and Alon-Krivelevich (1997) sharpened the window to a single value or two consecutive values — two-point concentration — by combining the martingale bound with a structural argument bounding the number of colours, a phenomenon the martingale alone cannot reach.
Theorem 2 (triangle-count concentration). Let be the number of triangles in . The edge-exposure martingale of has increments bounded by (one edge lies in at most triangles), giving the bounded-differences bound . This is sharp only in a limited range; the true upper-tail rate for triangle counts is governed by the polynomial concentration of Kim-Vu and the large-deviation rate of Chatterjee-Dembo and Lubetzky-Zhao, where the dominant strategy is the planting of a clique or a hub of high-degree vertices rather than a uniform shift — a regime where the bounded-differences philosophy of uniformly small increments breaks down [Alon-Spencer 2016].
Theorem 3 (first-passage percolation). Assign independent edge weights to the edges of (or ), and let be the passage time. If the weights are bounded in , then is -Lipschitz in each edge weight along any geodesic of bounded length , so McDiarmid gives . The bound captures the correct order in many regimes but not the sublinear of Benjamini-Kalai-Schramm, whose proof replaces the martingale by a hypercontractive (Fourier-analytic) variance bound — the next layer above Azuma.
Theorem 4 (Talagrand's convex-distance inequality). Let carry a product measure. For and , define the convex distance . Then [Talagrand 1995]. For a -Lipschitz, -certifiable functional with median , this gives and , so the deviation scale is rather than . For the LIS with and , the fluctuation scale drops from to ; for the assignment and Euclidean-TSP functionals the same mechanism gives the right concentration order. The inequality is an isoperimetric statement: it says product measure concentrates on the convex-distance neighbourhood of any set of non-negligible measure, the discrete-product analogue of Gaussian and spherical isoperimetry.
Theorem 5 (the isoperimetric viewpoint). Each concentration result is a statement that a -Lipschitz function on a metric product space is nearly constant. Azuma corresponds to the Hamming-metric vertex isoperimetry of the cube, where the extremal sets are subcubes and the deviation scale is ; Talagrand's is a strictly finer, convexified metric whose neighbourhoods are smaller, so its concentration is stronger. The chain Azuma McDiarmid Talagrand is a chain of progressively sharper isoperimetric inequalities on product spaces, and the entropy method of the co-produced unit on log-Sobolev/entropy concentration is a fourth route to the same family, often with the sharpest constants [McDiarmid 1989].
Synthesis. The bounded-differences method is one inequality wearing many combinatorial costumes: Azuma-Hoeffding turns the local hypothesis each exposure step moves the conditional mean by at most into the global Gaussian tail at scale , and the foundational reason this works is that mean-zero bounded increments have sub-Gaussian moment generating functions that multiply along the Doob filtration. The chromatic, triangle, percolation, and LIS results are exactly this identity applied to different exposure martingales, and the choice of filtration — vertex versus edge — is what sets the increment bounds and hence the scale, which is why Shamir-Spencer's window is dual to the cruder that edge exposure would give. The central insight is that Talagrand's inequality generalises bounded differences by replacing the uniform coordinate budget with the convex distance, a value-dependent budget that for certifiable functionals collapses to the witness size ; this is exactly the improvement for the LIS, and it is the isoperimetric statement that product measure concentrates on convex-distance neighbourhoods. Putting these together, concentration of combinatorial measure is a tower — Azuma, then McDiarmid, then Talagrand, then the entropy method — and the bridge upward at each level is a sharper isoperimetry on the same product space, trading the universal Lipschitz bound for ever more local control, while the large-deviations spine 37.07.02 runs alongside as the structured, exact-rate counterpart for the special case of independent sums.
Full proof set Master
Proposition 1 (conditional Hoeffding lemma). If and almost surely, then for all real .
Proof. For , convexity of bounds it below its chord: . Taking and using removes the term linear in , leaving . Comparing power series, , because termwise. Hence .
Proposition 2 (Azuma-Hoeffding). For a martingale with almost surely, .
Proof. Set , so and . By Proposition 1 with , . Conditioning on and pulling out the measurable factor, $$ \mathbb{E}[e^{\lambda(Z_n - Z_0)}] = \mathbb{E}\big[e^{\lambda(Z_{n-1}-Z_0)},\mathbb{E}[e^{\lambda D_n}\mid\mathcal{F}{n-1}]\big] \le e^{\lambda^2 c_n^2/2},\mathbb{E}[e^{\lambda(Z{n-1}-Z_0)}]. $$ Iterating down to gives . Markov's inequality on yields ; minimising the exponent at gives .
Proposition 3 (McDiarmid bounded-differences inequality). Let be independent and satisfy whenever differ only in coordinate . Then .
Proof. Let and , the Doob martingale, with , . Define the conditional range of step , $$ \ell_i = \inf_{u}\mathbb{E}[f \mid \mathcal{F}{i-1}, \xi_i = u], \qquad h_i = \sup{u}\mathbb{E}[f \mid \mathcal{F}{i-1}, \xi_i = u]. $$ By independence the conditional expectation given integrates the remaining coordinates against their unchanged product law, so for any two values the integrands differ pointwise by at most (the bounded-differences hypothesis in coordinate ); hence . The increment $D_i = Z_i - Z{i-1}[\ell_i - Z_{i-1}, h_i - Z_{i-1}]\le c_i0c_i\mathbb{E}[e^{\lambda D_i}\mid\mathcal{F}_{i-1}] \le e^{\lambda^2 c_i^2/8}\Pr(f - \mathbb{E}f \ge t) \le \exp(-2t^2/\sum_j c_j^2)-f\square$
Proposition 4 (Shamir-Spencer chromatic concentration). For , .
Proof. Order the vertices and let be generated by the edges within ; put , so , . For graphs on the same vertex set differing only in the edges incident to , a proper colouring of is a proper colouring of , and at most one extra colour is needed to colour in either; hence . Conditioning on — which fixes the edges among — and integrating out the edges from to earlier vertices against their product law, the pointwise bound passes to . The first vertex contributes no internal edge, so and only increments are nonzero: . Proposition 2 (two-sided) gives .
Proposition 5 (Talagrand certifiability corollary, lower tail). Let be -Lipschitz and -certifiable on a product space, with median . Then .
Proof. Let . The claim is that any with has . Take achieving the inner minimum in . Since , certifiability supplies an index set with such that any agreeing with on has . Form by overwriting with on the coordinates of where they disagree; then , while -Lipschitzness gives . Hence , so the weight vector (unit norm) certifies , whence . By the convex-distance inequality, . As by the median property, .
Connections Master
The martingale theory this unit consumes — conditional expectation, the tower property, and the Doob martingale of an integrable variable along a filtration — is developed in
37.04.01, with the Doob and maximal inequalities in37.04.03; the exposure martingales here are precisely Doob martingales for the vertex- and edge-revealing filtrations, and the foundational reason Azuma applies is that those increments are mean-zero and bounded, exactly the structure the optional-stopping and convergence theory of that spine is built on.The Azuma-Hoeffding bound is the bounded-difference, structure-free cousin of the large-deviations machinery of
37.07.02: Cramér's theorem extracts the exact exponential rate for sums of independent variables, where Azuma settles for the universal Gaussian surrogate obtained by bounding every coordinate's moment generating function uniformly; both are Chernoff-method arguments, and Azuma is what survives when the functional is an arbitrary bounded-difference map rather than a sum.The chromatic-number concentration proved here pairs with the second-moment estimates of the chromatic number and clique number of in
40.07.03: the second moment locates up to lower-order terms by counting colourings and independent sets, while Azuma certifies that sits in an window around that (still imprecisely located) mean; the two results are complementary halves — one fixes where the window is, the other fixes how narrow it is.The entropy (log-Sobolev) method of the co-produced unit
40.07.06is a fourth, often sharpest, route to the same concentration family: where Azuma multiplies bounded conditional moment generating functions along the Doob filtration, the entropy method tensorises a single entropy inequality across independent coordinates, recovering McDiarmid-type bounds and the self-bounding refinements with frequently better constants, and reaching variance proxies the martingale increments cannot see.
Historical & philosophical context Master
The exponential bound for sums of bounded independent variables is due to Wassily Hoeffding's 1963 paper in the Journal of the American Statistical Association [Hoeffding 1963], which established the moment-generating-function inequality at the heart of the method. Kazuoki Azuma's 1967 note in the Tôhoku Mathematical Journal [Azuma 1967] extended the bound from independent sums to martingales with bounded increments, the form combinatorics uses, though the same inequality appears in unpublished work of Hoeffding and was anticipated by Sergei Bernstein. The decisive combinatorial application is Shamir and Spencer's 1987 Combinatorica paper [Shamir-Spencer 1987], which introduced the vertex-exposure martingale and proved that the chromatic number of is concentrated in an interval without any knowledge of its location — a result that startled the field precisely because it certified concentration of a quantity whose value was, and largely remains, unknown.
Colin McDiarmid's 1989 survey [McDiarmid 1989] packaged the martingale argument as the bounded-differences inequality, a black box requiring only a one-coordinate Lipschitz check, and catalogued its applications across occupancy, percolation, and combinatorial optimisation. Michel Talagrand's 1995 Publications IHÉS memoir [Talagrand 1995] reconceived concentration as product-space isoperimetry, defining the convex distance and proving the inequality that, through certifiability, sharpens the longest-increasing-subsequence and Euclidean-functional bounds from scale to scale . The tower from Hoeffding-Azuma through McDiarmid to Talagrand, with the later entropy method of Ledoux and Boucheron-Lugosi-Massart, constitutes the modern theory of concentration of measure as a unifying lens on combinatorial functionals.
Bibliography Master
@article{hoeffding1963,
author = {Hoeffding, Wassily},
title = {Probability inequalities for sums of bounded random variables},
journal = {Journal of the American Statistical Association},
volume = {58},
number = {301},
pages = {13--30},
year = {1963}
}
@article{azuma1967,
author = {Azuma, Kazuoki},
title = {Weighted sums of certain dependent random variables},
journal = {T\^{o}hoku Mathematical Journal},
volume = {19},
number = {3},
pages = {357--367},
year = {1967}
}
@article{shamirspencer1987,
author = {Shamir, Eli and Spencer, Joel},
title = {Sharp concentration of the chromatic number on random graphs $G_{n,p}$},
journal = {Combinatorica},
volume = {7},
number = {1},
pages = {121--129},
year = {1987}
}
@incollection{mcdiarmid1989,
author = {McDiarmid, Colin},
title = {On the method of bounded differences},
booktitle = {Surveys in Combinatorics 1989},
series = {London Mathematical Society Lecture Note Series},
volume = {141},
publisher = {Cambridge University Press},
pages = {148--188},
year = {1989}
}
@article{talagrand1995,
author = {Talagrand, Michel},
title = {Concentration of measure and isoperimetric inequalities in product spaces},
journal = {Publications Math\'{e}matiques de l'IH\'{E}S},
volume = {81},
pages = {73--205},
year = {1995}
}
@article{bks2003,
author = {Benjamini, Itai and Kalai, Gil and Schramm, Oded},
title = {First passage percolation has sublinear distance variance},
journal = {Annals of Probability},
volume = {31},
number = {4},
pages = {1970--1978},
year = {2003}
}
@article{alonkrivelevich1997,
author = {Alon, Noga and Krivelevich, Michael},
title = {The concentration of the chromatic number of random graphs},
journal = {Combinatorica},
volume = {17},
number = {3},
pages = {303--313},
year = {1997}
}
@book{alonspencer2016,
author = {Alon, Noga and Spencer, Joel H.},
title = {The Probabilistic Method},
edition = {4},
publisher = {Wiley-Interscience},
year = {2016}
}
@book{boucheronlugosimassart2013,
author = {Boucheron, St\'{e}phane and Lugosi, G\'{a}bor and Massart, Pascal},
title = {Concentration Inequalities: A Nonasymptotic Theory of Independence},
publisher = {Oxford University Press},
year = {2013}
}