The Szemerédi Regularity Lemma and the Triangle Removal Lemma
Anchor (Master): Diestel 2017 *Graph Theory* (5th ed., Springer GTM 173) Ch. 7 (regularity, the counting and removal lemmas, the route to Szemerédi's theorem); Tao-Vu 2006 *Additive Combinatorics* (Cambridge Studies in Advanced Mathematics 105) Ch. 10-11 (the regularity and removal lemmas, Roth's theorem via triangle removal, hypergraph regularity); Gowers 1997 *Lower bounds of tower type for Szemerédi's uniformity lemma* (GAFA 7) for the tower-type lower bound; the original sources Szemerédi 1975/1978, Ruzsa-Szemerédi 1978, Roth 1953, Frankl-Rödl and Gowers (hypergraph regularity), Lovász-Szegedy (graph limits)
Intuition Beginner
Take an enormous graph, too big to look at edge by edge, and try to describe it in a few words. The Szemerédi regularity lemma says you always can: chop the dots into a small fixed number of equal-size groups so that, for almost every pair of groups, the lines between them are spread out evenly. Evenly means no secret clumps — zoom into any large chunk of one group and any large chunk of another, and the fraction of possible lines drawn is about the same as for the whole pair. The messy graph becomes a tidy summary: a few groups, and one density number per pair telling you how thickly they connect.
Why is "spread out evenly" so useful? Because a pair of groups joined evenly behaves like a graph drawn at random with that density. Random-like graphs are predictable: you can count how many triangles or other small shapes they hold just from the density numbers, without inspecting a single line. So the regularity lemma turns a wild, specific graph into something you can compute with as if it were random.
The headline payoff is the triangle removal lemma. It says that if a graph has only a tiny number of triangles — too few to matter at scale — then you can wipe out every last triangle by erasing only a tiny number of lines. Few triangles means almost triangle-free, and "almost" can be upgraded to "exactly" cheaply.
Visual Beginner
Picture a big blob of dots cut into four equal groups, . Between each pair of groups, draw lines. For a regular pair, the lines look the same everywhere: take any sizeable handful from one group and any sizeable handful from the other, and the fraction of pairs joined is close to the overall fraction for that pair of groups.
The table below records one density number for each pair of groups. A regular pair earns the label "even"; an irregular pair has a clump somewhere and gets flagged. The lemma promises that the flagged pairs are rare — at most a small fraction of all pairs.
| pair of groups | density (fraction of lines drawn) | even? |
|---|---|---|
| 0.30 | yes | |
| 0.62 | yes | |
| 0.45 | yes | |
| 0.18 | no (a clump) |
Three of these four pairs are even; only hides a clump. With more groups the picture is the same: nearly every pair is even, and the few uneven ones can be safely ignored when you count small shapes.
Worked example Beginner
Count the triangles in a clean three-group example using only the density numbers, the way the counting lemma lets you. Take three groups , each holding 100 dots. Suppose every pair of groups is joined evenly, and the fractions of lines drawn are: - has fraction , - has fraction , and - has fraction .
Step 1. Count the candidate triangles. A triangle here picks one dot from each group: ways to choose one dot per group.
Step 2. For each chosen triple, ask the chance all three connecting lines are present. Because each pair of groups is joined evenly with fraction , each of the three needed lines is present a fraction of the time, and an even (random-like) pair lets us multiply: .
Step 3. Multiply. The expected number of triangles is .
What this tells us: with even pairs, three density numbers settle the triangle count without ever listing a single line. The graph had a million possible triples and the densities alone predict about real triangles. This multiply-the-densities rule is the whole engine: it works precisely because "even" makes each pair behave like a random graph, and randomness lets probabilities multiply.
Check your understanding Beginner
Formal definition Intermediate+
Let be a graph and let be disjoint nonempty vertex sets. Write for the number of edges with one endpoint in each, and define the edge density [Diestel 2017 §7.4] $$ d(X, Y) = \frac{e(X, Y)}{|X|,|Y|} \in [0, 1]. $$ Fix . The pair is -regular if for all and with and one has . Regularity is the statement that the density between large subsets cannot deviate from the global density of the pair: the bipartite graph between and has no dense or sparse clump visible at scale , so it behaves like a random bipartite graph of density .
A partition of is an -regular partition (or equipartition) if (an exceptional set absorbing divisibility remainders), all other parts satisfy , and all but at most of the pairs with are -regular. The number of non-exceptional parts is the order of the partition.
The driving quantity is the energy (or index) of a partition , $$ q(\mathcal{P}) = \sum_{i < j} \frac{|V_i|,|V_j|}{|V|^2}, d(V_i, V_j)^2, $$ the mean square of the densities weighted by pair size. The energy is the mean-square edge density across the partition; it lies in because densities lie in and the weights sum to at most . Refining a partition cannot decrease the energy (a Cauchy-Schwarz / conditional-expectation inequality), and an irregular pair forces a strict, quantified increase under a suitable refinement. The boundedness of against this forced increment is what terminates the partitioning process and yields the lemma.
Counterexamples to common slips
- Regularity is not "the density of every subset equals ". The bound is two-sided and only constrains subsets of size , . Tiny subsets may have wildly different density; an -regular pair can contain an isolated vertex.
- An -regular partition is not a partition into -regular pairs. Up to an -fraction of pairs may be irregular. The lemma controls the number of bad pairs, not their absence.
- Low density does not imply regularity. A bipartite graph that is a perfect matching has density but is highly irregular: some large subset pairs have density and others much higher. Sparseness and evenness are different.
- The bound depends only on , not on or . This uniformity — one bound for all graphs — is the substance of the lemma and the source of its tower-type size.
Key theorem with proof Intermediate+
Theorem (Szemerédi regularity lemma). For every and every integer there is an integer such that every graph on at least vertices admits an -regular equipartition with . (See [Szemerédi 1978], [Diestel 2017 §7.4], [Tao, Vu 2006 Ch. 10].)
Proof (energy increment). Start with any equipartition into parts (plus an exceptional remainder ). Given an equipartition , either it is -regular and we stop, or more than of its pairs are irregular. For each irregular pair pick witnessing subsets , with and , .
The engine is the defect form of Cauchy-Schwarz. For a single pair , partitioning into and its complement and likewise raises the contribution of that pair to the energy: if is -irregular, the refined contribution exceeds . Concretely, writing the densities on the four cells, convexity of gives that the size-weighted mean of the squared cell densities is at least the square of the mean plus a defect term equal to the size-weighted variance, and a witnessing pair of relative density gap on subsets of relative size contributes variance in normalised units.
Now simultaneously refine every part using all the witness sets it participates in: a part is cut by the at most sets into at most cells. Refining a partition never lowers the energy, and the irregular pairs each contribute an extra . Summing over the more than irregular pairs, the energy rises by at least (after accounting for the near-equal part sizes). Then is replaced by a common equipartition refinement with the same energy gain up to a vanishing loss from re-equalising part sizes into the exceptional set.
Because lies in and rises by at least at each non-terminating step, the process stops after at most steps, at an -regular partition. Each step multiplies the number of parts by at most , so is bounded by a tower of 's of height ; this tower is .
Bridge. The regularity lemma builds toward the entire density theory of the chapter and appears again in the counting and removal lemmas below, in the modern proof of Erdős-Stone from 40.05.01, and in the density Ramsey theorems flagged in 40.05.04, because it converts an arbitrary dense graph into a bounded random-like model on which counting is routine. The foundational reason the proof terminates is the monotone-bounded energy: refinement never lowers the mean-square density, an irregular pair forces a quantified increment, and a quantity trapped in cannot increase by a fixed amount forever — this is exactly the index-increment scheme that recurs in the hypergraph and arithmetic regularity lemmas. The counting lemma is dual to the lemma itself: regularity produces the random-like model, and counting reads small-subgraph statistics off that model as if it were random. Putting these together, the regularity method is one strategy — partition until random-like, then count — and the bridge to additive combinatorics is the recognition that the same energy increment, transported to functions on abelian groups, yields the arithmetic regularity behind Roth's and Szemerédi's theorems.
Exercises Intermediate+
Advanced results Master
Theorem (counting lemma). Let be a graph on vertices and let be disjoint vertex sets in a graph . Suppose that for every edge the pair is -regular with density . Then the number of labelled copies of with the -th vertex in is $$ \Big(\prod_{ij \in E(H)} d_{ij} ;\pm; e(H),\varepsilon \Big)\prod_{i=1}^{h} |V_i| . $$ (See [Diestel 2017 §7.5], [Tao, Vu 2006 Ch. 11].) The proof embeds the vertices of one at a time, at each step restricting to the typical vertices whose neighbourhoods into the already-placed parts retain the expected density, which Exercise 3 guarantees lose only an -fraction per step; the regular pairs let each new edge contribute a factor to the count. The triangle case is the workhorse: a regular tripartite configuration of densities bounded below holds triangles.
Theorem (triangle removal lemma). For every there is such that every graph on vertices with at most triangles can be made triangle-free by deleting at most edges. (Ruzsa-Szemerédi [Ruzsa, Szemerédi 1978]; Diestel [Diestel 2017 §7.5].) The deduction is the cleaning argument of Exercise 6: regularise, delete edges in irregular pairs, sparse pairs, and inside parts (total ), and observe that any surviving triangle sits in a regular dense triple, which by the counting lemma forces triangles — so if the triangle count is below that threshold, the cleaned graph is triangle-free. The dependence of on is inverse-tower-type: grows like a tower of height polynomial in , inherited directly from , and no primitive-recursive bound was known until the work of Fox (2011), who removed the regularity lemma from the proof and obtained a tower of bounded height.
Theorem (graph removal lemma). For every fixed graph and every there is such that every -vertex graph with at most copies of can be made -free by deleting at most edges. (Erdős-Frankl-Rödl; Tao-Vu [Tao, Vu 2006 Ch. 11].) The same regularise-clean-count scheme applies, with the counting lemma for general replacing the triangle case; the removal lemma is thereby the canonical structural consequence of regularity, equivalent in strength to a qualitative form of property testing, since it certifies that "few copies of " and "close to -free in edit distance" coincide.
Theorem (lower bound; Gowers). The function in the regularity lemma must grow at least as fast as a tower of 's of height . (Gowers [Gowers 1997].) Gowers constructed graphs forcing the energy to climb in many small increments, so no regular partition with a primitive-recursive number of parts exists; the tower height is intrinsic, not an artefact of the proof. Consequently the removal lemma's cannot be improved to any fixed iterated-exponential rate while the regularity lemma is the engine, which is why Fox's tower-bounded removal lemma had to bypass regularity.
Theorem (sparse and hypergraph regularity, stated). A sparse-graph analogue holds relative to a pseudorandom or upper-regular host (Kohayakawa-Rödl, Conlon-Gowers, Schacht), licensing the transference of the removal method to sparse settings such as random graphs and the primes. The hypergraph regularity lemma of Frankl-Rödl and Gowers, together with its counting lemma, supplies the -uniform analogue from which Szemerédi's theorem — every positive-density subset of contains arbitrarily long arithmetic progressions — follows by the same removal scheme applied to -uniform hypergraphs encoding -term progressions. Layering the sparse transference over hypergraph regularity is one route to the Green-Tao theorem that the primes contain arbitrarily long arithmetic progressions, the relative Szemerédi theorem run against a pseudorandom majorant of the primes.
Synthesis. Putting these together, the regularity lemma, the counting lemma, and the removal lemma are one method seen at three depths: partition any dense graph into a bounded random-like model, read subgraph statistics off the model, and convert "few copies" into "few deletions to none". The foundational reason the method has tower-type cost is the energy increment — the same monotone-bounded quantity that drives the proof also forces, by Gowers's construction, a tower of parts, so the central insight is that boundedness-versus-increment is simultaneously the proof and its price. The counting lemma is dual to the regularity lemma, the two halves of "behaves like a random graph": regularity produces the model and counting exploits it, and the triangle removal lemma is exactly the counting lemma read contrapositively. This is exactly the engine behind Roth's theorem, where the Cartesian-product graph turns 3-term progressions into triangles and the -theorem of Ruzsa-Szemerédi packages the sharpness via Behrend's sets; and it generalises through hypergraph regularity to Szemerédi's theorem and, with sparse transference, to Green-Tao. The bridge to the analytic theory is the graphon: a regular partition is a finite approximation to a measurable limit object, so the regularity lemma is the combinatorial shadow of compactness in the cut metric, and the removal lemma is the statement that the homomorphism density of is continuous there.
Full proof set Master
Proposition 1 (defect Cauchy-Schwarz / energy monotonicity). Let be a partition of into measurable density cells and let refine . Then , with the increment equal to the size-weighted variance of the cell densities of within the cells of .
Proof. Fix a cell of , refined into cells by partitions , . Write , so , and . The edge count is additive, , so , the mean. By the variance identity, $$ \sum_{a,b} w_{ab} d_{ab}^2 = d(V_i,V_j)^2 + \sum_{a,b} w_{ab}\big(d_{ab} - d(V_i,V_j)\big)^2 \geq d(V_i,V_j)^2 . $$ Multiplying by and summing over the cells of gives , the gain being the stated size-weighted variance.
Proposition 2 (irregular pair forces an increment). If is -irregular, witnessed by , with , and , then refining by and by increases this pair's energy contribution by at least .
Proof. By Proposition 1 the increment is the size-weighted variance over the four cells, . Drop all terms except the cell: its weight is , and its deviation satisfies . Hence the variance is , and multiplying by gives the claim.
Proposition 3 (regularity lemma, quantitative termination). The energy-increment process reaches an -regular equipartition in at most steps, yielding with a tower of 's of height .
Proof. If an equipartition with parts is not -regular, more than pairs are irregular. Refining every part by all its witness sets, Proposition 2 contributes per irregular pair, and with parts of near-equal size , the total gain is (the ). Re-equalising the cells into a common equipartition, moving the surplus into , costs an energy loss tending to and keeps . Since and rises by each non-terminating step, there are at most steps. Each step replaces parts by at most cells, so iterating times bounds by a tower of height .
Proposition 4 (triangle counting lemma with explicit error). If are -regular with densities , the number of triangles with one vertex in each part is at least .
Proof. Call typical if and . By Exercise 3 applied to each of the pairs and (with ), at most vertices fail the first condition and at most the second, so at least vertices are typical. Fix a typical and set , ; then and since . As is -regular, , so the number of edges between and — each completing a triangle through — is . Summing over the typical gives the bound.
Proposition 5 (triangle removal lemma). For every there is with: any -vertex graph with at most triangles can be made triangle-free by deleting at most edges.
Proof. Choose and apply Proposition 3 to get an -regular equipartition into parts, exceptional. Delete edges of three types: (i) those meeting — at most ; (ii) those inside a part or between a pair of density — at most ; (iii) those in -irregular pairs — at most . For large and the total is . Any triangle of the cleaned graph has its three vertices in distinct parts , each pair -regular of density ; Proposition 4 then forces triangles in (these triangles already lie in , as cleaning only removes edges). Setting , a graph with triangles cannot have such a surviving regular dense triple, so is triangle-free.
Proposition 6 (Roth's theorem via removal). A set with , fixed, contains a non-degenerate 3-term arithmetic progression once is large.
Proof. Form the tripartite graph on with , , . A triangle yields , , with and , i.e. a 3-AP in . For each and each , the triple is a triangle (the diagonal family), giving triangles, and distinct give distinct edges , so these triangles are pairwise edge-disjoint. If had only degenerate 3-APs (those with ), every triangle would be diagonal, hence the triangles are edge-disjoint; making triangle-free then needs edge deletions. But has triangles, so by Proposition 5 it is made triangle-free with deletions — impossible for fixed and large . Hence has a non-degenerate 3-AP.
Connections Master
Extremal graph theory: Turán and Erdős-Stone-Simonovits
40.05.01. The regularity method gives the modern, conceptual proof of the Erdős-Stone theorem proved by hand in that unit: regularise a graph above the Turán density, find a regular reduced configuration of dense parts, and embed a thick by the counting lemma. The supersaturation and stability phenomena of40.05.01are the removal-lemma's relatives — supersaturation is the counting lemma read above threshold, and stability is the cut-metric closeness that regularity makes precise.Ramsey theory and the density turn
40.05.04. The regularity lemma is the tool Szemerédi invented to prove his arithmetic-progression theorem, the density apex toward which the partition-Ramsey results of40.05.04point; this unit supplies the case (Roth) via triangle removal, and hypergraph regularity supplies the general , so the removal method is the bridge from the colouring statements of40.05.04to their density refinements. Behrend's AP-free sets, which power the -theorem's sharpness here, are the same constructions that bound the Ramsey-type quantities there.The probabilistic method and quasirandomness
40.07.01. The counting lemma's content is that a regular pair is quasirandom — indistinguishable at scale from a random bipartite graph of the same density — so the second-moment and deletion arguments of that unit are the probabilistic skeleton of the regularity method. The removal lemma's near-sharp lower bound uses dense AP-free sets, the same extremal-versus-random tension that unit develops; regularity is the deterministic structure theorem dual to the random constructions there.
Historical & philosophical context Master
Endre Szemerédi introduced the regularity lemma as the central tool in his 1975 proof that every set of integers of positive upper density contains arbitrarily long arithmetic progressions, settling the 1936 conjecture of Erdős and Turán [Roth 1953 gives the $k=3$ case]; the lemma was isolated in its now-standard graph form in his 1978 Orsay colloquium paper Regular partitions of graphs [Szemerédi 1978]. The energy-increment proof, the boundedness of the mean-square density against a forced increment, is the engine, and the resulting bound on the number of parts is tower-type. The earliest application of the regularity philosophy to a removal statement is the 1978 paper of Imre Ruzsa and Szemerédi Triple systems with no six points carrying three triangles [Ruzsa, Szemerédi 1978], which proved the -theorem and, as a consequence, both the triangle removal lemma and a new proof of Roth's 1953 theorem [Roth 1953] on 3-term progressions.
W. T. Gowers proved in 1997 that the tower-type bound is unavoidable: must grow as a tower of height polynomial in [Gowers 1997], showing the lemma's apparent inefficiency is intrinsic. The hypergraph regularity method, developed by Peter Frankl and Vojtěch Rödl and independently by Gowers in the 2000s, extended the scheme to -uniform hypergraphs and yielded combinatorial proofs of Szemerédi's theorem for all progression lengths; Ben Green and Terence Tao used a sparse, relative form of these ideas in their 2008 proof that the primes contain arbitrarily long arithmetic progressions. The analytic completion came with the graph-limit theory of László Lovász and Balázs Szegedy, in which a regular partition is a finite approximation to a graphon, the measurable limit of a graph sequence in the cut metric, recasting the removal lemma as a continuity statement for homomorphism densities.
Bibliography Master
@book{Diestel2017,
author = {Diestel, Reinhard},
title = {Graph Theory},
edition = {5th},
series = {Graduate Texts in Mathematics},
volume = {173},
publisher = {Springer},
year = {2017}
}
@book{TaoVu2006,
author = {Tao, Terence and Vu, Van H.},
title = {Additive Combinatorics},
series = {Cambridge Studies in Advanced Mathematics},
volume = {105},
publisher = {Cambridge University Press},
year = {2006}
}
@incollection{Szemeredi1978,
author = {Szemer\'edi, Endre},
title = {Regular partitions of graphs},
booktitle = {Probl\`emes combinatoires et th\'eorie des graphes (Colloq. Internat. CNRS, Univ. Orsay, 1976)},
series = {Colloq. Internat. CNRS},
volume = {260},
publisher = {CNRS, Paris},
year = {1978},
pages = {399--401}
}
@incollection{RuzsaSzemeredi1978,
author = {Ruzsa, Imre Z. and Szemer\'edi, Endre},
title = {Triple systems with no six points carrying three triangles},
booktitle = {Combinatorics (Keszthely, 1976)},
series = {Colloq. Math. Soc. J\'anos Bolyai},
volume = {18},
publisher = {North-Holland},
year = {1978},
pages = {939--945}
}
@article{Gowers1997,
author = {Gowers, W. Timothy},
title = {Lower bounds of tower type for Szemer\'edi's uniformity lemma},
journal = {Geometric and Functional Analysis},
volume = {7},
year = {1997},
pages = {322--337}
}
@article{Roth1953,
author = {Roth, Klaus F.},
title = {On certain sets of integers},
journal = {Journal of the London Mathematical Society},
volume = {28},
year = {1953},
pages = {104--109}
}
@article{Fox2011,
author = {Fox, Jacob},
title = {A new proof of the graph removal lemma},
journal = {Annals of Mathematics},
volume = {174},
year = {2011},
pages = {561--579}
}
@article{LovaszSzegedy2006,
author = {Lov\'asz, L\'aszl\'o and Szegedy, Bal\'azs},
title = {Limits of dense graph sequences},
journal = {Journal of Combinatorial Theory, Series B},
volume = {96},
year = {2006},
pages = {933--957}
}
@article{GreenTao2008,
author = {Green, Ben and Tao, Terence},
title = {The primes contain arbitrarily long arithmetic progressions},
journal = {Annals of Mathematics},
volume = {167},
year = {2008},
pages = {481--547}
}