40.05.03 · combinatorics / extremal-ramsey

The Szemerédi Regularity Lemma and the Triangle Removal Lemma

shipped3 tiersLean: none

Anchor (Master): Diestel 2017 *Graph Theory* (5th ed., Springer GTM 173) Ch. 7 (regularity, the counting and removal lemmas, the route to Szemerédi's theorem); Tao-Vu 2006 *Additive Combinatorics* (Cambridge Studies in Advanced Mathematics 105) Ch. 10-11 (the regularity and removal lemmas, Roth's theorem via triangle removal, hypergraph regularity); Gowers 1997 *Lower bounds of tower type for Szemerédi's uniformity lemma* (GAFA 7) for the tower-type lower bound; the original sources Szemerédi 1975/1978, Ruzsa-Szemerédi 1978, Roth 1953, Frankl-Rödl and Gowers (hypergraph regularity), Lovász-Szegedy (graph limits)

Intuition Beginner

Take an enormous graph, too big to look at edge by edge, and try to describe it in a few words. The Szemerédi regularity lemma says you always can: chop the dots into a small fixed number of equal-size groups so that, for almost every pair of groups, the lines between them are spread out evenly. Evenly means no secret clumps — zoom into any large chunk of one group and any large chunk of another, and the fraction of possible lines drawn is about the same as for the whole pair. The messy graph becomes a tidy summary: a few groups, and one density number per pair telling you how thickly they connect.

Why is "spread out evenly" so useful? Because a pair of groups joined evenly behaves like a graph drawn at random with that density. Random-like graphs are predictable: you can count how many triangles or other small shapes they hold just from the density numbers, without inspecting a single line. So the regularity lemma turns a wild, specific graph into something you can compute with as if it were random.

The headline payoff is the triangle removal lemma. It says that if a graph has only a tiny number of triangles — too few to matter at scale — then you can wipe out every last triangle by erasing only a tiny number of lines. Few triangles means almost triangle-free, and "almost" can be upgraded to "exactly" cheaply.

Visual Beginner

Picture a big blob of dots cut into four equal groups, $V_{1}, V_{2}, V_{3}, V_{4}$ . Between each pair of groups, draw lines. For a regular pair, the lines look the same everywhere: take any sizeable handful from one group and any sizeable handful from the other, and the fraction of pairs joined is close to the overall fraction for that pair of groups.

The table below records one density number for each pair of groups. A regular pair earns the label "even"; an irregular pair has a clump somewhere and gets flagged. The lemma promises that the flagged pairs are rare — at most a small fraction of all pairs.

pair of groups	density (fraction of lines drawn)	even?
$V_{1}, V_{2}$	0.30	yes
$V_{1}, V_{3}$	0.62	yes
$V_{2}, V_{3}$	0.45	yes
$V_{3}, V_{4}$	0.18	no (a clump)

Three of these four pairs are even; only $V_{3}, V_{4}$ hides a clump. With more groups the picture is the same: nearly every pair is even, and the few uneven ones can be safely ignored when you count small shapes.

Worked example Beginner

Count the triangles in a clean three-group example using only the density numbers, the way the counting lemma lets you. Take three groups $V_{1}, V_{2}, V_{3}$ , each holding 100 dots. Suppose every pair of groups is joined evenly, and the fractions of lines drawn are: $V_{1}$ - $V_{2}$ has fraction $0.5$ , $V_{1}$ - $V_{3}$ has fraction $0.5$ , and $V_{2}$ - $V_{3}$ has fraction $0.5$ .

Step 1. Count the candidate triangles. A triangle here picks one dot from each group: $100 \times 100 \times 100 = 1, 000, 000$ ways to choose one dot per group.

Step 2. For each chosen triple, ask the chance all three connecting lines are present. Because each pair of groups is joined evenly with fraction $0.5$ , each of the three needed lines is present a fraction $0.5$ of the time, and an even (random-like) pair lets us multiply: $0.5 \times 0.5 \times 0.5 = 0.125$ .

Step 3. Multiply. The expected number of triangles is $1, 000, 000 \times 0.125 = 125, 000$ .

What this tells us: with even pairs, three density numbers settle the triangle count without ever listing a single line. The graph had a million possible triples and the densities alone predict about $125, 000$ real triangles. This multiply-the-densities rule is the whole engine: it works precisely because "even" makes each pair behave like a random graph, and randomness lets probabilities multiply.

Check your understanding Beginner

Formal definition Intermediate+

Let $G = (V, E)$ be a graph and let $X, Y \subseteq V$ be disjoint nonempty vertex sets. Write $e (X, Y)$ for the number of edges with one endpoint in each, and define the edge density ^{[Diestel 2017 §7.4]} $$ d(X, Y) = \frac{e(X, Y)}{|X|,|Y|} \in [0, 1]. $$ Fix $ε > 0$ . The pair $(X, Y)$ is $ε$ -regular if for all $A \subseteq X$ and $B \subseteq Y$ with $∣ A ∣ \geq ε ∣ X ∣$ and $∣ B ∣ \geq ε ∣ Y ∣$ one has $∣ d (A, B) - d (X, Y) ∣ \leq ε$ . Regularity is the statement that the density between large subsets cannot deviate from the global density of the pair: the bipartite graph between $X$ and $Y$ has no dense or sparse clump visible at scale $ε$ , so it behaves like a random bipartite graph of density $d (X, Y)$ .

A partition ${V_{0}, V_{1}, \dots, V_{k}}$ of $V$ is an $ε$ -regular partition (or equipartition) if $∣ V_{0} ∣ \leq ε ∣ V ∣$ (an exceptional set absorbing divisibility remainders), all other parts satisfy $∣ V_{1} ∣ = \dots = ∣ V_{k} ∣$ , and all but at most $ε (2 k)$ of the pairs $(V_{i}, V_{j})$ with $1 \leq i < j \leq k$ are $ε$ -regular. The number $k$ of non-exceptional parts is the order of the partition.

The driving quantity is the energy (or index) of a partition $P = {V_{1}, \dots, V_{k}}$ , $$ q(\mathcal{P}) = \sum_{i < j} \frac{|V_i|,|V_j|}{|V|^2}, d(V_i, V_j)^2, $$ the mean square of the densities weighted by pair size. The energy is the mean-square edge density across the partition; it lies in $[0, 1]$ because densities lie in $[0, 1]$ and the weights sum to at most $1$ . Refining a partition cannot decrease the energy (a Cauchy-Schwarz / conditional-expectation inequality), and an irregular pair forces a strict, quantified increase under a suitable refinement. The boundedness of $q$ against this forced increment is what terminates the partitioning process and yields the lemma.

Counterexamples to common slips

Regularity is not "the density of every subset equals $d (X, Y)$ ". The bound $ε$ is two-sided and only constrains subsets of size $\geq ε ∣ X ∣$ , $\geq ε ∣ Y ∣$ . Tiny subsets may have wildly different density; an $ε$ -regular pair can contain an isolated vertex.
An $ε$ -regular partition is not a partition into $ε$ -regular pairs. Up to an $ε$ -fraction of pairs may be irregular. The lemma controls the number of bad pairs, not their absence.
Low density does not imply regularity. A bipartite graph that is a perfect matching has density $1/∣ X ∣$ but is highly irregular: some large subset pairs have density $0$ and others much higher. Sparseness and evenness are different.
The bound $M (ε)$ depends only on $ε$ , not on $G$ or $n$ . This uniformity — one bound for all graphs — is the substance of the lemma and the source of its tower-type size.

Key theorem with proof Intermediate+

Theorem (Szemerédi regularity lemma). For every $ε > 0$ and every integer $m \geq 1$ there is an integer $M = M (ε, m)$ such that every graph $G$ on at least $M$ vertices admits an $ε$ -regular equipartition ${V_{0}, V_{1}, \dots, V_{k}}$ with $m \leq k \leq M$ . (See ^{[Szemerédi 1978]}, ^{[Diestel 2017 §7.4]}, ^{[Tao, Vu 2006 Ch. 10]}.)

Proof (energy increment). Start with any equipartition $P_{0}$ into $k_{0} = m$ parts (plus an exceptional remainder $V_{0}$ ). Given an equipartition $P$ , either it is $ε$ -regular and we stop, or more than $ε (2 k)$ of its pairs are irregular. For each irregular pair $(V_{i}, V_{j})$ pick witnessing subsets $A_{ij} \subseteq V_{i}$ , $A_{j i} \subseteq V_{j}$ with $∣ d (A_{ij}, A_{j i}) - d (V_{i}, V_{j}) ∣ > ε$ and $∣ A_{ij} ∣ \geq ε ∣ V_{i} ∣$ , $∣ A_{j i} ∣ \geq ε ∣ V_{j} ∣$ .

The engine is the defect form of Cauchy-Schwarz. For a single pair $(V_{i}, V_{j})$ , partitioning $V_{i}$ into $A_{ij}$ and its complement and likewise $V_{j}$ raises the contribution of that pair to the energy: if $(V_{i}, V_{j})$ is $ε$ -irregular, the refined contribution exceeds $\frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}} (d (V_{i}, V_{j})^{2} + ε^{4})$ . Concretely, writing the densities on the four cells, convexity of $t \mapsto t^{2}$ gives that the size-weighted mean of the squared cell densities is at least the square of the mean plus a defect term equal to the size-weighted variance, and a witnessing pair of relative density gap $> ε$ on subsets of relative size $\geq ε$ contributes variance $\geq ε^{2} \cdot ε^{2} = ε^{4}$ in normalised units.

Now simultaneously refine every part using all the witness sets it participates in: a part $V_{i}$ is cut by the at most $k - 1$ sets $A_{ij}$ into at most $2^{k - 1}$ cells. Refining a partition never lowers the energy, and the irregular pairs each contribute an extra $\geq \frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}} ε^{4}$ . Summing over the more than $ε (2 k)$ irregular pairs, the energy rises by at least $ε \cdot ε^{4} /2 = ε^{5} /2$ (after accounting for the near-equal part sizes). Then $P$ is replaced by a common equipartition refinement $P^{'}$ with the same energy gain up to a vanishing loss from re-equalising part sizes into the exceptional set.

Because $q$ lies in $[0, 1]$ and rises by at least $ε^{5} /2$ at each non-terminating step, the process stops after at most $2/ ε^{5}$ steps, at an $ε$ -regular partition. Each step multiplies the number of parts by at most $2^{k}$ , so $k$ is bounded by a tower of $2$ 's of height $O (ε^{- 5})$ ; this tower is $M (ε, m)$ . $□$

Bridge. The regularity lemma builds toward the entire density theory of the chapter and appears again in the counting and removal lemmas below, in the modern proof of Erdős-Stone from 40.05.01, and in the density Ramsey theorems flagged in 40.05.04, because it converts an arbitrary dense graph into a bounded random-like model on which counting is routine. The foundational reason the proof terminates is the monotone-bounded energy: refinement never lowers the mean-square density, an irregular pair forces a quantified increment, and a quantity trapped in $[0, 1]$ cannot increase by a fixed amount forever — this is exactly the index-increment scheme that recurs in the hypergraph and arithmetic regularity lemmas. The counting lemma is dual to the lemma itself: regularity produces the random-like model, and counting reads small-subgraph statistics off that model as if it were random. Putting these together, the regularity method is one strategy — partition until random-like, then count — and the bridge to additive combinatorics is the recognition that the same energy increment, transported to functions on abelian groups, yields the arithmetic regularity behind Roth's and Szemerédi's theorems.

Exercises Intermediate+

Exercise 3 (medium, short-answer).

Show that if $(X, Y)$ is $ε$ -regular with density $d$ , and $A \subseteq X$ with $∣ A ∣ \geq ε ∣ X ∣$ , then all but at most $ε ∣ Y ∣$ vertices of $Y$ have at least $(d - ε) ∣ A ∣$ neighbours in $A$ .

Hint

Let $B$ be the set of $y \in Y$ with fewer than $(d - ε) ∣ A ∣$ neighbours in $A$ . Bound $d (A, B)$ and apply regularity if $∣ B ∣ \geq ε ∣ Y ∣$ .

Answer

Let $B = {y \in Y : ∣ N (y) \cap A ∣ < (d - ε) ∣ A ∣}$ . Then $e (A, B) < (d - ε) ∣ A ∣ ∣ B ∣$ , so $d (A, B) < d - ε$ . Suppose for contradiction $∣ B ∣ \geq ε ∣ Y ∣$ . Since also $∣ A ∣ \geq ε ∣ X ∣$ , the pair $(A, B)$ qualifies in the regularity condition, giving $∣ d (A, B) - d (X, Y) ∣ \leq ε$ , i.e. $d (A, B) \geq d - ε$ — contradicting $d (A, B) < d - ε$ . Hence $∣ B ∣ < ε ∣ Y ∣$ , so all but fewer than $ε ∣ Y ∣$ vertices of $Y$ have $\geq (d - ε) ∣ A ∣$ neighbours in $A$ . Rubric: full credit for defining the deficient set, bounding its density, and the regularity contradiction.

Exercise 5 (medium, short-answer).

State the triangle counting lemma precisely and explain why the error term forces the partition to be regular rather than merely have bounded order.

Hint

The counting lemma asserts the triangle count is $(1 \pm f (ε)) d_{12} d_{13} d_{23} ∣ V_{1} ∣∣ V_{2} ∣∣ V_{3} ∣$ with $f (ε) \to 0$ . What goes wrong if a pair is irregular?

Answer

Counting lemma: if $(V_{1}, V_{2}), (V_{1}, V_{3}), (V_{2}, V_{3})$ are each $ε$ -regular with densities $d_{12}, d_{13}, d_{23}$ , then the number of triangles with one vertex in each $V_{i}$ is $(d_{12} d_{13} d_{23} \pm g (ε)) ∣ V_{1} ∣∣ V_{2} ∣∣ V_{3} ∣$ , where $g (ε) \to 0$ as $ε \to 0$ . The proof uses Exercise 3 twice: most vertices of $V_{1}$ have $\approx d_{12} ∣ V_{2} ∣$ neighbours in $V_{2}$ and $\approx d_{13} ∣ V_{3} ∣$ in $V_{3}$ , and the regular pair $(V_{2}, V_{3})$ then has $\approx d_{23}$ density between those two neighbourhoods, which remain large enough ( $\geq ε ∣ V_{i} ∣$ ) to invoke regularity. An irregular pair would permit the neighbourhoods to land in a low-density clump, so the product $d_{12} d_{13} d_{23}$ could badly over- or under-count; regularity is exactly the hypothesis that licenses multiplying the densities. Rubric: full credit for the statement with vanishing error and the explanation that irregularity breaks the density-multiplication step.

Exercise 6 (medium, short-answer).

Deduce the triangle removal lemma from the regularity and counting lemmas: every graph on $n$ vertices with at most $δ n^{3}$ triangles (for suitable $δ = δ (ε)$ ) can be made triangle-free by deleting at most $ε n^{2}$ edges.

Hint

Take an $ε^{'}$ -regular partition. Delete edges of three kinds: in irregular pairs, in low-density pairs, and inside parts. Any surviving triangle has all three pairs regular and dense, so the counting lemma forces many triangles.

Answer

Apply the regularity lemma with a small $ε^{'}$ to get an equipartition into $k$ parts. Delete (i) all edges in irregular pairs: at most $ε^{'} (2 k) (n / k)^{2} \leq ε^{'} n^{2} /2$ edges; (ii) all edges inside parts and between parts of density $< 2 ε^{'}$ : at most $k (n / k)^{2} /2 + 2 ε^{'} n^{2} /2 \leq o (n^{2}) + ε^{'} n^{2}$ edges; (iii) edges meeting the exceptional set $V_{0}$ : at most $ε^{'} n \cdot n = ε^{'} n^{2}$ . The total deleted is $\leq ε n^{2}$ for $ε^{'}$ small. If a triangle survives, its three pairs are all $ε^{'}$ -regular with density $\geq 2 ε^{'}$ , so by the counting lemma the configuration holds at least $((2 ε^{'})^{3} - g (ε^{'})) (n / k)^{3} > 0$ triangles, in fact $\geq c (ε^{'}) n^{3}$ . Contrapositively, if $G$ has at most $δ n^{3} < c (ε^{'}) n^{3}$ triangles, no triangle survives the cleaning, so $G$ became triangle-free after $\leq ε n^{2}$ deletions. Rubric: full credit for the three deletion classes, the counting-lemma lower bound on surviving triangles, and the contrapositive.

Exercise 7 (hard, short-answer).

Derive Roth's theorem (a set $A \subseteq {1, \dots, N}$ of size $∣ A ∣ = α N$ with $α$ bounded below contains a non-degenerate 3-term arithmetic progression) from the triangle removal lemma.

Hint

Build a tripartite graph on parts $X, Y, Z$ each a copy of $Z_{n}$ (with $n \approx 3 N$ ): join $x \in X, y \in Y$ if $y - x \in A$ ; $y \in Y, z \in Z$ if $z - y \in A$ ; $x \in X, z \in Z$ if $(z - x) /2 \in A$ . A triangle encodes a 3-AP.

Answer

On parts $X = Y = Z = Z_{n}$ (with $n = 2 N + 1$ , identifying $A \subseteq Z_{n}$ ), put $x y$ an edge iff $y - x \in A$ , $y z$ an edge iff $z - y \in A$ , $x z$ an edge iff $z - x \in 2 \cdot A := {2 a : a \in A}$ . A triangle $(x, y, z)$ has $y - x = a_{1} \in A$ , $z - y = a_{2} \in A$ , $z - x = 2 a_{3}$ with $a_{3} \in A$ ; then $a_{1} + a_{2} = 2 a_{3}$ , so $a_{1}, a_{3}, a_{2}$ form a 3-AP in $A$ . Each element $a \in A$ gives the diagonal triangles $x$ arbitrary, $y = x + a$ , $z = x + 2 a$ (here $a_{1} = a_{2} = a_{3} = a$ ), and these are $∣ A ∣ \cdot n = α n^{2}$ edge-disjoint triangles, so the graph has $\geq α n^{2}$ triangles, each on a distinct edge of $X$ - $Z$ . If $A$ had no non-degenerate 3-AP, every triangle would be diagonal, hence the triangles are pairwise edge-disjoint, so removing one edge per triangle requires $\geq α n^{2}$ deletions; but the graph has only $Θ (n^{2})$ triangles overall ( $= o (n^{3})$ ), so the triangle removal lemma makes it triangle-free with $o (n^{2})$ deletions — contradiction once $α$ is a fixed positive constant and $n$ large. Hence $A$ contains a non-degenerate 3-AP. Rubric: full credit for the tripartite encoding, the diagonal edge-disjoint triangles, and the removal-lemma contradiction.

Exercise 8 (hard, short-answer).

State the Ruzsa-Szemerédi $(6, 3)$ -theorem and explain how it is equivalent to the triangle removal lemma's quantitative content (a linear number of edge-disjoint triangles forces a superlinear edge count to be impossible at density $o (1)$ ).

Hint

The $(6, 3)$ -theorem: a 3-uniform hypergraph on $n$ vertices in which no $6$ vertices span $3$ edges has $o (n^{2})$ edges. Translate "no 6 points on 3 triples" into "the triangles are edge-disjoint" in a tripartite graph.

Answer

The $(6, 3)$ -theorem (Ruzsa-Szemerédi): if a 3-uniform hypergraph $H$ on $n$ vertices has the property that no $6$ vertices carry $3$ of its triples, then $H$ has $o (n^{2})$ edges. Equivalently, a graph that is the edge-disjoint union of $m$ triangles, with no other triangles, has $m = o (n^{2})$ . The link: given such a graph $G$ with $m$ pairwise edge-disjoint triangles and no others, each triangle is the unique triangle on each of its edges, so removing all triangles requires deleting at least one edge from each, i.e. $\geq m$ edge deletions. But the triangle count is exactly $m \leq (2 n) /3 = O (n^{2}) = o (n^{3})$ , so the triangle removal lemma makes $G$ triangle-free using $o (n^{2})$ deletions. Therefore $m = o (n^{2})$ . Conversely, a construction of $Ω (n^{2} / e^{c l o g n})$ edge-disjoint triangles (from Behrend's dense AP-free sets) shows $m$ can be $n^{2 - o (1)}$ , so the $o (n^{2})$ bound is close to sharp and the removal lemma cannot have a polynomial-rate quantitative form. Rubric: full credit for the $(6, 3)$ statement, the edge-disjoint-triangle equivalence, the removal-lemma deduction, and the Behrend near-sharpness remark.

Advanced results Master

Theorem (counting lemma). Let $H$ be a graph on vertices ${1, \dots, h}$ and let $V_{1}, \dots, V_{h}$ be disjoint vertex sets in a graph $G$ . Suppose that for every edge $ij \in E (H)$ the pair $(V_{i}, V_{j})$ is $ε$ -regular with density $d_{ij} \geq d$ . Then the number of labelled copies of $H$ with the $i$ -th vertex in $V_{i}$ is $$ \Big(\prod_{ij \in E(H)} d_{ij} ;\pm; e(H),\varepsilon \Big)\prod_{i=1}^{h} |V_i| . $$ (See ^{[Diestel 2017 §7.5]}, ^{[Tao, Vu 2006 Ch. 11]}.) The proof embeds the vertices of $H$ one at a time, at each step restricting to the typical vertices whose neighbourhoods into the already-placed parts retain the expected density, which Exercise 3 guarantees lose only an $ε$ -fraction per step; the regular pairs let each new edge contribute a factor $d_{ij} \pm ε$ to the count. The triangle case $H = K_{3}$ is the workhorse: a regular tripartite configuration of densities bounded below holds $(d_{12} d_{13} d_{23} \pm 3 ε) ∣ V_{1} ∣∣ V_{2} ∣∣ V_{3} ∣$ triangles.

Theorem (triangle removal lemma). For every $η > 0$ there is $δ > 0$ such that every graph on $n$ vertices with at most $δ n^{3}$ triangles can be made triangle-free by deleting at most $η n^{2}$ edges. (Ruzsa-Szemerédi ^{[Ruzsa, Szemerédi 1978]}; Diestel ^{[Diestel 2017 §7.5]}.) The deduction is the cleaning argument of Exercise 6: regularise, delete edges in irregular pairs, sparse pairs, and inside parts (total $\leq η n^{2}$ ), and observe that any surviving triangle sits in a regular dense triple, which by the counting lemma forces $\geq c (η) n^{3}$ triangles — so if the triangle count is below that threshold, the cleaned graph is triangle-free. The dependence of $δ$ on $η$ is inverse-tower-type: $δ^{- 1}$ grows like a tower of height polynomial in $η^{- 1}$ , inherited directly from $M (ε)$ , and no primitive-recursive bound was known until the work of Fox (2011), who removed the regularity lemma from the proof and obtained a tower of bounded height.

Theorem (graph removal lemma). For every fixed graph $H$ and every $η > 0$ there is $δ > 0$ such that every $n$ -vertex graph with at most $δ n^{∣ V (H) ∣}$ copies of $H$ can be made $H$ -free by deleting at most $η n^{2}$ edges. (Erdős-Frankl-Rödl; Tao-Vu ^{[Tao, Vu 2006 Ch. 11]}.) The same regularise-clean-count scheme applies, with the counting lemma for general $H$ replacing the triangle case; the removal lemma is thereby the canonical structural consequence of regularity, equivalent in strength to a qualitative form of property testing, since it certifies that "few copies of $H$ " and "close to $H$ -free in edit distance" coincide.

Theorem (lower bound; Gowers). The function $M (ε)$ in the regularity lemma must grow at least as fast as a tower of $2$ 's of height $Ω (ε^{- 1/16})$ . (Gowers ^{[Gowers 1997]}.) Gowers constructed graphs forcing the energy to climb in many small increments, so no regular partition with a primitive-recursive number of parts exists; the tower height is intrinsic, not an artefact of the proof. Consequently the removal lemma's $δ (η)$ cannot be improved to any fixed iterated-exponential rate while the regularity lemma is the engine, which is why Fox's tower-bounded removal lemma had to bypass regularity.

Theorem (sparse and hypergraph regularity, stated). A sparse-graph analogue holds relative to a pseudorandom or upper-regular host (Kohayakawa-Rödl, Conlon-Gowers, Schacht), licensing the transference of the removal method to sparse settings such as random graphs and the primes. The hypergraph regularity lemma of Frankl-Rödl and Gowers, together with its counting lemma, supplies the $k$ -uniform analogue from which Szemerédi's theorem — every positive-density subset of $N$ contains arbitrarily long arithmetic progressions — follows by the same removal scheme applied to $(k - 1)$ -uniform hypergraphs encoding $k$ -term progressions. Layering the sparse transference over hypergraph regularity is one route to the Green-Tao theorem that the primes contain arbitrarily long arithmetic progressions, the relative Szemerédi theorem run against a pseudorandom majorant of the primes.

Synthesis. Putting these together, the regularity lemma, the counting lemma, and the removal lemma are one method seen at three depths: partition any dense graph into a bounded random-like model, read subgraph statistics off the model, and convert "few copies" into "few deletions to none". The foundational reason the method has tower-type cost is the energy increment — the same monotone-bounded quantity that drives the proof also forces, by Gowers's construction, a tower of parts, so the central insight is that boundedness-versus-increment is simultaneously the proof and its price. The counting lemma is dual to the regularity lemma, the two halves of "behaves like a random graph": regularity produces the model and counting exploits it, and the triangle removal lemma is exactly the $K_{3}$ counting lemma read contrapositively. This is exactly the engine behind Roth's theorem, where the Cartesian-product graph turns 3-term progressions into triangles and the $(6, 3)$ -theorem of Ruzsa-Szemerédi packages the sharpness via Behrend's sets; and it generalises through hypergraph regularity to Szemerédi's theorem and, with sparse transference, to Green-Tao. The bridge to the analytic theory is the graphon: a regular partition is a finite approximation to a measurable limit object, so the regularity lemma is the combinatorial shadow of compactness in the cut metric, and the removal lemma is the statement that the homomorphism density of $H$ is continuous there.

Full proof set Master

Proposition 1 (defect Cauchy-Schwarz / energy monotonicity). Let $P$ be a partition of $V \times V$ into measurable density cells and let $P^{'}$ refine $P$ . Then $q (P^{'}) \geq q (P)$ , with the increment equal to the size-weighted variance of the cell densities of $P^{'}$ within the cells of $P$ .

Proof. Fix a cell $V_{i} \times V_{j}$ of $P$ , refined into cells $A_{a} \times B_{b}$ by partitions $V_{i} = ⨆_{a} A_{a}$ , $V_{j} = ⨆_{b} B_{b}$ . Write $w_{ab} = \frac{∣ A _{a} ∣∣ B _{b} ∣}{∣ V _{i} ∣∣ V _{j} ∣}$ , so $\sum_{a, b} w_{ab} = 1$ , and $d_{ab} = d (A_{a}, B_{b})$ . The edge count is additive, $\sum_{a, b} ∣ A_{a} ∣∣ B_{b} ∣ d_{ab} = ∣ V_{i} ∣∣ V_{j} ∣ d (V_{i}, V_{j})$ , so $\sum_{a, b} w_{ab} d_{ab} = d (V_{i}, V_{j})$ , the mean. By the variance identity, $$ \sum_{a,b} w_{ab} d_{ab}^2 = d(V_i,V_j)^2 + \sum_{a,b} w_{ab}\big(d_{ab} - d(V_i,V_j)\big)^2 \geq d(V_i,V_j)^2 . $$ Multiplying by $\frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}}$ and summing over the cells of $P$ gives $q (P^{'}) \geq q (P)$ , the gain being the stated size-weighted variance. $□$

Proposition 2 (irregular pair forces an $ε^{4}$ increment). If $(V_{i}, V_{j})$ is $ε$ -irregular, witnessed by $A \subseteq V_{i}$ , $B \subseteq V_{j}$ with $∣ A ∣ \geq ε ∣ V_{i} ∣$ , $∣ B ∣ \geq ε ∣ V_{j} ∣$ and $∣ d (A, B) - d (V_{i}, V_{j}) ∣ > ε$ , then refining $V_{i}$ by ${A, V_{i} ∖ A}$ and $V_{j}$ by ${B, V_{j} ∖ B}$ increases this pair's energy contribution by at least $\frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}} ε^{4}$ .

Proof. By Proposition 1 the increment is the size-weighted variance $\sum w_{ab} (d_{ab} - d)^{2}$ over the four cells, $d = d (V_{i}, V_{j})$ . Drop all terms except the $A \times B$ cell: its weight is $w = \frac{∣ A ∣∣ B ∣}{∣ V _{i} ∣∣ V _{j} ∣} \geq ε \cdot ε = ε^{2}$ , and its deviation satisfies $(d_{A B} - d)^{2} > ε^{2}$ . Hence the variance is $\geq ε^{2} \cdot ε^{2} = ε^{4}$ , and multiplying by $\frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}}$ gives the claim. $□$

Proposition 3 (regularity lemma, quantitative termination). The energy-increment process reaches an $ε$ -regular equipartition in at most $2 ε^{- 5}$ steps, yielding $k \leq M (ε, m)$ with $M$ a tower of $2$ 's of height $O (ε^{- 5})$ .

Proof. If an equipartition $P$ with $k$ parts is not $ε$ -regular, more than $ε (2 k)$ pairs are irregular. Refining every part by all its witness sets, Proposition 2 contributes $\geq \frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}} ε^{4}$ per irregular pair, and with parts of near-equal size $\frac{∣ V _{i} ∣∣ V _{j} ∣}{∣ V ∣ ^{2}} \approx k^{- 2}$ , the total gain is $\geq ε (2 k) \cdot k^{- 2} ε^{4} \geq ε^{5} /2$ (the $(2 k) k^{- 2} \to 1/2$ ). Re-equalising the cells into a common equipartition, moving the surplus into $V_{0}$ , costs an energy loss tending to $0$ and keeps $∣ V_{0} ∣ \leq ε ∣ V ∣$ . Since $q \in [0, 1]$ and rises by $\geq ε^{5} /2$ each non-terminating step, there are at most $2 ε^{- 5}$ steps. Each step replaces $k$ parts by at most $k 2^{k}$ cells, so iterating $O (ε^{- 5})$ times bounds $k$ by a tower of height $O (ε^{- 5})$ . $□$

Proposition 4 (triangle counting lemma with explicit error). If $(V_{1}, V_{2}), (V_{1}, V_{3}), (V_{2}, V_{3})$ are $ε$ -regular with densities $d_{12}, d_{13}, d_{23} \geq 2 ε$ , the number of triangles with one vertex in each part is at least $(1 - 2 ε) (d_{12} - ε) (d_{13} - ε) (d_{23} - ε) ∣ V_{1} ∣∣ V_{2} ∣∣ V_{3} ∣$ .

Proof. Call $v \in V_{1}$ typical if $∣ N (v) \cap V_{2} ∣ \geq (d_{12} - ε) ∣ V_{2} ∣$ and $∣ N (v) \cap V_{3} ∣ \geq (d_{13} - ε) ∣ V_{3} ∣$ . By Exercise 3 applied to each of the pairs $(V_{1}, V_{2})$ and $(V_{1}, V_{3})$ (with $A = V_{1}$ ), at most $ε ∣ V_{1} ∣$ vertices fail the first condition and at most $ε ∣ V_{1} ∣$ the second, so at least $(1 - 2 ε) ∣ V_{1} ∣$ vertices are typical. Fix a typical $v$ and set $X = N (v) \cap V_{2}$ , $Y = N (v) \cap V_{3}$ ; then $∣ X ∣ \geq (d_{12} - ε) ∣ V_{2} ∣ \geq ε ∣ V_{2} ∣$ and $∣ Y ∣ \geq (d_{13} - ε) ∣ V_{3} ∣ \geq ε ∣ V_{3} ∣$ since $d_{1 j} \geq 2 ε$ . As $(V_{2}, V_{3})$ is $ε$ -regular, $d (X, Y) \geq d_{23} - ε$ , so the number of edges between $X$ and $Y$ — each completing a triangle through $v$ — is $\geq (d_{23} - ε) ∣ X ∣∣ Y ∣ \geq (d_{12} - ε) (d_{13} - ε) (d_{23} - ε) ∣ V_{2} ∣∣ V_{3} ∣$ . Summing over the $\geq (1 - 2 ε) ∣ V_{1} ∣$ typical $v$ gives the bound. $□$

Proposition 5 (triangle removal lemma). For every $η > 0$ there is $δ > 0$ with: any $n$ -vertex graph $G$ with at most $δ n^{3}$ triangles can be made triangle-free by deleting at most $η n^{2}$ edges.

Proof. Choose $ε = η /4$ and apply Proposition 3 to get an $ε$ -regular equipartition into $k$ parts, $V_{0}$ exceptional. Delete edges of three types: (i) those meeting $V_{0}$ — at most $∣ V_{0} ∣ n \leq ε n^{2}$ ; (ii) those inside a part or between a pair of density $< 2 ε$ — at most $k (2 n / k) + (2 k) 2 ε (n / k)^{2} \leq \frac{n ^{2}}{2 k} + ε n^{2}$ ; (iii) those in $ε$ -irregular pairs — at most $ε (2 k) (n / k)^{2} \leq ε n^{2} /2$ . For $k$ large and $ε = η /4$ the total is $\leq η n^{2}$ . Any triangle of the cleaned graph $G^{'}$ has its three vertices in distinct parts $V_{a}, V_{b}, V_{c}$ , each pair $ε$ -regular of density $\geq 2 ε$ ; Proposition 4 then forces $\geq (1 - 2 ε) ε^{3} (n / k)^{3} =: c n^{3}$ triangles in $G$ (these triangles already lie in $G$ , as cleaning only removes edges). Setting $δ = c /2$ , a graph with $\leq δ n^{3}$ triangles cannot have such a surviving regular dense triple, so $G^{'}$ is triangle-free. $□$

Proposition 6 (Roth's theorem via removal). A set $A \subseteq Z_{n}$ with $∣ A ∣ \geq α n$ , $α > 0$ fixed, contains a non-degenerate 3-term arithmetic progression once $n$ is large.

Proof. Form the tripartite graph $G$ on $X = Y = Z = Z_{n}$ with $x y \in E \Leftrightarrow y - x \in A$ , $y z \in E \Leftrightarrow z - y \in A$ , $x z \in E \Leftrightarrow z - x \in 2 A$ . A triangle $(x, y, z)$ yields $y - x = a_{1}$ , $z - y = a_{2}$ , $z - x = 2 a_{3}$ with $a_{1}, a_{2}, a_{3} \in A$ and $a_{1} + a_{2} = 2 a_{3}$ , i.e. a 3-AP $(a_{1}, a_{3}, a_{2})$ in $A$ . For each $a \in A$ and each $x \in Z_{n}$ , the triple $(x, x + a, x + 2 a)$ is a triangle (the diagonal family), giving $\geq α n^{2}$ triangles, and distinct $(x, a)$ give distinct edges $x z$ , so these triangles are pairwise edge-disjoint. If $A$ had only degenerate 3-APs (those with $a_{1} = a_{2} = a_{3}$ ), every triangle would be diagonal, hence the $\geq α n^{2}$ triangles are edge-disjoint; making $G$ triangle-free then needs $\geq α n^{2}$ edge deletions. But $G$ has $O (n^{2}) = o (n^{3})$ triangles, so by Proposition 5 it is made triangle-free with $o (n^{2})$ deletions — impossible for fixed $α > 0$ and large $n$ . Hence $A$ has a non-degenerate 3-AP. $□$

Connections Master

Extremal graph theory: Turán and Erdős-Stone-Simonovits 40.05.01. The regularity method gives the modern, conceptual proof of the Erdős-Stone theorem proved by hand in that unit: regularise a graph above the Turán density, find a regular reduced configuration of $r + 1$ dense parts, and embed a thick $K_{r + 1} (t)$ by the counting lemma. The supersaturation and stability phenomena of 40.05.01 are the removal-lemma's relatives — supersaturation is the counting lemma read above threshold, and stability is the cut-metric closeness that regularity makes precise.
Ramsey theory and the density turn 40.05.04. The regularity lemma is the tool Szemerédi invented to prove his arithmetic-progression theorem, the density apex toward which the partition-Ramsey results of 40.05.04 point; this unit supplies the $k = 3$ case (Roth) via triangle removal, and hypergraph regularity supplies the general $k$ , so the removal method is the bridge from the colouring statements of 40.05.04 to their density refinements. Behrend's AP-free sets, which power the $(6, 3)$ -theorem's sharpness here, are the same constructions that bound the Ramsey-type quantities there.
The probabilistic method and quasirandomness 40.07.01. The counting lemma's content is that a regular pair is quasirandom — indistinguishable at scale $ε$ from a random bipartite graph of the same density — so the second-moment and deletion arguments of that unit are the probabilistic skeleton of the regularity method. The removal lemma's near-sharp $(6, 3)$ lower bound uses dense AP-free sets, the same extremal-versus-random tension that unit develops; regularity is the deterministic structure theorem dual to the random constructions there.

Historical & philosophical context Master

Endre Szemerédi introduced the regularity lemma as the central tool in his 1975 proof that every set of integers of positive upper density contains arbitrarily long arithmetic progressions, settling the 1936 conjecture of Erdős and Turán ^{[Roth 1953 gives the $k=3$ case]}; the lemma was isolated in its now-standard graph form in his 1978 Orsay colloquium paper Regular partitions of graphs ^{[Szemerédi 1978]}. The energy-increment proof, the boundedness of the mean-square density against a forced increment, is the engine, and the resulting bound on the number of parts is tower-type. The earliest application of the regularity philosophy to a removal statement is the 1978 paper of Imre Ruzsa and Szemerédi Triple systems with no six points carrying three triangles ^{[Ruzsa, Szemerédi 1978]}, which proved the $(6, 3)$ -theorem and, as a consequence, both the triangle removal lemma and a new proof of Roth's 1953 theorem ^{[Roth 1953]} on 3-term progressions.

W. T. Gowers proved in 1997 that the tower-type bound is unavoidable: $M (ε)$ must grow as a tower of height polynomial in $1/ ε$ ^{[Gowers 1997]}, showing the lemma's apparent inefficiency is intrinsic. The hypergraph regularity method, developed by Peter Frankl and Vojtěch Rödl and independently by Gowers in the 2000s, extended the scheme to $k$ -uniform hypergraphs and yielded combinatorial proofs of Szemerédi's theorem for all progression lengths; Ben Green and Terence Tao used a sparse, relative form of these ideas in their 2008 proof that the primes contain arbitrarily long arithmetic progressions. The analytic completion came with the graph-limit theory of László Lovász and Balázs Szegedy, in which a regular partition is a finite approximation to a graphon, the measurable limit of a graph sequence in the cut metric, recasting the removal lemma as a continuity statement for homomorphism densities.

Bibliography Master

@book{Diestel2017,
  author    = {Diestel, Reinhard},
  title     = {Graph Theory},
  edition   = {5th},
  series    = {Graduate Texts in Mathematics},
  volume    = {173},
  publisher = {Springer},
  year      = {2017}
}

@book{TaoVu2006,
  author    = {Tao, Terence and Vu, Van H.},
  title     = {Additive Combinatorics},
  series    = {Cambridge Studies in Advanced Mathematics},
  volume    = {105},
  publisher = {Cambridge University Press},
  year      = {2006}
}

@incollection{Szemeredi1978,
  author    = {Szemer\'edi, Endre},
  title     = {Regular partitions of graphs},
  booktitle = {Probl\`emes combinatoires et th\'eorie des graphes (Colloq. Internat. CNRS, Univ. Orsay, 1976)},
  series    = {Colloq. Internat. CNRS},
  volume    = {260},
  publisher = {CNRS, Paris},
  year      = {1978},
  pages     = {399--401}
}

@incollection{RuzsaSzemeredi1978,
  author    = {Ruzsa, Imre Z. and Szemer\'edi, Endre},
  title     = {Triple systems with no six points carrying three triangles},
  booktitle = {Combinatorics (Keszthely, 1976)},
  series    = {Colloq. Math. Soc. J\'anos Bolyai},
  volume    = {18},
  publisher = {North-Holland},
  year      = {1978},
  pages     = {939--945}
}

@article{Gowers1997,
  author  = {Gowers, W. Timothy},
  title   = {Lower bounds of tower type for Szemer\'edi's uniformity lemma},
  journal = {Geometric and Functional Analysis},
  volume  = {7},
  year    = {1997},
  pages   = {322--337}
}

@article{Roth1953,
  author  = {Roth, Klaus F.},
  title   = {On certain sets of integers},
  journal = {Journal of the London Mathematical Society},
  volume  = {28},
  year    = {1953},
  pages   = {104--109}
}

@article{Fox2011,
  author  = {Fox, Jacob},
  title   = {A new proof of the graph removal lemma},
  journal = {Annals of Mathematics},
  volume  = {174},
  year    = {2011},
  pages   = {561--579}
}

@article{LovaszSzegedy2006,
  author  = {Lov\'asz, L\'aszl\'o and Szegedy, Bal\'azs},
  title   = {Limits of dense graph sequences},
  journal = {Journal of Combinatorial Theory, Series B},
  volume  = {96},
  year    = {2006},
  pages   = {933--957}
}

@article{GreenTao2008,
  author  = {Green, Ben and Tao, Terence},
  title   = {The primes contain arbitrarily long arithmetic progressions},
  journal = {Annals of Mathematics},
  volume  = {167},
  year    = {2008},
  pages   = {481--547}
}

Prerequisites

40.05.01

Tier anchors

beginner: Diestel 2017 *Graph Theory* (5th ed., Springer GTM 173) §7.4 (the regularity lemma) read for the even-spread / no-clumping picture of a regular pair; West 2001 *Introduction to Graph Theory* (2nd ed., Prentice Hall) for the basic density-between-sets vocabulary; the folklore 'shuffle the deck into a few boxes so each pair of boxes looks random' description of a regular partition
intermediate: Diestel 2017 *Graph Theory* (5th ed., Springer GTM 173) §7.4-§7.5 (the Szemerédi regularity lemma with the energy-increment proof, the regularity/embedding/counting lemmas, the triangle removal lemma and its corollaries); Bollobás 1998 *Modern Graph Theory* (Springer GTM 184) Ch. IV (the regularity method); Komlós-Simonovits 1996 *The regularity lemma and its applications in graph theory* for the worked applications
master: Diestel 2017 *Graph Theory* (5th ed., Springer GTM 173) Ch. 7 (regularity, the counting and removal lemmas, the route to Szemerédi's theorem); Tao-Vu 2006 *Additive Combinatorics* (Cambridge Studies in Advanced Mathematics 105) Ch. 10-11 (the regularity and removal lemmas, Roth's theorem via triangle removal, hypergraph regularity); Gowers 1997 *Lower bounds of tower type for Szemerédi's uniformity lemma* (GAFA 7) for the tower-type lower bound; the original sources Szemerédi 1975/1978, Ruzsa-Szemerédi 1978, Roth 1953, Frankl-Rödl and Gowers (hypergraph regularity), Lovász-Szegedy (graph limits)

References

Diestel 2017 — Graph Theory (5th edition) · Springer Graduate Texts in Mathematics 173, 2017, Ch. 7 (§7.4 the regularity lemma with the energy/index-increment proof and the tower-type bound; §7.5 applications — the regularity, embedding, and counting lemmas, the triangle removal lemma, and the link to Szemerédi's theorem on arithmetic progressions)
Tao, Vu 2006 — Additive Combinatorics · Cambridge Studies in Advanced Mathematics 105, 2006, Ch. 10-11 (the Szemerédi regularity lemma via the energy increment, the counting and triangle removal lemmas, Roth's theorem on 3-term arithmetic progressions deduced from triangle removal, and an account of hypergraph regularity)
Szemerédi 1978 — Regular partitions of graphs · Problèmes combinatoires et théorie des graphes (Colloq. Internat. CNRS 260, Orsay 1976), CNRS, Paris, 1978, pp. 399-401; the regularity lemma in its standard form, extracted from the 1975 proof of the arithmetic-progression theorem
Ruzsa, Szemerédi 1978 — Triple systems with no six points carrying three triangles · Combinatorics (Keszthely 1976), Colloq. Math. Soc. János Bolyai 18, North-Holland, 1978, pp. 939-945; the (6,3)-theorem and the triangle removal lemma, with the deduction of Roth's theorem on 3-term progressions
Gowers 1997 — Lower bounds of tower type for Szemerédi's uniformity lemma · Geometric and Functional Analysis 7 (1997) 322-337; the tower-type lower bound showing the number of parts M(ε) must grow like a tower of 2's of height polynomial in 1/ε, so the regularity lemma's bound is essentially best possible
Roth 1953 — On certain sets of integers · Journal of the London Mathematical Society 28 (1953) 104-109; the theorem that a set of integers of positive upper density contains a 3-term arithmetic progression, the k=3 case of Szemerédi's theorem, recovered here as a corollary of the triangle removal lemma

Estimated time

beginner: 18m
intermediate: 52m
master: 90m