Linearity of Expectation and the Deletion Method
Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 2-3 and §3.6 (dependent random choice preview); Erdős 1947 *Bull. AMS* 53 (the Ramsey lower bound by counting); Erdős 1959 *Canad. J. Math.* 11 (graphs of high girth and high chromatic number); Szele 1943 *Mat. Fiz. Lapok* 50 (the number of Hamiltonian paths in a tournament)
Intuition Beginner
Suppose you want to prove that some arrangement of a thing is good — a seating chart with few arguments, a way to split a network so that many links cross the divide. Checking every arrangement by hand is hopeless when there are billions of them. Here is a shortcut that feels like cheating but is airtight: compute the average quality over all arrangements. If the average score is , then at least one arrangement scores or better, because no collection of numbers can have every member below its own average. You never had to find the good arrangement; you only had to know it cannot avoid existing.
The reason this is so powerful is a small bookkeeping fact about averages. If a score is a sum of many small pieces — one piece per link, per pair, per edge — then the average of the whole sum is just the sum of the averages of the pieces. You compute each piece on its own, ignore how the pieces tangle together, and add up. This is the linearity of averaging, and it works even when the pieces are wildly dependent on one another.
The second idea fixes a common snag. Often the random arrangement you build is almost perfect but carries a few flaws — a handful of forbidden pairs, a few short loops. Instead of demanding perfection up front, you build the slightly-flawed object, then delete the small number of offending parts. If the flaws are few on average, deleting them costs little, and what survives is both large and clean. This patch-it-afterward trick is the deletion method.
Visual Beginner
The picture shows two ideas side by side. On the left, a row of bars gives the score of every possible arrangement; a dashed line marks their average. Because the bars cannot all sit below the dashed line, at least one bar must reach it or rise above — that tall bar is the good arrangement guaranteed to exist, even though we never pointed to which one it is. On the right, a cluster of dots with a few red "bad" links is drawn first; then the red links and one endpoint of each are crossed out, leaving a smaller all-black cluster. That is "build too much, then delete the flaws."
| Idea | What you compute | What you conclude |
|---|---|---|
| Averaging | the average score | some arrangement meets the average |
| Linearity | each small piece's average, then add | the total average, with no untangling |
| Deletion | average number of flaws | a clean object survives after removing few |
The two columns are the whole unit: average to prove existence, and delete to clean up.
Worked example Beginner
Take six people standing in a circle, and randomly hand each one either a red or a blue card by flipping a fair coin. Call two neighbours a clash if they hold the same colour. We will find the average number of clashes, and use it to guarantee a colouring with few clashes.
Step 1. There are six neighbouring pairs around the circle (each person and the one to their right). For any one pair, the two coin flips match — both red or both blue — exactly half the time. So each pair is a clash with chance .
Step 2. The average number of clashes is the sum, over the six pairs, of each pair's chance of clashing. That is . On average, three of the six neighbour-pairs clash.
Step 3. Because the average number of clashes is , at least one actual colouring has clashes or fewer. (No set of whole numbers can have every value strictly above its average of .)
Step 4. Turn it around for the other side. The average number of non-clashing pairs is also , so some colouring has at least neighbour-pairs coloured differently — a "cut" of size out of edges, namely half.
What this tells us: without checking all colourings, the average alone forces a good one to exist. The count of clashes was a sum of six simple pieces, and we averaged each piece by itself — that is linearity doing the work.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, is a finite probability space — in applications is the set of -colourings of a graph, of orientations of , or of subsets of chosen by independent coin flips — and are random variables. Basic probability and expectation are imported from the cross-spine units and applied here, not reproved.
Definition (expectation). For a random variable on a finite space, its expectation is . For an event with indicator , .
Proposition (linearity of expectation). For random variables on a common finite space and scalars , $$ \mathbb{E}\Big[\sum_{i=1}^{n} c_i X_i\Big] = \sum_{i=1}^{n} c_i, \mathbb{E}[X_i], $$ with no independence hypothesis on the . This is the defining tool: it converts the expectation of a complicated global count into a sum of elementary local expectations [Alon-Spencer Ch. 2].
Definition (first-moment / averaging existence principle). If is a real random variable on a finite space, then there is a sample point with and a sample point with . Equivalently, an object whose quality equals or exceeds the average exists. When counts bad substructures and , some has : a flawless object exists.
Definition (deletion / alteration method). A two-stage existence argument. Stage one: sample a random structure and let count its defects (forbidden cliques, short cycles, intersecting pairs). Stage two: from each defect delete one element of , producing with no defects; the number deleted is at most . One bounds from below to guarantee a defect-free that is still large. The method is used when no single sample is simultaneously large and clean, but the average surplus of size over defects is positive [Alon-Spencer Ch. 3].
Counterexamples to common slips Intermediate+
"Linearity needs independence." It does not. holds for any on a common space. Independence is required only for , a product identity, never invoked in these arguments.
" forces the value to occur." It forces only a value and a value , which may differ; for integer with , no sample has . The principle is an inequality, not an equality.
*" means is usually ."* It means for integer-valued (since ), which is all the existence proof needs — not that is typical.
"Deletion changes the average count, so the bound is circular." The two stages are sequential on a fixed sample. One first reads off on a good sample, then deletes; the expectation is used once, to locate a sample where size minus defects is large.
Key theorem with proof Intermediate+
The signature result is Szele's theorem, the historically first use of the probabilistic method: pure linearity of expectation, applied to the count of Hamiltonian paths in a random tournament, forces a tournament with exponentially many such paths to exist.
A tournament on vertex set is an orientation of the complete graph : for each pair exactly one of the arcs , is present. A Hamiltonian path is a permutation of such that every consecutive arc is present in the tournament.
Theorem (Szele 1943). There is a tournament on vertices with at least Hamiltonian paths. [Szele 1943]
Proof. Form a random tournament by orienting each of the edges independently, each direction with probability . Let be the number of Hamiltonian paths in . For each permutation of , let be the indicator that the path is present, so , the sum over all permutations.
The path requires its consecutive arcs to point the prescribed way. These arcs are on distinct edges, hence oriented independently, each correctly with probability ; therefore . By linearity of expectation, summed over all permutations, $$ \mathbb{E}[X] = \sum_{\sigma} \mathbb{E}[X_\sigma] = n! \cdot 2^{-(n-1)} = \frac{n!}{2^{n-1}}. $$ Here each permutation and its reverse give the same expected contribution; the count treats directed paths, so reversals are counted separately, and the bookkeeping needs no correction. By the averaging existence principle, some outcome of the random orientation has . That tournament has at least Hamiltonian paths.
Corollary (max-cut, half the edges). Every graph has a partition with at least edges crossing between and . Proof. Place each vertex in or by an independent fair coin. For each edge , let indicate that crosses; its two endpoints land on opposite sides with probability , so . Linearity gives , and some partition attains at least this.
Bridge. Szele's theorem is the foundational reason linearity of expectation counts as a constructive-strength existence tool even though it constructs nothing: writing a global count as a sum of indicators and averaging termwise pins down exactly, and the averaging principle converts that number into a guaranteed sample. This is exactly the move that reappears in the deletion arguments below, where instead counts defects and the same termwise averaging makes small enough to delete away; the bridge is that one identity, , drives both the "large object exists" and the "clean object exists" directions. The max-cut corollary builds toward the algorithmic derandomisation of 40.07.03 via conditional expectations, and the Hamiltonian-path count appears again in 40.07.04, where the local structure that independence supplies here is relaxed to the bounded dependence handled by the Lovász Local Lemma. Putting these together, the first moment is the central insight from which the alteration method is the natural next step: when the average defect count is not yet below one, you do not give up — you delete.
Exercises Intermediate+
Advanced results Master
The two elementary tools open into a graded family of refinements: variance and second-moment control where the first moment alone is too blunt, the alteration method tuned to optimise the surviving structure, and the dependent-random-choice technique that selects a high-codegree core by a deletion-flavoured argument.
Theorem 1 (Szele, sharpened; the maximum is ). Szele's lower bound for the maximum number of Hamiltonian paths over -vertex tournaments is tight up to a subexponential factor: for an absolute constant (Alon 1990, via the permanent of the tournament's adjacency-type matrix and the Brégman bound). Thus the first-moment lower bound is essentially best possible, an early instance of the probabilistic method delivering the correct order of magnitude [Alon-Spencer Ch. 2].
Theorem 2 (max-cut, second-moment and the refinement). The bound from linearity is improved by the standard deviation of the cut size. For a graph with edges, the random cut has and, since the are pairwise-dependent only through shared vertices, in many regimes, so there is a cut of size (Edwards' bound gives the sharp ). The averaging principle locates the mean; the second moment quantifies the surplus a single good sample can be pushed to carry.
Theorem 3 (the alteration method, general form). Suppose a random structure has expected size and expected defect count , where each defect can be repaired by deleting one element. Then there is a defect-free structure of size at least . Choosing the sampling parameter to maximise is the optimisation at the heart of every deletion proof: in the Ramsey application , ; in the independent-set application , ; in the girth-chromatic application , . The unifying statement is that is a valid lower bound on the size of the clean object [Alon-Spencer Ch. 3].
Theorem 4 (Erdős girth-chromatic, quantitative). For with , the deletion construction yields, for large, a graph on at least vertices with girth and chromatic number . The result is qualitative-impossible by any local argument: high girth makes the graph locally a tree (locally -colourable), yet its global chromatic number is unbounded — the first demonstration that chromatic number is not a local invariant [Erdős 1959].
Theorem 5 (dependent random choice). For every graph with , average degree , and target , a random vertex has neighbourhood whose expected number of "low-codegree" -subsets is small; deleting those subsets leaves a set of size in which every vertices have at least common neighbours, provided . The technique — sample a neighbourhood, then delete the subsets that fail the codegree requirement — is the deletion method applied to subsets rather than vertices, and it powers modern bounds on Turán numbers of bipartite graphs and Ramsey-Turán theory [Alon-Spencer Ch. 3].
Synthesis. Linearity of expectation and the deletion method are two readings of a single identity, : the foundational reason both work is that a global count decomposes into local indicators whose averages add without regard to dependence. Szele's theorem is exactly this identity counting Hamiltonian paths, and the Ramsey, independent-set, and girth-chromatic theorems are exactly this identity counting defects, with the averaging principle converting into a usable sample in both directions. The central insight is that the deletion method generalises the first moment: where the bare first moment demands and so caps the size of the object, alteration tolerates up to a fraction of the size and removes the excess, which is exactly why improves by a linear factor . This is dual to the second-moment method, which controls concentration rather than mean: linearity locates the average, variance certifies a sample near or beyond it, and deletion repairs a sample that overshoots into defect territory. Putting these together, the probabilistic method is a tower — first moment, then alteration, then second moment, then the Lovász Local Lemma — and the bridge upward at each level is the same termwise averaging, applied to ever more delicate functionals of the random structure, culminating in dependent random choice, where the deletion is performed on subsets to extract a high-codegree core.
Full proof set Master
Proposition 1 (averaging existence principle). Let be a real random variable on a finite probability space with for all . Then there exist with .
Proof. Suppose, for contradiction, for every . Then , a strict inequality of a number with itself. Hence some has . Applying the same argument to produces with .
Proposition 2 (linearity, finite form). For random variables on a common finite space and scalars , .
Proof. Expand the definition and exchange the two finite sums: $$ \mathbb{E}\Big[\sum_i c_i X_i\Big] = \sum_\omega \Pr(\omega)\sum_i c_i X_i(\omega) = \sum_i c_i \sum_\omega \Pr(\omega) X_i(\omega) = \sum_i c_i ,\mathbb{E}[X_i]. $$ The interchange is valid because both index sets are finite; no independence or integrability hypothesis enters.
Proposition 3 (Szele's bound). Some tournament on has at least directed Hamiltonian paths.
Proof. Orient each edge of independently and uniformly. For a permutation , the indicator that the directed path along is present satisfies , as the consecutive arcs lie on distinct edges, each correctly oriented with probability , and distinct edges are oriented independently. With , Proposition 2 gives , and Proposition 1 furnishes a tournament with .
Proposition 4 (Ramsey lower bound by deletion). For all , ; optimising the choice of yields .
Proof. Two-colour the edges of uniformly at random. Let ; then by Proposition 2, each -set being monochromatic with probability . By Proposition 1 fix a colouring with . Remove one vertex from each monochromatic : this deletes at most vertices and leaves a clique on vertices whose induced colouring has no monochromatic . Therefore exceeds that quantity. Writing and maximising over — the optimum is at , i.e. , giving after Stirling — yields the stated asymptotic.
Proposition 5 (independent set by deletion). Every graph with vertices and edges has an independent set of size at least .
Proof. Retain each vertex independently with probability , forming . By Proposition 2, and , since each edge has both endpoints retained with probability . Delete one endpoint of each retained edge to obtain an independent set with , so . Put (valid as ): then . Proposition 1 supplies an outcome with , an independent set of that size.
Proposition 6 (Erdős girth-chromatic). For every there is a graph with girth and .
Proof. Fix , , and form . Let be the number of cycles of length at most . The expected number of -cycles is , so as ; Markov gives . Let . Then , since for large . Choose large enough that both events fail: a sample has and . Delete one vertex from each short cycle, producing on vertices with girth and . Then , which exceeds for large.
Connections Master
The first-moment existence principle of this unit is the elementary case of the broader first-moment method developed in
40.07.01; that unit pushes the same averaging idea to threshold phenomena (when does force with high probability), and the indicator-decomposition used here for Hamiltonian paths and monochromatic cliques is the shared engine. The foundational reason both succeed is that linearity ignores dependence, so the local computation is always elementary.The deletion method's tolerance of a few defects is sharpened to zero defects under bounded dependence by the Lovász Local Lemma in
40.07.04: where deletion removes the rare bad events after the fact, the local lemma certifies that all bad events can be avoided simultaneously when each depends on few others. The Ramsey and hypergraph-colouring applications recur there with the deletion step replaced by the local-lemma symmetric criterion , giving a sharper constant.The max-cut corollary is the existence half of the algorithmic story in
40.07.03: the method of conditional expectations derandomises the random bipartition into a deterministic greedy algorithm that finds a cut of size in linear time, and the proof that the greedy choice never decreases the conditional expectation is exactly Proposition 1 applied one vertex at a time.
Historical & philosophical context Master
The probabilistic method's first published application is Tibor Szele's 1943 paper [Szele 1943] on Hamiltonian paths in tournaments, which computed the expected path count and inferred existence of a tournament beating it — linearity of expectation deployed before the technique had a name. The method's decisive entry into mainstream combinatorics is Paul Erdős's 1947 note [Erdős 1947] establishing by a counting (first-moment) argument: if the expected number of monochromatic 's in a random colouring is below one, a colouring avoiding them exists. The three-page note reframed an extremal question as a probability calculation and is conventionally taken as the birth of the field.
The deletion refinement and its most striking consequence are Erdős's 1959 theorem [Erdős 1959] on graphs of high girth and high chromatic number, which resolved a question of Tutte and Descartes by a non-constructive argument: random graphs have few short cycles and small independence number simultaneously, and deleting one vertex per short cycle yields the desired graph. No explicit construction of comparable strength was known for decades; Lovász (1968) and later Kříž gave constructive versions far weaker quantitatively. Erdős and Spencer's 1974 monograph and Spencer's 1994 Ten Lectures [Spencer 1994] codified linearity, alteration, the second moment, and the local lemma as the four pillars of the method. Alon's 1990 permanent bound showed Szele's first-moment lower bound is order-optimal, closing the loop on the field's founding example.
Bibliography Master
@article{szele1943,
author = {Szele, Tibor},
title = {Kombinatorikai vizsg\'{a}latok az ir\'{a}ny\'{\i}tott teljes gr\'{a}ffal kapcsolatban},
journal = {Matematikai \'{e}s Fizikai Lapok},
volume = {50},
year = {1943},
pages = {223--256}
}
@article{erdos1947ramsey,
author = {Erd{\H{o}}s, Paul},
title = {Some remarks on the theory of graphs},
journal = {Bulletin of the American Mathematical Society},
volume = {53},
year = {1947},
pages = {292--294}
}
@article{erdos1959graphprob,
author = {Erd{\H{o}}s, Paul},
title = {Graph theory and probability},
journal = {Canadian Journal of Mathematics},
volume = {11},
year = {1959},
pages = {34--38}
}
@article{alon1990permanent,
author = {Alon, Noga},
title = {The maximum number of Hamiltonian paths in tournaments},
journal = {Combinatorica},
volume = {10},
year = {1990},
pages = {319--324}
}
@book{spencer1994ten,
author = {Spencer, Joel H.},
title = {Ten Lectures on the Probabilistic Method},
series = {CBMS-NSF Regional Conference Series in Applied Mathematics},
volume = {64},
publisher = {SIAM},
year = {1994}
}
@book{alonspencer2016,
author = {Alon, Noga and Spencer, Joel H.},
title = {The Probabilistic Method},
edition = {4},
publisher = {Wiley-Interscience},
year = {2016}
}
@book{jukna2011extremal,
author = {Jukna, Stasys},
title = {Extremal Combinatorics: With Applications in Computer Science},
edition = {2},
publisher = {Springer},
year = {2011}
}