40.07.02 · combinatorics / probabilistic-method

Linearity of Expectation and the Deletion Method

shipped3 tiersLean: none

Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 2-3 and §3.6 (dependent random choice preview); Erdős 1947 *Bull. AMS* 53 (the Ramsey lower bound by counting); Erdős 1959 *Canad. J. Math.* 11 (graphs of high girth and high chromatic number); Szele 1943 *Mat. Fiz. Lapok* 50 (the number of Hamiltonian paths in a tournament)

Intuition Beginner

Suppose you want to prove that some arrangement of a thing is good — a seating chart with few arguments, a way to split a network so that many links cross the divide. Checking every arrangement by hand is hopeless when there are billions of them. Here is a shortcut that feels like cheating but is airtight: compute the average quality over all arrangements. If the average score is $50$ , then at least one arrangement scores $50$ or better, because no collection of numbers can have every member below its own average. You never had to find the good arrangement; you only had to know it cannot avoid existing.

The reason this is so powerful is a small bookkeeping fact about averages. If a score is a sum of many small pieces — one piece per link, per pair, per edge — then the average of the whole sum is just the sum of the averages of the pieces. You compute each piece on its own, ignore how the pieces tangle together, and add up. This is the linearity of averaging, and it works even when the pieces are wildly dependent on one another.

The second idea fixes a common snag. Often the random arrangement you build is almost perfect but carries a few flaws — a handful of forbidden pairs, a few short loops. Instead of demanding perfection up front, you build the slightly-flawed object, then delete the small number of offending parts. If the flaws are few on average, deleting them costs little, and what survives is both large and clean. This patch-it-afterward trick is the deletion method.

Visual Beginner

The picture shows two ideas side by side. On the left, a row of bars gives the score of every possible arrangement; a dashed line marks their average. Because the bars cannot all sit below the dashed line, at least one bar must reach it or rise above — that tall bar is the good arrangement guaranteed to exist, even though we never pointed to which one it is. On the right, a cluster of dots with a few red "bad" links is drawn first; then the red links and one endpoint of each are crossed out, leaving a smaller all-black cluster. That is "build too much, then delete the flaws."

Idea	What you compute	What you conclude
Averaging	the average score	some arrangement meets the average
Linearity	each small piece's average, then add	the total average, with no untangling
Deletion	average number of flaws	a clean object survives after removing few

The two columns are the whole unit: average to prove existence, and delete to clean up.

Worked example Beginner

Take six people standing in a circle, and randomly hand each one either a red or a blue card by flipping a fair coin. Call two neighbours a clash if they hold the same colour. We will find the average number of clashes, and use it to guarantee a colouring with few clashes.

Step 1. There are six neighbouring pairs around the circle (each person and the one to their right). For any one pair, the two coin flips match — both red or both blue — exactly half the time. So each pair is a clash with chance $\frac{1}{2}$ .

Step 2. The average number of clashes is the sum, over the six pairs, of each pair's chance of clashing. That is $6 \times \frac{1}{2} = 3$ . On average, three of the six neighbour-pairs clash.

Step 3. Because the average number of clashes is $3$ , at least one actual colouring has $3$ clashes or fewer. (No set of whole numbers can have every value strictly above its average of $3$ .)

Step 4. Turn it around for the other side. The average number of non-clashing pairs is also $6 - 3 = 3$ , so some colouring has at least $3$ neighbour-pairs coloured differently — a "cut" of size $3$ out of $6$ edges, namely half.

What this tells us: without checking all $2^{6} = 64$ colourings, the average alone forces a good one to exist. The count of clashes was a sum of six simple pieces, and we averaged each piece by itself — that is linearity doing the work.

Check your understanding Beginner

Formal definition Intermediate+

Throughout, $(Ω, F, Pr)$ is a finite probability space — in applications $Ω$ is the set of $2$ -colourings of a graph, of orientations of $K_{n}$ , or of subsets of $[n]$ chosen by independent coin flips — and $X, X_{i} : Ω \to R$ are random variables. Basic probability and expectation are imported from the cross-spine units $37.01. *$ and applied here, not reproved.

Definition (expectation). For a random variable $X$ on a finite space, its expectation is $E [X] = \sum_{ω \in Ω} Pr (ω) X (ω) = \sum_{x} x Pr (X = x)$ . For an event $A$ with indicator $1_{A}$ , $E [1_{A}] = Pr (A)$ .

Proposition (linearity of expectation). For random variables $X_{1}, \dots, X_{n}$ on a common finite space and scalars $c_{1}, \dots, c_{n} \in R$ , $$ \mathbb{E}\Big[\sum_{i=1}^{n} c_i X_i\Big] = \sum_{i=1}^{n} c_i, \mathbb{E}[X_i], $$ with no independence hypothesis on the $X_{i}$ . This is the defining tool: it converts the expectation of a complicated global count into a sum of elementary local expectations ^{[Alon-Spencer Ch. 2]}.

Definition (first-moment / averaging existence principle). If $X$ is a real random variable on a finite space, then there is a sample point $ω \in Ω$ with $X (ω) \geq E [X]$ and a sample point $ω^{'}$ with $X (ω^{'}) \leq E [X]$ . Equivalently, an object whose quality equals or exceeds the average exists. When $X$ counts bad substructures and $E [X] < 1$ , some $ω$ has $X (ω) = 0$ : a flawless object exists.

Definition (deletion / alteration method). A two-stage existence argument. Stage one: sample a random structure $S$ and let $X$ count its defects (forbidden cliques, short cycles, intersecting pairs). Stage two: from each defect delete one element of $S$ , producing $S^{'}$ with no defects; the number deleted is at most $X$ . One bounds $E [∣ S ∣] - E [X]$ from below to guarantee a defect-free $S^{'}$ that is still large. The method is used when no single sample is simultaneously large and clean, but the average surplus of size over defects is positive ^{[Alon-Spencer Ch. 3]}.

Counterexamples to common slips Intermediate+

"Linearity needs independence." It does not. $E [X + Y] = E [X] + E [Y]$ holds for any $X, Y$ on a common space. Independence is required only for $E [X Y] = E [X] E [Y]$ , a product identity, never invoked in these arguments.
" $E [X] = μ$ forces the value $μ$ to occur." It forces only a value $\geq μ$ and a value $\leq μ$ , which may differ; for integer $X$ with $μ = 2.7$ , no sample has $X = 2.7$ . The principle is an inequality, not an equality.
*" $E [X] < 1$ means $X$ is usually $0$ ."* It means $Pr (X = 0) > 0$ for integer-valued $X \geq 0$ (since $E [X] \geq Pr (X \geq 1)$ ), which is all the existence proof needs — not that $X = 0$ is typical.
"Deletion changes the average count, so the bound is circular." The two stages are sequential on a fixed sample. One first reads off $X (ω)$ on a good sample, then deletes; the expectation is used once, to locate a sample where size minus defects is large.

Key theorem with proof Intermediate+

The signature result is Szele's theorem, the historically first use of the probabilistic method: pure linearity of expectation, applied to the count of Hamiltonian paths in a random tournament, forces a tournament with exponentially many such paths to exist.

A tournament on vertex set $[n]$ is an orientation of the complete graph $K_{n}$ : for each pair ${i, j}$ exactly one of the arcs $i \to j$ , $j \to i$ is present. A Hamiltonian path is a permutation $σ = (σ_{1}, \dots, σ_{n})$ of $[n]$ such that every consecutive arc $σ_{1} \to σ_{2} \to \dots \to σ_{n}$ is present in the tournament.

Theorem (Szele 1943). There is a tournament on $n$ vertices with at least $n! / 2^{n - 1}$ Hamiltonian paths. ^{[Szele 1943]}

Proof. Form a random tournament $T$ by orienting each of the $(2 n)$ edges independently, each direction with probability $\frac{1}{2}$ . Let $X$ be the number of Hamiltonian paths in $T$ . For each permutation $σ$ of $[n]$ , let $X_{σ}$ be the indicator that the path $σ_{1} \to σ_{2} \to \dots \to σ_{n}$ is present, so $X = \sum_{σ} X_{σ}$ , the sum over all $n!$ permutations.

The path $σ$ requires its $n - 1$ consecutive arcs to point the prescribed way. These $n - 1$ arcs are on distinct edges, hence oriented independently, each correctly with probability $\frac{1}{2}$ ; therefore $E [X_{σ}] = Pr (X_{σ} = 1) = 2^{- (n - 1)}$ . By linearity of expectation, summed over all $n!$ permutations, $$ \mathbb{E}[X] = \sum_{\sigma} \mathbb{E}[X_\sigma] = n! \cdot 2^{-(n-1)} = \frac{n!}{2^{n-1}}. $$ Here each permutation and its reverse give the same expected contribution; the count $X$ treats directed paths, so reversals are counted separately, and the bookkeeping needs no correction. By the averaging existence principle, some outcome $T$ of the random orientation has $X (T) \geq E [X] = n! / 2^{n - 1}$ . That tournament has at least $n! / 2^{n - 1}$ Hamiltonian paths. $□$

Corollary (max-cut, half the edges). Every graph $G = (V, E)$ has a partition $V = A ⊔ B$ with at least $∣ E ∣/2$ edges crossing between $A$ and $B$ . Proof. Place each vertex in $A$ or $B$ by an independent fair coin. For each edge $e$ , let $Y_{e}$ indicate that $e$ crosses; its two endpoints land on opposite sides with probability $\frac{1}{2}$ , so $E [Y_{e}] = \frac{1}{2}$ . Linearity gives $E [\sum_{e} Y_{e}] = ∣ E ∣/2$ , and some partition attains at least this. $□$

Bridge. Szele's theorem is the foundational reason linearity of expectation counts as a constructive-strength existence tool even though it constructs nothing: writing a global count $X$ as a sum of indicators and averaging termwise pins down $E [X]$ exactly, and the averaging principle converts that number into a guaranteed sample. This is exactly the move that reappears in the deletion arguments below, where $X$ instead counts defects and the same termwise averaging makes $E [X]$ small enough to delete away; the bridge is that one identity, $E [\sum X_{i}] = \sum E [X_{i}]$ , drives both the "large object exists" and the "clean object exists" directions. The max-cut corollary builds toward the algorithmic derandomisation of 40.07.03 via conditional expectations, and the Hamiltonian-path count appears again in 40.07.04, where the local structure that independence supplies here is relaxed to the bounded dependence handled by the Lovász Local Lemma. Putting these together, the first moment is the central insight from which the alteration method is the natural next step: when the average defect count is not yet below one, you do not give up — you delete.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove the splitting (unbalancing-lights) bound: for any $\pm 1$ matrix $A = (a_{ij})$ of size $m \times n$ , there are signs $x_{j} = \pm 1$ with $\sum_{i, j} a_{ij} x_{j} \geq 0$ , and more sharply some single row sum $\sum_{j} a_{ij} x_{j}$ can be made nonnegative on average. State the expectation computed and the existence conclusion.

Hint

Choose each $x_{j} \in {+ 1, - 1}$ by a fair coin, independently. Compute $E [\sum_{i, j} a_{ij} x_{j}]$ using $E [x_{j}] = 0$ and linearity; the symmetric distribution then yields the existence of a sign vector attaining at least the (nonnegative) reflection bound.

Answer

Let $R = \sum_{i, j} a_{ij} x_{j} = \sum_{j} x_{j} (\sum_{i} a_{ij})$ with each $x_{j}$ a fair independent sign. By linearity $E [R] = \sum_{j} (\sum_{i} a_{ij}) E [x_{j}] = 0$ since $E [x_{j}] = 0$ . Because the distribution of $R$ is symmetric ( $x \mapsto - x$ sends $R \mapsto - R$ with equal probability), $Pr (R \geq 0) \geq \frac{1}{2} > 0$ , so some sign vector gives $R \geq 0$ . For the sharper "unbalancing lights" version one computes, for the column-sum random variable, $E [\sum_{j} a_{ij} x_{j}] = 2/ π (1 + o (1)) n$ per row by the central limit estimate, so summing over $m$ rows and applying the averaging principle yields a sign choice making $\sum_{i} ∣ \sum_{j} a_{ij} x_{j} ∣$ at least $(2/ π + o (1)) m n$ . Rubric: full credit for the linearity computation $E [R] = 0$ , the symmetry argument for existence, and the correct statement of the existence conclusion.

Exercise 4 (medium, symbolic).

Use the deletion method to prove the Ramsey lower bound $R (k, k) > n - (k n) 2^{1 - (2 k)}$ : from a random $2$ -colouring of $K_{n}$ , deleting one vertex per monochromatic $K_{k}$ leaves a complete graph with no monochromatic $K_{k}$ .

Hint

Let $X$ count monochromatic $K_{k}$ 's; compute $E [X] = (k n) 2^{1 - (2 k)}$ by linearity. Delete one vertex from each, removing all of them and at most $X$ vertices.

Answer

Colour each edge of $K_{n}$ red or blue by a fair independent coin. For a fixed $k$ -set $S$ , the event that $S$ is monochromatic has probability $2 \cdot 2^{- (2 k)} = 2^{1 - (2 k)}$ (all $(2 k)$ edges one colour, two colours). Let $X = \sum_{∣ S ∣ = k} 1 [S monochromatic]$ ; by linearity $E [X] = (k n) 2^{1 - (2 k)}$ . Fix a colouring with $X \leq E [X]$ . Delete one vertex from each monochromatic $K_{k}$ ; this removes at most $E [X]$ vertices and destroys every monochromatic $K_{k}$ . The remaining set has at least $n - (k n) 2^{1 - (2 k)}$ vertices and its induced $2$ -colouring has no monochromatic $K_{k}$ , so $R (k, k)$ exceeds this size. Optimising $n$ gives $R (k, k) > (1/ e) (1 + o (1)) k 2^{k /2}$ , a factor $2 / e \cdot k$ improvement over the bare first-moment bound $2^{k /2}$ . Rubric: full credit for the linearity computation of $E [X]$ , the deletion of one vertex per clique, and the size estimate of the survivor.

Exercise 5 (medium, symbolic).

Prove the deletion bound for independent sets: every graph $G$ on $n$ vertices with $m$ edges has an independent set of size at least $n^{2} / (4 m)$ (assume $m \geq n /2$ ). Use a random subset and delete one endpoint per surviving edge.

Hint

Keep each vertex independently with probability $p$ . Let the kept set have size $E = p n$ and surviving-edge count $E = p^{2} m$ . Delete one endpoint per surviving edge; optimise $p = n / (2 m)$ .

Answer

Pick a random subset $S$ keeping each vertex independently with probability $p$ . Let $∣ S ∣$ be its size and $e (S)$ the number of edges with both endpoints in $S$ . By linearity $E [∣ S ∣] = p n$ and $E [e (S)] = p^{2} m$ (each edge survives with probability $p^{2}$ ). Delete one endpoint from each surviving edge to obtain an independent set $S^{'}$ with $∣ S^{'} ∣ \geq ∣ S ∣ - e (S)$ . Then $E [∣ S^{'} ∣] \geq E [∣ S ∣] - E [e (S)] = p n - p^{2} m$ . Choosing $p = n / (2 m) \leq 1$ gives $p n - p^{2} m = n^{2} / (2 m) - n^{2} / (4 m) = n^{2} / (4 m)$ . By the averaging principle some $S^{'}$ attains at least this, so $α (G) \geq n^{2} / (4 m)$ . (The sharper Turán-via-deletion bound $α (G) \geq \sum_{v} 1/ (d_{v} + 1)$ comes from a random-permutation refinement.) Rubric: full credit for the two linearity computations, the deletion step, and the optimisation $p = n / (2 m)$ .

Exercise 7 (hard, symbolic).

Prove Erdős's theorem that for every $k$ and $g$ there is a graph with girth $> g$ and chromatic number $> k$ . Sketch the deletion argument: random $G (n, p)$ with $p = n^{θ - 1}$ , $θ < 1/ g$ ; delete a vertex per short cycle; bound the independence number.

Hint

Let $X$ count cycles of length $\leq g$ ; show $E [X] = o (n)$ for $p = n^{θ - 1}$ , $0 < θ < 1/ g$ . Delete one vertex per short cycle to kill girth defects on $\geq n /2$ vertices. Separately show with high probability $α (G) < \frac{1}{2} n^{1 - θ} lo g n$ , so $χ \geq (n /2) / α$ is large.

Answer

Fix $θ$ with $0 < θ < 1/ g$ and set $p = n^{θ - 1}$ . The expected number of cycles of length $i$ is $\frac{n !}{( n - i )! 2 i} p^{i} \leq \frac{1}{2 i} (n p)^{i} = \frac{1}{2 i} n^{θ i}$ . Summing over $3 \leq i \leq g$ gives $E [X] \leq \sum_{i = 3}^{g} n^{θ i} = O (n^{θ g}) = o (n)$ since $θ g < 1$ . By Markov, $Pr (X \geq n /2) \to 0$ . Independently, let $a = ⌈ \frac{3}{p} ln n ⌉ = O (n^{1 - θ} lo g n)$ ; then $Pr (α (G) \geq a) \leq (a n) (1 - p)^{(2 a)} \leq (n e^{- p (a - 1) /2})^{a} \to 0$ . So with probability tending to $1$ a sample has $X < n /2$ and $α (G) < a$ . Delete one vertex from each of the $< n /2$ short cycles, obtaining $G^{'}$ on $> n /2$ vertices with girth $> g$ and $α (G^{'}) \leq α (G) < a$ . Then $χ (G^{'}) \geq ∣ V (G^{'}) ∣/ α (G^{'}) > (n /2) / a = Ω (n^{θ} / lo g n) \to \infty$ , exceeding $k$ for large $n$ . This is the canonical deletion-method proof: the random graph is forced to have few short cycles and small independence number simultaneously, and deletion removes the girth defects while barely shrinking the vertex set. Rubric: full credit for the $E [X] = o (n)$ short-cycle estimate, the independence-number tail bound, the deletion step, and the $χ \geq ∣ V ∣/ α$ conclusion.

Exercise 8 (hard, short-answer).

Explain in one paragraph why the deletion method improves the Ramsey lower bound from $2^{k /2}$ to $(1/ e) (1 + o (1)) k 2^{k /2}$ , and why linearity of expectation alone (no deletion) cannot reach the improved constant.

Hint

The plain first moment needs $E [X] < 1$ to force a perfect colouring at some $n$ . Deletion only needs $E [X]$ small relative to $n$ , tolerating a few bad cliques and removing them.

Answer

The plain first-moment argument requires the expected number of monochromatic $K_{k}$ 's to be below $1$ , $(k n) 2^{1 - (2 k)} < 1$ , so that some colouring has zero; this caps $n$ at roughly $2^{k /2}$ . The deletion method relaxes the demand: it allows $E [X]$ to be as large as a constant fraction of $n$ , then surgically removes one vertex per monochromatic clique. Because deleting at most $E [X]$ vertices still leaves $n - E [X]$ clean vertices, one can push $n$ up to the point where $n - (k n) 2^{1 - (2 k)}$ is maximised; calculus gives the optimum near $n \approx (1/ e) k 2^{k /2}$ , a gain of a linear factor $k$ (times $2 / e$ ) over the bound that insists on perfection. Linearity alone cannot reach this because it offers no mechanism to use a colouring that has a handful of bad cliques — it can only certify the existence of a flawless one, which forces the smaller $n$ . The deletion step is precisely what converts "few defects on average" into "a clean, large survivor." Rubric: full credit for contrasting the $E [X] < 1$ requirement with the deletion tolerance and for locating the optimum at $n = Θ (k 2^{k /2})$ .

Advanced results Master

The two elementary tools open into a graded family of refinements: variance and second-moment control where the first moment alone is too blunt, the alteration method tuned to optimise the surviving structure, and the dependent-random-choice technique that selects a high-codegree core by a deletion-flavoured argument.

Theorem 1 (Szele, sharpened; the maximum is $Θ (n! / 2^{n - 1})$ ). Szele's lower bound $n! / 2^{n - 1}$ for the maximum number $P (n)$ of Hamiltonian paths over $n$ -vertex tournaments is tight up to a subexponential factor: $P (n) \leq c n^{3/2} n! / 2^{n - 1}$ for an absolute constant $c$ (Alon 1990, via the permanent of the tournament's adjacency-type matrix and the Brégman bound). Thus the first-moment lower bound is essentially best possible, an early instance of the probabilistic method delivering the correct order of magnitude ^{[Alon-Spencer Ch. 2]}.

Theorem 2 (max-cut, second-moment and the $\frac{1}{2} + Ω (1/ m)$ refinement). The bound $∣ E ∣/2$ from linearity is improved by the standard deviation of the cut size. For a graph with $m$ edges, the random cut $Y = \sum_{e} Y_{e}$ has $E [Y] = m /2$ and, since the $Y_{e}$ are pairwise-dependent only through shared vertices, $Var (Y) = Θ (m)$ in many regimes, so there is a cut of size $m /2 + Ω (m)$ (Edwards' bound gives the sharp $m /2 + (8 m + 1 - 1) /8$ ). The averaging principle locates the mean; the second moment quantifies the surplus a single good sample can be pushed to carry.

Theorem 3 (the alteration method, general form). Suppose a random structure has expected size $s$ and expected defect count $d$ , where each defect can be repaired by deleting one element. Then there is a defect-free structure of size at least $s - d$ . Choosing the sampling parameter to maximise $s - d$ is the optimisation at the heart of every deletion proof: in the Ramsey application $s = n$ , $d = (k n) 2^{1 - (2 k)}$ ; in the independent-set application $s = p n$ , $d = p^{2} m$ ; in the girth-chromatic application $s = n$ , $d = E [# short cycles]$ . The unifying statement is that $max_{params} (s - d)$ is a valid lower bound on the size of the clean object ^{[Alon-Spencer Ch. 3]}.

Theorem 4 (Erdős girth-chromatic, quantitative). For $p = n^{θ - 1}$ with $0 < θ < 1/ g$ , the deletion construction yields, for $n$ large, a graph on at least $n /2$ vertices with girth $> g$ and chromatic number $χ \geq \frac{1}{2} n^{θ} / (3 ln n)$ . The result is qualitative-impossible by any local argument: high girth makes the graph locally a tree (locally $2$ -colourable), yet its global chromatic number is unbounded — the first demonstration that chromatic number is not a local invariant ^{[Erdős 1959]}.

Theorem 5 (dependent random choice). For every graph $G = (V, E)$ with $∣ V ∣ = n$ , average degree $d = 2∣ E ∣/ n$ , and target $t$ , a random vertex $v$ has neighbourhood $N (v)$ whose expected number of "low-codegree" $t$ -subsets is small; deleting those subsets leaves a set $U \subseteq V$ of size $\geq d^{t} / n^{t - 1}$ in which every $t$ vertices have at least $m$ common neighbours, provided $\frac{d ^{t}}{n ^{t - 1}} - (t n) (\frac{m}{n})^{t} \geq u$ . The technique — sample a neighbourhood, then delete the subsets that fail the codegree requirement — is the deletion method applied to subsets rather than vertices, and it powers modern bounds on Turán numbers of bipartite graphs and Ramsey-Turán theory ^{[Alon-Spencer Ch. 3]}.

Synthesis. Linearity of expectation and the deletion method are two readings of a single identity, $E [\sum_{i} X_{i}] = \sum_{i} E [X_{i}]$ : the foundational reason both work is that a global count decomposes into local indicators whose averages add without regard to dependence. Szele's theorem is exactly this identity counting Hamiltonian paths, and the Ramsey, independent-set, and girth-chromatic theorems are exactly this identity counting defects, with the averaging principle converting $E [X]$ into a usable sample in both directions. The central insight is that the deletion method generalises the first moment: where the bare first moment demands $E [defects] < 1$ and so caps the size of the object, alteration tolerates $E [defects]$ up to a fraction of the size and removes the excess, which is exactly why $R (k, k)$ improves by a linear factor $k$ . This is dual to the second-moment method, which controls concentration rather than mean: linearity locates the average, variance certifies a sample near or beyond it, and deletion repairs a sample that overshoots into defect territory. Putting these together, the probabilistic method is a tower — first moment, then alteration, then second moment, then the Lovász Local Lemma — and the bridge upward at each level is the same termwise averaging, applied to ever more delicate functionals of the random structure, culminating in dependent random choice, where the deletion is performed on subsets to extract a high-codegree core.

Full proof set Master

Proposition 1 (averaging existence principle). Let $X$ be a real random variable on a finite probability space with $Pr (ω) > 0$ for all $ω$ . Then there exist $ω_{+}, ω_{-}$ with $X (ω_{+}) \geq E [X] \geq X (ω_{-})$ .

Proof. Suppose, for contradiction, $X (ω) < E [X]$ for every $ω$ . Then $E [X] = \sum_{ω} Pr (ω) X (ω) < \sum_{ω} Pr (ω) E [X] = E [X] \sum_{ω} Pr (ω) = E [X]$ , a strict inequality of a number with itself. Hence some $ω_{+}$ has $X (ω_{+}) \geq E [X]$ . Applying the same argument to $- X$ produces $ω_{-}$ with $X (ω_{-}) \leq E [X]$ . $□$

Proposition 2 (linearity, finite form). For random variables $X_{1}, \dots, X_{n}$ on a common finite space and scalars $c_{i}$ , $E [\sum_{i} c_{i} X_{i}] = \sum_{i} c_{i} E [X_{i}]$ .

Proof. Expand the definition and exchange the two finite sums: $$ \mathbb{E}\Big[\sum_i c_i X_i\Big] = \sum_\omega \Pr(\omega)\sum_i c_i X_i(\omega) = \sum_i c_i \sum_\omega \Pr(\omega) X_i(\omega) = \sum_i c_i ,\mathbb{E}[X_i]. $$ The interchange is valid because both index sets are finite; no independence or integrability hypothesis enters. $□$

Proposition 3 (Szele's bound). Some tournament on $[n]$ has at least $n! / 2^{n - 1}$ directed Hamiltonian paths.

Proof. Orient each edge of $K_{n}$ independently and uniformly. For a permutation $σ$ , the indicator $X_{σ}$ that the directed path along $σ$ is present satisfies $E [X_{σ}] = 2^{- (n - 1)}$ , as the $n - 1$ consecutive arcs lie on distinct edges, each correctly oriented with probability $\frac{1}{2}$ , and distinct edges are oriented independently. With $X = \sum_{σ} X_{σ}$ , Proposition 2 gives $E [X] = n! 2^{- (n - 1)} = n! / 2^{n - 1}$ , and Proposition 1 furnishes a tournament $T$ with $X (T) \geq n! / 2^{n - 1}$ . $□$

Proposition 4 (Ramsey lower bound by deletion). For all $n, k$ , $R (k, k) > n - (k n) 2^{1 - (2 k)}$ ; optimising the choice of $n$ yields $R (k, k) > (1/ e) (1 + o (1)) k 2^{k /2}$ .

Proof. Two-colour the edges of $K_{n}$ uniformly at random. Let $X = \sum_{∣ S ∣ = k} 1 [S monochromatic]$ ; then $E [X] = (k n) 2^{1 - (2 k)}$ by Proposition 2, each $k$ -set being monochromatic with probability $2^{1 - (2 k)}$ . By Proposition 1 fix a colouring $χ$ with $X (χ) \leq E [X]$ . Remove one vertex from each monochromatic $K_{k}$ : this deletes at most $X (χ) \leq E [X]$ vertices and leaves a clique on $\geq n - (k n) 2^{1 - (2 k)}$ vertices whose induced colouring has no monochromatic $K_{k}$ . Therefore $R (k, k)$ exceeds that quantity. Writing $(k n) 2^{1 - (2 k)} \leq \frac{n ^{k}}{k !} 2^{1 - k (k - 1) /2}$ and maximising $f (n) = n - \frac{n ^{k}}{k !} 2^{1 - k (k - 1) /2}$ over $n$ — the optimum is at $f^{'} (n) = 0$ , i.e. $n^{k - 1} = k! 2^{(k - 1) (k - 2) /2 - 1} / k \cdot \dots$ , giving $n = (1 + o (1)) (1/ e) k 2^{k /2}$ after Stirling — yields the stated asymptotic. $□$

Proposition 5 (independent set by deletion). Every graph with $n$ vertices and $m \geq n /2$ edges has an independent set of size at least $n^{2} / (4 m)$ .

Proof. Retain each vertex independently with probability $p$ , forming $S$ . By Proposition 2, $E [∣ S ∣] = p n$ and $E [e (S)] = p^{2} m$ , since each edge has both endpoints retained with probability $p^{2}$ . Delete one endpoint of each retained edge to obtain an independent set $S^{'}$ with $∣ S^{'} ∣ \geq ∣ S ∣ - e (S)$ , so $E [∣ S^{'} ∣] \geq p n - p^{2} m$ . Put $p = n / (2 m) \in (0, 1]$ (valid as $m \geq n /2$ ): then $p n - p^{2} m = \frac{n ^{2}}{2 m} - \frac{n ^{2}}{4 m} = \frac{n ^{2}}{4 m}$ . Proposition 1 supplies an outcome with $∣ S^{'} ∣ \geq n^{2} / (4 m)$ , an independent set of that size. $□$

Proposition 6 (Erdős girth-chromatic). For every $g, k$ there is a graph $H$ with girth $(H) > g$ and $χ (H) > k$ .

Proof. Fix $θ \in (0, 1/ g)$ , $p = n^{θ - 1}$ , and form $G \sim G (n, p)$ . Let $X$ be the number of cycles of length at most $g$ . The expected number of $i$ -cycles is $(i n) \frac{( i - 1 )!}{2} p^{i} \leq \frac{1}{2 i} (n p)^{i} = \frac{1}{2 i} n^{θ i}$ , so $E [X] \leq \sum_{i = 3}^{g} \frac{1}{2 i} n^{θ i} = o (n)$ as $θ g < 1$ ; Markov gives $Pr (X \geq n /2) \to 0$ . Let $a = ⌈ 3 p^{- 1} ln n ⌉$ . Then $Pr (α (G) \geq a) \leq (a n) (1 - p)^{(2 a)} \leq n^{a} e^{- p a (a - 1) /2} = (n e^{- p (a - 1) /2})^{a} \to 0$ , since $p (a - 1) /2 \geq \frac{3}{2} ln n - O (p) > ln n$ for large $n$ . Choose $n$ large enough that both events fail: a sample $G$ has $X < n /2$ and $α (G) < a$ . Delete one vertex from each short cycle, producing $H$ on $> n /2$ vertices with girth $> g$ and $α (H) \leq α (G) < a$ . Then $χ (H) \geq ∣ V (H) ∣/ α (H) > (n /2) / a = Ω (n^{θ} / lo g n)$ , which exceeds $k$ for $n$ large. $□$

Connections Master

The first-moment existence principle of this unit is the elementary case of the broader first-moment method developed in 40.07.01; that unit pushes the same averaging idea to threshold phenomena (when does $E [X] \to 0$ force $X = 0$ with high probability), and the indicator-decomposition $X = \sum X_{i}$ used here for Hamiltonian paths and monochromatic cliques is the shared engine. The foundational reason both succeed is that linearity ignores dependence, so the local computation is always elementary.
The deletion method's tolerance of a few defects is sharpened to zero defects under bounded dependence by the Lovász Local Lemma in 40.07.04: where deletion removes the rare bad events after the fact, the local lemma certifies that all bad events can be avoided simultaneously when each depends on few others. The Ramsey and hypergraph-colouring applications recur there with the deletion step replaced by the local-lemma symmetric criterion $e p (D + 1) \leq 1$ , giving a sharper constant.
The max-cut corollary $cut \geq ∣ E ∣/2$ is the existence half of the algorithmic story in 40.07.03: the method of conditional expectations derandomises the random bipartition into a deterministic greedy algorithm that finds a cut of size $\geq ∣ E ∣/2$ in linear time, and the proof that the greedy choice never decreases the conditional expectation is exactly Proposition 1 applied one vertex at a time.

Historical & philosophical context Master

The probabilistic method's first published application is Tibor Szele's 1943 paper ^{[Szele 1943]} on Hamiltonian paths in tournaments, which computed the expected path count and inferred existence of a tournament beating it — linearity of expectation deployed before the technique had a name. The method's decisive entry into mainstream combinatorics is Paul Erdős's 1947 note ^{[Erdős 1947]} establishing $R (k, k) > 2^{k /2}$ by a counting (first-moment) argument: if the expected number of monochromatic $K_{k}$ 's in a random colouring is below one, a colouring avoiding them exists. The three-page note reframed an extremal question as a probability calculation and is conventionally taken as the birth of the field.

The deletion refinement and its most striking consequence are Erdős's 1959 theorem ^{[Erdős 1959]} on graphs of high girth and high chromatic number, which resolved a question of Tutte and Descartes by a non-constructive argument: random graphs have few short cycles and small independence number simultaneously, and deleting one vertex per short cycle yields the desired graph. No explicit construction of comparable strength was known for decades; Lovász (1968) and later Kříž gave constructive versions far weaker quantitatively. Erdős and Spencer's 1974 monograph and Spencer's 1994 Ten Lectures ^{[Spencer 1994]} codified linearity, alteration, the second moment, and the local lemma as the four pillars of the method. Alon's 1990 permanent bound showed Szele's first-moment lower bound is order-optimal, closing the loop on the field's founding example.

Bibliography Master

@article{szele1943,
  author  = {Szele, Tibor},
  title   = {Kombinatorikai vizsg\'{a}latok az ir\'{a}ny\'{\i}tott teljes gr\'{a}ffal kapcsolatban},
  journal = {Matematikai \'{e}s Fizikai Lapok},
  volume  = {50},
  year    = {1943},
  pages   = {223--256}
}

@article{erdos1947ramsey,
  author  = {Erd{\H{o}}s, Paul},
  title   = {Some remarks on the theory of graphs},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {53},
  year    = {1947},
  pages   = {292--294}
}

@article{erdos1959graphprob,
  author  = {Erd{\H{o}}s, Paul},
  title   = {Graph theory and probability},
  journal = {Canadian Journal of Mathematics},
  volume  = {11},
  year    = {1959},
  pages   = {34--38}
}

@article{alon1990permanent,
  author  = {Alon, Noga},
  title   = {The maximum number of Hamiltonian paths in tournaments},
  journal = {Combinatorica},
  volume  = {10},
  year    = {1990},
  pages   = {319--324}
}

@book{spencer1994ten,
  author    = {Spencer, Joel H.},
  title     = {Ten Lectures on the Probabilistic Method},
  series    = {CBMS-NSF Regional Conference Series in Applied Mathematics},
  volume    = {64},
  publisher = {SIAM},
  year      = {1994}
}

@book{alonspencer2016,
  author    = {Alon, Noga and Spencer, Joel H.},
  title     = {The Probabilistic Method},
  edition   = {4},
  publisher = {Wiley-Interscience},
  year      = {2016}
}

@book{jukna2011extremal,
  author    = {Jukna, Stasys},
  title     = {Extremal Combinatorics: With Applications in Computer Science},
  edition   = {2},
  publisher = {Springer},
  year      = {2011}
}

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 2 (linearity of expectation, the splitting argument, Hamiltonian paths in tournaments) and Ch. 3 (alterations, the Ramsey lower bound, high girth and high chromatic number); averaging-and-existence framed for a first-time reader
intermediate: Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 2 §2.1-2.5 (linearity, max-cut, Szele's theorem) and Ch. 3 §3.1-3.3 (the deletion method, R(k,k) lower bound, Erdős's girth-chromatic theorem); Jukna 2011 *Extremal Combinatorics* (Springer) Ch. 18
master: Alon-Spencer 2016 *The Probabilistic Method* 4e Ch. 2-3 and §3.6 (dependent random choice preview); Erdős 1947 *Bull. AMS* 53 (the Ramsey lower bound by counting); Erdős 1959 *Canad. J. Math.* 11 (graphs of high girth and high chromatic number); Szele 1943 *Mat. Fiz. Lapok* 50 (the number of Hamiltonian paths in a tournament)

References

Alon, N. & Spencer, J. H. — The Probabilistic Method · 4th edition, Wiley-Interscience (2016). Chapter 2 develops linearity of expectation: the existence of an object beating the average, the splitting/unbalancing-lights argument, max-cut bounds (every graph has a cut of size at least |E|/2), and Szele's theorem that some tournament on n vertices has at least n!/2^{n-1} Hamiltonian paths. Chapter 3 develops the deletion (alteration) method: the improved Ramsey lower bound R(k,k) > (1/e)(1+o(1)) k 2^{k/2}, independent sets of size at least sum 1/(d_v+1) >= n^2/(2|E|+n) (Turán-type via deletion), and Erdős's theorem that there exist graphs of arbitrarily high girth and arbitrarily high chromatic number, obtained by deleting one vertex from each short cycle of a random graph.
Erdős, P. — Some remarks on the theory of graphs · *Bulletin of the American Mathematical Society* 53 (1947), 292-294. The counting argument giving R(k,k) > 2^{k/2} for k >= 3: if binom(n,k) 2^{1-binom(k,2)} < 1 then a random 2-colouring of K_n has positive probability of avoiding a monochromatic K_k, so such a colouring exists. The seed of the probabilistic method.
Erdős, P. — Graph theory and probability · *Canadian Journal of Mathematics* 11 (1959), 34-38. The existence, for every k and g, of a graph with girth greater than g and chromatic number greater than k. Proof by the deletion method: take a random graph G(n,p) with p = n^{θ-1} for small θ, delete one vertex from each cycle shorter than g, and bound the independence number to force large chromatic number on what remains.
Szele, T. — Kombinatorikai vizsgálatok az irányított teljes gráffal kapcsolatban · *Matematikai és Fizikai Lapok* 50 (1943), 223-256. The first application of the probabilistic method, predating its naming: the expected number of Hamiltonian paths in a random tournament on n vertices is n!/2^{n-1}, so some tournament has at least this many directed Hamiltonian paths.
Spencer, J. H. — Ten Lectures on the Probabilistic Method · CBMS-NSF Regional Conference Series in Applied Mathematics 64, SIAM (1994). Lectures 1-2 on the first moment and alterations, with the Ramsey and Ramsey-Turán applications worked in detail.

Estimated time

beginner: 18m
intermediate: 45m
master: 85m