40.07.01 · combinatorics / probabilistic-method

The Probabilistic Method: First-Moment and Counting Arguments

shipped3 tiersLean: none

Anchor (Master): Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 1-3, 6 (first moment, deletion, second moment, Lovász Local Lemma); Erdős 1947 *Bull. AMS* 53 (the Ramsey lower bound); Erdős 1963/1964 *Nordisk Mat. Tidskr.* / *Israel J. Math.* (property B); Bollobás 1986 *Combinatorics* (Cambridge) Ch. on random colourings

Intuition Beginner

Sometimes the easiest way to prove that a thing exists is to refuse to build it and instead show that a blind random guess would land on it with better-than-zero odds. If you reach into a bag and the chance of pulling out a winning ticket is greater than zero, then at least one winning ticket must be in the bag. That single sentence is the whole engine of the probabilistic method: a property that a random object has with positive probability is a property that some specific object actually has.

The surprise is how far this carries. Suppose you want to colour the connections in a big network so that no small group is connected all in one colour. Building such a colouring by hand looks hopeless. Instead, flip a coin for every connection. If the expected number of "bad" all-one-colour groups comes out to less than one, then some coin-flip outcome must have produced zero bad groups, because you cannot have an average below one if every outcome had at least one. So a good colouring exists, even though we never name it.

This counting-by-averaging idea, due to Paul Erdős, replaced clever explicit constructions with a short calculation. You compare the number of bad arrangements against the total, or you compute an average and find it small, and existence falls out for free.

Visual Beginner

Picture a complete network on six dots, every pair joined by a line, and colour each line red or blue by a coin flip. A "bad event" is a triangle whose three lines all share one colour. We count how many triangles there are and how likely each is to be monochromatic.

quantity	value for $n$ dots, triangle target
number of triangles	$(3 n) = n (n - 1) (n - 2) /6$
chance one fixed triangle is all-one-colour	$2 \times (1/2)^{3} = 1/4$
expected number of bad triangles	$(3 n) \times 1/4$

When the expected number of bad triangles drops below one, some colouring has none. The table makes the trade-off concrete: more dots means more triangles, so the method works only while the count stays small. For larger targets than a triangle the chance shrinks fast, which is exactly what lets the method beat the count.

Worked example Beginner

Show that the edges of the complete network on five dots can be coloured red or blue so that no triangle is all one colour.

Step 1. Count the triangles. A triangle is a choice of three dots out of five, and the number of such choices is $(3 5) = 10$ . So there are ten triangles to worry about.

Step 2. Find the chance one fixed triangle goes bad. Its three edges are each red or blue by an independent fair coin. All three red happens with chance $(1/2) (1/2) (1/2) = 1/8$ , and all three blue likewise with chance $1/8$ . So the chance the triangle is monochromatic is $1/8 + 1/8 = 1/4$ .

Step 3. Average the count of bad triangles. The average number of monochromatic triangles, across all random colourings, is the number of triangles times the per-triangle chance: $10 \times 1/4 = 2.5$ .

Step 4. Read off what the average permits. An average of $2.5$ is bigger than one, so this particular bound does not yet force a good colouring. Drop to four dots: now there are $(3 4) = 4$ triangles and the average is $4 \times 1/4 = 1$ , still not below one.

Step 5. Push the target up instead. Ask for no all-one-colour set of four mutually joined dots among six dots. There are $(4 6) = 15$ such sets, each monochromatic with chance $2 \times (1/2)^{6} = 1/32$ , giving an average of $15/32 < 1$ .

What this tells us: an average below one guarantees an outcome with zero bad sets, so a colouring of six dots avoiding a one-colour group of four exists. The method trades construction for a single division.

Check your understanding Beginner

Formal definition Intermediate+

Fix a finite probability space $(Ω, Pr)$ — a finite sample set $Ω$ with a probability assignment $Pr$ on its subsets. An event is a subset $A \subseteq Ω$ , and a random variable is a function $X : Ω \to R$ with expectation $E [X] = \sum_{ω \in Ω} X (ω) Pr (ω)$ . Expectation is linear: $E [X + Y] = E [X] + E [Y]$ for any random variables, with no independence required. These notions are developed at the basic-probability units 37.01.x; here they are applied, not reproved.

Basic existence principle (first form). If $A \subseteq Ω$ is an event with $Pr (A) > 0$ , then $A \neq = \emptyset$ : there is a sample point $ω$ with $ω \in A$ . Contrapositively, if every $ω$ fails a property $P$ , then $Pr (P) = 0$ .

Basic existence principle (expectation form). Let $X : Ω \to R$ . There is a point $ω_{0}$ with $X (ω_{0}) \leq E [X]$ and a point $ω_{1}$ with $X (ω_{1}) \geq E [X]$ . In particular, if $E [X] < t$ then some point has $X (ω) < t$ , and if $X$ is integer-valued with $E [X] < 1$ then some point has $X (ω) = 0$ .

Union bound (the first-moment method). Let $A_{1}, \dots, A_{m}$ be events, the "bad" events to be avoided. Writing $X = \sum_{i = 1}^{m} 1_{A_{i}}$ for the number of bad events that occur, linearity gives $E [X] = \sum_{i = 1}^{m} Pr (A_{i})$ . By the expectation form, if $$ \sum_{i=1}^{m} \Pr(A_i) < 1, $$ then some sample point lies in none of the $A_{i}$ , i.e. $Pr (⋂_{i} \overline{A_{i}}) > 0$ . This is the first-moment method: control the expected number of bad events and conclude existence of a bad-event-free outcome. The bound never uses independence of the $A_{i}$ and never uses the joint distribution beyond the individual probabilities.

The notation $(k n)$ , $E [\cdot]$ , $1_{A}$ (indicator), and $\overline{A}$ (complement) is registered in _meta/NOTATION.md.

Counterexamples to common slips Intermediate+

"You need the bad events to be independent." The union bound $Pr (⋃ A_{i}) \leq \sum Pr (A_{i})$ and its expectation form hold for arbitrary events; independence is irrelevant. It is precisely this that makes the method robust.
" $E [X] < 1$ gives $Pr (X = 0) > 0$ for any real-valued $X$ ." It gives a point with $X < 1$ . The conclusion $X = 0$ needs $X$ to be a non-negative integer-valued count, where "below one" forces "exactly zero".
"A small expectation means most outcomes are good." It need not: $E [X] < 1$ guarantees one good outcome, not a majority. A few outcomes with large $X$ can coexist with the existence claim; refining "one good outcome" to "many" is the job of the second-moment method (40.07.02).
"The method constructs the object." It certifies existence non-constructively. Turning the guarantee into an algorithm is a separate task — the deletion method, alterations, and the algorithmic Local Lemma address it.

Key theorem with proof Intermediate+

The signature application is Erdős's exponential lower bound on the diagonal Ramsey number, the first triumph of the method ^{[Erdős 1947]}. Recall $R (k, k)$ is the least $n$ such that every red/blue colouring of the edges of the complete graph $K_{n}$ contains a monochromatic $K_{k}$ . The companion unit 40.05.04 states Ramsey's theorem and proves the upper bound $R (k, k) \leq (k - 1 2 k - 2) \leq 4^{k}$ ; the matching lower bound is proved here.

Theorem (Ramsey lower bound; Erdős 1947). *If $(k n) 2^{1 - (2 k)} < 1$ , then there is a red/blue colouring of the edges of $K_{n}$ with no monochromatic $K_{k}$ . Consequently, for $k \geq 3$ ,* $$ R(k, k) > 2^{k/2}. $$

Proof. Colour each of the $(2 n)$ edges of $K_{n}$ independently red or blue, each with probability $1/2$ ; this defines a finite probability space on the $2^{(2 n)}$ colourings. For a fixed set $S$ of $k$ vertices, let $A_{S}$ be the event that the $(2 k)$ edges inside $S$ are all the same colour. There are two monochromatic patterns (all red, all blue) out of $2^{(2 k)}$ equally likely patterns on those edges, so $$ \Pr(A_S) = \frac{2}{2^{\binom{k}{2}}} = 2^{,1 - \binom{k}{2}}. $$ Let $X = \sum_{∣ S ∣ = k} 1_{A_{S}}$ count the monochromatic $k$ -subsets. By linearity of expectation, summing over the $(k n)$ choices of $S$ , $$ \mathbb{E}[X] = \sum_{|S| = k} \Pr(A_S) = \binom{n}{k}, 2^{,1 - \binom{k}{2}}. $$ By hypothesis $E [X] < 1$ . Since $X$ is a non-negative integer-valued random variable, the expectation form of the existence principle yields a colouring with $X = 0$ — a colouring with no monochromatic $K_{k}$ . By definition of the Ramsey number, $R (k, k) > n$ .

It remains to extract the bound $R (k, k) > 2^{k /2}$ . Take $n = ⌊ 2^{k /2} ⌋$ . Using $(k n) \leq n^{k} / k!$ and $(2 k) = k (k - 1) /2$ , $$ \binom{n}{k},2^{1 - \binom{k}{2}} ;\le; \frac{n^k}{k!},2^{1 - k(k-1)/2} ;\le; \frac{2^{k^2/2}}{k!},\cdot, 2^{1 - k(k-1)/2} ;=; \frac{2^{1 + k/2}}{k!}. $$ For $k \geq 3$ the right-hand side is below one (at $k = 3$ it is $2^{2.5} /6 \approx 0.94 < 1$ , and $k!$ outgrows $2^{1 + k /2}$ thereafter), so the hypothesis holds with $n = ⌊ 2^{k /2} ⌋$ and therefore $R (k, k) > 2^{k /2}$ . $□$

Bridge. This proof builds toward the entire probabilistic-method chapter: the same template — define a random object, bound the expected number of defects, invoke $E [X] < 1$ — appears again in the tournament, dominating-set, and property-B arguments below, and the foundational reason it works is the expectation form of the existence principle, that an integer count with mean below one must vanish somewhere. The bound $R (k, k) > 2^{k /2}$ is exactly the lower companion to the constructive-looking upper bound $R (k, k) \leq 4^{k}$ of 40.05.04, so putting these together pins the diagonal Ramsey number between $2^{k /2}$ and $4^{k}$ — a gap, narrowed only by constant factors in seventy years, that is the central insight motivating the second-moment and Lovász-Local-Lemma refinements of 40.07.02. The first-moment estimate generalises from "no monochromatic $K_{k}$ " to any family of forbidden substructures whose total expected count is small.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Let $G$ be a graph on $n$ vertices with $m$ edges. Prove that $G$ contains a bipartite subgraph with at least $m /2$ edges, by colouring each vertex black or white independently with probability $1/2$ and counting the edges across the cut.

Hint

An edge is "cut" if its endpoints get different colours, which happens with probability $1/2$ . Compute the expected number of cut edges and apply the existence principle.

Answer

Colour each vertex black or white by an independent fair coin. For an edge $e = uv$ , let $1_{e}$ indicate that $u, v$ receive different colours; then $Pr (1_{e} = 1) = 1/2$ (two of the four colour pairs split the edge). Let $X = \sum_{e} 1_{e}$ be the number of edges crossing the black/white cut. By linearity, $E [X] = \sum_{e} 1/2 = m /2$ . By the expectation form of the existence principle, some colouring achieves $X \geq E [X] = m /2$ . The edges crossing that cut form a bipartite subgraph (each part is one colour class) with at least $m /2$ edges. Rubric: full credit for the per-edge probability $1/2$ , the linearity computation $E [X] = m /2$ , and the " $\geq$ mean" direction of the principle.

Exercise 4 (medium, symbolic).

A tournament is a complete graph with each edge oriented. Say it has property $S_{k}$ if for every set of $k$ players there is a player who beats all of them. Show that if $(k n) (1 - 2^{- k})^{n - k} < 1$ , then some tournament on $n$ players has property $S_{k}$ .

Hint

Orient each game by an independent fair coin. For a fixed $k$ -set $K$ , bound the probability that no outside player beats everyone in $K$ : each of the $n - k$ outsiders beats all of $K$ with probability $2^{- k}$ , independently.

Answer

Orient each of the $(2 n)$ games independently by a fair coin. Fix a $k$ -set $K$ . A fixed player $v \in / K$ beats all $k$ members of $K$ with probability $2^{- k}$ , and these events are independent across the $n - k$ outsiders (they involve disjoint games). So the probability that no outsider beats all of $K$ is $(1 - 2^{- k})^{n - k}$ . Let $A_{K}$ be this "bad" event and $X = \sum_{∣ K ∣ = k} 1_{A_{K}}$ . By linearity, $E [X] = (k n) (1 - 2^{- k})^{n - k}$ . If this is $< 1$ , the existence principle gives an orientation with $X = 0$ : every $k$ -set is dominated by some outsider, i.e. property $S_{k}$ holds. Rubric: full credit for the independence of outsiders, the bad-event probability $(1 - 2^{- k})^{n - k}$ , and the union-bound conclusion. (Since $(1 - 2^{- k})^{n - k} \leq e^{- (n - k) 2^{- k}}$ , the hypothesis holds once $n$ is large enough — about $k^{2} 2^{k} ln 2$ .)

Exercise 5 (medium, symbolic).

Let $H$ be an $n$ -uniform hypergraph (every edge has exactly $n$ vertices) with fewer than $2^{n - 1}$ edges. Prove $H$ is 2-colourable: its vertices can be coloured red/blue with no edge monochromatic. Conclude $m (n) \geq 2^{n - 1}$ , where $m (n)$ is the least number of edges in a non-2-colourable $n$ -uniform hypergraph.

Hint

Colour each vertex independently by a fair coin. A fixed edge of size $n$ is monochromatic with probability $2 \cdot 2^{- n} = 2^{1 - n}$ . Apply the first-moment method to the number of monochromatic edges.

Answer

Colour each vertex red or blue independently with probability $1/2$ . A fixed edge $e$ (with $n$ vertices) is all-red with probability $2^{- n}$ and all-blue with probability $2^{- n}$ , so $Pr (e monochromatic) = 2^{1 - n}$ . Let $X$ count monochromatic edges; by linearity $E [X] = (# edges) \cdot 2^{1 - n}$ . If $# edges < 2^{n - 1}$ then $E [X] < 2^{n - 1} \cdot 2^{1 - n} = 1$ , so some colouring has $X = 0$ , a proper 2-colouring. Hence every $n$ -uniform hypergraph with fewer than $2^{n - 1}$ edges is 2-colourable, so a non-2-colourable one needs at least $2^{n - 1}$ edges: $m (n) \geq 2^{n - 1}$ . Rubric: full credit for the monochromatic-edge probability $2^{1 - n}$ , the threshold computation, and the conclusion about $m (n)$ .

Exercise 6 (hard, symbolic).

Let $G$ be a graph in which every vertex has degree at least $δ \geq 1$ . Prove $G$ has a dominating set $D$ (every vertex is in $D$ or adjacent to a vertex of $D$ ) with $$ |D| \le \frac{n\big(1 + \ln(\delta + 1)\big)}{\delta + 1}. $$

Hint

Pick each vertex into a random set $A$ independently with probability $p$ . Then add, for each vertex not dominated by $A$ , that vertex itself. Bound $E [∣ A ∣]$ and $E [# undominated]$ , then optimise $p = ln (δ + 1) / (δ + 1)$ via $1 - p \leq e^{- p}$ .

Answer

Include each vertex in a random set $A$ independently with probability $p$ . Let $Y$ be the set of vertices neither in $A$ nor adjacent to $A$ ; then $D = A \cup Y$ is a dominating set. A fixed vertex $v$ lies in $Y$ only if $v$ and all its $\geq δ$ neighbours are excluded from $A$ , which has probability at most $(1 - p)^{δ + 1}$ (its closed neighbourhood has $\geq δ + 1$ vertices). By linearity, $E [∣ A ∣] = n p$ and $E [∣ Y ∣] \leq n (1 - p)^{δ + 1} \leq n e^{- p (δ + 1)}$ . Hence $$ \mathbb{E}[|D|] \le \mathbb{E}[|A|] + \mathbb{E}[|Y|] \le np + n e^{-p(\delta + 1)}. $$ Setting $p = \frac{ln ( δ + 1 )}{δ + 1}$ gives $n p = \frac{n ln ( δ + 1 )}{δ + 1}$ and $n e^{- p (δ + 1)} = \frac{n}{δ + 1}$ , so $E [∣ D ∣] \leq \frac{n ( 1 + ln ( δ + 1 ))}{δ + 1}$ . By the existence principle some outcome achieves $∣ D ∣ \leq E [∣ D ∣]$ , giving the bound. Rubric: full credit for the two-stage construction $D = A \cup Y$ , the bound $E [∣ Y ∣] \leq n (1 - p)^{δ + 1}$ , the inequality $1 - p \leq e^{- p}$ , and the optimal $p$ .

Exercise 7 (hard, short-answer).

Explain in one paragraph why the first-moment Ramsey lower bound $R (k, k) > 2^{k /2}$ and the counting upper bound $R (k, k) \leq 4^{k}$ are "the same kind of argument run in opposite directions", and name what would be needed to close the constant in the exponent base from $2$ toward $2$ .

Hint

Both compare a count of monochromatic $k$ -sets against a threshold. The lower bound says "expected count below one, so a good colouring exists"; the upper bound is a deterministic pigeonhole/recursion. The gap base $2$ vs $2$ has resisted improvement for decades.

Answer

Both arguments turn on the count of monochromatic $k$ -cliques. The lower bound is the first moment: a random colouring has expected count $(k n) 2^{1 - (2 k)}$ , and when this is below one a colouring with zero monochromatic cliques exists, forcing $R (k, k) > n \approx 2^{k /2}$ . The upper bound (40.05.04) is the dual deterministic statement: in any 2-colouring of $K_{n}$ with $n \geq (k - 1 2 k - 2)$ , a neighbourhood-majority recursion forces a monochromatic $K_{k}$ to appear, giving $R (k, k) \leq 4^{k}$ . So one direction says "few enough expected cliques to avoid them all", the other "enough vertices to force one"; together they trap $R (k, k)$ between bases $2 = 2^{1/2}$ and $2 = 4^{1/2}$ in $R (k, k)^{1/ k}$ . Closing the gap would require either a denser explicit or random-like construction beating $2^{k /2}$ (the Conlon-type improvements shave only sub-exponential factors), or a fundamentally better counting upper bound; the recent breakthrough of Campos-Griffiths-Morris-Sahasrabudhe (2023) lowered the base of the upper bound below $4$ for the first time, but the multiplicative exponent gap remains. Rubric: full credit for identifying the shared clique-count, the existence-vs-forcing duality, and naming a denser construction or improved upper-bound counting as the missing ingredient.

Advanced results Master

The first moment is the base of a tower of refinements. Each sharpens the conclusion "one good outcome exists" by adding control over the distribution of the defect count, not merely its mean.

Theorem 1 (alteration / deletion bound for Ramsey). For every $n$ and $k$ , $R (k, k) > n - (k n) 2^{1 - (2 k)}$ . Two-colour $K_{n}$ at random; with $E [X] = (k n) 2^{1 - (2 k)}$ monochromatic $k$ -sets in expectation, fix a colouring with at most this many, then delete one vertex from each monochromatic $k$ -set. The surviving graph on at least $n - E [X]$ vertices has no monochromatic $K_{k}$ . Optimising $n$ improves the constant: it yields $R (k, k) > (1 + o (1)) \frac{k}{e} 2^{k /2}$ rather than the bare $2^{k /2}$ . The deletion method is developed in 40.07.02.

Theorem 2 (the gap and its slow closure). The diagonal Ramsey number satisfies $2 \leq lim inf_{k} R (k, k)^{1/ k} \leq lim sup_{k} R (k, k)^{1/ k} \leq 4$ , where the lower constant is Erdős's first-moment bound (improved by a polynomial factor through the second moment and the Lovász Local Lemma, Spencer 1975) and the upper constant is the Erdős-Szekeres count. The base of the upper bound stood at $4$ from 1935 until Campos, Griffiths, Morris, and Sahasrabudhe (2023) proved $R (k, k) \leq (4 - ε)^{k}$ for an absolute $ε > 0$ . No exponential-base improvement to the lower bound is known: the first-moment value $2$ has resisted all attempts, and producing an explicit colouring matching even $2^{k /2}$ is a notorious open problem (the best explicit constructions, via Frankl-Wilson and later Conlon-type designs, are far weaker).

Theorem 3 (Lovász Local Lemma sharpening). When the bad events $A_{i}$ each have probability $\leq p$ and each is mutually independent of all but at most $d$ of the others, the symmetric Local Lemma gives $Pr (⋂ \overline{A_{i}}) > 0$ provided $e p (d + 1) \leq 1$ — a local condition replacing the global $\sum Pr (A_{i}) < 1$ . For the Ramsey colouring this multiplies the achievable $n$ by a factor polynomial in $k$ , and for property B it yields $m (n) \geq c 2^{n} n / ln n$ , improving the first-moment $2^{n - 1}$ . The Local Lemma is the chapter's apex existence tool (40.07.03).

Theorem 4 (sharp threshold for property B). The first moment gives $m (n) \geq 2^{n - 1}$ and a random-plus-counting argument gives $m (n) = O (n^{2} 2^{n})$ (Erdős 1964); Radhakrishnan-Srinivasan (2000) sharpened the lower bound to $m (n) = Ω (2^{n} n / ln n)$ via a clever random recolouring, and the exact growth rate of $m (n) / 2^{n}$ remains open between $n / ln n$ and $n^{2}$ . The two-sided estimate exhibits the method's typical shape: a clean exponential main term pinned quickly, with the polynomial correction the residue of genuine difficulty.

Synthesis. Putting these together, the first-moment method is the foundational reason the whole probabilistic-method edifice coheres: every refinement — deletion, the second moment, the Local Lemma — is a way of extracting more from the same random object than its mean, and each reduces in its degenerate case to the plain inequality $E [X] < 1 \Rightarrow \exists ω : X (ω) = 0$ . The Ramsey gap $2 \leq R (k, k)^{1/ k} \leq 4$ is exactly the signature of this: the lower constant is the first moment's verdict and the upper is a forcing count, and the central insight is that they are dual — existence by averaging below a threshold against existence by pigeonhole above one — so closing the gap is the open problem of either beating the average from below or the forcing from above. This generalises the single Ramsey calculation into a method: define a random structure, identify the bad events, sum their probabilities, and either conclude directly (first moment), delete the few defects (alteration), or localise the dependencies (Local Lemma). The bridge from this unit to the rest of the chapter is that all three share one engine, and the deletion and second-moment continuations of 40.07.02 are precisely the next two turns of that engine.

Full proof set Master

Proposition 1 (expectation form of the existence principle). Let $(Ω, Pr)$ be a finite probability space and $X : Ω \to R$ . Then there exist $ω_{0}, ω_{1} \in Ω$ with $X (ω_{0}) \leq E [X] \leq X (ω_{1})$ . If $X$ is integer-valued and $E [X] < 1$ then some $ω$ has $X (ω) = 0$ .

Proof. Write $μ = E [X] = \sum_{ω} X (ω) Pr (ω)$ , a weighted average with weights $Pr (ω) \geq 0$ summing to $1$ over the support. If $X (ω) > μ$ for every $ω$ in the support, then $\sum_{ω} X (ω) Pr (ω) > μ \sum_{ω} Pr (ω) = μ$ , contradicting $E [X] = μ$ ; so some $ω_{0}$ has $X (ω_{0}) \leq μ$ . The symmetric argument applied to $- X$ gives $ω_{1}$ with $X (ω_{1}) \geq μ$ . For the last clause, if $X$ is integer-valued and $μ < 1$ , take $ω_{0}$ with $X (ω_{0}) \leq μ < 1$ ; since $X (ω_{0})$ is a non-negative integer (assuming $X \geq 0$ ) below $1$ , it equals $0$ . $□$

Proposition 2 (union-bound existence criterion). Let $A_{1}, \dots, A_{m}$ be events. If $\sum_{i = 1}^{m} Pr (A_{i}) < 1$ then $Pr (⋂_{i = 1}^{m} \overline{A_{i}}) > 0$ ; in particular the intersection of complements is nonempty.

Proof. Let $X = \sum_{i = 1}^{m} 1_{A_{i}}$ , the number of events that occur; $X$ is a non-negative integer-valued random variable. By linearity of expectation, $E [X] = \sum_{i = 1}^{m} E [1_{A_{i}}] = \sum_{i = 1}^{m} Pr (A_{i}) < 1$ . By Proposition 1 some $ω$ has $X (ω) = 0$ , meaning $ω \in / A_{i}$ for all $i$ , i.e. $ω \in ⋂_{i} \overline{A_{i}}$ . Since $Pr (ω) \geq 0$ and the set is nonempty, $Pr (⋂_{i} \overline{A_{i}}) \geq Pr (ω) > 0$ on a finite space where every sample point has positive probability (or, in general, the event is nonempty, which is all the existence claim requires). $□$

Proposition 3 (Ramsey lower bound, full statement). If $(k n) 2^{1 - (2 k)} < 1$ then $R (k, k) > n$ ; hence $R (k, k) > 2^{k /2}$ for all $k \geq 3$ .

Proof. The colouring space is the uniform measure on ${0, 1}^{(2 n)}$ . For each $k$ -set $S$ , the event $A_{S}$ "the $(2 k)$ edges within $S$ are monochromatic" has probability $2 \cdot 2^{- (2 k)} = 2^{1 - (2 k)}$ , since exactly two of the $2^{(2 k)}$ patterns on those edges are constant. By Proposition 2 with the $(k n)$ events ${A_{S}}$ , the hypothesis $\sum_{S} Pr (A_{S}) = (k n) 2^{1 - (2 k)} < 1$ yields a colouring in $⋂_{S} \overline{A_{S}}$ : no $k$ -set is monochromatic, so $K_{n}$ has no monochromatic $K_{k}$ and $R (k, k) > n$ . Taking $n = ⌊ 2^{k /2} ⌋$ and bounding $(k n) \leq n^{k} / k! \leq 2^{k^{2} /2} / k!$ gives $(k n) 2^{1 - (2 k)} \leq 2^{1 + k /2} / k! < 1$ for $k \geq 3$ , so $R (k, k) > ⌊ 2^{k /2} ⌋$ , hence $> 2^{k /2}$ (strictly, since the inequality is strict and $R$ is an integer exceeding $n$ ). $□$

Proposition 4 (property B lower bound). Every $n$ -uniform hypergraph with fewer than $2^{n - 1}$ edges is 2-colourable; equivalently $m (n) \geq 2^{n - 1}$ .

Proof. Colour vertices red/blue independently with probability $1/2$ each. For an edge $e$ of size $n$ , the event $B_{e}$ " $e$ is monochromatic" has $Pr (B_{e}) = 2 \cdot 2^{- n} = 2^{1 - n}$ . If the hypergraph has $M < 2^{n - 1}$ edges, then $\sum_{e} Pr (B_{e}) = M \cdot 2^{1 - n} < 2^{n - 1} \cdot 2^{1 - n} = 1$ . By Proposition 2 there is a colouring avoiding every $B_{e}$ , i.e. a proper 2-colouring. Contrapositively, a non-2-colourable $n$ -uniform hypergraph has at least $2^{n - 1}$ edges, so $m (n) \geq 2^{n - 1}$ . $□$

Connections Master

The deletion (alteration) and second-moment continuations live in 40.07.02: where the first moment proves a good outcome exists, deletion repairs an almost-good outcome by removing its few defects, and the second moment upgrades " $E [X] < 1$ forces $X = 0$ somewhere" to " $X$ is concentrated near its mean", which is what lower-bounds rather than upper-bounds counts. Both are read here as further turns of the single engine $E [X] ≶ t \Rightarrow \exists ω$ .
The Ramsey lower bound here is the exact lower companion to Ramsey's theorem and its upper bound $R (k, k) \leq 4^{k}$ in 40.05.04: that unit owns the statement of Ramsey's theorem and the forcing (pigeonhole-recursion) upper bound and cross-references this unit for the probabilistic lower bound, so together they trap $R (k, k)^{1/ k}$ between $2$ and $4$ .
The whole method applies the finite-probability and expectation apparatus of the basic-probability units 37.01.x — finite sample spaces, indicators, and the linearity $E [\sum 1_{A_{i}}] = \sum Pr (A_{i})$ that needs no independence — without reproving any measure theory; this unit is the combinatorial pay-off of that linearity, which is the one property of expectation the first-moment method actually consumes.

Historical & philosophical context Master

The method originates with Paul Erdős's 1947 note in the Bulletin of the American Mathematical Society ^{[Erdős 1947]}, a three-page paper proving $R (k, k) > 2^{k /2}$ by the counting argument above. Erdős did not phrase it in the language of probability — he compared the number of colourings containing a monochromatic $K_{k}$ against the total $2^{(2 n)}$ and observed the former is smaller — but the calculation is exactly the first-moment bound, and it is conventionally regarded as the founding instance of the probabilistic method. The reframing in explicitly probabilistic terms, with expectation and the union bound as named tools, was consolidated over the following decades and codified in the Alon-Spencer monograph ^{[Alon-Spencer 2016]}.

The property-B threshold traces to Erdős's 1963-64 papers ^{[Erdős 1964]}, which introduced $m (n)$ and established $2^{n - 1} \leq m (n) = O (n^{2} 2^{n})$ by pairing the first-moment lower bound with a random construction for the upper. Bernstein had used a counting argument of similar spirit for a colouring problem as early as 1908, but Erdős made the technique systematic and demonstrated its reach across Ramsey theory, hypergraph colouring, and number theory. Spencer's 1975 use of the Lovász Local Lemma and Radhakrishnan-Srinivasan's 2000 recolouring refined the constants without displacing the first moment as the entry point.

Bibliography Master

@article{erdos1947ramsey,
  author  = {Erd\H{o}s, Paul},
  title   = {Some remarks on the theory of graphs},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {53},
  pages   = {292--294},
  year    = {1947}
}

@article{erdos1964propertyB,
  author  = {Erd\H{o}s, Paul},
  title   = {On a combinatorial problem. II},
  journal = {Acta Mathematica Academiae Scientiarum Hungaricae},
  volume  = {15},
  pages   = {445--447},
  year    = {1964}
}

@book{alonspencer2016,
  author    = {Alon, Noga and Spencer, Joel H.},
  title     = {The Probabilistic Method},
  edition   = {4},
  publisher = {Wiley},
  year      = {2016}
}

@article{spencer1975ramsey,
  author  = {Spencer, Joel},
  title   = {Ramsey's theorem -- a new lower bound},
  journal = {Journal of Combinatorial Theory, Series A},
  volume  = {18},
  pages   = {108--115},
  year    = {1975}
}

@article{radhakrishnan2000property,
  author  = {Radhakrishnan, Jaikumar and Srinivasan, Aravind},
  title   = {Improved bounds and algorithms for hypergraph 2-coloring},
  journal = {Random Structures \& Algorithms},
  volume  = {16},
  number  = {1},
  pages   = {4--32},
  year    = {2000}
}

@article{campos2023ramsey,
  author  = {Campos, Marcelo and Griffiths, Simon and Morris, Robert and Sahasrabudhe, Julian},
  title   = {An exponential improvement for diagonal Ramsey},
  journal = {arXiv preprint arXiv:2303.09521},
  year    = {2023}
}

Prerequisites

none — this is a leaf unit

Tier anchors

beginner: Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 1 (the basic existence principle, the Ramsey lower bound by counting colourings); a coin-flip / lottery analogy for 'positive probability forces existence'
intermediate: Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 1-2 (the first moment method, union bound, Ramsey $R(k,k) > 2^{k/2}$, tournaments with property $S_k$, dominating sets, hypergraph 2-colourability); Jukna 2011 *Extremal Combinatorics* 2e (Springer) Ch. 17-18
master: Alon-Spencer 2016 *The Probabilistic Method* 4e (Wiley) Ch. 1-3, 6 (first moment, deletion, second moment, Lovász Local Lemma); Erdős 1947 *Bull. AMS* 53 (the Ramsey lower bound); Erdős 1963/1964 *Nordisk Mat. Tidskr.* / *Israel J. Math.* (property B); Bollobás 1986 *Combinatorics* (Cambridge) Ch. on random colourings

References

Alon, N. & Spencer, J. H. — The Probabilistic Method · Wiley, 4th edition (2016). Chapter 1 develops the basic existence principle (a random object with positive probability of a property certifies an object with that property), the first-moment / union-bound method, the Ramsey lower bound $R(k,k) > 2^{k/2}$ via the count $\binom{n}{k} 2^{1 - \binom{k}{2}} < 1$, tournaments with the Schütte property $S_k$, and dominating sets with the bound $\mathbb{E}|D| \le n(1 + \ln(\delta + 1))/(\delta + 1)$. Chapter 1 also treats hypergraph 2-colourability (property B) and the bound $m(n) \ge 2^{n-1}$.
Erdős, P. — Some remarks on the theory of graphs · *Bulletin of the American Mathematical Society* 53 (1947), 292-294. The original counting proof that $R(k,k) > 2^{k/2}$: if $\binom{n}{k} 2^{1 - \binom{k}{2}} < 1$ then there is a red/blue colouring of the edges of $K_n$ with no monochromatic $K_k$, giving the exponential lower bound on the diagonal Ramsey number.
Erdős, P. — On a combinatorial problem · *Nordisk Matematisk Tidskrift* 11 (1963), 5-10, and *Israel Journal of Mathematics* 2 (1964), 183-190. Property B and the function $m(n)$: a counting argument shows every $n$-uniform hypergraph with fewer than $2^{n-1}$ edges is 2-colourable, so $m(n) \ge 2^{n-1}$, with a probabilistic-plus-counting upper bound $m(n) = O(n^2 2^n)$.
Jukna, S. — Extremal Combinatorics · Springer, 2nd edition (2011). Chapters 17-18 present the probabilistic method: the basic existence principle, linearity of expectation, the union bound, the Ramsey lower bound, tournaments, dominating sets, and property B, with the deletion method as a refinement.

Estimated time

beginner: 16m
intermediate: 45m
master: 80m