The Borel-Cantelli Lemmas and the Kolmogorov 0-1 Law
Anchor (Master): Kallenberg 2021 *Foundations of Modern Probability* 3e (Springer) Ch. 4; Billingsley 1995 *Probability and Measure* 3e (Wiley) §4; Durrett 2019 §2.3-2.5
Intuition Beginner
Imagine an endless sequence of trials, one after another forever, and for each trial a yes-or-no event that might happen on that trial. A natural question is whether the event keeps recurring no matter how far out you go, or whether it eventually stops happening once and for all. "Recurs forever" is the idea of an event happening infinitely often: past any point in the sequence, you can still find a later trial where it happens.
The Borel-Cantelli lemmas give a clean test for this, based on adding up the chances. Take the probability of the event on trial one, plus the probability on trial two, plus the probability on trial three, and so on forever. If this running total stays finite, the first lemma says the event happens only finitely many times, with certainty: past some point it never happens again. The picture to keep is rainfall. If the chance of rain on day shrinks fast enough that the total expected number of rainy days is finite, then with certainty only finitely many rainy days ever occur.
The second lemma is a partial converse, and it needs one extra ingredient: the events must not influence each other, the property called independence. If the running total of probabilities adds up to infinity and the events are independent, the second lemma says the event happens infinitely often, again with certainty. So for independent events the test is sharp: a finite total means finitely many occurrences, an infinite total means infinitely many.
There is a striking companion fact. For an independent sequence, any event whose truth does not depend on the first ten trials, or the first thousand, or the first million, no matter how large the starting block you ignore, must have probability either zero or one. Such an event is called a tail event, and the statement that every tail event is certain or impossible is the Kolmogorov zero-one law. There is no middle ground, no genuine fifty-fifty for these long-run questions: the answer is already decided, you just may not know which way.
Visual Beginner
The diagram below contrasts the two regimes side by side: a finite sum of probabilities forces the event to die out, while an infinite sum of independent probabilities forces it to recur without end.
The top row is the first Borel-Cantelli lemma: a finite total of probabilities pins the occurrences down to a finite set. The bottom row is the second lemma: independence plus an infinite total forces endless recurrence. The side box is the Kolmogorov zero-one law, which says any question about only the far end of the sequence has a yes-or-no answer locked in advance.
Worked example Beginner
We test recurrence in two sequences using the sum-of-probabilities idea.
Step 1. First sequence. Suppose event has probability on trial . The running total of probabilities is , the sum of reciprocal squares. This famous sum settles down to a finite number near . Because the total is finite, the first Borel-Cantelli lemma applies, and we conclude that with certainty only finitely many of the events occur. Past some trial, the event stops happening for good.
Step 2. Second sequence. Suppose instead event has probability on trial , and suppose the events are independent. The running total is , the harmonic sum, which grows without bound: it eventually passes any number you name. Because the total is infinite and the events are independent, the second Borel-Cantelli lemma applies, and we conclude that with certainty infinitely many of the events occur. No matter how far out you look, more occurrences are still to come.
Step 3. Compare. The two sequences differ only in how fast the probabilities shrink. Probability shrinks fast enough for a finite total, so the events die out. Probability shrinks too slowly, the total is infinite, and with independence the events recur forever. The boundary between dying out and recurring forever is exactly the boundary between a finite and an infinite total of probabilities.
What this tells us: to decide whether an event recurs forever, add up its probabilities across all trials. A finite total means it stops; an infinite total, for independent events, means it never stops.
Check your understanding Beginner
Formal definition Intermediate+
Let be a probability space: a measurable space together with a measure with 02.07.01, 26.02.01. The symbol denotes the -algebra of events; denotes the complement ; and denote countable union and intersection of events.
Definition (limit superior and limit inferior of events). For a sequence of events , the limit superior and limit inferior are The event is the set of outcomes belonging to for infinitely many , written (read " infinitely often"). The event is the set of outcomes belonging to for all but finitely many , written (read " eventually"). Each is measurable, being built from countable unions and intersections of events. The complement relation holds by De Morgan's laws.
Definition (independence of events). A finite family is independent if for every subset , . An infinite family is independent if every finite subfamily is.
Definition (independence of -algebras). A sequence of sub--algebras of is independent if for every finite choice of indices and events , .
Definition (tail -algebra). Given a sequence of -algebras , the tail -algebra is where is the -algebra generated by all events from index onward. An event is a tail event: its membership is unaffected by altering any finite block of the underlying coordinates. A probability measure is -degenerate on when for every .
Counterexamples to common slips Intermediate+
The second lemma genuinely needs independence. Take a single event with and set for every . The total , but , so , which is neither nor . Constant repetition is maximally dependent, and the conclusion fails. Independence is what upgrades a divergent sum into certain recurrence.
The first lemma needs no independence. The finite-sum direction holds for arbitrary events, dependent or not; its proof uses only countable subadditivity. Adding an independence hypothesis to the first lemma is harmless but unnecessary.
Pairwise independence is not enough for the cleanest tail statements but suffices for the second lemma. The second Borel-Cantelli lemma holds under pairwise independence of the events (Erdős-Rényi), though the textbook proof assumes full independence; the Kolmogorov zero-one law, by contrast, genuinely uses independence of the generating -algebras.
A tail event is not merely an event of small probability. Tail membership is a structural condition (insensitivity to finite blocks), not a numerical one. The event "the first toss is heads" is not a tail event even though one can compute its probability; the event "the running averages converge" is a tail event.
Almost-sure constancy is not pointwise constancy. The zero-one law makes a tail event certain or impossible and makes a tail-measurable random variable almost surely equal to a constant, but the variable can still differ from that constant on a null set. The conclusion lives modulo -null sets.
Key theorem with proof Intermediate+
Theorem (Borel-Cantelli lemmas; Borel 1909, Cantelli 1917). Let be events in .
(i) (First lemma.) If , then .
(ii) (Second lemma.) If the are independent and , then .
Proof. For (i), recall for every . By countable subadditivity of , The right side is the tail of a convergent series, so it tends to as . The left side does not depend on , so .
For (ii), it suffices to show for every , since then is a countable intersection of probability-one events, hence has probability one. Equivalently we show . Fix and any . By independence of the , the complements are independent, so Apply the inequality , valid for all real , to each factor: Because , the partial sums as , so the bound tends to . Hence , using continuity from above of . Therefore for every , and .
Bridge. The two lemmas together build toward a sharp dichotomy for independent sequences: the probability is exactly or according as converges or diverges, and this zero-one alternative appears again in the Kolmogorov 0-1 law as the general principle that is a tail event. The foundational reason the recurrence probability cannot land strictly between and is that does not change when any finite block of the is deleted, so it carries no information localized to early trials; for independent events this insensitivity forces the zero-one collapse. The bridge is from the convergence-of-series test, a deterministic statement about the numbers , to a probabilistic certainty about the path of occurrences, and putting these together is exactly what makes the first Borel-Cantelli lemma the workhorse behind almost-sure convergence proofs (a summable sequence of tail probabilities forces almost-sure convergence) and what makes the second lemma, generalised by the Kolmogorov law, the source of the strong law of large numbers and the record-value and radius-of-convergence dichotomies developed below.
Exercises Intermediate+
Advanced results Master
Theorem 1 (Kolmogorov zero-one law; Kolmogorov 1933). Let be independent sub--algebras of , and let be the tail -algebra. Then is -degenerate: for every . Consequently every -measurable random variable is almost surely constant [Kolmogorov 1933].
Proof. Write for the head -algebras and for the strict tails. By the definition of independence of the , the head is independent of for each (any event in depends only on indices , any event in only on indices , and independence of the blocks is inherited from independence of the generators through a -system argument). Since for every , the head is independent of .
The union is a -system (it is increasing, hence closed under finite intersection) generating . Each is independent of , so the generated -algebra is independent of by the - (Dynkin) theorem. But , so is independent of itself. For any , self-independence gives , whence . A -measurable random variable then has for every ; the cumulative distribution jumps from to at the single value , so almost surely.
Theorem 2 (sharp recurrence dichotomy). For independent events , This is the combination of the two Borel-Cantelli lemmas, and it is consistent with Theorem 1: is a tail event of the independent -algebras , so its probability must be or regardless, and the sum criterion identifies which.
Theorem 3 (Hewitt-Savage zero-one law; Hewitt-Savage 1955). If are independent and identically distributed, then every exchangeable event (an event invariant under finite permutations of the coordinates) has probability or . The exchangeable -algebra contains the tail -algebra, so Hewitt-Savage strictly strengthens Kolmogorov for the i.i.d. case; events such as are exchangeable but not tail, and Hewitt-Savage settles them at or where Kolmogorov is silent [Kolmogorov 1933].
Theorem 4 (record values; Rényi 1962). Let be i.i.d. with a continuous distribution, and let be the event that the -th observation is a record. Then are independent with , and since , the second Borel-Cantelli lemma gives : records occur infinitely often almost surely. Moreover the number of records among the first observations has mean , so records thin out logarithmically yet never cease.
Theorem 5 (Borel's normal number theorem; Borel 1909). Almost every real number in is normal in base : the asymptotic frequency of the digit in its binary expansion equals . The proof realises the binary digits as i.i.d. fair coin tosses under Lebesgue measure, applies the strong law of large numbers (itself proved by a Borel-Cantelli truncation argument), and the exceptional non-normal numbers form a tail-type Lebesgue-null set. This was the original application that motivated Borel's 1909 paper and the lemma now bearing his name [Borel 1909].
Theorem 6 (Kochen-Stone refinement; Kochen-Stone 1964). Without any independence, if then When the events are independent the denominator's off-diagonal terms factor and the bound returns , recovering the second lemma; for weakly dependent events it gives a positive lower bound on the recurrence probability where the bare second lemma does not apply.
Synthesis. The Borel-Cantelli-Kolmogorov circle is the foundational reason that long-run questions about independent sequences admit only certain or impossible answers, and the central insight is that the event , the radius of convergence of a random series, and the convergence of a random series are all tail-measurable, so each is forced to collapse to a constant. This is exactly the mechanism that generalises in three directions at once. First, the recurrence dichotomy of Theorem 2 is dual to the convergence-of-series test: a deterministic sum criterion on the numbers is converted into a probabilistic certainty, and putting these together yields the strong law of large numbers through Borel's normal-number argument and its truncation refinements. Second, the tail-degeneracy of Theorem 1 generalises to the exchangeable zero-one law of Hewitt-Savage and, in the dependent regime, relaxes to the Kochen-Stone lower bound, the bridge being the second-moment method that already appeared in the pairwise-independent strengthening of the second lemma. Third, the first lemma's summable-tail-probability criterion is the load-bearing step behind almost-sure convergence proofs throughout probability: martingale convergence and the law of the iterated logarithm route their hardest steps through a Borel-Cantelli estimate. The central insight, restated, is that insensitivity to finite data plus independence equals certainty, recurring downstream in ergodic theory (the tail -algebra is the germ of the invariant -algebra) and in random-graph theory (zero-one laws for monotone connectivity events).
Full proof set Master
Proposition 1 (independence of complements and the i.o. event is a tail event). If are independent events, then lies in the tail -algebra of , and its complement likewise.
Proof. For each , the event is measurable with respect to . Hence for every fixed (the intersection over equals the intersection over because the sets decrease in , so the early terms are redundant). The right side is -measurable for every , so . The complement is then also tail-measurable since is a -algebra.
Proposition 2 (independence of head and strict tail). With and for independent events , the -algebras and are independent.
Proof. The collection of events of the form with together with the whole space forms a -system generating ; similarly finite intersections with generate and form a -system. By the definition of independence of the family , any event from the first -system and any from the second satisfy , because the indices are disjoint and all factor. Two -algebras whose generating -systems are independent are themselves independent, by the Dynkin - theorem applied twice (fix and let the -system be , then fix and repeat).
Proposition 3 (self-independence forces the zero-one collapse). If a -algebra is independent of itself under , then for every .
Proof. Self-independence means for all . Taking gives , so , i.e. , forcing or .
Proposition 4 (a.s. constancy of a tail variable). If is measurable with respect to a -degenerate -algebra , then there is a constant with almost surely.
Proof. For each , the event lies in , so . The function is non-decreasing, right-continuous, takes only the values and , and (if is real-valued) tends to as and to as . Such a function is the indicator-type step for . Then and , so .
Proposition 5 (first lemma is sharp without the second's hypothesis being necessary for ). Even without independence, does not determine , but always yields . There is no missing converse to the first lemma in the dependent case.
Proof. The first lemma's proof used only subadditivity and the convergent-tail estimate, neither of which references independence, so the implication holds for arbitrary events. For the failure of a converse, the constant sequence with has yet ; and a sequence with arranged so that occurrences are forced to coincide can even give when later events are contained in the failure of earlier ones. Thus the divergence half is genuinely conditional on an independence-type hypothesis, while the convergence half is unconditional.
Connections Master
The first Borel-Cantelli lemma is the engine behind almost-sure convergence criteria built on summable tail probabilities, and it feeds directly into the strong law of large numbers and the truncation arguments that establish it; the strong law's measure-theoretic backbone is the integration theory of 02.07.05.
The tail-degeneracy mechanism of the Kolmogorov 0-1 law reappears as the load-bearing structural fact in the construction of conditional expectation and martingale convergence, where uniformly integrable martingales converge almost surely to a tail-measurable limit; the relevant convergence theorems are the dominated and Fatou results of 02.07.05.
The measurability of the limsup and liminf of events, and the - machinery that proves independence of generated -algebras, are pure consequences of the -algebra axioms developed in 02.07.01; the present unit specialises that abstract apparatus to the independence setting.
The recurrence dichotomy and the normal-number theorem are the probabilistic face of the elementary probability rules and distributions in 26.02.01, lifting finite-sample statements to almost-sure limit statements about infinite sequences.
The zero-one phenomenon generalises to the Hewitt-Savage law for i.i.d. exchangeable events and onward to the invariant -algebra of ergodic theory, where tail-degeneracy is the germ of ergodicity; this connects to the strong law via Birkhoff's ergodic theorem 26.02.01.
Historical & philosophical context Master
Émile Borel introduced the convergence half of the lemma in his 1909 paper Les probabilités dénombrables et leurs applications arithmétiques (Rendiconti del Circolo Matematico di Palermo 27, 247-271), where the motivating application was the metric theory of continued fractions and the normality of almost every real number [Borel 1909]. Borel's treatment of the divergence half was incomplete by modern standards; Francesco Paolo Cantelli supplied a rigorous account of the independence-driven converse in 1917 (Atti della Reale Accademia Nazionale dei Lincei 26:1, 39-45), and the paired statement has carried both names since.
Andrey Kolmogorov placed the zero-one law within his axiomatic foundation of probability in the 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Springer; English translation Foundations of the Theory of Probability, Chelsea 1956), §IV.6, where the tail -algebra is identified as the carrier of asymptotic events and shown to collapse to under independence [Kolmogorov 1933]. The law was the decisive technical instrument that made the strong law of large numbers a theorem about a fixed probability space rather than a limiting statement about finite samples. Edwin Hewitt and Leonard Savage extended the zero-one collapse to exchangeable events for i.i.d. sequences in 1955 (Transactions of the American Mathematical Society 80, 470-501), and Alfréd Rényi developed the record-value theory and the second-moment generalisations in the early 1960s.
Bibliography Master
@article{Borel1909,
author = {Borel, {\'E}mile},
title = {Les probabilit{\'e}s d{\'e}nombrables et leurs applications arithm{\'e}tiques},
journal = {Rendiconti del Circolo Matematico di Palermo},
volume = {27},
pages = {247--271},
year = {1909}
}
@article{Cantelli1917,
author = {Cantelli, Francesco Paolo},
title = {Sulla probabilit{\`a} come limite della frequenza},
journal = {Atti della Reale Accademia Nazionale dei Lincei},
volume = {26},
number = {1},
pages = {39--45},
year = {1917}
}
@book{Kolmogorov1933,
author = {Kolmogorov, Andrey N.},
title = {Grundbegriffe der Wahrscheinlichkeitsrechnung},
publisher = {Springer},
year = {1933},
note = {English translation: Foundations of the Theory of Probability, Chelsea, 1956}
}
@article{HewittSavage1955,
author = {Hewitt, Edwin and Savage, Leonard J.},
title = {Symmetric measures on Cartesian products},
journal = {Transactions of the American Mathematical Society},
volume = {80},
pages = {470--501},
year = {1955}
}
@article{KochenStone1964,
author = {Kochen, Simon and Stone, Charles},
title = {A note on the Borel-Cantelli lemma},
journal = {Illinois Journal of Mathematics},
volume = {8},
pages = {248--251},
year = {1964}
}
@book{Durrett2019,
author = {Durrett, Rick},
title = {Probability: Theory and Examples},
edition = {5},
publisher = {Cambridge University Press},
year = {2019}
}