37.02.01 · probability / 02-independence-laws-of-large-numbers

The Borel-Cantelli Lemmas and the Kolmogorov 0-1 Law

shipped3 tiersLean: none

Anchor (Master): Kallenberg 2021 *Foundations of Modern Probability* 3e (Springer) Ch. 4; Billingsley 1995 *Probability and Measure* 3e (Wiley) §4; Durrett 2019 §2.3-2.5

Intuition Beginner

Imagine an endless sequence of trials, one after another forever, and for each trial a yes-or-no event that might happen on that trial. A natural question is whether the event keeps recurring no matter how far out you go, or whether it eventually stops happening once and for all. "Recurs forever" is the idea of an event happening infinitely often: past any point in the sequence, you can still find a later trial where it happens.

The Borel-Cantelli lemmas give a clean test for this, based on adding up the chances. Take the probability of the event on trial one, plus the probability on trial two, plus the probability on trial three, and so on forever. If this running total stays finite, the first lemma says the event happens only finitely many times, with certainty: past some point it never happens again. The picture to keep is rainfall. If the chance of rain on day $n$ shrinks fast enough that the total expected number of rainy days is finite, then with certainty only finitely many rainy days ever occur.

The second lemma is a partial converse, and it needs one extra ingredient: the events must not influence each other, the property called independence. If the running total of probabilities adds up to infinity and the events are independent, the second lemma says the event happens infinitely often, again with certainty. So for independent events the test is sharp: a finite total means finitely many occurrences, an infinite total means infinitely many.

There is a striking companion fact. For an independent sequence, any event whose truth does not depend on the first ten trials, or the first thousand, or the first million, no matter how large the starting block you ignore, must have probability either zero or one. Such an event is called a tail event, and the statement that every tail event is certain or impossible is the Kolmogorov zero-one law. There is no middle ground, no genuine fifty-fifty for these long-run questions: the answer is already decided, you just may not know which way.

Visual Beginner

The diagram below contrasts the two regimes side by side: a finite sum of probabilities forces the event to die out, while an infinite sum of independent probabilities forces it to recur without end.

The top row is the first Borel-Cantelli lemma: a finite total of probabilities pins the occurrences down to a finite set. The bottom row is the second lemma: independence plus an infinite total forces endless recurrence. The side box is the Kolmogorov zero-one law, which says any question about only the far end of the sequence has a yes-or-no answer locked in advance.

Worked example Beginner

We test recurrence in two sequences using the sum-of-probabilities idea.

Step 1. First sequence. Suppose event $A_{n}$ has probability $1/ n^{2}$ on trial $n$ . The running total of probabilities is $1/1 + 1/4 + 1/9 + 1/16 + \dots$ , the sum of reciprocal squares. This famous sum settles down to a finite number near $1.645$ . Because the total is finite, the first Borel-Cantelli lemma applies, and we conclude that with certainty only finitely many of the events $A_{n}$ occur. Past some trial, the event stops happening for good.

Step 2. Second sequence. Suppose instead event $B_{n}$ has probability $1/ n$ on trial $n$ , and suppose the events are independent. The running total is $1/1 + 1/2 + 1/3 + 1/4 + \dots$ , the harmonic sum, which grows without bound: it eventually passes any number you name. Because the total is infinite and the events are independent, the second Borel-Cantelli lemma applies, and we conclude that with certainty infinitely many of the events $B_{n}$ occur. No matter how far out you look, more occurrences are still to come.

Step 3. Compare. The two sequences differ only in how fast the probabilities shrink. Probability $1/ n^{2}$ shrinks fast enough for a finite total, so the events die out. Probability $1/ n$ shrinks too slowly, the total is infinite, and with independence the events recur forever. The boundary between dying out and recurring forever is exactly the boundary between a finite and an infinite total of probabilities.

What this tells us: to decide whether an event recurs forever, add up its probabilities across all trials. A finite total means it stops; an infinite total, for independent events, means it never stops.

Check your understanding Beginner

Exercise (easy, multiple choice).

The first Borel-Cantelli lemma says that if the total of the probabilities of the events $A_{n}$ is finite, then:

A. The events occur infinitely often with certainty B. The events occur only finitely many times with certainty C. The events are independent D. The probability of each event is zero

Hint

A finite running total means mass cannot keep accumulating, so the events cannot keep recurring.

Answer

B. The events occur only finitely many times with certainty. The first Borel-Cantelli lemma needs no independence: a finite total of probabilities forces, with probability one, that only finitely many of the events occur. Feedback-correct: a finite sum of probabilities pins the occurrences to a finite set almost surely. Feedback-wrong: A is the conclusion of the second lemma and needs both an infinite total and independence; C is a hypothesis of the second lemma, not a conclusion of the first; D is false, since individual probabilities can be positive while the total stays finite.

Formal definition Intermediate+

Let $(Ω, F, P)$ be a probability space: a measurable space $(Ω, F)$ together with a measure $P$ with $P (Ω) = 1$ 02.07.01, 26.02.01. The symbol $F$ denotes the $σ$ -algebra of events; $A^{c}$ denotes the complement $Ω ∖ A$ ; $⋃$ and $⋂$ denote countable union and intersection of events.

Definition (limit superior and limit inferior of events). For a sequence of events $A_{n} \in F$ , the limit superior and limit inferior are $n \to \infty lim sup A_{n} = n = 1 ⋂ \infty k = n ⋃ \infty A_{k}, n \to \infty lim inf A_{n} = n = 1 ⋃ \infty k = n ⋂ \infty A_{k} .$ The event $lim sup_{n} A_{n}$ is the set of outcomes belonging to $A_{k}$ for infinitely many $k$ , written ${A_{n} i.o.}$ (read " $A_{n}$ infinitely often"). The event $lim inf_{n} A_{n}$ is the set of outcomes belonging to $A_{k}$ for all but finitely many $k$ , written ${A_{n} ev.}$ (read " $A_{n}$ eventually"). Each is measurable, being built from countable unions and intersections of events. The complement relation $(lim sup_{n} A_{n})^{c} = lim inf_{n} A_{n}^{c}$ holds by De Morgan's laws.

Definition (independence of events). A finite family $A_{1}, \dots, A_{m} \in F$ is independent if for every subset $S \subseteq {1, \dots, m}$ , $P (⋂_{i \in S} A_{i}) = \prod_{i \in S} P (A_{i})$ . An infinite family ${A_{n}}$ is independent if every finite subfamily is.

Definition (independence of $σ$ -algebras). A sequence of sub- $σ$ -algebras $G_{1}, G_{2}, \dots$ of $F$ is independent if for every finite choice of indices $n_{1} < \dots < n_{m}$ and events $G_{i} \in G_{n_{i}}$ , $P (⋂_{i = 1}^{m} G_{i}) = \prod_{i = 1}^{m} P (G_{i})$ .

Definition (tail $σ$ -algebra). Given a sequence of $σ$ -algebras $G_{1}, G_{2}, \dots$ , the tail $σ$ -algebra is $T = n = 1 ⋂ \infty σ (G_{n}, G_{n + 1}, \dots),$ where $σ (G_{n}, G_{n + 1}, \dots)$ is the $σ$ -algebra generated by all events from index $n$ onward. An event $T \in T$ is a tail event: its membership is unaffected by altering any finite block of the underlying coordinates. A probability measure is $P$ -degenerate on $T$ when $P (T) \in {0, 1}$ for every $T \in T$ .

Counterexamples to common slips Intermediate+

The second lemma genuinely needs independence. Take a single event $A$ with $0 < P (A) < 1$ and set $A_{n} = A$ for every $n$ . The total $\sum_{n} P (A_{n}) = \infty$ , but ${A_{n} i.o.} = A$ , so $P ({A_{n} i.o.}) = P (A)$ , which is neither $0$ nor $1$ . Constant repetition is maximally dependent, and the conclusion fails. Independence is what upgrades a divergent sum into certain recurrence.
The first lemma needs no independence. The finite-sum direction holds for arbitrary events, dependent or not; its proof uses only countable subadditivity. Adding an independence hypothesis to the first lemma is harmless but unnecessary.
Pairwise independence is not enough for the cleanest tail statements but suffices for the second lemma. The second Borel-Cantelli lemma holds under pairwise independence of the events (Erdős-Rényi), though the textbook proof assumes full independence; the Kolmogorov zero-one law, by contrast, genuinely uses independence of the generating $σ$ -algebras.
A tail event is not merely an event of small probability. Tail membership is a structural condition (insensitivity to finite blocks), not a numerical one. The event "the first toss is heads" is not a tail event even though one can compute its probability; the event "the running averages converge" is a tail event.
Almost-sure constancy is not pointwise constancy. The zero-one law makes a tail event certain or impossible and makes a tail-measurable random variable almost surely equal to a constant, but the variable can still differ from that constant on a null set. The conclusion lives modulo $P$ -null sets.

Key theorem with proof Intermediate+

Theorem (Borel-Cantelli lemmas; Borel 1909, Cantelli 1917). Let ${A_{n}}$ be events in $(Ω, F, P)$ .

(i) (First lemma.) If $\sum_{n = 1}^{\infty} P (A_{n}) < \infty$ , then $P ({A_{n} i.o.}) = 0$ .

(ii) (Second lemma.) If the $A_{n}$ are independent and $\sum_{n = 1}^{\infty} P (A_{n}) = \infty$ , then $P ({A_{n} i.o.}) = 1$ .

Proof. For (i), recall ${A_{n} i.o.} = ⋂_{n} ⋃_{k \geq n} A_{k} \subseteq ⋃_{k \geq n} A_{k}$ for every $n$ . By countable subadditivity of $P$ , $P ({A_{n} i.o.}) \leq P (k \geq n ⋃ A_{k}) \leq k = n \sum \infty P (A_{k}) .$ The right side is the tail of a convergent series, so it tends to $0$ as $n \to \infty$ . The left side does not depend on $n$ , so $P ({A_{n} i.o.}) = 0$ .

For (ii), it suffices to show $P (⋃_{k \geq n} A_{k}) = 1$ for every $n$ , since then ${A_{n} i.o.} = ⋂_{n} ⋃_{k \geq n} A_{k}$ is a countable intersection of probability-one events, hence has probability one. Equivalently we show $P (⋂_{k \geq n} A_{k}^{c}) = 0$ . Fix $n$ and any $m > n$ . By independence of the $A_{k}$ , the complements $A_{k}^{c}$ are independent, so $P (k = n ⋂ m A_{k}^{c}) = k = n \prod m (1 - P (A_{k})) .$ Apply the inequality $1 - x \leq e^{- x}$ , valid for all real $x$ , to each factor: $k = n \prod m (1 - P (A_{k})) \leq k = n \prod m e^{- P (A_{k})} = exp (- k = n \sum m P (A_{k})) .$ Because $\sum_{k} P (A_{k}) = \infty$ , the partial sums $\sum_{k = n}^{m} P (A_{k}) \to \infty$ as $m \to \infty$ , so the bound tends to $0$ . Hence $P (⋂_{k \geq n} A_{k}^{c}) = lim_{m \to \infty} P (⋂_{k = n}^{m} A_{k}^{c}) = 0$ , using continuity from above of $P$ . Therefore $P (⋃_{k \geq n} A_{k}) = 1$ for every $n$ , and $P ({A_{n} i.o.}) = 1$ .

$□$

Bridge. The two lemmas together build toward a sharp dichotomy for independent sequences: the probability $P ({A_{n} i.o.})$ is exactly $0$ or $1$ according as $\sum_{n} P (A_{n})$ converges or diverges, and this zero-one alternative appears again in the Kolmogorov 0-1 law as the general principle that ${A_{n} i.o.}$ is a tail event. The foundational reason the recurrence probability cannot land strictly between $0$ and $1$ is that ${A_{n} i.o.}$ does not change when any finite block of the $A_{n}$ is deleted, so it carries no information localized to early trials; for independent events this insensitivity forces the zero-one collapse. The bridge is from the convergence-of-series test, a deterministic statement about the numbers $P (A_{n})$ , to a probabilistic certainty about the path of occurrences, and putting these together is exactly what makes the first Borel-Cantelli lemma the workhorse behind almost-sure convergence proofs (a summable sequence of tail probabilities forces almost-sure convergence) and what makes the second lemma, generalised by the Kolmogorov law, the source of the strong law of large numbers and the record-value and radius-of-convergence dichotomies developed below.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Let ${X_{n}}$ be random variables (not necessarily independent) with $\sum_{n} P (∣ X_{n} ∣ > ε) < \infty$ for every $ε > 0$ . Prove that $X_{n} \to 0$ almost surely.

Hint

For fixed $ε$ , apply the first Borel-Cantelli lemma to $A_{n}^{ε} = {∣ X_{n} ∣ > ε}$ . Then intersect over a sequence $ε ↓ 0$ .

Answer

Fix $ε > 0$ and set $A_{n}^{ε} = {∣ X_{n} ∣ > ε}$ . Since $\sum_{n} P (A_{n}^{ε}) < \infty$ , the first Borel-Cantelli lemma gives $P (A_{n}^{ε} i.o.) = 0$ . On the complement, an event of probability one, only finitely many $∣ X_{n} ∣$ exceed $ε$ , so $lim sup_{n} ∣ X_{n} ∣ \leq ε$ .

Now take $ε = 1/ m$ for $m = 1, 2, \dots$ . The intersection $⋂_{m} {lim sup_{n} ∣ X_{n} ∣ \leq 1/ m}$ is a countable intersection of probability-one events, hence has probability one, and on it $lim sup_{n} ∣ X_{n} ∣ = 0$ , i.e. $X_{n} \to 0$ . This is the standard route from a summable-tail-probability hypothesis to almost-sure convergence, and it shows that the first lemma alone (no independence) drives a.s.-convergence criteria.

Exercise 5 (medium, numeric).

Toss a fair coin independently forever and consider runs of consecutive heads. Let $A_{n}$ be the event that tosses $n, n + 1, \dots, n + ⌊ lo g_{2} n ⌋ - 1$ are all heads (a head-run of length $⌊ lo g_{2} n ⌋$ starting at position $n$ ). Decide whether $P (A_{n} i.o.)$ is $0$ or $1$ .

Hint

$P (A_{n}) = 2^{- ⌊ l o g_{2} n ⌋} \approx 1/ n$ . Compare $\sum_{n} P (A_{n})$ to the harmonic series. The events overlap but pairwise/asymptotic independence along a sparse subsequence still lets the second lemma apply.

Answer

$1$ . Here $P (A_{n}) = 2^{- ⌊ l o g_{2} n ⌋}$ , which lies between $1/ n$ and $2/ n$ , so $\sum_{n} P (A_{n}) = \infty$ . The events $A_{n}$ overlap and are not independent as stated, but restricting to a sparse subsequence $n_{j}$ spaced so the blocks are disjoint yields independent events with $\sum_{j} P (A_{n_{j}}) = \infty$ , and the second Borel-Cantelli lemma applied to that subsequence gives $P (A_{n_{j}} i.o.) = 1$ , hence $P (A_{n} i.o.) = 1$ . Arbitrarily long head-runs recur forever almost surely; the answer reported is $1$ .

Exercise 6 (hard, symbolic).

Let ${a_{n}}$ be independent random coefficients with $P (a_{n} = 1) = P (a_{n} = - 1) = 1/2$ , and form the random power series $\sum_{n} a_{n} z^{n}$ . Use a zero-one argument to show the radius of convergence $R$ is almost surely a constant, and compute it.

Hint

$1/ R = lim sup_{n} ∣ a_{n} ∣^{1/ n}$ by the Cauchy-Hadamard formula. Argue $R$ is a tail random variable, so it is a.s. constant; then evaluate the limsup using $∣ a_{n} ∣ = 1$ .

Answer

By the Cauchy-Hadamard formula, $1/ R = lim sup_{n} ∣ a_{n} ∣^{1/ n}$ . The value of this limit superior is unchanged if any finite block of the $a_{n}$ is altered, so $R$ is measurable with respect to the tail $σ$ -algebra of the independent sequence ${a_{n}}$ . By the Kolmogorov zero-one law, every tail event has probability $0$ or $1$ , so the tail random variable $R$ is almost surely equal to a constant $r$ : for each threshold $t$ , ${R \leq t}$ is a tail event with probability $0$ or $1$ , and $r = sup {t : P (R \leq t) = 0}$ .

To evaluate $r$ : since $∣ a_{n} ∣ = 1$ for every $n$ , $∣ a_{n} ∣^{1/ n} = 1$ , so $lim sup_{n} ∣ a_{n} ∣^{1/ n} = 1$ surely, giving $1/ R = 1$ and $R = 1$ almost surely. The radius of convergence of a random sign series is the deterministic value $1$ , an instance of the general principle that tail-measurable quantities of independent sequences degenerate to constants.

Exercise 7 (hard, symbolic).

Let ${X_{n}}$ be independent random variables. Show that the event ${\sum_{n} X_{n} converges}$ is a tail event, and conclude that the series converges almost surely or diverges almost surely.

Hint

Convergence of a series is unaffected by changing finitely many terms. Express the convergence event in terms of the tail sums.

Answer

Convergence of $\sum_{n} X_{n}$ is equivalent to convergence of the tail $\sum_{n \geq N} X_{n}$ for any fixed $N$ , because the two differ by the finite sum $\sum_{n < N} X_{n}$ , a finite real number that does not affect whether the series converges. Hence the event $C = {\sum_{n} X_{n} converges}$ is measurable with respect to $σ (X_{N}, X_{N + 1}, \dots)$ for every $N$ , so $C \in ⋂_{N} σ (X_{N}, X_{N + 1}, \dots) = T$ , the tail $σ$ -algebra.

The $X_{n}$ are independent, so the generated $σ$ -algebras $σ (X_{n})$ are independent, and the Kolmogorov zero-one law applies to $T$ : $P (C) \in {0, 1}$ . Therefore the random series $\sum_{n} X_{n}$ converges almost surely or diverges almost surely, with no intermediate possibility. The actual dichotomy is decided by the Kolmogorov three-series theorem, but the zero-one structure of the answer is forced by the tail-event argument alone.

Exercise 8 (hard, symbolic).

Prove that the second Borel-Cantelli lemma holds under the weaker hypothesis of pairwise independence: if the $A_{n}$ are pairwise independent with $\sum_{n} P (A_{n}) = \infty$ , then $P (A_{n} i.o.) = 1$ .

Hint

Let $S_{N} = \sum_{n = 1}^{N} 1_{A_{n}}$ count occurrences. Compute $E [S_{N}]$ and $Var (S_{N})$ using pairwise independence, then apply the second-moment (Paley-Zygmund or Chebyshev) method.

Answer

Let $S_{N} = \sum_{n = 1}^{N} 1_{A_{n}}$ , so $m_{N} := E [S_{N}] = \sum_{n = 1}^{N} P (A_{n}) \to \infty$ . Pairwise independence gives $Cov (1_{A_{i}}, 1_{A_{j}}) = 0$ for $i \neq = j$ , so the variance is the sum of individual variances: $Var (S_{N}) = n = 1 \sum N Var (1_{A_{n}}) = n = 1 \sum N P (A_{n}) (1 - P (A_{n})) \leq n = 1 \sum N P (A_{n}) = m_{N} .$ By Chebyshev's inequality, $P (∣ S_{N} - m_{N} ∣ \geq m_{N} /2) \leq Var (S_{N}) / (m_{N} /2)^{2} \leq 4 m_{N} / m_{N}^{2} = 4/ m_{N} \to 0$ . Hence $P (S_{N} \geq m_{N} /2) \to 1$ , so $P (S_{N} \to \infty) = 1$ : with probability one the count of occurrences diverges, which is exactly $P (A_{n} i.o.) = 1$ .

This second-moment argument (Erdős-Rényi 1959) shows full independence is more than is needed; pairwise independence and a divergent mean suffice. The contrast with the Kolmogorov zero-one law is instructive: the law genuinely uses independence of generating $σ$ -algebras and does not survive pairwise weakening.

Advanced results Master

Theorem 1 (Kolmogorov zero-one law; Kolmogorov 1933). Let $G_{1}, G_{2}, \dots$ be independent sub- $σ$ -algebras of $(Ω, F, P)$ , and let $T = ⋂_{n} σ (G_{n}, G_{n + 1}, \dots)$ be the tail $σ$ -algebra. Then $T$ is $P$ -degenerate: $P (T) \in {0, 1}$ for every $T \in T$ . Consequently every $T$ -measurable random variable is almost surely constant ^{[Kolmogorov 1933]}.

Proof. Write $F_{n} = σ (G_{1}, \dots, G_{n})$ for the head $σ$ -algebras and $T_{n} = σ (G_{n + 1}, G_{n + 2}, \dots)$ for the strict tails. By the definition of independence of the $G_{i}$ , the head $F_{n}$ is independent of $T_{n}$ for each $n$ (any event in $F_{n}$ depends only on indices $\leq n$ , any event in $T_{n}$ only on indices $> n$ , and independence of the blocks is inherited from independence of the generators through a $π$ -system argument). Since $T \subseteq T_{n}$ for every $n$ , the head $F_{n}$ is independent of $T$ .

The union $⋃_{n} F_{n}$ is a $π$ -system (it is increasing, hence closed under finite intersection) generating $F_{\infty} = σ (G_{1}, G_{2}, \dots)$ . Each $F_{n}$ is independent of $T$ , so the generated $σ$ -algebra $F_{\infty}$ is independent of $T$ by the $π$ - $λ$ (Dynkin) theorem. But $T \subseteq F_{\infty}$ , so $T$ is independent of itself. For any $T \in T$ , self-independence gives $P (T) = P (T \cap T) = P (T) P (T) = P (T)^{2}$ , whence $P (T) \in {0, 1}$ . A $T$ -measurable random variable $Y$ then has $P (Y \leq t) \in {0, 1}$ for every $t$ ; the cumulative distribution jumps from $0$ to $1$ at the single value $c = in f {t : P (Y \leq t) = 1}$ , so $Y = c$ almost surely. $□$

Theorem 2 (sharp recurrence dichotomy). For independent events ${A_{n}}$ , $P (A_{n} i.o.) = {01 if \sum_{n} P (A_{n}) < \infty, if \sum_{n} P (A_{n}) = \infty.$ This is the combination of the two Borel-Cantelli lemmas, and it is consistent with Theorem 1: ${A_{n} i.o.}$ is a tail event of the independent $σ$ -algebras $σ (A_{n})$ , so its probability must be $0$ or $1$ regardless, and the sum criterion identifies which.

Theorem 3 (Hewitt-Savage zero-one law; Hewitt-Savage 1955). If $X_{1}, X_{2}, \dots$ are independent and identically distributed, then every exchangeable event (an event invariant under finite permutations of the coordinates) has probability $0$ or $1$ . The exchangeable $σ$ -algebra contains the tail $σ$ -algebra, so Hewitt-Savage strictly strengthens Kolmogorov for the i.i.d. case; events such as ${\sum_{n} X_{n} > 0 i.o.}$ are exchangeable but not tail, and Hewitt-Savage settles them at $0$ or $1$ where Kolmogorov is silent ^{[Kolmogorov 1933]}.

Theorem 4 (record values; Rényi 1962). Let $X_{1}, X_{2}, \dots$ be i.i.d. with a continuous distribution, and let $R_{n} = {X_{n} > max (X_{1}, \dots, X_{n - 1})}$ be the event that the $n$ -th observation is a record. Then $R_{1}, R_{2}, \dots$ are independent with $P (R_{n}) = 1/ n$ , and since $\sum_{n} 1/ n = \infty$ , the second Borel-Cantelli lemma gives $P (R_{n} i.o.) = 1$ : records occur infinitely often almost surely. Moreover the number of records among the first $N$ observations has mean $\sum_{n = 1}^{N} 1/ n = ln N + O (1)$ , so records thin out logarithmically yet never cease.

Theorem 5 (Borel's normal number theorem; Borel 1909). Almost every real number in $[0, 1]$ is normal in base $2$ : the asymptotic frequency of the digit $1$ in its binary expansion equals $1/2$ . The proof realises the binary digits as i.i.d. fair coin tosses under Lebesgue measure, applies the strong law of large numbers (itself proved by a Borel-Cantelli truncation argument), and the exceptional non-normal numbers form a tail-type Lebesgue-null set. This was the original application that motivated Borel's 1909 paper and the lemma now bearing his name ^{[Borel 1909]}.

Theorem 6 (Kochen-Stone refinement; Kochen-Stone 1964). Without any independence, if $\sum_{n} P (A_{n}) = \infty$ then $P (A_{n} i.o.) \geq N \to \infty lim sup \frac{( \sum _{n = 1}^{N} P ( A _{n} ) ) ^{2}}{\sum _{1 \leq i, j \leq N} P ( A _{i} \cap A _{j} )} .$ When the events are independent the denominator's off-diagonal terms factor and the bound returns $1$ , recovering the second lemma; for weakly dependent events it gives a positive lower bound on the recurrence probability where the bare second lemma does not apply.

Synthesis. The Borel-Cantelli-Kolmogorov circle is the foundational reason that long-run questions about independent sequences admit only certain or impossible answers, and the central insight is that the event ${A_{n} i.o.}$ , the radius of convergence of a random series, and the convergence of a random series are all tail-measurable, so each is forced to collapse to a constant. This is exactly the mechanism that generalises in three directions at once. First, the recurrence dichotomy of Theorem 2 is dual to the convergence-of-series test: a deterministic sum criterion on the numbers $P (A_{n})$ is converted into a probabilistic certainty, and putting these together yields the strong law of large numbers through Borel's normal-number argument and its truncation refinements. Second, the tail-degeneracy of Theorem 1 generalises to the exchangeable zero-one law of Hewitt-Savage and, in the dependent regime, relaxes to the Kochen-Stone lower bound, the bridge being the second-moment method that already appeared in the pairwise-independent strengthening of the second lemma. Third, the first lemma's summable-tail-probability criterion is the load-bearing step behind almost-sure convergence proofs throughout probability: martingale convergence and the law of the iterated logarithm route their hardest steps through a Borel-Cantelli estimate. The central insight, restated, is that insensitivity to finite data plus independence equals certainty, recurring downstream in ergodic theory (the tail $σ$ -algebra is the germ of the invariant $σ$ -algebra) and in random-graph theory (zero-one laws for monotone connectivity events).

Full proof set Master

Proposition 1 (independence of complements and the i.o. event is a tail event). If ${A_{n}}$ are independent events, then ${A_{n} i.o.}$ lies in the tail $σ$ -algebra of ${σ (A_{n})}$ , and its complement ${A_{n} ev.}$ likewise.

Proof. For each $m$ , the event $⋃_{k \geq m} A_{k}$ is measurable with respect to $σ (A_{m}, A_{m + 1}, \dots)$ . Hence ${A_{n} i.o.} = n \geq 1 ⋂ k \geq n ⋃ A_{k} = n \geq m ⋂ k \geq n ⋃ A_{k}$ for every fixed $m$ (the intersection over $n \geq 1$ equals the intersection over $n \geq m$ because the sets $⋃_{k \geq n} A_{k}$ decrease in $n$ , so the early terms are redundant). The right side is $σ (A_{m}, A_{m + 1}, \dots)$ -measurable for every $m$ , so ${A_{n} i.o.} \in ⋂_{m} σ (A_{m}, A_{m + 1}, \dots) = T$ . The complement ${A_{n} ev.} = {A_{n}^{c} i.o.}^{c}$ is then also tail-measurable since $T$ is a $σ$ -algebra. $□$

Proposition 2 (independence of head and strict tail). With $F_{n} = σ (A_{1}, \dots, A_{n})$ and $T_{n} = σ (A_{n + 1}, A_{n + 2}, \dots)$ for independent events ${A_{n}}$ , the $σ$ -algebras $F_{n}$ and $T_{n}$ are independent.

Proof. The collection of events of the form $A_{i_{1}} \cap \dots \cap A_{i_{p}}$ with $i_{1} < \dots < i_{p} \leq n$ together with the whole space forms a $π$ -system generating $F_{n}$ ; similarly finite intersections $A_{j_{1}} \cap \dots \cap A_{j_{q}}$ with $n < j_{1} < \dots < j_{q}$ generate $T_{n}$ and form a $π$ -system. By the definition of independence of the family ${A_{n}}$ , any event from the first $π$ -system and any from the second satisfy $P (E \cap F) = P (E) P (F)$ , because the indices are disjoint and all factor. Two $σ$ -algebras whose generating $π$ -systems are independent are themselves independent, by the Dynkin $π$ - $λ$ theorem applied twice (fix $F$ and let the $λ$ -system be ${E : P (E \cap F) = P (E) P (F)}$ , then fix $E \in F_{n}$ and repeat). $□$

Proposition 3 (self-independence forces the zero-one collapse). If a $σ$ -algebra $T$ is independent of itself under $P$ , then $P (T) \in {0, 1}$ for every $T \in T$ .

Proof. Self-independence means $P (T_{1} \cap T_{2}) = P (T_{1}) P (T_{2})$ for all $T_{1}, T_{2} \in T$ . Taking $T_{1} = T_{2} = T$ gives $P (T) = P (T \cap T) = P (T)^{2}$ , so $P (T)^{2} - P (T) = 0$ , i.e. $P (T) (P (T) - 1) = 0$ , forcing $P (T) = 0$ or $P (T) = 1$ . $□$

Proposition 4 (a.s. constancy of a tail variable). If $Y$ is measurable with respect to a $P$ -degenerate $σ$ -algebra $T$ , then there is a constant $c \in [- \infty, \infty]$ with $Y = c$ almost surely.

Proof. For each $t \in R$ , the event ${Y \leq t}$ lies in $T$ , so $F (t) := P (Y \leq t) \in {0, 1}$ . The function $F$ is non-decreasing, right-continuous, takes only the values $0$ and $1$ , and (if $Y$ is real-valued) tends to $0$ as $t \to - \infty$ and to $1$ as $t \to + \infty$ . Such a function is the indicator-type step $F (t) = 1 [t \geq c]$ for $c = in f {t : F (t) = 1}$ . Then $P (Y < c) = 0$ and $P (Y \leq c) = 1$ , so $P (Y = c) = 1$ . $□$

Proposition 5 (first lemma is sharp without the second's hypothesis being necessary for $0$ ). Even without independence, $\sum_{n} P (A_{n}) = \infty$ does not determine $P (A_{n} i.o.)$ , but $\sum_{n} P (A_{n}) < \infty$ always yields $P (A_{n} i.o.) = 0$ . There is no missing converse to the first lemma in the dependent case.

Proof. The first lemma's proof used only subadditivity and the convergent-tail estimate, neither of which references independence, so the implication $\sum_{n} P (A_{n}) < \infty \Rightarrow P (A_{n} i.o.) = 0$ holds for arbitrary events. For the failure of a converse, the constant sequence $A_{n} = A$ with $0 < P (A) < 1$ has $\sum_{n} P (A_{n}) = \infty$ yet $P (A_{n} i.o.) = P (A) \in / {0, 1}$ ; and a sequence $A_{n}$ with $\sum_{n} P (A_{n}) = \infty$ arranged so that occurrences are forced to coincide can even give $P (A_{n} i.o.) = 0$ when later events are contained in the failure of earlier ones. Thus the divergence half is genuinely conditional on an independence-type hypothesis, while the convergence half is unconditional. $□$

Connections Master

The first Borel-Cantelli lemma is the engine behind almost-sure convergence criteria built on summable tail probabilities, and it feeds directly into the strong law of large numbers and the truncation arguments that establish it; the strong law's measure-theoretic backbone is the integration theory of 02.07.05.

The tail-degeneracy mechanism of the Kolmogorov 0-1 law reappears as the load-bearing structural fact in the construction of conditional expectation and martingale convergence, where uniformly integrable martingales converge almost surely to a tail-measurable limit; the relevant convergence theorems are the dominated and Fatou results of 02.07.05.

The measurability of the limsup and liminf of events, and the $π$ - $λ$ machinery that proves independence of generated $σ$ -algebras, are pure consequences of the $σ$ -algebra axioms developed in 02.07.01; the present unit specialises that abstract apparatus to the independence setting.

The recurrence dichotomy and the normal-number theorem are the probabilistic face of the elementary probability rules and distributions in 26.02.01, lifting finite-sample statements to almost-sure limit statements about infinite sequences.

The zero-one phenomenon generalises to the Hewitt-Savage law for i.i.d. exchangeable events and onward to the invariant $σ$ -algebra of ergodic theory, where tail-degeneracy is the germ of ergodicity; this connects to the strong law via Birkhoff's ergodic theorem 26.02.01.

Historical & philosophical context Master

Émile Borel introduced the convergence half of the lemma in his 1909 paper Les probabilités dénombrables et leurs applications arithmétiques (Rendiconti del Circolo Matematico di Palermo 27, 247-271), where the motivating application was the metric theory of continued fractions and the normality of almost every real number ^{[Borel 1909]}. Borel's treatment of the divergence half was incomplete by modern standards; Francesco Paolo Cantelli supplied a rigorous account of the independence-driven converse in 1917 (Atti della Reale Accademia Nazionale dei Lincei 26:1, 39-45), and the paired statement has carried both names since.

Andrey Kolmogorov placed the zero-one law within his axiomatic foundation of probability in the 1933 monograph Grundbegriffe der Wahrscheinlichkeitsrechnung (Springer; English translation Foundations of the Theory of Probability, Chelsea 1956), §IV.6, where the tail $σ$ -algebra is identified as the carrier of asymptotic events and shown to collapse to ${0, 1}$ under independence ^{[Kolmogorov 1933]}. The law was the decisive technical instrument that made the strong law of large numbers a theorem about a fixed probability space rather than a limiting statement about finite samples. Edwin Hewitt and Leonard Savage extended the zero-one collapse to exchangeable events for i.i.d. sequences in 1955 (Transactions of the American Mathematical Society 80, 470-501), and Alfréd Rényi developed the record-value theory and the second-moment generalisations in the early 1960s.

Bibliography Master

@article{Borel1909,
  author  = {Borel, {\'E}mile},
  title   = {Les probabilit{\'e}s d{\'e}nombrables et leurs applications arithm{\'e}tiques},
  journal = {Rendiconti del Circolo Matematico di Palermo},
  volume  = {27},
  pages   = {247--271},
  year    = {1909}
}

@article{Cantelli1917,
  author  = {Cantelli, Francesco Paolo},
  title   = {Sulla probabilit{\`a} come limite della frequenza},
  journal = {Atti della Reale Accademia Nazionale dei Lincei},
  volume  = {26},
  number  = {1},
  pages   = {39--45},
  year    = {1917}
}

@book{Kolmogorov1933,
  author    = {Kolmogorov, Andrey N.},
  title     = {Grundbegriffe der Wahrscheinlichkeitsrechnung},
  publisher = {Springer},
  year      = {1933},
  note      = {English translation: Foundations of the Theory of Probability, Chelsea, 1956}
}

@article{HewittSavage1955,
  author  = {Hewitt, Edwin and Savage, Leonard J.},
  title   = {Symmetric measures on Cartesian products},
  journal = {Transactions of the American Mathematical Society},
  volume  = {80},
  pages   = {470--501},
  year    = {1955}
}

@article{KochenStone1964,
  author  = {Kochen, Simon and Stone, Charles},
  title   = {A note on the Borel-Cantelli lemma},
  journal = {Illinois Journal of Mathematics},
  volume  = {8},
  pages   = {248--251},
  year    = {1964}
}

@book{Durrett2019,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

Prerequisites

02.07.01
02.07.05
26.02.01

Tier anchors

beginner: Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) §2.3; the infinitely-often picture and the sum-of-probabilities test
intermediate: Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) §2.3, §2.5; Williams 1991 *Probability with Martingales* (Cambridge) §2.7, §4.11
master: Kallenberg 2021 *Foundations of Modern Probability* 3e (Springer) Ch. 4; Billingsley 1995 *Probability and Measure* 3e (Wiley) §4; Durrett 2019 §2.3-2.5

References

Borel — Les probabilités dénombrables et leurs applications arithmétiques · Rendiconti del Circolo Matematico di Palermo 27 (1909), 247-271
Cantelli — Sulla probabilità come limite della frequenza · Atti della Reale Accademia Nazionale dei Lincei 26:1 (1917), 39-45
Kolmogorov — Grundbegriffe der Wahrscheinlichkeitsrechnung · Ergebnisse der Mathematik, Springer 1933; English: Foundations of the Theory of Probability, Chelsea 1956, §IV.6 (the zero-one law)
Durrett — Probability: Theory and Examples, 5e · §2.3 (Borel-Cantelli lemmas), §2.5 (Kolmogorov's 0-1 law)
Williams — Probability with Martingales · §2.7 (first Borel-Cantelli), §4.11 (second Borel-Cantelli, Kolmogorov 0-1 law)
Kallenberg — Foundations of Modern Probability, 3e · Ch. 4, Independence and zero-one laws

Estimated time

beginner: 18m
intermediate: 55m
master: 90m