37.02.03 · probability / 02-independence-laws-of-large-numbers

The Ergodic Theorems: Birkhoff, von Neumann, and Kingman

shipped3 tiersLean: none

Anchor (Master): Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) Ch. 6; Kallenberg 2002 *Foundations of Modern Probability* 2e (Springer) Ch. 10 (ergodic theorems, subadditivity); Krengel 1985 *Ergodic Theorems* (de Gruyter) Ch. 1, Ch. 6 (the subadditive theorem); Steele 1997 *Probability Theory and Combinatorial Optimization* (SIAM) Ch. 1 (Kingman applied to longest increasing subsequences)

Intuition Beginner

The strong law of large numbers handles a single, very clean situation: you average a long run of independent identical trials, and the average settles to the expected value. But the world is full of sequences that are not independent. The temperature on successive days, the position of a particle bouncing inside a box, the letters in a long stretch of English text — each entry depends on the ones before it. The ergodic theorems are the answer to a simple question: when does the running average of a dependent sequence still settle down to a fixed number?

The key idea is stationarity. A sequence is stationary when its statistical behaviour does not depend on where you start the clock: the chunk of the sequence you see between day 100 and day 200 looks, in distribution, exactly like the chunk between day 1000 and day 1100. Time has no preferred origin. Many real sequences with strong dependence are still stationary in this sense, and stationarity turns out to be the right replacement for independence.

Picture a stationary sequence as a single system evolving by one fixed rule, like a marble rolling on a frictionless table that reflects off the walls. Applying the rule shifts the whole system forward one step, and because the rule preserves the underlying "size" of regions, the long-run behaviour stays balanced. Birkhoff's theorem says: track any measurement of such a system and its time-average over a long run converges. Whether it converges to one universal number, or to a number depending on which run you got, is decided by whether the system is ergodic — whether it eventually visits everywhere, mixing the whole space rather than staying trapped in one region.

The takeaway: the ergodic theorems extend the law of large numbers from independent sequences to stationary ones, replacing the expected value by a time-average that converges for almost every run, and equals one universal number exactly when the system is ergodic.

Visual Beginner

Picture a single point hopping around the surface of a doughnut by a fixed rule, leaving a trail. If the rule is ergodic, the trail eventually covers the whole surface evenly; the fraction of time spent over any patch matches the area of that patch.

The winding trail is one long run of the system; the shaded patch is a region you might measure. The bars show that two different runs reach the same long-run average when the system is ergodic. If the system were not ergodic, the surface would split into separate zones and different runs could settle on different averages.

Worked example Beginner

We compute a time-average for the simplest interesting stationary system: rotating a circle by a fixed angle and tracking how often a point lands in the right half.

Step 1. The system. Take the circle as the numbers from $0$ up to $1$ , wrapping around so that $1$ is the same point as $0$ . The rule is: add $a = 0.3$ and wrap around. Starting from $0.1$ , the orbit is $0.1, 0.4, 0.7, 1.0$ which wraps to $0.0$ , then $0.3, 0.6, 0.9, 0.2, 0.5, 0.8$ , and so on.

Step 2. The measurement. We measure $1$ when the point is in the right half (between $0.5$ and $1$ ) and $0$ otherwise. The first ten positions $0.1, 0.4, 0.7, 0.0, 0.3, 0.6, 0.9, 0.2, 0.5, 0.8$ give measurements $0, 0, 1, 0, 0, 1, 1, 0, 1, 1$ .

Step 3. The running average. Summing those ten measurements gives $5$ , so the average over the first ten steps is $5/10 = 0.5$ . Continue for many more steps and the average stays near $0.5$ .

Step 4. What the theorem predicts. Because $0.3$ is not a simple fraction of the circle in a way that makes the orbit repeat, the orbit spreads out evenly over the whole circle. The right half is exactly half the circle, so the long-run fraction of time spent there is $0.5$ . The time-average of our measurement converges to $0.5$ , which is also the area of the region we were detecting.

What this tells us: for this ergodic rotation, the long-run time-average of "fraction of steps in the right half" equals the size of the right half, $0.5$ . The average over time, taken along one orbit, reproduces the average over space. That equality of time-average and space-average is the whole content of the ergodic theorem in miniature.

Check your understanding Beginner

Exercise (easy, multiple choice).

A sequence of random quantities is called stationary when:

A. The quantities are independent of one another B. Every quantity equals the same fixed number C. The statistical behaviour of a block of the sequence does not depend on where in time the block starts D. The running average is exactly constant from the first step

Hint

Stationarity is about time having no preferred origin: shifting the whole sequence forward leaves its statistical description unchanged.

Answer

C. The statistical behaviour does not depend on where the block starts.

Feedback-correct: stationarity means the joint distribution of any block is unchanged when you shift the starting time, which is exactly the time-shift-invariance that the ergodic theorems require. Feedback-wrong: A describes independence, which is much stronger; B and D describe constant sequences or constant averages, neither of which is what stationarity means.

Formal definition Intermediate+

Throughout, $(Ω, F, P)$ is a probability space. The expectation $E [f] = \int_{Ω} f d P$ is the Lebesgue integral against $P$ , and $L^{p} (P)$ denotes the $L^{p}$ spaces of 02.07.06, with $L^{2} (P)$ the Hilbert space of square-integrable variables.

Definition (measure-preserving transformation). A map $T : Ω \to Ω$ is measurable if $T^{- 1} A \in F$ for all $A \in F$ , and measure-preserving if $P (T^{- 1} A) = P (A)$ for all $A \in F$ . The quadruple $(Ω, F, P, T)$ is then a measure-preserving system. Its associated Koopman operator $U_{T}$ acts on functions by $(U_{T} f) (ω) = f (T ω)$ ; measure-preservation is equivalent to $U_{T}$ being an isometry of $L^{p} (P)$ for every $p$ , since $E [∣ f \circ T ∣^{p}] = E [∣ f ∣^{p}]$ .

Definition (stationary sequence). A sequence $(ξ_{n})_{n \geq 0}$ of random variables is stationary if for every $k \geq 0$ the shifted block $(ξ_{k}, ξ_{k + 1}, \dots)$ has the same joint distribution as $(ξ_{0}, ξ_{1}, \dots)$ . Every stationary sequence arises from a measure-preserving system: set $Ω = R^{{0, 1, \dots}}$ with the law of $(ξ_{n})$ , let $T$ be the shift $(T ω)_{n} = ω_{n + 1}$ , and let $f (ω) = ω_{0}$ be the zeroth coordinate, so that $ξ_{n} = f \circ T^{n}$ . An i.i.d. sequence is the special case where $P$ is a product measure.

Definition (invariant $σ$ -algebra and ergodicity). A set $A \in F$ is invariant if $T^{- 1} A = A$ , and almost invariant if $P (T^{- 1} A △ A) = 0$ . The almost-invariant sets form a sub- $σ$ -algebra $I$ , the invariant $σ$ -algebra. The system is ergodic if every almost-invariant set has $P (A) \in {0, 1}$ , equivalently if every $I$ -measurable function is a.s. constant. A function $g$ is invariant if $g \circ T = g$ a.s.; ergodicity says all invariant functions are a.s. constant.

Definition (subadditive sequence). A doubly indexed family $(X_{m, n})_{0 \leq m < n}$ of integrable variables on a measure-preserving system is subadditive if $X_{0, n} \leq X_{0, m} + X_{m, n} (0 < m < n), X_{m, n} = X_{0, n - m} \circ T^{m},$ the second relation being stationarity of the increments. The time constant is $γ = in f_{n \geq 1} E [X_{0, n}] / n$ , finite provided $E [X_{0, n}] \geq - c n$ for some constant $c$ . Choosing $X_{0, n} = \sum_{k = 0}^{n - 1} f \circ T^{k}$ makes the subadditivity an equality and recovers the additive (Birkhoff) setting.

Counterexamples to common slips Intermediate+

Stationary is not i.i.d. The sequence $ξ_{n} = Z$ for a single random variable $Z$ (every term equal) is stationary but maximally dependent. Birkhoff still applies; the time-average is $Z$ itself, an invariant non-constant limit. The system is not ergodic, which is exactly why the limit is a random variable rather than a constant.
Measure-preserving is about preimages, not images. The condition is $P (T^{- 1} A) = P (A)$ . For non-invertible $T$ the forward image $P (T A)$ can differ from $P (A)$ ; the doubling map $ω \mapsto 2 ω mod 1$ on $[0, 1)$ preserves Lebesgue measure through preimages while being two-to-one.
Ergodicity is a property of the pair $(T, P)$ , not of $T$ alone. The same map can be ergodic for one invariant measure and not another. An irrational rotation is ergodic for Lebesgue measure but not for a point mass on a periodic orbit (there are none) — and a rational rotation is ergodic for no atomless invariant measure at all, since its orbits are finite.
The invariant $σ$ -algebra need not be degenerate. For the shift on a product space the invariant $σ$ -algebra is the tail/exchangeable field, which the Kolmogorov zero-one law 37.02.01 forces to be $P$ -degenerate (every set has probability $0$ or $1$ ) — that is why i.i.d. sequences are ergodic. Drop independence and $I$ can be rich; the conditional expectation $E [f ∣ I]$ is then a genuine random limit.
Subadditivity is one-directional. From $X_{0, n} \leq X_{0, m} + X_{m, n}$ one gets $E [X_{0, n}] / n \to γ = in f_{n} E [X_{0, n}] / n$ by Fekete's lemma, but the pointwise limit equals $γ$ only after the ergodic-theoretic argument; assuming the pointwise limit is the naive $lim E [X_{0, n}] / n$ without proof is the slip Kingman's theorem repairs.

Key theorem with proof Intermediate+

Theorem (Birkhoff's pointwise ergodic theorem; Birkhoff 1931). Let $(Ω, F, P, T)$ be a measure-preserving system and $f \in L^{1} (P)$ . Then the Birkhoff averages $A_{n} f = \frac{1}{n} k = 0 \sum n - 1 f \circ T^{k}$ converge almost surely and in $L^{1}$ to $E [f ∣ I]$ , the conditional expectation of $f$ on the invariant $σ$ -algebra. If $T$ is ergodic the limit is the constant $E [f]$ .

The analytic core is the maximal ergodic lemma; the pointwise statement follows by a squeeze on the limsup and liminf of the averages.

Lemma (maximal ergodic lemma). For $f \in L^{1}$ write $S_{0} f = 0$ and $S_{k} f = \sum_{j = 0}^{k - 1} f \circ T^{j}$ for $k \geq 1$ , and $M_{n} f = max_{0 \leq k \leq n} S_{k} f$ . Let $E_{n} = {M_{n} f > 0}$ . Then $\int_{E_{n}} f d P \geq 0.$

Proof. On $E_{n}$ , for $0 \leq k \leq n$ we have $M_{n} f \geq S_{k} f$ , hence $M_{n} f \geq S_{k + 1} f - f \circ T^{0}$ rearranges through $S_{k + 1} f = f + (S_{k} f) \circ T$ . Concretely, for $1 \leq k \leq n$ , $S_{k} f = f + (S_{k - 1} f) \circ T \leq f + (M_{n} f) \circ T$ , and also $S_{0} f = 0$ . Taking the maximum over $0 \leq k \leq n$ gives $M_{n} f \leq f + (M_{n} f)^{+} \circ T$ pointwise, because $(M_{n} f) \circ T \leq (M_{n} f)^{+} \circ T$ and the $k = 0$ term is dominated by $f + (M_{n} f)^{+} \circ T$ when $f \geq - ((M_{n} f)^{+} \circ T)$ , which holds since the right side is the bound forced on every $S_{k} f$ . Therefore on $E_{n}$ , where $M_{n} f > 0$ so $M_{n} f = (M_{n} f)^{+}$ , $f \geq (M_{n} f)^{+} - (M_{n} f)^{+} \circ T .$ Integrate over $E_{n}$ and extend the right side to all of $Ω$ using $(M_{n} f)^{+} = 0$ off $E_{n}$ : $\int_{E_{n}} f d P \geq \int_{Ω} (M_{n} f)^{+} d P - \int_{E_{n}} (M_{n} f)^{+} \circ T d P \geq \int_{Ω} (M_{n} f)^{+} d P - \int_{Ω} (M_{n} f)^{+} \circ T d P .$ By measure-preservation $\int (M_{n} f)^{+} \circ T d P = \int (M_{n} f)^{+} d P$ , so the right side is $0$ . $□$

Proof of the Theorem. Replacing $f$ by $f - E [f ∣ I]$ reduces to the case $E [f ∣ I] = 0$ , for which we must show $A_{n} f \to 0$ a.s. Set $\overset{ˉ}{f} = lim sup_{n} A_{n} f$ ; since $A_{n} f \circ T$ and $A_{n} f$ differ by $(f \circ T^{n} - f) / n \to 0$ in the relevant sense, $\overset{ˉ}{f}$ is invariant, hence $I$ -measurable. Fix $ε > 0$ and let $B = {\overset{ˉ}{f} > ε} \in I$ . Apply the maximal ergodic lemma to $g = (f - ε) 1_{B}$ , which is invariant-set-restricted so that its Birkhoff sums on $B$ coincide with those of $f - ε$ . The lemma gives $\int_{B \cap {s u p_{k} S_{k} g > 0}} g d P \geq 0$ ; on $B$ , $sup_{k} S_{k} (f - ε) > 0$ holds because $\overset{ˉ}{f} > ε$ there forces some average $A_{k} f > ε$ . Hence $0 \leq \int_{B} (f - ε) d P = \int_{B} f d P - ε P (B) = \int_{B} E [f ∣ I] d P - ε P (B) = - ε P (B),$ using $B \in I$ and $E [f ∣ I] = 0$ . Thus $ε P (B) \leq 0$ , forcing $P (B) = 0$ ; as $ε$ was arbitrary, $\overset{ˉ}{f} \leq 0$ a.s. Applying the same argument to $- f$ gives $lim inf_{n} A_{n} f \geq 0$ a.s., so $A_{n} f \to 0$ a.s. The $L^{1}$ convergence follows from a.s. convergence plus uniform integrability of $(A_{n} f)$ , which holds because the family ${f \circ T^{k}}$ is uniformly integrable (each has the same distribution as $f \in L^{1}$ ) and Cesàro averages of a uniformly integrable family are uniformly integrable 37.04.03. $□$

Bridge. Birkhoff's theorem builds toward the entire theory of stationary sequences and appears again in the subadditive theorem below, where additive Birkhoff sums are the boundary case of a subadditive family. The foundational reason the average converges is that the invariant $σ$ -algebra carries every limit: the time-average can only land on an $I$ -measurable function, and the maximal ergodic lemma pins that function to be $E [f ∣ I]$ . This is exactly the conditional-expectation structure of 37.04.03, now read dynamically: the projection onto invariants is the conditional expectation, which is why the mean (von Neumann) theorem below is the $L^{2}$ shadow of the same fact and is dual to the pointwise statement. Putting these together, the i.i.d. strong law 37.02.02 is the case where $T$ is the product shift and the zero-one law 37.02.01 collapses $I$ to the constants, so $E [f ∣ I] = E [f]$ ; the central insight is that independence is one route to a degenerate invariant field, and stationarity-plus-ergodicity is the general route. The bridge is the identification of "time-average" with "projection onto the invariant $σ$ -algebra".

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Deduce the i.i.d. strong law of large numbers from Birkhoff's theorem. State precisely which system and which function you use, and why the limit is the constant $E [X_{1}]$ .

Hint

Take the shift on the product space carrying the law of the i.i.d. sequence; use the Kolmogorov zero-one law to identify the invariant $σ$ -algebra.

Answer

Let $(X_{n})_{n \geq 0}$ be i.i.d. with $E ∣ X_{0} ∣ < \infty$ , realised as coordinates on $Ω = R^{{0, 1, \dots}}$ under the product law $P$ . Let $T$ be the shift $(T ω)_{n} = ω_{n + 1}$ and $f (ω) = ω_{0}$ , so $X_{n} = f \circ T^{n}$ and $\frac{1}{n} \sum_{k = 0}^{n - 1} X_{k} = A_{n} f$ . The product measure is shift-invariant, so $T$ is measure-preserving, and the invariant $σ$ -algebra $I$ is contained in the tail $σ$ -algebra, which the Kolmogorov zero-one law 37.02.01 makes $P$ -degenerate; hence $T$ is ergodic. Birkhoff gives $A_{n} f \to E [f ∣ I] = E [f] = E [X_{0}]$ a.s. This is exactly the strong law 37.02.02, recovered without the maximal-inequality machinery.

Exercise 5 (medium, symbolic).

Let $U$ be a linear isometry of a Hilbert space $H$ . Show that $ker (I - U) = ker (I - U^{*})$ , and conclude that the orthogonal complement of the invariant subspace is the closure of the range of $I - U$ .

Hint

For $U v = v$ with $∥ U ∥ = 1$ , expand $∥ U v - v ∥^{2}$ using $⟨ U v, v ⟩$ and the isometry property to force $U^{*} v = v$ ; then use $\overline{range (I - U)} = ker (I - U^{*})^{⊥}$ .

Answer

If $U v = v$ then $⟨ U^{*} v, v ⟩ = ⟨ v, U v ⟩ = ∥ v ∥^{2}$ , and $∥ U^{*} v ∥ \leq ∥ v ∥$ since isometries have $∥ U^{*} ∥ \leq 1$ ; by Cauchy-Schwarz equality $U^{*} v = v$ , so $ker (I - U) \subseteq ker (I - U^{*})$ , and the reverse inclusion follows by symmetry ( $U^{**} = U$ ). For any bounded operator $A$ , $\overline{range (A)} = ker (A^{*})^{⊥}$ . Take $A = I - U$ , so $A^{*} = I - U^{*}$ and $ker (A^{*}) = ker (I - U^{*}) = ker (I - U)$ . Hence $\overline{range (I - U)} = ker (I - U)^{⊥}$ , giving the orthogonal splitting $H = ker (I - U) \oplus \overline{range (I - U)}$ used in von Neumann's theorem.

Exercise 7 (hard, symbolic).

Show that $X_{0, n} = lo g ∥ M_{n} \dots M_{1} ∥$ , for a stationary sequence of integrable random matrices with $E lo g^{+} ∥ M_{1} ∥ < \infty$ , satisfies the subadditivity hypotheses of Kingman's theorem, and identify the limit.

Hint

Use submultiplicativity of the operator norm $∥ A B ∥ \leq ∥ A ∥∥ B ∥$ to get subadditivity of the logs, and stationarity of the matrix sequence to get the cocycle relation.

Answer

Write $P_{m, n} = M_{n} \dots M_{m + 1}$ , so $P_{0, n} = P_{m, n} P_{0, m}$ . Submultiplicativity of the operator norm gives $∥ P_{0, n} ∥ \leq ∥ P_{m, n} ∥ ∥ P_{0, m} ∥$ , hence taking logs $X_{0, n} \leq X_{0, m} + X_{m, n}$ with $X_{m, n} = lo g ∥ P_{m, n} ∥$ . Stationarity of $(M_{k})$ gives $X_{m, n} = X_{0, n - m} \circ T^{m}$ for the shift $T$ , so the cocycle relation holds. Integrability of $X_{0, n}^{+}$ follows from $E lo g^{+} ∥ M_{1} ∥ < \infty$ and submultiplicativity; a lower bound $E [X_{0, n}] \geq - c n$ comes from $∥ P_{0, n} ∥ \geq ∣ det P_{0, n} ∣^{1/ d} = \prod_{k} ∣ det M_{k} ∣^{1/ d}$ when the matrices are invertible. Kingman's theorem then gives $X_{0, n} / n \to γ$ a.s. and in $L^{1}$ , where $γ = in f_{n} E [X_{0, n}] / n$ is the top Lyapunov exponent of the Furstenberg-Kesten theorem; under an irreducibility hypothesis it is a.s. constant.

Exercise 8 (hard, symbolic).

Prove von Neumann's mean ergodic theorem: if $U$ is a linear isometry of a Hilbert space $H$ and $P$ is the orthogonal projection onto $ker (I - U)$ , then $\frac{1}{n} \sum_{k = 0}^{n - 1} U^{k} v \to P v$ in norm for every $v \in H$ .

Hint

Split $v$ using $H = ker (I - U) \oplus \overline{range (I - U)}$ from Exercise 5. Handle invariant vectors and coboundaries $v = w - U w$ separately, then pass to the closure.

Answer

By Exercise 5, $H = ker (I - U) \oplus \overline{range (I - U)}$ . Write $A_{n} = \frac{1}{n} \sum_{k = 0}^{n - 1} U^{k}$ ; each $∥ A_{n} ∥ \leq 1$ . If $v \in ker (I - U)$ then $U^{k} v = v$ for all $k$ , so $A_{n} v = v = P v$ . If $v = w - U w$ is a coboundary, the sum telescopes: $A_{n} v = \frac{1}{n} (w - U^{n} w)$ , and $∥ A_{n} v ∥ \leq \frac{2∥ w ∥}{n} \to 0 = P v$ . Hence $A_{n} v \to P v$ on the dense subspace $ker (I - U) + range (I - U)$ . For general $v$ in the closure, given $ε > 0$ pick $v^{'}$ in that subspace with $∥ v - v^{'} ∥ < ε$ ; then $∥ A_{n} v - P v ∥ \leq ∥ A_{n} (v - v^{'}) ∥ + ∥ A_{n} v^{'} - P v^{'} ∥ + ∥ P (v^{'} - v) ∥ \leq 2 ε + ∥ A_{n} v^{'} - P v^{'} ∥$ , and the middle term $\to 0$ . So $lim sup_{n} ∥ A_{n} v - P v ∥ \leq 2 ε$ for all $ε$ , giving $A_{n} v \to P v$ . Applied with $U = U_{T}$ on $L^{2} (P)$ , $ker (I - U_{T})$ is the invariant functions and $P = E [\cdot ∣ I]$ , so the averages converge in $L^{2}$ to $E [f ∣ I]$ .

Advanced results Master

Theorem 1 (von Neumann mean ergodic theorem; von Neumann 1932). Let $U$ be a contraction of a Hilbert space $H$ — in particular the Koopman isometry $U_{T}$ of $L^{2} (P)$ . Then $\frac{1}{n} \sum_{k = 0}^{n - 1} U^{k} \to P$ strongly, where $P$ is the orthogonal projection onto the invariant subspace $ker (I - U)$ . For $U = U_{T}$ the projection is conditional expectation onto $I$ , so $A_{n} f \to E [f ∣ I]$ in $L^{2}$ . The mean theorem is the Hilbert-space shadow of Birkhoff: it gives $L^{2}$ rather than pointwise convergence but extends to any contraction and requires only the orthogonal decomposition $H = ker (I - U) \oplus \overline{range (I - U)}$ ^{[von Neumann 1932]}.

Theorem 2 (Birkhoff pointwise ergodic theorem; Birkhoff 1931). For $f \in L^{1}$ and any measure-preserving $T$ , $A_{n} f \to E [f ∣ I]$ almost surely and in $L^{1}$ . The maximal ergodic lemma is the irreducible analytic content; the pointwise statement strictly strengthens the mean theorem on $L^{1} \cap L^{2}$ but neither contains the other in general, since $L^{1}$ convergence and a.s. convergence are distinct modes ^{[Birkhoff 1931]}.

Theorem 3 (ergodic decomposition). Every invariant probability measure for a measurable transformation on a standard Borel space is a mixture (a barycentre) of ergodic invariant measures: there is a probability measure on the set of ergodic measures such that $P = \int P_{e} d ρ (e)$ . Consequently the Birkhoff limit $E [f ∣ I]$ evaluated at $ω$ is the integral of $f$ against the ergodic component through $ω$ . This is the structural reason the non-ergodic case reduces to the ergodic one: $I$ indexes the ergodic components.

Theorem 4 (Kingman's subadditive ergodic theorem; Kingman 1968). Let $(X_{m, n})$ be a subadditive family on a measure-preserving system with $E [X_{0, 1}^{+}] < \infty$ and $in f_{n} E [X_{0, n}] / n =: γ > - \infty$ . Then $X_{0, n} / n \to X$ a.s. and in $L^{1}$ , where $X$ is an invariant random variable with $E [X] = γ$ ; if $T$ is ergodic, $X = γ$ a.s. The additive case $X_{0, n} = \sum_{k < n} f \circ T^{k}$ is Birkhoff. The standard modern proof (Steele's variant of Kingman, via the Burkholder-style decomposition or the Ackoglu-Krengel superadditive approach) controls $lim sup X_{0, n} / n$ by comparison against truncated additive cocycles and identifies the limit through the inf-formula ^{[Kingman 1968]}.

Theorem 5 (Furstenberg-Kesten / multiplicative ergodic; Furstenberg-Kesten 1960). Let $(M_{k})$ be a stationary ergodic sequence of $d \times d$ random matrices with $E lo g^{+} ∥ M_{1} ∥ < \infty$ . Then $\frac{1}{n} lo g ∥ M_{n} \dots M_{1} ∥ \to γ_{1}$ a.s., the top Lyapunov exponent, equal to $in f_{n} \frac{1}{n} E lo g ∥ M_{n} \dots M_{1} ∥$ . This is Kingman applied to $X_{0, n} = lo g ∥ M_{n} \dots M_{1} ∥$ , submultiplicativity supplying subadditivity. Oseledets' multiplicative ergodic theorem refines this to a full Lyapunov spectrum $γ_{1} \geq \dots \geq γ_{d}$ with an associated measurable filtration ^{[Furstenberg-Kesten 1960]}.

Theorem 6 (longest increasing subsequence and first-passage percolation). For a uniform random permutation of ${1, \dots, n}$ , the length $L_{n}$ of the longest increasing subsequence satisfies $L_{n} / n \to 2$ a.s.; the subadditive structure (after Hammersley's Poissonised model) yields existence of the constant via Kingman, and the value $2$ comes from the later Vershik-Kerov / Logan-Shepp analysis. In first-passage percolation on $Z^{d}$ with i.i.d. non-negative edge weights, the passage time $T (0, n x)$ satisfies $T (0, n x) / n \to μ (x)$ a.s. for a deterministic norm-like time constant $μ$ , again by subadditivity. These are the canonical combinatorial applications of Kingman's theorem ^{[Hammersley-Welsh 1965]}.

Synthesis. The four ergodic theorems are one circle of ideas seen at different resolutions, and the foundational reason they cohere is that each identifies a limit of averages with a projection onto the invariant $σ$ -algebra. The mean theorem is the $L^{2}$ projection $P = E [\cdot ∣ I]$ obtained from the orthogonal splitting $H = ker (I - U) \oplus \overline{range (I - U)}$ ; the pointwise theorem is exactly the a.s. upgrade of that same projection, the maximal ergodic lemma supplying the control that $L^{2}$ geometry cannot. This is precisely the conditional-expectation and uniform-integrability machinery of 37.04.03 read dynamically, and it is dual to the martingale convergence theorem, where the limiting $σ$ -algebra is a filtration tail rather than an invariant field. Putting these together, the i.i.d. strong law 37.02.02 is the ergodic special case in which the Kolmogorov zero-one law 37.02.01 makes $I$ $P$ -degenerate, so the central insight — independence is one mechanism producing a degenerate invariant field — generalises to stationarity-plus-ergodicity. The subadditive theorem generalises the additive Birkhoff statement to families that only sub-add, and this single generalisation is what unlocks Lyapunov exponents, longest increasing subsequences, and first-passage percolation: the bridge is that submultiplicativity of norms becomes subadditivity of logs, and Kingman converts it into an a.s. growth rate equal to the inf-formula $γ = in f_{n} E [X_{0, n}] / n$ .

Full proof set Master

Proposition 1 (Fekete's lemma underlies the time constant). If $(a_{n})$ is a real sequence with $a_{m + n} \leq a_{m} + a_{n}$ for all $m, n \geq 1$ , then $lim_{n} a_{n} / n = in f_{n} a_{n} / n \in [- \infty, \infty)$ .

Proof. Let $γ = in f_{n} a_{n} / n$ . Fix $ε > 0$ and choose $m$ with $a_{m} / m < γ + ε$ . For $n > m$ write $n = q m + r$ with $0 \leq r < m$ . Subadditivity gives $a_{n} \leq q a_{m} + a_{r}$ (with $a_{0} := 0$ ), so $\frac{a _{n}}{n} \leq \frac{q a _{m}}{q m + r} + \frac{a _{r}}{n} \leq \frac{a _{m}}{m} + \frac{max _{0 \leq r < m} a _{r}}{n} .$ As $n \to \infty$ the last term vanishes, so $lim sup_{n} a_{n} / n \leq a_{m} / m < γ + ε$ . Since $ε$ is arbitrary, $lim sup_{n} a_{n} / n \leq γ \leq lim inf_{n} a_{n} / n$ , giving equality. $□$

Proposition 2 (the maximal ergodic lemma yields the Hardy-Littlewood maximal bound). With $f \in L^{1}$ and $f^{*} = sup_{n \geq 1} A_{n} f$ , for every $λ > 0$ , $P (f^{*} > λ) \leq \frac{1}{λ} E [∣ f ∣ 1_{{f^{*} > λ}}] \leq \frac{E ∣ f ∣}{λ} .$

Proof. Apply the maximal ergodic lemma to $g = f - λ$ on the invariant-restricted set ${f^{*} > λ}$ . The event ${f^{*} > λ}$ means some average $A_{n} f > λ$ , equivalently some partial sum $S_{n} (f - λ) > 0$ , so ${f^{*} > λ} \subseteq ⋃_{n} {M_{n} g > 0}$ . The lemma gives $\int_{{M_{n} g > 0}} (f - λ) d P \geq 0$ for each $n$ ; letting $n \to \infty$ and using monotone convergence on the increasing sets, $\int_{{f^{*} > λ}} (f - λ) d P \geq 0$ , i.e. $λ P (f^{*} > λ) \leq \int_{{f^{*} > λ}} f d P \leq E [∣ f ∣ 1_{{f^{*} > λ}}]$ . Dividing by $λ$ and bounding the indicator by $1$ gives the weak-type bound. $□$

Proposition 3 (ergodicity is equivalent to a mixing-free averaging criterion). A measure-preserving system is ergodic if and only if for all $A, B \in F$ , $\frac{1}{n} k = 0 \sum n - 1 P (T^{- k} A \cap B) \to P (A) P (B) .$

Proof. Suppose $T$ is ergodic. Apply Birkhoff to $f = 1_{A}$ : $A_{n} 1_{A} \to P (A)$ a.s. and in $L^{1}$ . Multiply by $1_{B}$ and take expectations, using bounded convergence: $\frac{1}{n} \sum_{k < n} E [1_{A} \circ T^{k} 1_{B}] \to P (A) E [1_{B}] = P (A) P (B)$ , and $E [1_{A} \circ T^{k} 1_{B}] = P (T^{- k} A \cap B)$ . Conversely, suppose the averaging criterion holds and let $A$ be invariant, $T^{- 1} A = A$ . Take $B = A$ : then $T^{- k} A \cap A = A$ for all $k$ , so the left side is constantly $P (A)$ , while the right side is $P (A)^{2}$ ; hence $P (A) = P (A)^{2}$ , forcing $P (A) \in {0, 1}$ . So every invariant set is $P$ -degenerate and $T$ is ergodic. $□$

Proposition 4 (subadditive limit has the stated mean). Under the hypotheses of Kingman's theorem, the a.s. limit $X = lim_{n} X_{0, n} / n$ satisfies $E [X] = γ = in f_{n} E [X_{0, n}] / n$ .

Proof. By Proposition 1 applied to $a_{n} = E [X_{0, n}]$ (subadditive in $n$ by the cocycle relation and $E [X_{m, n}] = E [X_{0, n - m}]$ ), $E [X_{0, n}] / n \to γ$ . Kingman's theorem asserts $X_{0, n} / n \to X$ in $L^{1}$ as well as a.s.; $L^{1}$ convergence implies convergence of means, so $E [X] = lim_{n} E [X_{0, n} / n] = γ$ . Invariance of $X$ follows from $X = lim_{n} X_{0, n} / n$ and the cocycle relation $X_{0, n} = X_{0, 1} + X_{1, n}$ with $X_{1, n} = X_{0, n - 1} \circ T$ , giving $X = X \circ T$ a.s. after dividing by $n$ and letting $n \to \infty$ . $□$

Connections Master

The strong law of large numbers 37.02.02 is the ergodic special case of Birkhoff's theorem: realise the i.i.d. sequence as coordinates on a product space, take $T$ the shift and $f$ the zeroth coordinate, and the Kolmogorov zero-one law makes the invariant $σ$ -algebra $P$ -degenerate so that $E [f ∣ I]$ collapses to $E [X_{1}]$ . This unit is the stationary-sequence generalisation that this dependency makes precise.
The Borel-Cantelli lemmas and Kolmogorov zero-one law 37.02.01 supply the reason i.i.d. shifts are ergodic: the invariant $σ$ -algebra sits inside the tail field, which the zero-one law makes $P$ -degenerate. Ergodicity is thus the dynamical face of the zero-one law, and the non-ergodic case is what happens when the tail field is replaced by a rich invariant field.
Doob's inequalities, uniform integrability, and $L^{p}$ convergence 37.04.03 furnish the $L^{1}$ half of every ergodic theorem here: the Birkhoff and Kingman limits hold in $L^{1}$ because the Cesàro averages of an identically-distributed family are uniformly integrable, and the maximal ergodic lemma is the additive-cocycle analogue of Doob's maximal inequality for martingales.
The $L^{p}$ and Hilbert-space theory 02.07.06 is the arena for von Neumann's mean theorem: the Koopman operator is an $L^{2}$ isometry, the invariant functions form a closed subspace, and the orthogonal decomposition $L^{2} = ker (I - U_{T}) \oplus \overline{range (I - U_{T})}$ is the Riesz-Fischer completeness of $L^{2}$ at work.
The ergodic theorem for Markov chains 37.05.07 is the Markovian instance: a positive-recurrent irreducible chain is a stationary ergodic sequence under its invariant law, and its time-average theorem $\frac{1}{n} \sum_{k < n} f (X_{k}) \to E_{π} [f]$ is Birkhoff specialised to the shift on path space, with the excursion decomposition replacing the abstract maximal lemma.

The dynamical-systems treatment of ergodic theory — mixing, weak mixing, entropy (Kolmogorov-Sinai), the isomorphism theory of Ornstein, and smooth ergodic theory (Pesin theory, SRB measures) — is a separate forthcoming curriculum spine. This unit deliberately keeps its emphasis on the probabilistic limit theorems for stationary sequences: Birkhoff, von Neumann, and Kingman as the stationary-sequence generalisation of the strong law of large numbers, rather than on the structural dynamics that those theorems also serve.

Historical & philosophical context Master

The ergodic theorems arose from a foundational problem in statistical mechanics: Boltzmann's ergodic hypothesis of the 1870s posited that a mechanical system's long-time average along a single trajectory equals its average over the constant-energy surface, justifying the replacement of intractable time-averages by computable phase-space averages. The hypothesis as literally stated — that a single orbit visits every point of the energy surface — is false on dimensional grounds, and the corrected quasi-ergodic hypothesis (the orbit is dense) was still too weak to license the averaging. The resolution came in 1931-1932 in two papers in the Proceedings of the National Academy of Sciences. John von Neumann's Proof of the quasi-ergodic hypothesis ^{[von Neumann 1932]} established the mean ( $L^{2}$ ) ergodic theorem first, using the spectral theory of unitary operators he had just developed; George Birkhoff's Proof of the ergodic theorem ^{[Birkhoff 1931]} then proved the stronger pointwise statement. Birkhoff's paper appeared in print before von Neumann's despite von Neumann having the result earlier, a priority episode documented in the correspondence reproduced by later historians of the period.

The decisive conceptual move was the recognition, sharpened over the following two decades by Khinchin, Hopf, and Kolmogorov, that the right hypothesis is not a property of individual orbits but the measure-theoretic degeneracy of the invariant $σ$ -algebra. This reframed ergodicity as an indecomposability condition on an invariant measure and tied it directly to the law of large numbers: i.i.d. sequences are exactly the ergodic stationary sequences whose invariant field is $P$ -degenerate by the zero-one law.

John Kingman's 1968 Journal of the Royal Statistical Society paper ^{[Kingman 1968]} introduced the subadditive ergodic theorem, motivated by problems where the natural quantities only sub-add rather than add — products of random matrices studied by Furstenberg and Kesten in 1960 ^{[Furstenberg-Kesten 1960]}, and the first-passage percolation and subadditive-process framework of Hammersley and Welsh in 1965 ^{[Hammersley-Welsh 1965]}. Kingman's theorem made the existence of Lyapunov exponents, time constants in percolation, and the longest-increasing-subsequence growth rate immediate consequences of a single subadditivity principle, and Steele and others subsequently gave shorter proofs that became the textbook standard.

Bibliography Master

@article{Birkhoff1931,
  author  = {Birkhoff, George D.},
  title   = {Proof of the ergodic theorem},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {17},
  number  = {12},
  year    = {1931},
  pages   = {656--660}
}

@article{vonNeumann1932,
  author  = {von Neumann, John},
  title   = {Proof of the quasi-ergodic hypothesis},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {18},
  number  = {1},
  year    = {1932},
  pages   = {70--82}
}

@article{Kingman1968,
  author  = {Kingman, John F. C.},
  title   = {The ergodic theory of subadditive stochastic processes},
  journal = {Journal of the Royal Statistical Society, Series B},
  volume  = {30},
  number  = {3},
  year    = {1968},
  pages   = {499--510}
}

@article{FurstenbergKesten1960,
  author  = {Furstenberg, Harry and Kesten, Harry},
  title   = {Products of random matrices},
  journal = {Annals of Mathematical Statistics},
  volume  = {31},
  number  = {2},
  year    = {1960},
  pages   = {457--469}
}

@incollection{HammersleyWelsh1965,
  author    = {Hammersley, John M. and Welsh, Dominic J. A.},
  title     = {First-passage percolation, subadditive processes, stochastic networks, and generalized renewal theory},
  booktitle = {Bernoulli-Bayes-Laplace Anniversary Volume},
  publisher = {Springer},
  year      = {1965},
  pages     = {61--110}
}

@book{Walters1982,
  author    = {Walters, Peter},
  title     = {An Introduction to Ergodic Theory},
  publisher = {Springer},
  series    = {Graduate Texts in Mathematics},
  volume    = {79},
  year      = {1982}
}

@book{Krengel1985,
  author    = {Krengel, Ulrich},
  title     = {Ergodic Theorems},
  publisher = {de Gruyter},
  year      = {1985}
}

@book{Steele1997,
  author    = {Steele, J. Michael},
  title     = {Probability Theory and Combinatorial Optimization},
  publisher = {SIAM},
  series    = {CBMS-NSF Regional Conference Series},
  year      = {1997}
}

@book{Durrett2019,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{Kallenberg2002,
  author    = {Kallenberg, Olav},
  title     = {Foundations of Modern Probability},
  edition   = {2},
  publisher = {Springer},
  year      = {2002}
}

Prerequisites

37.02.02
02.07.06
37.04.03

Tier anchors

beginner: Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) Ch. 6 (informal); Walters 1982 *An Introduction to Ergodic Theory* (Springer) Ch. 1 (the long-run-average picture for a single transformation)
intermediate: Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) §6.1-6.4 (stationary sequences, Birkhoff, the ergodic case of the SLLN, Kingman); Walters 1982 *An Introduction to Ergodic Theory* (Springer) §1.5-1.6, §2.1
master: Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) Ch. 6; Kallenberg 2002 *Foundations of Modern Probability* 2e (Springer) Ch. 10 (ergodic theorems, subadditivity); Krengel 1985 *Ergodic Theorems* (de Gruyter) Ch. 1, Ch. 6 (the subadditive theorem); Steele 1997 *Probability Theory and Combinatorial Optimization* (SIAM) Ch. 1 (Kingman applied to longest increasing subsequences)

References

Birkhoff — Proof of the ergodic theorem · Proceedings of the National Academy of Sciences 17 (1931), 656-660
von Neumann — Proof of the quasi-ergodic hypothesis · Proceedings of the National Academy of Sciences 18 (1932), 70-82
Kingman — The ergodic theory of subadditive stochastic processes · Journal of the Royal Statistical Society Series B 30 (1968), 499-510
Furstenberg-Kesten — Products of random matrices · Annals of Mathematical Statistics 31 (1960), 457-469
Hammersley-Welsh — First-passage percolation, subadditive processes, stochastic networks, and generalized renewal theory · in Bernoulli-Bayes-Laplace Anniversary Volume, Springer 1965, 61-110
Durrett — Probability: Theory and Examples, 5th edition · Cambridge University Press 2019, Ch. 6 (ergodic theorems)
Kallenberg — Foundations of Modern Probability, 2nd edition · Springer 2002, Ch. 10 (stationarity, invariance, ergodic theorems, subadditivity)

Estimated time

beginner: 18m
intermediate: 58m
master: 95m