37.02.03 · probability / 02-independence-laws-of-large-numbers

The Ergodic Theorems: Birkhoff, von Neumann, and Kingman

shipped3 tiersLean: none

Anchor (Master): Durrett 2019 *Probability: Theory and Examples* 5e (Cambridge) Ch. 6; Kallenberg 2002 *Foundations of Modern Probability* 2e (Springer) Ch. 10 (ergodic theorems, subadditivity); Krengel 1985 *Ergodic Theorems* (de Gruyter) Ch. 1, Ch. 6 (the subadditive theorem); Steele 1997 *Probability Theory and Combinatorial Optimization* (SIAM) Ch. 1 (Kingman applied to longest increasing subsequences)

Intuition Beginner

The strong law of large numbers handles a single, very clean situation: you average a long run of independent identical trials, and the average settles to the expected value. But the world is full of sequences that are not independent. The temperature on successive days, the position of a particle bouncing inside a box, the letters in a long stretch of English text — each entry depends on the ones before it. The ergodic theorems are the answer to a simple question: when does the running average of a dependent sequence still settle down to a fixed number?

The key idea is stationarity. A sequence is stationary when its statistical behaviour does not depend on where you start the clock: the chunk of the sequence you see between day 100 and day 200 looks, in distribution, exactly like the chunk between day 1000 and day 1100. Time has no preferred origin. Many real sequences with strong dependence are still stationary in this sense, and stationarity turns out to be the right replacement for independence.

Picture a stationary sequence as a single system evolving by one fixed rule, like a marble rolling on a frictionless table that reflects off the walls. Applying the rule shifts the whole system forward one step, and because the rule preserves the underlying "size" of regions, the long-run behaviour stays balanced. Birkhoff's theorem says: track any measurement of such a system and its time-average over a long run converges. Whether it converges to one universal number, or to a number depending on which run you got, is decided by whether the system is ergodic — whether it eventually visits everywhere, mixing the whole space rather than staying trapped in one region.

The takeaway: the ergodic theorems extend the law of large numbers from independent sequences to stationary ones, replacing the expected value by a time-average that converges for almost every run, and equals one universal number exactly when the system is ergodic.

Visual Beginner

Picture a single point hopping around the surface of a doughnut by a fixed rule, leaving a trail. If the rule is ergodic, the trail eventually covers the whole surface evenly; the fraction of time spent over any patch matches the area of that patch.

The winding trail is one long run of the system; the shaded patch is a region you might measure. The bars show that two different runs reach the same long-run average when the system is ergodic. If the system were not ergodic, the surface would split into separate zones and different runs could settle on different averages.

Worked example Beginner

We compute a time-average for the simplest interesting stationary system: rotating a circle by a fixed angle and tracking how often a point lands in the right half.

Step 1. The system. Take the circle as the numbers from up to , wrapping around so that is the same point as . The rule is: add and wrap around. Starting from , the orbit is which wraps to , then , and so on.

Step 2. The measurement. We measure when the point is in the right half (between and ) and otherwise. The first ten positions give measurements .

Step 3. The running average. Summing those ten measurements gives , so the average over the first ten steps is . Continue for many more steps and the average stays near .

Step 4. What the theorem predicts. Because is not a simple fraction of the circle in a way that makes the orbit repeat, the orbit spreads out evenly over the whole circle. The right half is exactly half the circle, so the long-run fraction of time spent there is . The time-average of our measurement converges to , which is also the area of the region we were detecting.

What this tells us: for this ergodic rotation, the long-run time-average of "fraction of steps in the right half" equals the size of the right half, . The average over time, taken along one orbit, reproduces the average over space. That equality of time-average and space-average is the whole content of the ergodic theorem in miniature.

Check your understanding Beginner

Formal definition Intermediate+

Throughout, is a probability space. The expectation is the Lebesgue integral against , and denotes the spaces of 02.07.06, with the Hilbert space of square-integrable variables.

Definition (measure-preserving transformation). A map is measurable if for all , and measure-preserving if for all . The quadruple is then a measure-preserving system. Its associated Koopman operator acts on functions by ; measure-preservation is equivalent to being an isometry of for every , since .

Definition (stationary sequence). A sequence of random variables is stationary if for every the shifted block has the same joint distribution as . Every stationary sequence arises from a measure-preserving system: set with the law of , let be the shift , and let be the zeroth coordinate, so that . An i.i.d. sequence is the special case where is a product measure.

Definition (invariant -algebra and ergodicity). A set is invariant if , and almost invariant if . The almost-invariant sets form a sub--algebra , the invariant -algebra. The system is ergodic if every almost-invariant set has , equivalently if every -measurable function is a.s. constant. A function is invariant if a.s.; ergodicity says all invariant functions are a.s. constant.

Definition (subadditive sequence). A doubly indexed family of integrable variables on a measure-preserving system is subadditive if the second relation being stationarity of the increments. The time constant is , finite provided for some constant . Choosing makes the subadditivity an equality and recovers the additive (Birkhoff) setting.

Counterexamples to common slips Intermediate+

  • Stationary is not i.i.d. The sequence for a single random variable (every term equal) is stationary but maximally dependent. Birkhoff still applies; the time-average is itself, an invariant non-constant limit. The system is not ergodic, which is exactly why the limit is a random variable rather than a constant.

  • Measure-preserving is about preimages, not images. The condition is . For non-invertible the forward image can differ from ; the doubling map on preserves Lebesgue measure through preimages while being two-to-one.

  • Ergodicity is a property of the pair , not of alone. The same map can be ergodic for one invariant measure and not another. An irrational rotation is ergodic for Lebesgue measure but not for a point mass on a periodic orbit (there are none) — and a rational rotation is ergodic for no atomless invariant measure at all, since its orbits are finite.

  • The invariant -algebra need not be degenerate. For the shift on a product space the invariant -algebra is the tail/exchangeable field, which the Kolmogorov zero-one law 37.02.01 forces to be -degenerate (every set has probability or ) — that is why i.i.d. sequences are ergodic. Drop independence and can be rich; the conditional expectation is then a genuine random limit.

  • Subadditivity is one-directional. From one gets by Fekete's lemma, but the pointwise limit equals only after the ergodic-theoretic argument; assuming the pointwise limit is the naive without proof is the slip Kingman's theorem repairs.

Key theorem with proof Intermediate+

Theorem (Birkhoff's pointwise ergodic theorem; Birkhoff 1931). Let be a measure-preserving system and . Then the Birkhoff averages converge almost surely and in to , the conditional expectation of on the invariant -algebra. If is ergodic the limit is the constant .

The analytic core is the maximal ergodic lemma; the pointwise statement follows by a squeeze on the limsup and liminf of the averages.

Lemma (maximal ergodic lemma). For write and for , and . Let . Then

Proof. On , for we have , hence rearranges through . Concretely, for , , and also . Taking the maximum over gives pointwise, because and the term is dominated by when , which holds since the right side is the bound forced on every . Therefore on , where so , Integrate over and extend the right side to all of using off : By measure-preservation , so the right side is .

Proof of the Theorem. Replacing by reduces to the case , for which we must show a.s. Set ; since and differ by in the relevant sense, is invariant, hence -measurable. Fix and let . Apply the maximal ergodic lemma to , which is invariant-set-restricted so that its Birkhoff sums on coincide with those of . The lemma gives ; on , holds because there forces some average . Hence using and . Thus , forcing ; as was arbitrary, a.s. Applying the same argument to gives a.s., so a.s. The convergence follows from a.s. convergence plus uniform integrability of , which holds because the family is uniformly integrable (each has the same distribution as ) and Cesàro averages of a uniformly integrable family are uniformly integrable 37.04.03.

Bridge. Birkhoff's theorem builds toward the entire theory of stationary sequences and appears again in the subadditive theorem below, where additive Birkhoff sums are the boundary case of a subadditive family. The foundational reason the average converges is that the invariant -algebra carries every limit: the time-average can only land on an -measurable function, and the maximal ergodic lemma pins that function to be . This is exactly the conditional-expectation structure of 37.04.03, now read dynamically: the projection onto invariants is the conditional expectation, which is why the mean (von Neumann) theorem below is the shadow of the same fact and is dual to the pointwise statement. Putting these together, the i.i.d. strong law 37.02.02 is the case where is the product shift and the zero-one law 37.02.01 collapses to the constants, so ; the central insight is that independence is one route to a degenerate invariant field, and stationarity-plus-ergodicity is the general route. The bridge is the identification of "time-average" with "projection onto the invariant -algebra".

Exercises Intermediate+

Advanced results Master

Theorem 1 (von Neumann mean ergodic theorem; von Neumann 1932). Let be a contraction of a Hilbert space — in particular the Koopman isometry of . Then strongly, where is the orthogonal projection onto the invariant subspace . For the projection is conditional expectation onto , so in . The mean theorem is the Hilbert-space shadow of Birkhoff: it gives rather than pointwise convergence but extends to any contraction and requires only the orthogonal decomposition [von Neumann 1932].

Theorem 2 (Birkhoff pointwise ergodic theorem; Birkhoff 1931). For and any measure-preserving , almost surely and in . The maximal ergodic lemma is the irreducible analytic content; the pointwise statement strictly strengthens the mean theorem on but neither contains the other in general, since convergence and a.s. convergence are distinct modes [Birkhoff 1931].

Theorem 3 (ergodic decomposition). Every invariant probability measure for a measurable transformation on a standard Borel space is a mixture (a barycentre) of ergodic invariant measures: there is a probability measure on the set of ergodic measures such that . Consequently the Birkhoff limit evaluated at is the integral of against the ergodic component through . This is the structural reason the non-ergodic case reduces to the ergodic one: indexes the ergodic components.

Theorem 4 (Kingman's subadditive ergodic theorem; Kingman 1968). Let be a subadditive family on a measure-preserving system with and . Then a.s. and in , where is an invariant random variable with ; if is ergodic, a.s. The additive case is Birkhoff. The standard modern proof (Steele's variant of Kingman, via the Burkholder-style decomposition or the Ackoglu-Krengel superadditive approach) controls by comparison against truncated additive cocycles and identifies the limit through the inf-formula [Kingman 1968].

Theorem 5 (Furstenberg-Kesten / multiplicative ergodic; Furstenberg-Kesten 1960). Let be a stationary ergodic sequence of random matrices with . Then a.s., the top Lyapunov exponent, equal to . This is Kingman applied to , submultiplicativity supplying subadditivity. Oseledets' multiplicative ergodic theorem refines this to a full Lyapunov spectrum with an associated measurable filtration [Furstenberg-Kesten 1960].

Theorem 6 (longest increasing subsequence and first-passage percolation). For a uniform random permutation of , the length of the longest increasing subsequence satisfies a.s.; the subadditive structure (after Hammersley's Poissonised model) yields existence of the constant via Kingman, and the value comes from the later Vershik-Kerov / Logan-Shepp analysis. In first-passage percolation on with i.i.d. non-negative edge weights, the passage time satisfies a.s. for a deterministic norm-like time constant , again by subadditivity. These are the canonical combinatorial applications of Kingman's theorem [Hammersley-Welsh 1965].

Synthesis. The four ergodic theorems are one circle of ideas seen at different resolutions, and the foundational reason they cohere is that each identifies a limit of averages with a projection onto the invariant -algebra. The mean theorem is the projection obtained from the orthogonal splitting ; the pointwise theorem is exactly the a.s. upgrade of that same projection, the maximal ergodic lemma supplying the control that geometry cannot. This is precisely the conditional-expectation and uniform-integrability machinery of 37.04.03 read dynamically, and it is dual to the martingale convergence theorem, where the limiting -algebra is a filtration tail rather than an invariant field. Putting these together, the i.i.d. strong law 37.02.02 is the ergodic special case in which the Kolmogorov zero-one law 37.02.01 makes -degenerate, so the central insight — independence is one mechanism producing a degenerate invariant field — generalises to stationarity-plus-ergodicity. The subadditive theorem generalises the additive Birkhoff statement to families that only sub-add, and this single generalisation is what unlocks Lyapunov exponents, longest increasing subsequences, and first-passage percolation: the bridge is that submultiplicativity of norms becomes subadditivity of logs, and Kingman converts it into an a.s. growth rate equal to the inf-formula .

Full proof set Master

Proposition 1 (Fekete's lemma underlies the time constant). If is a real sequence with for all , then .

Proof. Let . Fix and choose with . For write with . Subadditivity gives (with ), so As the last term vanishes, so . Since is arbitrary, , giving equality.

Proposition 2 (the maximal ergodic lemma yields the Hardy-Littlewood maximal bound). With and , for every ,

Proof. Apply the maximal ergodic lemma to on the invariant-restricted set . The event means some average , equivalently some partial sum , so . The lemma gives for each ; letting and using monotone convergence on the increasing sets, , i.e. . Dividing by and bounding the indicator by gives the weak-type bound.

Proposition 3 (ergodicity is equivalent to a mixing-free averaging criterion). A measure-preserving system is ergodic if and only if for all ,

Proof. Suppose is ergodic. Apply Birkhoff to : a.s. and in . Multiply by and take expectations, using bounded convergence: , and . Conversely, suppose the averaging criterion holds and let be invariant, . Take : then for all , so the left side is constantly , while the right side is ; hence , forcing . So every invariant set is -degenerate and is ergodic.

Proposition 4 (subadditive limit has the stated mean). Under the hypotheses of Kingman's theorem, the a.s. limit satisfies .

Proof. By Proposition 1 applied to (subadditive in by the cocycle relation and ), . Kingman's theorem asserts in as well as a.s.; convergence implies convergence of means, so . Invariance of follows from and the cocycle relation with , giving a.s. after dividing by and letting .

Connections Master

  • The strong law of large numbers 37.02.02 is the ergodic special case of Birkhoff's theorem: realise the i.i.d. sequence as coordinates on a product space, take the shift and the zeroth coordinate, and the Kolmogorov zero-one law makes the invariant -algebra -degenerate so that collapses to . This unit is the stationary-sequence generalisation that this dependency makes precise.

  • The Borel-Cantelli lemmas and Kolmogorov zero-one law 37.02.01 supply the reason i.i.d. shifts are ergodic: the invariant -algebra sits inside the tail field, which the zero-one law makes -degenerate. Ergodicity is thus the dynamical face of the zero-one law, and the non-ergodic case is what happens when the tail field is replaced by a rich invariant field.

  • Doob's inequalities, uniform integrability, and convergence 37.04.03 furnish the half of every ergodic theorem here: the Birkhoff and Kingman limits hold in because the Cesàro averages of an identically-distributed family are uniformly integrable, and the maximal ergodic lemma is the additive-cocycle analogue of Doob's maximal inequality for martingales.

  • The and Hilbert-space theory 02.07.06 is the arena for von Neumann's mean theorem: the Koopman operator is an isometry, the invariant functions form a closed subspace, and the orthogonal decomposition is the Riesz-Fischer completeness of at work.

  • The ergodic theorem for Markov chains 37.05.07 is the Markovian instance: a positive-recurrent irreducible chain is a stationary ergodic sequence under its invariant law, and its time-average theorem is Birkhoff specialised to the shift on path space, with the excursion decomposition replacing the abstract maximal lemma.

The dynamical-systems treatment of ergodic theory — mixing, weak mixing, entropy (Kolmogorov-Sinai), the isomorphism theory of Ornstein, and smooth ergodic theory (Pesin theory, SRB measures) — is a separate forthcoming curriculum spine. This unit deliberately keeps its emphasis on the probabilistic limit theorems for stationary sequences: Birkhoff, von Neumann, and Kingman as the stationary-sequence generalisation of the strong law of large numbers, rather than on the structural dynamics that those theorems also serve.

Historical & philosophical context Master

The ergodic theorems arose from a foundational problem in statistical mechanics: Boltzmann's ergodic hypothesis of the 1870s posited that a mechanical system's long-time average along a single trajectory equals its average over the constant-energy surface, justifying the replacement of intractable time-averages by computable phase-space averages. The hypothesis as literally stated — that a single orbit visits every point of the energy surface — is false on dimensional grounds, and the corrected quasi-ergodic hypothesis (the orbit is dense) was still too weak to license the averaging. The resolution came in 1931-1932 in two papers in the Proceedings of the National Academy of Sciences. John von Neumann's Proof of the quasi-ergodic hypothesis [von Neumann 1932] established the mean () ergodic theorem first, using the spectral theory of unitary operators he had just developed; George Birkhoff's Proof of the ergodic theorem [Birkhoff 1931] then proved the stronger pointwise statement. Birkhoff's paper appeared in print before von Neumann's despite von Neumann having the result earlier, a priority episode documented in the correspondence reproduced by later historians of the period.

The decisive conceptual move was the recognition, sharpened over the following two decades by Khinchin, Hopf, and Kolmogorov, that the right hypothesis is not a property of individual orbits but the measure-theoretic degeneracy of the invariant -algebra. This reframed ergodicity as an indecomposability condition on an invariant measure and tied it directly to the law of large numbers: i.i.d. sequences are exactly the ergodic stationary sequences whose invariant field is -degenerate by the zero-one law.

John Kingman's 1968 Journal of the Royal Statistical Society paper [Kingman 1968] introduced the subadditive ergodic theorem, motivated by problems where the natural quantities only sub-add rather than add — products of random matrices studied by Furstenberg and Kesten in 1960 [Furstenberg-Kesten 1960], and the first-passage percolation and subadditive-process framework of Hammersley and Welsh in 1965 [Hammersley-Welsh 1965]. Kingman's theorem made the existence of Lyapunov exponents, time constants in percolation, and the longest-increasing-subsequence growth rate immediate consequences of a single subadditivity principle, and Steele and others subsequently gave shorter proofs that became the textbook standard.

Bibliography Master

@article{Birkhoff1931,
  author  = {Birkhoff, George D.},
  title   = {Proof of the ergodic theorem},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {17},
  number  = {12},
  year    = {1931},
  pages   = {656--660}
}

@article{vonNeumann1932,
  author  = {von Neumann, John},
  title   = {Proof of the quasi-ergodic hypothesis},
  journal = {Proceedings of the National Academy of Sciences},
  volume  = {18},
  number  = {1},
  year    = {1932},
  pages   = {70--82}
}

@article{Kingman1968,
  author  = {Kingman, John F. C.},
  title   = {The ergodic theory of subadditive stochastic processes},
  journal = {Journal of the Royal Statistical Society, Series B},
  volume  = {30},
  number  = {3},
  year    = {1968},
  pages   = {499--510}
}

@article{FurstenbergKesten1960,
  author  = {Furstenberg, Harry and Kesten, Harry},
  title   = {Products of random matrices},
  journal = {Annals of Mathematical Statistics},
  volume  = {31},
  number  = {2},
  year    = {1960},
  pages   = {457--469}
}

@incollection{HammersleyWelsh1965,
  author    = {Hammersley, John M. and Welsh, Dominic J. A.},
  title     = {First-passage percolation, subadditive processes, stochastic networks, and generalized renewal theory},
  booktitle = {Bernoulli-Bayes-Laplace Anniversary Volume},
  publisher = {Springer},
  year      = {1965},
  pages     = {61--110}
}

@book{Walters1982,
  author    = {Walters, Peter},
  title     = {An Introduction to Ergodic Theory},
  publisher = {Springer},
  series    = {Graduate Texts in Mathematics},
  volume    = {79},
  year      = {1982}
}

@book{Krengel1985,
  author    = {Krengel, Ulrich},
  title     = {Ergodic Theorems},
  publisher = {de Gruyter},
  year      = {1985}
}

@book{Steele1997,
  author    = {Steele, J. Michael},
  title     = {Probability Theory and Combinatorial Optimization},
  publisher = {SIAM},
  series    = {CBMS-NSF Regional Conference Series},
  year      = {1997}
}

@book{Durrett2019,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{Kallenberg2002,
  author    = {Kallenberg, Olav},
  title     = {Foundations of Modern Probability},
  edition   = {2},
  publisher = {Springer},
  year      = {2002}
}