The Mixing Hierarchy: Mixing, Weak Mixing, and Ergodicity
Anchor (Master): Walters 1982 *An Introduction to Ergodic Theory* (Springer GTM 79) Ch. 1-2 (the hierarchy, weak mixing, spectral characterisations); Cornfeld-Fomin-Sinai 1982 *Ergodic Theory* (Springer Grundlehren 245) Ch. 4 (spectral theory, mixing of toral automorphisms and Bernoulli shifts); Halmos 1956 *Lectures on Ergodic Theory* (Chelsea) (mixing, weak mixing, the genericity of weak mixing); Glasner 2003 *Ergodic Theory via Joinings* (AMS) Ch. 6-7 (weak mixing via T×T and disjointness)
Intuition Beginner
Ergodicity, from the previous chapter, says a single orbit eventually visits every region in correct proportion — but it says nothing about how the visits are spread out in time. A rule can be ergodic and still march in lockstep: rotate a circle by a fixed irrational step and the orbit fills the circle evenly, yet two nearby starting points stay exactly the same distance apart forever. The rule rearranges the circle but never scrambles it. Mixing is the stronger demand that the rule genuinely scrambles — that after enough steps the system forgets where it started.
Here is the everyday picture. Pour a drop of cream into coffee and stir. At first the cream is a tight blob; after a few stirs it streaks; after many stirs it is spread so uniformly that every spoonful of coffee holds the same tiny fraction of cream, no matter which spoonful you take. Mixing is exactly this: take any region and any region ; follow forward many steps; the fraction of the moved- that now sits inside approaches the size of times the size of — as if had been smeared evenly across the whole space and simply meets in proportion to 's size.
Between "fills evenly" (ergodic) and "scrambles completely" (mixing) sits a middle condition called weak mixing: the system scrambles eventually, but is allowed rare bad moments when it briefly lines up again — as long as those bad moments are vanishingly sparse in the long run. So three conditions stack up: mixing is the strongest, weak mixing is in the middle, and ergodicity is the weakest. Each is strictly stronger than the next: there are systems that are ergodic but not weakly mixing, and systems that are weakly mixing but not mixing.
The takeaway: ergodicity guarantees even visiting; mixing guarantees the system forgets its starting point, so that distant-in-time observations become statistically independent; weak mixing is the in-between condition that forgets the past except for a negligibly thin set of exceptional times. This three-step ladder — mixing, then weak mixing, then ergodic — is the mixing hierarchy, and it is the backbone of how we measure how thoroughly a dynamical rule disorders the space.
Visual Beginner
Picture a square blob of dye dropped into a fluid that is being stirred by a fixed rule. Track how much of the dye overlaps a fixed test patch as the stirring proceeds.
The top row shows a rotation merely sliding the blob, so its overlap with keeps oscillating — even visiting, never scrambling. The middle row shows mixing stretching the blob into fine filaments until it coats the square evenly. The graph contrasts all three: the rotation never settles, weak mixing settles except for ever-sparser spikes, and mixing settles down for good.
Worked example Beginner
We compare a rotation (ergodic, not mixing) with the doubling-style scrambling on a small finite model, watching whether overlaps settle.
Step 1. A non-mixing rule. Take the circle as the numbers from up to , wrapping. The rule adds and wraps. Starting blob is the arc from to (half the circle). Apply the rule: moves to the arc from to . Apply again: it moves back to to . So the moved blob alternates between the two halves forever — it never spreads out.
Step 2. Test the overlap. Let be the arc from to . The size of is and the size of is , so mixing would require the overlap fraction to settle near . But the actual overlap of moved- with is the full at even steps and at odd steps. It bounces between and and never approaches . This rule is not mixing.
Step 3. A scrambling rule. Now use the doubling rule: replace by and wrap. Take to be the small arc from to (size ). One step doubles it to the arc to ; the next step doubles that to the whole circle, but because of the wrap it arrives as two evenly spaced copies; after a few more steps the image is many thin evenly spaced strips covering the circle uniformly.
Step 4. Test the overlap again. Let be any arc of size . Once the image of has become many thin evenly spaced strips, the fraction of those strips landing in is just the size of , namely , times the total size of , namely , giving overlap — and it stays there. The overlap settles to size-of- times size-of-.
What this tells us: the rotation merely relocates a blob, so overlaps oscillate and never settle — ergodic but not mixing. The doubling rule shreds a blob into evenly spread strips, so overlaps settle to the product of the two sizes — that settling is mixing. The difference between relocating and shredding is the whole content of the mixing hierarchy.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, is a measure-preserving system on a probability space in the sense of 38.04.01, with Koopman operator acting as an isometry of 02.11.08. Write for the orthocomplement of the constants. The correlation of at lag is .
Definition (strong mixing). The system is (strongly) mixing if for all , Equivalently (by an approximation argument), for all ; on this reads — the correlations decay to zero.
Definition (weak mixing). The system is weakly mixing if for all ,
The absolute value inside the Cesàro average distinguishes this from ergodicity, where (by 38.04.02, characterisation (4)) only the signed average is required.
Definition (eigenfunction; continuous spectrum). A function , , is an eigenfunction of with eigenvalue if , i.e. a.e.; necessarily since is an isometry. The constants are always eigenfunctions with . The system has continuous spectrum on if has no eigenfunction in — the only eigenfunctions are the constants.
The hierarchy. The three conditions are nested: The first implication is immediate: termwise convergence forces (Cesàro means of a null sequence vanish), applied to . The second follows because the signed Cesàro average is dominated by the absolute one, recovering ergodicity characterisation (4). Both implications are strict (canonical examples below).
Canonical examples. (i) Irrational rotation : ergodic but not weakly mixing — is an eigenfunction, , a nonconstant eigenfunction. (ii) Doubling map and every Bernoulli shift: mixing for the natural measure. (iii) Hyperbolic toral automorphism , with no eigenvalue on the unit circle (the cat map \begin{psmallmatrix}2&1\\1&1\end{psmallmatrix} for ): mixing. (iv) Chacón's map and rank-one rigid constructions: weakly mixing but not mixing.
Counterexamples to common slips Intermediate+
Ergodic is not weakly mixing. The signed Cesàro average vanishing (ergodicity) is genuinely weaker than the absolute Cesàro average vanishing (weak mixing). The irrational rotation has ergodic equidistribution but its correlations oscillate without their absolute averages decaying, because is a surviving eigenfunction.
Weak mixing is not mixing. Weak mixing only kills correlations along a density-one set of times; a thin set of exceptional times where correlations stay large is permitted. Rank-one constructions (Chacón) exploit rigidity — there is a sequence with strongly — so , not ; mixing fails while weak mixing holds.
Mixing of all orders is a separate, stronger ladder. The condition defined here is mixing of order two. Whether two-fold mixing implies higher-order mixing (Rokhlin's problem) is open in general; do not assume the term "mixing" includes all orders unless stated.
Eigenvalues lie on the unit circle, never inside. Since is an isometry, for every eigenvalue; an apparent eigenvalue with signals a computational error. Weak mixing is the statement that is the only eigenvalue and it is simple.
Weak mixing is a property of , not of alone. The clean characterisation is that is weakly mixing iff is ergodic; but ergodic is strictly stronger than ergodic, and it is exactly this gap that weak mixing fills. Checking ergodicity of does not detect weak mixing.
Key theorem with proof Intermediate+
Theorem (Koopman-von Neumann; the equivalence of weak-mixing characterisations). For a measure-preserving system the following are equivalent:
- is weakly mixing: for all .
- For all , .
- has no eigenfunction in : the only eigenfunctions of are the constants (continuous spectrum on ).
- is ergodic on .
The proof rests on the Koopman-von Neumann lemma: a bounded sequence of nonnegative reals satisfies if and only if there is a set of density one (meaning ) with as along ; and this holds iff [Koopman-von Neumann 1932].
Proof. We prove the lemma, then close the loop and .
The lemma. Suppose . For each integer the set has upper density zero, since by Markov's inequality. Choose an increasing sequence with for all , and set . Then has density one, and for with one has , so along . Conversely, if along a density-one set and , then for any , and both terms vanish in the limit. The equivalence with follows by applying the density-one criterion to both and (the same works since along iff along , using boundedness).
. Assume has no nonconstant eigenfunction. It suffices to treat (subtract the means; the constant parts contribute exactly ). On we must show . The correlation sequence is positive-definite, so by the Herglotz-Bochner theorem there is a finite positive measure (the spectral measure of ) on the circle with . By the lemma it is enough to show . Compute The inner average tends to pointwise and is bounded by , so by dominated convergence the limit is , the sum of squares of the atoms of . An atom of at corresponds to an eigenfunction of with eigenvalue (the projection onto the -eigenspace is nonzero); since and there are no nonconstant eigenfunctions, is non-atomic and the limit is . Polarisation extends from to , giving (2).
is the specialisation , , since and .
. Suppose with , ; we derive a contradiction. The function then satisfies , so is -invariant; weak mixing implies ergodicity, so is a.e. a nonzero constant . Normalise . For over an arc , the correlation is governed by , whose modulus is for every ; then , contradicting (2)(1) applied to . Hence no nonconstant eigenfunction exists.
. The Koopman operator of on is , with eigenvalues the products of eigenvalues of . is ergodic iff is a simple eigenvalue of (by 38.04.02, characterisation (3)). If has a nonconstant eigenfunction with eigenvalue , then is a nonconstant eigenfunction of with eigenvalue , so is non-ergodic. Conversely, if has only constant eigenfunctions, the spectral measures on are non-atomic, the same atom-counting computation shows is simple for , and is ergodic.
Bridge. This theorem builds toward the spectral viewpoint of 38.05.02, where the absence of eigenfunctions becomes the statement that the maximal spectral type of on is continuous, and it appears again in the Halmos-von Neumann classification, which is exactly the opposite extreme — pure point spectrum identifies the system with a group rotation. The foundational reason the equivalences hold is that a single object, the spectral measure furnished by Herglotz-Bochner, encodes the whole correlation sequence, and weak mixing is precisely the vanishing of its atoms; this is exactly the Koopman-von Neumann dichotomy that a Cesàro-null nonnegative sequence is null along density one. The central insight is that weak mixing is the dynamical face of continuous spectrum, the way ergodicity is the dynamical face of a simple eigenvalue — so weak mixing generalises ergodicity from "the eigenvalue is simple" to " is the only eigenvalue." Putting these together, the criterion is dual to the eigenfunction criterion, because an eigenfunction of and its conjugate manufacture an invariant function on the product, and the bridge is that self-joinings detect exactly the rigidity that a thin set of correlation-resonant times can hide.
Exercises Intermediate+
Advanced results Master
Theorem 1 (the strict hierarchy). For measure-preserving systems, (strong) mixing weak mixing ergodic, and both implications are strict. Strictness of the first is witnessed by rank-one rigid systems (Chacón's transformation): there is with strongly, so , defeating mixing, while the system is weakly mixing. Strictness of the second is witnessed by the irrational rotation, ergodic with the nonconstant eigenfunction obstructing weak mixing [Halmos 1956].
Theorem 2 (spectral characterisation of the hierarchy). Let denote the maximal spectral type of on . Then: is ergodic iff restricted to has no eigenvalue (equivalently is a simple eigenvalue of on ); is weakly mixing iff is continuous (no atoms) on , i.e. has no eigenfunction off the constants; is mixing iff is a Rajchman measure, as . Since every Rajchman measure is continuous but not conversely (there exist continuous measures, e.g. certain Riesz products, with non-vanishing Fourier coefficients along a subsequence), the spectral picture reproduces the strict hierarchy at the level of the spectral measure [Cornfeld-Fomin-Sinai 1982].
Theorem 3 (mixing of algebraic and Bernoulli systems). Bernoulli shifts are mixing of all orders and are the strongest models in the hierarchy (Kolmogorov / K-systems sit just below Bernoulli). A toral automorphism is mixing for Haar measure iff it is ergodic iff no eigenvalue of is a root of unity; the hyperbolic ones (no eigenvalue on the unit circle, the cat map being the prototype) are Bernoulli by the Adler-Weiss and Sinai theory of Markov partitions, hence mixing of all orders. The character computation exhibits the mixing as the escape of every nonzero frequency to infinity under [Walters 1982].
Theorem 4 (genericity: weak mixing is generic, mixing is meagre; Rokhlin, Halmos). In the group of invertible measure-preserving transformations of a Lebesgue probability space, equipped with the weak topology, the weakly mixing transformations form a dense (residual) set, while the mixing transformations form a meagre (first-category) set and the ergodic transformations a dense . Thus the "typical" measure-preserving transformation is weakly mixing but not mixing — the middle rung of the hierarchy is the generic one. The proof uses the Kakutani-Rokhlin tower approximation of 38.04.01 to perturb any transformation into one with prescribed weak-mixing behaviour while showing the mixing condition is a countable intersection of nowhere-dense conditions [Rokhlin 1948].
Theorem 5 (weak mixing as disjointness from rotations; the Koopman-von Neumann dichotomy). A system is weakly mixing iff it is disjoint (in Furstenberg's sense) from every ergodic rotation, equivalently iff it has no nontrivial factor with discrete spectrum. The Koopman-von Neumann decomposition splits into the closed span of eigenfunctions (the Kronecker factor, a group rotation by Halmos-von Neumann) and its complement, on which the action is weakly mixing relative to the factor; weak mixing of the whole system is precisely the vanishing of the Kronecker factor. This relative decomposition is the first step of the Furstenberg structure theorem underlying the ergodic-theoretic proof of Szemerédi's theorem [Koopman-von Neumann 1932].
Synthesis. The five results are one statement read at successive depths, and the foundational reason they cohere is that the spectral measure of the Koopman operator on is a single invariant whose increasing regularity — having the eigenvalue absent, then being continuous, then being Rajchman — is exactly the ascent through ergodic, weakly mixing, and mixing. This is the central insight that the dynamical hierarchy and the spectral hierarchy are the same hierarchy: an atom of at is an eigenfunction, a continuous is the death of all eigenfunctions, and a Rajchman is the termwise decay of correlations. Weak mixing is dual to discrete spectrum — the Kronecker factor of Theorem 5 is the maximal rotation quotient, and weak mixing is exactly its collapse — so the generic-transformation result is the foundational reason the subject is built around weak mixing rather than mixing: putting these together, the typical system has continuous but non-Rajchman spectrum. This is exactly the Koopman-von Neumann dichotomy made structural, and it generalises from a single transformation to the relative theory over a factor, where weak mixing becomes weak mixing relative to the Kronecker factor and the bridge is Furstenberg's tower of such extensions, which carries the whole apparatus forward into additive combinatorics and the spectral theory of 38.05.02.
Full proof set Master
Proposition 1 (strong mixing implies weak mixing implies ergodic). For any measure-preserving system, mixing weak mixing ergodic.
Proof. Set . If is mixing then , so and its Cesàro means vanish, giving weak mixing. If is weakly mixing then , hence , which is characterisation (4) of ergodicity from 38.04.02.
Proposition 2 (Koopman-von Neumann lemma). A bounded sequence has iff there is of density one with along , iff .
Proof. If the Cesàro mean vanishes, the sets have density zero by Markov's inequality ; a diagonal choice of thresholds assembles a density-one with along . Conversely a density-one with along it forces . The square equivalence is Cauchy-Schwarz one way () and the other.
Proposition 3 (weak mixing no nonconstant eigenfunction). is weakly mixing iff has no eigenfunction in .
Proof. () For the spectral measure (Herglotz-Bochner) satisfies . No eigenfunction means non-atomic, so the limit is ; the Koopman-von Neumann lemma upgrades this to , and polarisation gives the same for , hence weak mixing. () An eigenfunction in gives constant, so the absolute Cesàro average is bounded below by , contradicting weak mixing.
Proposition 4 (weak mixing ergodic). is weakly mixing iff is ergodic on .
Proof. The product Koopman operator is and is ergodic iff its only invariant functions are constants. If with nonconstant in , then is a nonconstant -invariant function, so is non-ergodic. Conversely, if has no nonconstant eigenfunction, decompose any product-invariant along , , , ; ergodicity of kills the mixed-constant parts, and on the joint spectral measure has no mass on (both factors non-atomic), so no nonzero invariant vector survives. Thus is constant and is ergodic.
Proposition 5 (Bernoulli and hyperbolic toral systems are mixing). Bernoulli shifts and hyperbolic toral automorphisms are mixing.
Proof. For the Bernoulli shift on with product measure, cylinder sets depending on coordinates in become independent under the shift once : exactly for , since the supports of and then occupy disjoint coordinate blocks and the measure is a product. Cylinders are dense in , so mixing follows for all . For the hyperbolic toral automorphism, on characters and hyperbolicity sends for every nonzero , so for large ; density of trigonometric polynomials extends correlation decay to all of .
Connections Master
Ergodicity, unique ergodicity, and equidistribution
38.04.02is the floor of this hierarchy: ergodicity is the demand that the invariant -algebra be degenerate, equivalently that be a simple eigenvalue of , and weak mixing strengthens this from "the eigenvalue is simple" to " is the only eigenvalue." The signed-Cesàro correlation criterion (4) for ergodicity proved there is exactly what the absolute-Cesàro criterion here sharpens, and the irrational rotation that is uniquely ergodic there is the canonical ergodic-not-weak-mixing example here.The ergodic theorems of Birkhoff, von Neumann, and Kingman
37.02.03supply the analytic backbone: von Neumann's mean ergodic theorem is the projection onto the eigenvalue- eigenspace, and the Koopman-von Neumann lemma used here is the weak-mixing refinement that controls Cesàro averages of correlations along density-one sets. The pointwise theory underlies the empirical-measure arguments that detect rigidity in the weak-but-not-strong-mixing examples.Hilbert space and the spectral theorem
02.11.08provide the operator-theoretic stage: the Koopman operator is an isometry (unitary in the invertible case) of , its eigenfunctions are the unit-modulus point spectrum, and the Herglotz-Bochner representation of the positive-definite correlation sequence furnishes the spectral measure whose atoms-versus-continuity dichotomy is the whole content of the weak-mixing characterisation.The spectral theory of dynamical systems
38.05.02is the direct successor: it develops the maximal spectral type, multiplicity, and the Halmos-von Neumann classification of discrete-spectrum systems as group rotations, recasting the mixing hierarchy proved here as the regularity ladder (point, continuous, Rajchman) of the spectral measure, and isolating the Kronecker factor whose vanishing is weak mixing.Hyperbolic sets and Anosov systems
38.03.01supply the geometric source of mixing: hyperbolic toral automorphisms and Anosov diffeomorphisms are mixing (indeed Bernoulli) for their natural smooth invariant measures, the exponential expansion-contraction producing the correlation decay, and Markov partitions code them as Bernoulli shifts, tying the smooth and symbolic faces of mixing together.
Historical & philosophical context Master
The mixing condition entered ergodic theory through statistical mechanics and the work of Eberhard Hopf, whose 1937 Ergodentheorie [Hopf 1937] established mixing for the geodesic flow on surfaces of negative curvature by the argument now called Hopf's method; mixing formalised Gibbs's heuristic that a stirred fluid forgets its initial coarse-grained state. The spectral viewpoint was opened by Bernard Koopman's 1931 observation that composition with a measure-preserving map is a unitary operator, and decisively by the joint 1932 Proceedings of the National Academy of Sciences note of Koopman and von Neumann [Koopman-von Neumann 1932], which introduced continuous spectrum as the operator-theoretic signature of thorough mixing and proved the density-one averaging lemma that bears their names. The recognition that weak mixing — continuous spectrum off the constants — is the precise dividing line between rotation-like (discrete spectrum) and genuinely scrambling behaviour crystallised in this circle.
The strictness of the hierarchy and the surprising genericity of its middle rung were settled in the 1940s and 1950s. Rokhlin's 1948 note [Rokhlin 1948], titled to announce that a general measure-preserving transformation is not mixing, showed via the tower approximation that mixing is exceptional, and Halmos's 1956 Lectures on Ergodic Theory [Halmos 1956] assembled the category-theoretic picture in which weak mixing and ergodicity are residual while mixing is meagre. Explicit weakly-but-not-strongly mixing systems were harder to exhibit by hand; Chacón's 1969 rank-one construction gave a concrete example, and the rigidity phenomenon it exploits — recurrence of the Koopman powers to the identity along a sparse sequence — became a central theme. The relative form of weak mixing, splitting off the Kronecker factor, was the structural innovation that Furstenberg turned into the ergodic-theoretic proof of Szemerédi's theorem in 1977, carrying the Koopman-von Neumann dichotomy from a classification tool into a generative principle of additive combinatorics.
Bibliography Master
@article{KoopmanvonNeumann1932,
author = {Koopman, Bernard O. and von Neumann, John},
title = {Dynamical systems of continuous spectra},
journal = {Proceedings of the National Academy of Sciences},
volume = {18},
number = {3},
year = {1932},
pages = {255--263}
}
@book{Hopf1937,
author = {Hopf, Eberhard},
title = {Ergodentheorie},
publisher = {Springer},
series = {Ergebnisse der Mathematik und ihrer Grenzgebiete},
volume = {5},
year = {1937}
}
@article{Rokhlin1948,
author = {Rokhlin, Vladimir A.},
title = {A general measure-preserving transformation is not mixing},
journal = {Doklady Akademii Nauk SSSR},
volume = {60},
year = {1948},
pages = {349--351}
}
@book{Halmos1956,
author = {Halmos, Paul R.},
title = {Lectures on Ergodic Theory},
publisher = {Chelsea Publishing Company},
year = {1956}
}
@article{Chacon1969,
author = {Chac\'on, Rafael V.},
title = {Weakly mixing transformations which are not strongly mixing},
journal = {Proceedings of the American Mathematical Society},
volume = {22},
number = {3},
year = {1969},
pages = {559--562}
}
@article{Furstenberg1977,
author = {Furstenberg, Harry},
title = {Ergodic behavior of diagonal measures and a theorem of Szemer\'edi on arithmetic progressions},
journal = {Journal d'Analyse Math\'ematique},
volume = {31},
year = {1977},
pages = {204--256}
}
@book{Walters1982,
author = {Walters, Peter},
title = {An Introduction to Ergodic Theory},
publisher = {Springer},
series = {Graduate Texts in Mathematics},
volume = {79},
year = {1982}
}
@book{CornfeldFominSinai1982,
author = {Cornfeld, Isaac P. and Fomin, Sergei V. and Sinai, Yakov G.},
title = {Ergodic Theory},
publisher = {Springer},
series = {Grundlehren der mathematischen Wissenschaften},
volume = {245},
year = {1982}
}
@book{Petersen1983,
author = {Petersen, Karl},
title = {Ergodic Theory},
publisher = {Cambridge University Press},
year = {1983}
}
@book{Glasner2003,
author = {Glasner, Eli},
title = {Ergodic Theory via Joinings},
publisher = {American Mathematical Society},
series = {Mathematical Surveys and Monographs},
volume = {101},
year = {2003}
}