Convergence to Equilibrium via Coupling
Anchor (Master): Norris 1997 *Markov Chains* (Cambridge) §1.8-1.10; Levin-Peres 2017 *Markov Chains and Mixing Times* 2e Ch. 4-5, §12.1; Lindvall 2002 *Lectures on the Coupling Method* 2e Ch. I-II; Aldous 1983 *Random walks on finite groups and rapidly mixing Markov chains* (Séminaire de Probabilités XVII)
Intuition Beginner
A Markov chain that is irreducible and aperiodic settles down. No matter where you start it, after enough steps the chance of finding it in any given state stops depending on the starting point and locks onto a single fixed pattern — the equilibrium distribution. Start the weather chain in a sunny state or a rainy state, and a month later the odds of sun are the same either way. The starting point is forgotten. This unit explains why that forgetting happens, using one strikingly visual idea called coupling.
Here is the picture. Run two copies of the same chain side by side. The first starts wherever you like; the second starts already in equilibrium, so it stays in equilibrium at every step. Let the two move independently until the first moment they land on the same state at the same time. Call that the meeting. From the meeting onward, make them move together in lockstep, sharing every future step. Because the second copy was always in equilibrium, the first is now in equilibrium too, since after meeting they are identical.
Why must they meet? Because we ran two independent copies of a well-mixing chain, the pair itself behaves like a single bigger chain that is sure to revisit the diagonal where the two coordinates agree. Aperiodicity is exactly what stops the two copies from forever marching out of step, like two people circling a track who never quite line up. With aperiodicity the lockstep is possible, the meeting is certain, and once it happens the starting point is erased.
The distance between where the chain actually is and where equilibrium says it should be can only shrink, and it shrinks at least as fast as the chance that the two copies have not yet met. So the whole convergence is controlled by one quantity: how long it takes the two copies to meet. That single quantity is also the seed of mixing time, the count of steps a shuffled deck or a randomized algorithm needs before it is "random enough."
The takeaway: an irreducible aperiodic chain forgets its start because you can couple it to a copy already at equilibrium; the two are sure to meet, after which they agree forever, and the speed of forgetting is the speed of meeting.
Visual Beginner
Picture two walkers on the same set of states, moving independently until they collide and then sticking together.
Top and bottom: walker starts at your chosen state; walker starts already in equilibrium. They step independently — separate random choices — so their paths wander apart and together. At the starred column they land on the same state at the same time: the meeting. After the star they share every move, so they are identical forever. Right: the gauge measuring how far the actual chain is from equilibrium can never exceed the chance the walkers have not yet met, so as meeting becomes near-certain the gauge falls to zero.
Worked example Beginner
Take the two-state chain on that, from either state, stays put with probability and switches with probability . This chain is irreducible (you can get from each state to the other) and aperiodic (the self-stay gives a length-one return, so the beat is one). Its equilibrium is the fair coin: each state has probability in the long run. We start the chain at state and watch it forget.
Step 1. Write the one-step odds from the start. Starting at , after one step the chain is at with probability and at with probability . Already after a single step the distribution is exactly the fair coin — for this very symmetric chain, one step suffices.
Step 2. Set up the coupling. Run a second copy started from the fair coin, and let start at . At each step flip the two copies independently. The meeting happens the first time and sit on the same state.
Step 3. Find the chance they have not met after one step. starts at ; starts at or each with probability . Work out the probability that after one independent step the two are on different states. A direct count of the four equally likely combinations of the two independent moves gives a probability that they differ after one step, and the chance of still differing keeps halving: after steps it is .
Step 4. Read off the convergence speed. The distance from equilibrium is at most the chance of not having met, which is . So the gap between the actual distribution and the fair coin shrinks by a factor of two every step, vanishing quickly.
What this tells us: even before computing the exact distribution, the coupling gives a clean upper bound on how far the chain is from equilibrium — here — purely by tracking when two independent copies first agree. The starting state is forgotten geometrically fast, and the rate is set entirely by the meeting time.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, is a time-homogeneous Markov chain on a countable state space with stochastic transition matrix and -step matrix , in the sense of 37.05.01; communication, irreducibility, and period are as in 37.05.02, and the strong Markov property and first-return time are as in 37.05.04. We write for the law of the chain with initial distribution , and for the row-vector , the distribution of under . The invariant distribution (the unique probability vector with , given by on a positive-recurrent irreducible chain) is constructed in the co-produced unit on stationary distributions; here is taken as given and the question is whether .
Definition (total-variation distance). For probability distributions on the countable set , the total-variation distance is The supremum is attained at . Total variation lies in , equals iff , and equals iff have disjoint supports.
Definition (coupling). A coupling of and is a pair of random variables on a common probability space with and (the joint law is unconstrained beyond its marginals). A coupling of two Markov chains with the same matrix is a process on such that each coordinate, viewed alone, is a Markov chain with matrix (possibly from different initial laws).
Definition (coupling time / coalescence). Given a coupling , the coupling time is The coupling is coalescent (or sticky) if for all : once the coordinates agree they stay equal forever.
Definition (the independent coupling and the coalescent modification). The independent coupling runs and as two independent chains with matrix ; its law on is that of the product chain with transition matrix . The associated coalescent coupling follows the product chain until and thereafter sets ; each coordinate is still a -chain, and the coupling is coalescent.
Definition (convergence to equilibrium). The chain converges to equilibrium if as for every initial distribution ; equivalently for all .
Counterexamples to common slips Intermediate+
Aperiodicity is needed, not just irreducibility. The two-cycle deterministically (period ) is irreducible and positive recurrent with , yet alternates between and and never converges. The independent product chain stays forever on the off-diagonal when the two copies start out of phase: they never meet. Aperiodicity is exactly what makes the product chain irreducible and the meeting certain.
The reference copy must start in equilibrium, not at a convenient state. Coupling a chain to a copy started at a fixed state bounds the distance between two runs of the chain, not the distance to . To bound one coordinate must be -distributed throughout, which requires .
The coupling inequality is an inequality, not an identity, for an arbitrary coupling. Only the maximal coupling achieves . Any coupling gives ; a clever coupling makes the bound small, but a poor one need not be tight.
Coalescence must be built in by hand. Two independent copies that happen to be equal at time can separate at time . The coalescent modification (freeze them together after first meeting) is what turns "agree once" into "agree forever," and it is legitimate because the frozen coordinate is still a -chain.
Key theorem with proof Intermediate+
Theorem (convergence to equilibrium via coupling). Let be irreducible and aperiodic on a countable state space , with invariant distribution (so the chain is positive recurrent). Then for every initial distribution , and in particular for all . Moreover, for the coalescent coupling started from with coupling time ,
Proof. Coupling inequality. Let be any coupling with , , each coordinate a -chain. Then and (invariance). For any , because on the two events and coincide and cancel. Taking the supremum over gives .
Coalescent coupling. Run the independent product chain on with matrix , started from , independently, and let be the first hit of the diagonal . Define the coalescent coupling by for and for . By the strong Markov property of 37.05.04 applied to at the stopping time , swapping the two coordinates after does not change the marginal law of either coordinate: is still a -chain started from , hence . Since for , we have , so the coupling inequality upgrades to
The product chain is irreducible and recurrent. The product matrix is irreducible on : given and , aperiodicity and irreducibility of give, by the eventual-positivity lemma of 37.05.02, an with and simultaneously for all , whence . (This is the one step that uses aperiodicity: without it the two coordinates could be forced onto incompatible periodic clocks and never align.) The product chain has invariant distribution (the product of two invariant distributions is invariant for the product matrix), so by the existence of a stationary distribution it is positive recurrent, hence recurrent. A recurrent irreducible chain hits every state — in particular every diagonal state — with probability one, so .
Conclusion. Since almost surely, as by monotone convergence (the events decrease to , a null event). Therefore . Taking and reading off the -coordinate gives .
Bridge. This theorem builds toward the ergodic and mixing-time theory of the whole chapter and appears again in every quantitative bound on how fast a chain randomizes, because it converts the analytic question "does converge?" into the probabilistic question "do two independent copies meet?" — and the answer to the second is forced by recurrence of the product chain. The foundational reason the proof works is that aperiodicity is exactly the condition making irreducible: this is the central insight, that periodicity is a phase obstruction living on the product space, not on a single copy. The coupling inequality generalises the elementary observation that two processes agreeing after a random time can differ only before it, and it is dual to the spectral picture in which convergence rate is the second-eigenvalue gap; putting these together, the meeting-time tail is the probabilistic avatar of the spectral gap, and the bridge is that this single tail simultaneously proves convergence and quantifies its speed, which is exactly the quantity refined into mixing time and the cutoff phenomenon of 07.05.08.
Exercises Intermediate+
Advanced results Master
The coupling time is the master quantity: convergence, its rate, the periodic correction, and the mixing-time theory are four readings of the single tail . The coupling inequality makes total variation a meeting-time problem, and every refinement below sharpens the meeting-time estimate.
Theorem 1 (Doeblin's condition and uniform geometric ergodicity). Suppose there exist , , and a probability measure on with for all (the Doeblin minorization). Then has a unique invariant and . The proof is a coupling: at every steps the two coordinates jointly draw a common state from with probability , coalescing then, so is dominated by times a geometric random variable of success probability . On a finite irreducible aperiodic chain Doeblin's condition always holds with the primitivity exponent of 37.05.02 and , yielding uniform geometric convergence with no spectral computation.
Theorem 2 (the periodic case: convergence along the period and Cesàro). Let be irreducible positive recurrent of period with invariant and cyclic subclasses of 37.05.02. For and , unless , and along the admissible subsequence . Equivalently restricted to each is irreducible aperiodic with invariant , to which Theorem (Key) applies. In every case the Cesàro (time) average converges without a parity restriction:
which is the ergodic-averaging face of convergence and survives periodicity because averaging washes out the rotation through the subclasses.
Theorem 3 (path coupling and contraction on a metric). If carries a path metric with edge set and there is a coupling of one step from adjacent states with for all edges, then the Bubley-Dyer path-coupling lemma extends the contraction to all pairs, giving for every and hence via Markov's inequality and the coupling bound. This reduces a global convergence rate to a one-edge computation and is the standard route to mixing bounds for spin systems and card shuffles, where checking contraction on neighbours is tractable.
Theorem 4 (coupling, mixing time, and cutoff). Define the mixing time . The coupling bound gives , so any coupling with a controlled meeting time upper-bounds mixing. For families of chains indexed by a size parameter (random walks on , on ), the meeting-time concentration produces the cutoff phenomenon: total variation stays near until a sharp time and then plunges to in a window , so for every . The finite-group instances — random transpositions, the riffle shuffle, the hypercube walk — are analysed by representation-theoretic upper bounds and matching coupling or strong-stationary-time lower bounds in 07.05.05 and 07.05.08; coupling supplies the robust, eigenvalue-free upper half of every such cutoff theorem.
Synthesis. The foundational reason the entire convergence theory compresses to one number is the coupling inequality: this is exactly the statement that total variation is the cost of disagreement, so turns an analytic limit into the recurrence of the product chain, and it is dual to the spectral picture where the same rate is read from the second-eigenvalue gap. Putting these together, aperiodicity is the central insight of the proof — the hypothesis that makes irreducible, so the meeting that erases the initial condition is forced — and when it fails the period is the phase by which the copies miss each other, salvaged by the Cesàro average and by passage to . The coupling time generalises from the crude independent meeting to the Doeblin minorization, to path coupling, and to the maximal coupling that attains the inequality; the same tail that proves is the quantity whose sharp concentration is the cutoff phenomenon of 07.05.08. This convergence appears again in 37.05.05: the ergodic theorem averages the strong-Markov excursions of 37.05.04, while convergence to equilibrium couples them to a stationary copy, two faces of the same positive-recurrence input — the reciprocal mean return time that is at once the long-run visit frequency and the equilibrium mass.
Full proof set Master
Proposition 1 (coupling inequality). For any coupling of distributions on countable , , with equality for the maximal coupling.
Proof. For , . The contributions from cancel since there , leaving ; symmetrically for . Sup over gives the inequality. The maximal coupling of Exercise 8 sets , attaining equality, so .
Proposition 2 (the independent product chain is irreducible under aperiodicity). If is irreducible and aperiodic on , then is irreducible on .
Proof. Fix and . By the eventual-positivity lemma for irreducible aperiodic chains of 37.05.02 (proved there from addition-closure of the return set and the numerical-semigroup lemma), there exist with for and for . For , , so in the product chain. As the pair was arbitrary, is irreducible.
Proposition 3 (the meeting time is almost surely finite). With irreducible aperiodic positive recurrent and invariant , the independent product chain started from any law on hits the diagonal in finite time a.s.
Proof. By Proposition 2, is irreducible. The product measure satisfies , so it is invariant for the product chain; an irreducible chain with an invariant probability distribution is positive recurrent, hence recurrent 37.05.04. A recurrent irreducible chain visits every state with probability one, so in particular it hits some diagonal state ; thus a.s.
Proposition 4 (convergence and the rate bound). Under the hypotheses of Proposition 3, for every initial law , , and .
Proof. Form the coalescent coupling from the independent product chain started at by freezing the second coordinate onto the first after ; by the strong Markov property at 37.05.04 the frozen coordinate remains a -chain started from , hence -distributed at every time, while the first is -distributed. Proposition 1 gives . Proposition 3 gives , so . Specializing yields .
Proposition 5 (total variation contracts under ; monotone convergence). For all , ; hence decreases to its limit.
Proof. With , by the triangle inequality and row-stochasticity. Halving gives the contraction. Setting , shows ; the bounded monotone sequence converges, and Proposition 4 identifies its limit as .
Proposition 6 (periodic case: Cesàro convergence). Let be irreducible positive recurrent of period with invariant . Then for all , and for , the subsequence .
Proof. The diagonal-block matrix leaves each cyclic subclass invariant and restricts there to an irreducible aperiodic stochastic matrix 37.05.02, with invariant distribution (since spreads its mass equally over the subclasses by the rotation ). Proposition 4 applied to gives for ; composing with a fixed length- bridge from to gives the stated subsequence limit. For the Cesàro average, partition by residue mod ; on each residue class the terms converge to when admissible and are otherwise, and averaging the classes returns . The bounded-convergence/Cesàro argument removes the parity restriction.
Connections Master
The strong Markov property and recurrence/transience dichotomy
37.05.04is the engine of the proof: the coalescent coupling is legitimized by restarting the product chain at the meeting time (a stopping time), and the almost-sure finiteness is recurrence of the irreducible product chain, the same visit-with-certainty mechanism that there made a recurrent class hit with probability one — convergence to equilibrium is recurrence of read on the diagonal.Class structure, irreducibility, and periodicity
37.05.02supplies the two hypotheses and the one delicate step: irreducibility gives a single class to converge within, and the eventual-positivity lemma ( for all large when aperiodic) is exactly what makes irreducible; periodicity is the obstruction handled by the cyclic-subclass decomposition and the Cesàro average, so the period of that unit is the phase by which two coupled copies fail to meet.The stationary-distribution and ergodic theory
37.05.05, co-produced with this unit, supplies the invariant with that the convergence theorem converges to: existence and positivity of are the positive-recurrence input, the reference copy must be started from , and the ergodic theorem's time-average is the pathwise twin of the distributional convergence proved here.The cutoff phenomenon for random walks on finite groups
07.05.08is the quantitative endpoint: the coupling bound furnishes the eigenvalue-free upper half of cutoff theorems, and the representation-theoretic upper bound lemma07.05.05furnishes the spectral half, the two meeting in the sharp-threshold analysis of shuffles and hypercube walks.
Historical & philosophical context Master
The coupling method originates with Wolfgang Doeblin, whose 1938 memoir on finite Markov chains [Doeblin 1938] introduced the minorization condition now bearing his name and used the device of running two coupled copies until they agree to prove convergence to equilibrium with an explicit geometric rate. Doeblin's argument replaced the spectral and matrix-analytic methods of Markov, Perron, and Frobenius with a purely probabilistic construction, trading eigenvalue estimates for a meeting-time estimate. The modern systematic development of coupling — the coupling inequality, the maximal coupling attaining it, and the path-coupling refinement — is the subject of Lindvall's monograph [Lindvall 2002].
The use of coupling to obtain sharp mixing rates, and the discovery that many natural chains exhibit a sudden transition from un-mixed to mixed, is due to the random-walks-on-groups school. David Aldous's 1983 lecture notes [Aldous 1983] developed coupling and strong-stationary-time techniques alongside the representation-theoretic upper-bound lemma for rapidly mixing chains, and the cutoff phenomenon was named and established for the riffle shuffle and random transpositions by Aldous, Diaconis, and Shahshahani in the same period. The textbook synthesis defining convergence through total variation and proving it by the coalescent coupling, together with the Cesàro statement in the periodic case, follows Norris [Norris 1997] and Levin-Peres [Levin-Peres 2017].
Bibliography Master
@book{Norris1997,
author = {Norris, James R.},
title = {Markov Chains},
series = {Cambridge Series in Statistical and Probabilistic Mathematics},
publisher = {Cambridge University Press},
year = {1997}
}
@article{Doeblin1938,
author = {Doeblin, Wolfgang},
title = {Expos\'e de la th\'eorie des cha\^ines simples constantes de Markoff \`a un nombre fini d'\'etats},
journal = {Revue Math\'ematique de l'Union Interbalkanique},
volume = {2},
year = {1938},
pages = {77--105}
}
@book{Lindvall2002,
author = {Lindvall, Torgny},
title = {Lectures on the Coupling Method},
edition = {2},
publisher = {Dover Publications},
year = {2002}
}
@incollection{Aldous1983,
author = {Aldous, David},
title = {Random walks on finite groups and rapidly mixing Markov chains},
booktitle = {S\'eminaire de Probabilit\'es XVII},
series = {Lecture Notes in Mathematics},
volume = {986},
publisher = {Springer},
year = {1983},
pages = {243--297}
}
@book{LevinPeres2017,
author = {Levin, David A. and Peres, Yuval},
title = {Markov Chains and Mixing Times},
edition = {2},
publisher = {American Mathematical Society},
year = {2017}
}