Random walk on a finite group; Upper Bound Lemma
Anchor (Master): Diaconis-Shahshahani 1981 J. Algebra 76; Diaconis 1988 Group Representations in Probability and Statistics; Saloff-Coste 2004 Random Walks on Finite Groups
Intuition [Beginner]
A random walk on a finite group works like this: pick a random group element at each step, and multiply it onto your current position. After many steps, your position becomes harder to predict — the distribution of where you might be gets closer and closer to uniform (every element equally likely). The mixing time is the number of steps needed to get "close enough" to uniform.
The surprise is that representation theory — the study of how groups act linearly on vector spaces — gives sharp answers to "how many steps?" The key tool is the character: a number attached to each group element that records how a representation acts. Characters of the non-principal representations act like "frequencies" that decay over the course of the walk, and the rate of decay controls the mixing time.
Why does this concept exist? Because the question "how fast does a random walk mix?" reduces to "how fast do the non-principal characters decay?", which is a purely representation-theoretic computation.
Visual [Beginner]
A diagram showing the distribution of a random walk on the cyclic group converging toward the flat uniform line. After one step the distribution is concentrated on two points; after three steps it is spread out but uneven; after eight steps it is nearly flat.
The tall bars shrink and the short bars grow as the walk progresses, converging toward the uniform height of .
Worked example [Beginner]
Consider the random walk on the cyclic group with addition modulo 3. At each step, move forward by 1 or backward by 1, each with probability . Start at 0.
Step 1. After one step: position is 1 with probability , position is 2 with probability . Uniform distribution would give probability to each position.
Step 2. After two steps: from position 1 you reach 0 or 2, and from position 2 you reach 0 or 1. So the probability of being at 0 is , at 1 is , at 2 is . The total variation distance from uniform is .
Step 3. After three steps: convolution gives probability at position 0 and at each of positions 1 and 2. The total variation distance is .
What this tells us: the distance from uniform halves at each step. The characters of are complex numbers of modulus 1; the non-principal characters evaluate to on the step distribution, and the distance from uniform is controlled by .
Check your understanding [Beginner]
Formal definition [Intermediate+]
Let be a finite group and a probability measure on (so for all and ). The random walk driven by is the Markov chain with transition kernel . The distribution after steps starting from the identity is the -fold convolution power , defined recursively by
Let denote the uniform distribution on : for all . The total variation distance between and is
The mixing time for threshold is
Definition (Fourier transform at a representation). Let be an irreducible unitary representation of of dimension . The Fourier transform of at is the matrix
For the convolution: .
Counterexamples to common slips
- Convergence without generation. If the support of does not generate all of , the walk never reaches every element and does not converge to . The walk must be ergodic: generates .
- TV versus . Total variation distance and distance measure different things. A bound in (Plancherel) gives a bound in TV (Cauchy-Schwarz), but the converse fails.
- Aperiodicity. For a non-symmetric the walk may be periodic. Adding a holding probability (where is the identity) guarantees aperiodicity. The symmetric walks studied here are automatically aperiodic.
Key theorem with proof [Intermediate+]
Theorem (Upper Bound Lemma — Diaconis-Shahshahani 1981). Let be a finite group, a probability measure on , and the uniform distribution. Then
where the sum ranges over all non-principal irreducible representations of , , and is the Hilbert-Schmidt (Frobenius) norm.
Proof. Set . The argument has four steps.
Step 1 (TV to ). By definition,
Squaring both sides:
Step 2 ( to via Cauchy-Schwarz). Apply the Cauchy-Schwarz inequality to the sum against the constant function 1:
Step 3 ( to characters via Plancherel). The Plancherel formula for a finite group states that for any function ,
Apply to . The Fourier transform of at is . For the principal representation : (since is a probability measure) and , so . For : by orthogonality of matrix coefficients (see 07.01.04). Hence . Plancherel gives
Step 4 (Combine). Multiplying the Cauchy-Schwarz bound by the Plancherel identity:
Bridge. The Upper Bound Lemma builds toward the riffle shuffle analysis where the character sum is evaluated for with representations indexed by partitions, and appears again in the cutoff phenomenon where the bound is shown to be sharp. The foundational reason the bound works is the Plancherel isomorphism between and the direct product of matrix algebras, which is exactly the non-abelian Fourier transform of 07.01.09. The bridge is that the character orthogonality relations of 07.01.04 kill the principal representation contribution, leaving only the non-principal terms that decay under convolution powers.
Exercises [Intermediate+]
Advanced results [Master]
Theorem 1 (Random transposition mixing time — Diaconis-Shahshahani 1981). For the random transposition walk on , the mixing time satisfies where depends only on . In particular, if and if .
The proof evaluates the character sum for the transposition conjugacy class on using the hook-length formula for and the Frobenius character formula for .
Theorem 2 (Plancherel bound is sharp for conjugacy class walks). For a conjugacy class walk on , the Upper Bound Lemma bound differs from the true total variation distance by at most a factor of 2. The dominant contribution comes from the standard representation .
This was established by Diaconis and Shahshahani in their 1981 paper by computing the asymptotics of the character sum and comparing with the lower bound from the second eigenvalue method.
Theorem 3 (Second eigenvalue bound). For a symmetric random walk on driven by , if has eigenvalues at each irreducible , then the mixing time satisfies where is the largest non-principal operator norm.
This lower bound complements the Upper Bound Lemma's upper bound, pinching the mixing time between two character-theoretic expressions.
Theorem 4 (Comparison theorem — Diaconis-Saloff-Coste 1993). If and are two probability measures on the same group , then where is a comparison constant depending on the support of relative to . This allows bounding the mixing time of one walk in terms of another.
Theorem 5 (Random adjacent transpositions — Wilson 2004). The random adjacent transposition walk on (step: swap positions and uniformly at random) has mixing time . The upper bound uses the Upper Bound Lemma adapted to the generating set of adjacent transpositions; the lower bound uses Wilson's method.
Theorem 6 (Product bound — Diaconis-Shahshahani). For a random walk on driven by the product of measures , the total variation distance factors as . This allows decomposing the character sum over irreducible representations of the product group.
Synthesis. The Upper Bound Lemma is the foundational reason that representation theory controls the rate of convergence of random walks on finite groups. The central insight is that the Plancherel isomorphism translates the probabilistic -norm into a sum over representations, and Cauchy-Schwarz carries this to the -norm that defines total variation distance. Putting these together with the character-theoretic machinery of 07.05.01, the bound becomes a finite computation: enumerate the non-principal irreducibles, compute character values on the support of , and read off the mixing time. This is exactly the content that builds toward the riffle shuffle analysis, where the representations of indexed by partitions control the rate at which shuffles approach uniform. The bridge is between abstract harmonic analysis on groups and concrete probabilistic convergence; the pattern generalises from conjugacy class walks to arbitrary measures via the operator-norm formulation of Theorem 3, and identifies the mixing time with the slowest-decaying non-principal character.
Full proof set [Master]
Proposition 1 (Fourier transform of convolution). For probability measures on and any irreducible representation :
Proof. The convolution is . Fourier-transforming:
Substituting :
Proposition 2 (Character sum for conjugacy class walks). Let be uniform on a conjugacy class of . Then
Proof. Bound each summand by replacing all but the maximum term with the maximum:
Using and bounding each ratio by the maximum:
The remaining sum is bounded by . More precisely, the second moment of weighted by is at most .
Connections [Master]
Non-abelian Fourier transform
07.01.09. The Fourier transform at irreducible representations, which underpins the Upper Bound Lemma, is exactly the non-abelian Fourier transform of07.01.09. The Plancherel isomorphism that converts -norms into character sums is the analytic core of both units.Character orthogonality
07.01.04. The vanishing of for non-principal in the proof of the Upper Bound Lemma is a direct application of the character orthogonality relations of07.01.04. The orthogonality relations guarantee that the uniform distribution has no "energy" in non-principal representations.Symmetric group representation
07.05.01. The primary application domain for the Upper Bound Lemma is the symmetric group . The representations indexed by partitions, their dimensions given by the hook-length formula, and the character values on cycle types developed in07.05.01are the computational ingredients needed to evaluate the character sum.Schur-Weyl duality
07.05.04. The representation-theoretic structure of that makes the Upper Bound Lemma computationally effective is the same structure captured by Schur-Weyl duality. The irreducible representations of that appear in the character sum correspond bijectively to partitions, and the same indexing governs the decomposition of tensor powers.
Historical & philosophical context [Master]
Diaconis and Shahshahani introduced the Upper Bound Lemma in their 1981 paper Generating a Random Permutation with Random Transpositions [DiaconisShahshahani1981], proving that random transpositions suffice to make a permutation uniformly distributed. Their innovation was to recast the classical probability question of mixing times in the language of character theory, using the Plancherel formula to bound total variation distance.
Diaconis expanded this programme in his 1988 monograph Group Representations in Probability and Statistics [Diaconis1988], which systematised the Fourier-analytic approach to random walks on finite groups and introduced the comparison theorem for bounding mixing times of one walk in terms of another. Saloff-Coste 2004 provided a comprehensive survey of the field, integrating the representation-theoretic bounds with geometric and functional-analytic methods.
Bibliography [Master]
@article{DiaconisShahshahani1981,
author = {Diaconis, Persi and Shahshahani, Mehrdad},
title = {Generating a random permutation with random transpositions},
journal = {Z. Wahrsch. Verw. Gebiete},
volume = {57},
year = {1981},
pages = {159--179},
}
@book{Diaconis1988,
author = {Diaconis, Persi},
title = {Group Representations in Probability and Statistics},
publisher = {Institute of Mathematical Statistics},
year = {1988},
series = {IMS Lecture Notes--Monograph Series},
volume = {11},
}
@article{BayerDiaconis1992,
author = {Bayer, Dave and Diaconis, Persi},
title = {Trailing the Dovetail Shuffle to its Lair},
journal = {Ann. Appl. Probab.},
volume = {2},
year = {1992},
pages = {294--313},
}
@incollection{SaloffCoste2004,
author = {Saloff-Coste, Laurent},
title = {Random Walks on Finite Groups},
booktitle = {Probability on Discrete Structures},
publisher = {Springer},
year = {2004},
series = {Encyclopaedia of Mathematical Sciences},
volume = {110},
}
@article{Wilson2004,
author = {Wilson, David B.},
title = {Mixing times of Lozenge tiling and card shuffling Markov chains},
journal = {Ann. Appl. Probab.},
volume = {14},
year = {2004},
pages = {274--325},
}
@article{DiaconisSaloffCoste1993,
author = {Diaconis, Persi and Saloff-Coste, Laurent},
title = {Comparison Theorems for Reversible Markov Chains},
journal = {Ann. Appl. Probab.},
volume = {3},
year = {1993},
pages = {696--730},
}