Riffle shuffle and the 7-shuffle theorem
Anchor (Master): Bayer-Diaconis 1992 Ann. Appl. Probab. 2; Gilbert 1955; Shannon (unpublished); Reeds 1981 (unpublished thesis)
Intuition [Beginner]
How many times must you shuffle a deck of cards before it is well-mixed? Bayer and Diaconis proved in 1992 that seven riffle shuffles suffice for a standard 52-card deck. Fewer than seven leave patterns; many more are wasted effort.
A riffle shuffle models what real card players do: cut the deck roughly in half, then let cards fall from the two halves, interleaving them. The mathematical model (due to Gilbert, Shannon, and Reeds) makes this precise: cut the deck at a random position, then drop cards one at a time from each half, choosing which half to drop from with probability proportional to how many cards remain in that half.
The result is striking because the transition is sharp. After six shuffles the deck is far from random; after seven it is close; after ten it is essentially indistinguishable from a perfectly random permutation. The answer "seven" comes from the representation theory of — the characters of the irreducible representations decay at rates that add up to this particular threshold.
Visual [Beginner]
A diagram showing the total variation distance from uniform for a 52-card deck as a function of the number of shuffles. The distance stays close to 1 for the first few shuffles, drops sharply around seven shuffles, and approaches 0 by twelve shuffles.
The sharp drop is characteristic of what probabilists call a cutoff phenomenon: the walk stays far from uniform for a long time, then converges rapidly in a narrow window.
Worked example [Beginner]
Take a deck of just 5 cards, labelled . After one riffle shuffle, some orderings are much more likely than others. The original ordering and the reverse are the most likely, because a "perfect" shuffle produces them with the highest probability.
Step 1. Cut the deck. The cut position is chosen randomly: each of the 6 possible cut positions (cut before card 1, between cards 1 and 2, ..., after card 5) has probability .
Step 2. Interleave. Given a cut that splits into cards and cards, the number of possible interleavings is . The GSR model assigns equal probability to each interleaving. So after one shuffle, the probability of any specific permutation depends on how many ways it can be obtained by cutting and interleaving.
Step 3. The probability of the identity permutation (no change) after one GSR shuffle of cards is . For : . The uniform probability would be .
What this tells us: after just one shuffle, the identity is about 23 times more likely than it should be under uniform. The deck is far from random.
Check your understanding [Beginner]
Formal definition [Intermediate+]
Let denote the symmetric group on letters (permutations of an -card deck). A rising sequence of a permutation is a maximal consecutive subsequence of that appears in increasing order in the list . Write for the number of rising sequences of .
Definition (Gilbert-Shannon-Reeds distribution). The GSR shuffle is the probability measure on defined as follows:
- Cut. Assign each of the cards independently to a left or right pile with probability each. The left pile has cards with probability .
- Interleave. Given piles of sizes and , drop cards one at a time from the bottom of the two piles, choosing the left pile with probability and the right pile with probability , where and are the number of cards remaining in each pile.
Equivalently (and more useful for computation):
The exact formula, due to Bayer-Diaconis, is cleaner:
but the most useful formulation is the following.
Proposition (GSR distribution via rising sequences). After independent GSR shuffles, the probability of permutation is:
Counterexamples to common slips
- Rising sequences are not cycles. A rising sequence is a subsequence in increasing order, not a cycle in the cycle decomposition. The identity has 1 rising sequence; the reverse permutation has rising sequences.
- GSR is not uniform on . After one shuffle, the identity has probability , far exceeding the uniform probability .
- Perfect shuffle is different. A perfect (Faro) shuffle deterministically interleaves cards one-for-one. The GSR model is random.
Key theorem with proof [Intermediate+]
Theorem (Bayer-Diaconis 1992). After riffle shuffles of a deck of cards, the total variation distance from uniform is
where is the Eulerian number (the number of permutations of with exactly descents). Asymptotically, for with :
where is the standard normal cumulative distribution function.
Proof. The proof has three parts: (i) computing in terms of rising sequences, (ii) converting the total variation sum to an Eulerian number sum, and (iii) the asymptotic analysis.
Part (i): Rising sequence formula. One GSR shuffle corresponds to cutting the deck binomially and interleaving uniformly. After independent shuffles, the resulting permutation has at most rising sequences. The number of permutations with exactly rising sequences that can result from shuffles is the number of ways to write as an ordered union of nonempty increasing subsequences, each counted with multiplicity. This count is . Summing over all possible rising sequence decompositions:
This is a standard counting argument. The in the denominator counts the total number of possible outcomes of independent shuffles (each shuffle has equiprobable cut-and-drop sequences).
Part (ii): Total variation as Eulerian sum. By definition, . The permutations with rising sequences contribute:
The number of permutations with exactly rising sequences equals where are the Eulerian numbers. (Equivalently, the number of permutations of with exactly rising sequences equals the number with exactly descents.)
The total variation becomes:
where is the number of permutations with rising sequences. Using the identity and simplifying the binomial coefficients yields the Eulerian number expression.
Part (iii): Asymptotics. The key observation is that when and when . The transition occurs when , i.e., .
The Eulerian numbers satisfy -related expressions for near . Substituting and applying the normal approximation to the Eulerian distribution yields
Bridge. The Bayer-Diaconis formula builds toward the cutoff phenomenon where the normal distribution shape of the TV curve is universal across random walks on , and appears again in the analysis of other shuffle models. The foundational reason the rising sequence formula works is that the GSR distribution is a convolution power of a measure whose Fourier transform at each irreducible representation of is controlled by the number of rising sequences. This is exactly the Upper Bound Lemma framework applied to a non-conjugacy-class walk; the bridge is between the combinatorial structure of rising sequences and the representation theory of from 07.05.01.
Exercises [Intermediate+]
Advanced results [Master]
Theorem 1 (Rising sequence characterisation of GSR). A permutation can result from a single GSR shuffle if and only if . The probability depends only on . After shuffles, depends only on and equals .
This characterisation is due to Epstein (unpublished) and appears in Bayer-Diaconis 1992. The proof is a direct enumeration of the cut-and-interleave process.
Theorem 2 (Asymptotic cutoff for riffle shuffles). For the GSR shuffle on , the total variation distance satisfies
The cutoff window has width (constant in ). This was established by Bayer-Diaconis 1992 and refined by Lalley 1999.
Theorem 3 (Numerical values for ). The exact total variation distances for a 52-card deck, computed by Bayer and Diaconis:
| Shuffles | |
|---|---|
| 1 | 1.000 |
| 5 | 0.924 |
| 6 | 0.614 |
| 7 | 0.326 |
| 8 | 0.144 |
| 9 | 0.053 |
| 10 | 0.018 |
The sharp drop between 6 and 8 shuffles illustrates the cutoff phenomenon.
Theorem 4 (Inverse riffle shuffle — symmetry). The inverse GSR shuffle (read the permutation backwards) has the same distribution as the forward GSR shuffle. Consequently, the number of rising sequences of equals the number of "falling sequences" of , and the analysis is self-dual.
Theorem 5 (Generalised -shuffles). An -shuffle is the generalisation of the GSR model where the deck is cut into piles (each card independently assigned to one of piles uniformly) and then the piles are dropped in order. An -shuffle followed by a -shuffle is equivalent to an -shuffle. The probability after GSR shuffles (each a 2-shuffle) equals a single -shuffle.
This factorisation property is the key structural fact: is a -shuffle, which gives the rising sequence formula immediately.
Theorem 6 (Lower bound via second eigenvalue — Lalley 1999). For the riffle shuffle on , the second-largest eigenvalue of the transition matrix is (with multiplicity , corresponding to the standard representation ). This gives the lower bound .
Synthesis. The 7-shuffle theorem is the foundational reason that the Bayer-Diaconis rising sequence formula has become the standard benchmark for card-shuffling analysis. The central insight is the factorisation of GSR shuffles into a single -shuffle, which reduces the -fold convolution to a one-step combinatorial computation. Putting these together with the Eulerian number identity (permutations with rising sequences correspond to permutations with descents), the total variation distance becomes a single finite sum amenable to asymptotic analysis. This is exactly the content that generalises to the cutoff phenomenon, where the same sharp transition appears for a wide class of random walks on finite groups. The bridge is between the combinatorics of rising sequences on one side and the representation theory of on the other; the pattern recurs in every shuffle model where the character sum can be evaluated in closed form. Identifying the rising sequence count with the descent count identifies the GSR distribution with a naturally occurring combinatorial distribution on permutations.
Full proof set [Master]
Proposition 1 (-shuffle composition). An -shuffle followed by a -shuffle is equivalent to an -shuffle.
Proof. An -shuffle assigns each card independently to one of labelled piles with probability each, then drops the piles in order (all cards from pile 1, then pile 2, etc.). The result is a permutation where cards from pile retain their relative order and all cards from pile precede all cards from pile for .
After an -shuffle yielding permutation , followed by a -shuffle yielding : the first shuffle places card into pile , and the second places into pile . The composite is equivalent to assigning card to the pair , ordered lexicographically. There are such pairs, and by the independence of the two stages, each card is assigned to each pair with probability . Hence the composite is an -shuffle.
Proposition 2 (Rising sequence formula). The probability of permutation after a single -shuffle is .
Proof. An -shuffle creates a permutation with at most rising sequences: the cards from each of the piles form a single rising sequence. The permutation with rising sequences can be produced by an -shuffle exactly when the pile labels can be assigned to cards so that cards in the same rising sequence get the same label and cards in earlier rising sequences get labels those in later rising sequences. The number of weakly increasing label assignments to the rising sequences from is (stars and bars). Given such a label assignment, the number of ways to assign it to the individual cards is the product of factorials of the rising sequence lengths. The total number of outcomes is (each card independently labelled). The probability simplifies to (after summing over all ways the rising sequences can be distributed among piles).
Connections [Master]
Non-abelian Fourier transform
07.01.09. The Fourier-analytic framework for analysing the GSR shuffle relies on the non-abelian Fourier transform at the irreducible representations of . The rising sequence formula can be derived alternatively via the Fourier transform, providing a representation-theoretic interpretation of the combinatorial result.Symmetric group representation
07.05.01. The representations of indexed by partitions control the eigenvalues of the GSR transition matrix. The standard representation gives the second-largest eigenvalue , and the higher-dimensional representations determine the finer structure of the mixing profile.Character orthogonality
07.01.04. The orthogonality relations for characters of appear in the Fourier-analytic derivation of the rising sequence formula, guaranteeing that the uniform distribution contributes nothing to the non-principal character sums that bound the total variation distance.Schur-Weyl duality
07.05.04. The representation theory of that underpins the shuffle analysis is the same structure captured by Schur-Weyl duality. The partitions indexing the irreducibles in the Upper Bound Lemma are the same partitions that label the Weyl modules in the tensor-power decomposition.
Historical & philosophical context [Master]
The mathematical study of card shuffling originates with Gilbert at Bell Labs in 1955 [Gilbert1955], who formulated the first probabilistic model of the riffle shuffle. Shannon independently developed a similar model (unpublished). Reeds 1981 [Reeds1981] extended the analysis in his unpublished thesis. Diaconis used the representation-theoretic framework in his 1988 monograph [Diaconis1988] to analyse shuffles via the Upper Bound Lemma.
Bayer and Diaconis 1992 [BayerDiaconis1992] gave the exact formula for the total variation distance after shuffles, computed the numerical table for 52 cards, and established the asymptotic cutoff at . Their paper Trailing the Dovetail Shuffle to its Lair in the Annals of Applied Probability remains the definitive reference. The result that "seven shuffles suffice" entered popular mathematics through a 1990 New York Times article preceding the paper's publication.
Bibliography [Master]
@article{BayerDiaconis1992,
author = {Bayer, Dave and Diaconis, Persi},
title = {Trailing the Dovetail Shuffle to its Lair},
journal = {Ann. Appl. Probab.},
volume = {2},
year = {1992},
pages = {294--313},
}
@book{Diaconis1988,
author = {Diaconis, Persi},
title = {Group Representations in Probability and Statistics},
publisher = {Institute of Mathematical Statistics},
year = {1988},
series = {IMS Lecture Notes--Monograph Series},
volume = {11},
}
@techreport{Gilbert1955,
author = {Gilbert, E. N.},
title = {Theory of Shuffling},
institution = {Bell Laboratories},
year = {1955},
type = {Technical Memorandum},
}
@phdthesis{Reeds1981,
author = {Reeds, James},
title = {Theory of Riffle Shuffling},
school = {Harvard University},
year = {1981},
note = {Unpublished manuscript},
}
@article{Lalley1999,
author = {Lalley, Steven P.},
title = {Riffle shuffles and their associated Markov chains},
journal = {Unpublished manuscript},
year = {1999},
}