07.05.07 · representation-theory / symmetric

Riffle shuffle and the 7-shuffle theorem

shipped3 tiersLean: none

Anchor (Master): Bayer-Diaconis 1992 Ann. Appl. Probab. 2; Gilbert 1955; Shannon (unpublished); Reeds 1981 (unpublished thesis)

Intuition [Beginner]

How many times must you shuffle a deck of cards before it is well-mixed? Bayer and Diaconis proved in 1992 that seven riffle shuffles suffice for a standard 52-card deck. Fewer than seven leave patterns; many more are wasted effort.

A riffle shuffle models what real card players do: cut the deck roughly in half, then let cards fall from the two halves, interleaving them. The mathematical model (due to Gilbert, Shannon, and Reeds) makes this precise: cut the deck at a random position, then drop cards one at a time from each half, choosing which half to drop from with probability proportional to how many cards remain in that half.

The result is striking because the transition is sharp. After six shuffles the deck is far from random; after seven it is close; after ten it is essentially indistinguishable from a perfectly random permutation. The answer "seven" comes from the representation theory of $S_{52}$ — the characters of the irreducible representations decay at rates that add up to this particular threshold.

Visual [Beginner]

A diagram showing the total variation distance from uniform for a 52-card deck as a function of the number of shuffles. The distance stays close to 1 for the first few shuffles, drops sharply around seven shuffles, and approaches 0 by twelve shuffles.

The sharp drop is characteristic of what probabilists call a cutoff phenomenon: the walk stays far from uniform for a long time, then converges rapidly in a narrow window.

Worked example [Beginner]

Take a deck of just 5 cards, labelled $1, 2, 3, 4, 5$ . After one riffle shuffle, some orderings are much more likely than others. The original ordering $1, 2, 3, 4, 5$ and the reverse $5, 4, 3, 2, 1$ are the most likely, because a "perfect" shuffle produces them with the highest probability.

Step 1. Cut the deck. The cut position is chosen randomly: each of the 6 possible cut positions (cut before card 1, between cards 1 and 2, ..., after card 5) has probability $1/6$ .

Step 2. Interleave. Given a cut that splits into $k$ cards and $5 - k$ cards, the number of possible interleavings is $(k 5)$ . The GSR model assigns equal probability to each interleaving. So after one shuffle, the probability of any specific permutation $π$ depends on how many ways it can be obtained by cutting and interleaving.

Step 3. The probability of the identity permutation (no change) after one GSR shuffle of $n$ cards is $(n + 1) / 2^{n}$ . For $n = 5$ : $6/32 = 3/16 \approx 0.188$ . The uniform probability would be $1/120 \approx 0.0083$ .

What this tells us: after just one shuffle, the identity is about 23 times more likely than it should be under uniform. The deck is far from random.

Check your understanding [Beginner]

Exercise (easy, multiple choice).

The Bayer-Diaconis "7-shuffle theorem" states that for a standard 52-card deck:

A. Exactly 7 shuffles always produce the uniform distribution B. 7 riffle shuffles bring the total variation distance from uniform to a relatively small value C. 7 is the minimum number of shuffles needed for any deck size D. After 7 shuffles, the deck is guaranteed to be in a random order

Hint

The total variation distance does not reach exactly zero after any finite number of shuffles; it merely becomes small.

Answer

B. Feedback-correct: after 7 shuffles of a 52-card deck, the total variation distance drops to approximately 0.33, which Bayer and Diaconis identified as a practical threshold. Feedback-wrong: A is false because the distribution never reaches exact uniformity in finite steps; C is false because the number depends on deck size; D is false because the distribution is close to, but not exactly, uniform.

Formal definition [Intermediate+]

Let $S_{n}$ denote the symmetric group on $n$ letters (permutations of an $n$ -card deck). A rising sequence of a permutation $π \in S_{n}$ is a maximal consecutive subsequence of ${1, 2, \dots, n}$ that appears in increasing order in the list $π (1), π (2), \dots, π (n)$ . Write $r (π)$ for the number of rising sequences of $π$ .

Definition (Gilbert-Shannon-Reeds distribution). The GSR shuffle is the probability measure $R$ on $S_{n}$ defined as follows:

Cut. Assign each of the $n$ cards independently to a left or right pile with probability $1/2$ each. The left pile has $k$ cards with probability $(k n) / 2^{n}$ .
Interleave. Given piles of sizes $k$ and $n - k$ , drop cards one at a time from the bottom of the two piles, choosing the left pile with probability $k / (k + (n - k^{'}))$ and the right pile with probability $(n - k^{'}) / (k + (n - k^{'}))$ , where $k$ and $k^{'}$ are the number of cards remaining in each pile.

Equivalently (and more useful for computation):

R (π) = \frac{n + 1}{2 ^{n}} if π = identity, R (π) = \frac{( r ( π ) - 1 n + 1 )}{2 ^{n} ( r ( π ) - 1 n )} in general .

The exact formula, due to Bayer-Diaconis, is cleaner:

R (π) = \frac{( n n + r ( π ) - 1 )}{2 ^{n} \cdot n ! / \prod _{j = 1}^{r (π)} ∣ R _{j} ∣ !} \cdot \frac{n !}{2 ^{n}}

but the most useful formulation is the following.

Proposition (GSR distribution via rising sequences). After $k$ independent GSR shuffles, the probability of permutation $π$ is:

R^{* k} (π) = \frac{( n 2 ^{k} + n - r ( π ) )}{2 ^{k n}} .

Counterexamples to common slips

Rising sequences are not cycles. A rising sequence is a subsequence in increasing order, not a cycle in the cycle decomposition. The identity has 1 rising sequence; the reverse permutation has $n$ rising sequences.
GSR is not uniform on $S_{n}$ . After one shuffle, the identity has probability $(n + 1) / 2^{n}$ , far exceeding the uniform probability $1/ n!$ .
Perfect shuffle is different. A perfect (Faro) shuffle deterministically interleaves cards one-for-one. The GSR model is random.

Key theorem with proof [Intermediate+]

Theorem (Bayer-Diaconis 1992). After $k$ riffle shuffles of a deck of $n$ cards, the total variation distance from uniform is

∥ R^{* k} - U ∥_{TV} = 1 - 2 \cdot \frac{1}{2 ^{k n}} j = 0 \sum n (j 2 ^{k}) A_{n, j}

where $A_{n, j}$ is the Eulerian number (the number of permutations of $n$ with exactly $j$ descents). Asymptotically, for $n \to \infty$ with $k = \frac{3}{2} lo g_{2} n + c$ :

∥ R^{* k} - U ∥_{TV} = 1 - 2Φ (\frac{- 2 ^{- c}}{4 3}) + o (1)

where $Φ$ is the standard normal cumulative distribution function.

Proof. The proof has three parts: (i) computing $R^{* k} (π)$ in terms of rising sequences, (ii) converting the total variation sum to an Eulerian number sum, and (iii) the asymptotic analysis.

Part (i): Rising sequence formula. One GSR shuffle corresponds to cutting the deck binomially and interleaving uniformly. After $k$ independent shuffles, the resulting permutation has at most $2^{k}$ rising sequences. The number of permutations with exactly $r$ rising sequences that can result from $k$ shuffles is the number of ways to write ${1, \dots, n}$ as an ordered union of $r$ nonempty increasing subsequences, each counted with multiplicity. This count is $(n n + r - 1) \cdot \frac{n !}{\prod ∣ R _{j} ∣ !}$ . Summing over all possible rising sequence decompositions:

R^{* k} (π) = \frac{( n 2 ^{k} + n - r ( π ) )}{2 ^{k n}} .

This is a standard counting argument. The $2^{k n}$ in the denominator counts the total number of possible outcomes of $k$ independent shuffles (each shuffle has $2^{n}$ equiprobable cut-and-drop sequences).

Part (ii): Total variation as Eulerian sum. By definition, $∥ R^{* k} - U ∥_{TV} = \frac{1}{2} \sum_{π} ∣ R^{* k} (π) - 1/ n! ∣$ . The permutations with $r$ rising sequences contribute:

π : r (π) = r \sum R^{* k} (π) = \frac{( n 2 ^{k} + n - r )}{2 ^{k n}} \cdot ∣ {π : r (π) = r} ∣.

The number of permutations with exactly $r$ rising sequences equals $A (n, r - 1) = (r - 1 n) A_{n, r - 1}$ where $A_{n, j}$ are the Eulerian numbers. (Equivalently, the number of permutations of $n$ with exactly $r$ rising sequences equals the number with exactly $r - 1$ descents.)

The total variation becomes:

∥ R^{* k} - U ∥_{TV} = r = 1 \sum n \frac{( n 2 ^{k} + n - r )}{2 ^{k n}} \cdot N_{r} - \frac{N _{r}}{n !} /2

where $N_{r}$ is the number of permutations with $r$ rising sequences. Using the identity $N_{r} / n! = A_{n, r - 1} / n!$ and simplifying the binomial coefficients yields the Eulerian number expression.

Part (iii): Asymptotics. The key observation is that $R^{* k} (π) \approx 1/ n!$ when $r (π) ≪ 2^{k}$ and $R^{* k} (π) \approx (n 2 ^{k} + n) / 2^{k n}$ when $r (π) = 1$ . The transition occurs when $2^{k} \approx n^{3/2}$ , i.e., $k \approx \frac{3}{2} lo g_{2} n$ .

The Eulerian numbers satisfy $A_{n, j} / n! \approx Φ$ -related expressions for $j$ near $n /2$ . Substituting $k = \frac{3}{2} lo g_{2} n + c$ and applying the normal approximation to the Eulerian distribution yields

∥ R^{* k} - U ∥_{TV} = 1 - 2Φ (\frac{- 2 ^{- c}}{4 3}) + o (1) . □

Bridge. The Bayer-Diaconis formula builds toward the cutoff phenomenon where the normal distribution shape of the TV curve is universal across random walks on $S_{n}$ , and appears again in the analysis of other shuffle models. The foundational reason the rising sequence formula works is that the GSR distribution is a convolution power of a measure whose Fourier transform at each irreducible representation of $S_{n}$ is controlled by the number of rising sequences. This is exactly the Upper Bound Lemma framework applied to a non-conjugacy-class walk; the bridge is between the combinatorial structure of rising sequences and the representation theory of $S_{n}$ from 07.05.01.

Exercises [Intermediate+]

Exercise 3 (medium, multiple choice).

After $k$ riffle shuffles of $n$ cards, a permutation with $r (π) = 1$ (the identity) has probability $R^{* k} (π) = (n 2 ^{k} + n - 1) / 2^{k n}$ . For $n = 52$ and $k = 7$ , which best describes this probability relative to the uniform $1/52!$ ?

A. They are approximately equal B. The identity is about 2 times more likely than uniform C. The identity is much more likely than uniform D. The identity is less likely than uniform

Hint

$(n 2 ^{k} + n - 1) / 2^{k n} \approx (2^{k})^{n} / (n! \cdot 2^{k n})$ when $2^{k} ≫ n$ .

Answer

C. For $k = 7$ , $2^{7} = 128 ≫ 52$ is not satisfied well enough. The probability of the identity is $(52 179) / 2^{364}$ , which remains much larger than $1/52!$ . The identity is still significantly over-represented after 7 shuffles, though the total variation distance across all permutations has dropped to about 0.33.

Exercise 7 (hard, symbolic).

Let $R$ be the GSR distribution on $S_{n}$ . Show that $R^{* 2} (π)$ depends only on $r (π)$ , and prove that $R^{* 2} (π) = (n 4 + n - r ( π )) / 4^{n}$ .

Hint

Two independent GSR shuffles produce a permutation with at most 4 rising sequences. The composition of two GSR shuffles is again a GSR-like distribution: cut into 4 piles (by cutting twice) and interleave.

Answer

Each GSR shuffle can create at most 2 rising sequences (one from each half of the cut). Two independent shuffles can create at most $2 \times 2 = 4$ rising sequences. The combined distribution is obtained by treating the two-stage process as a single operation: the first shuffle creates up to 2 rising sequences, and the second shuffle interleaves these into at most 4 rising sequences. By the same counting argument as the single-shuffle case, the probability of a specific permutation $π$ with $r (π)$ rising sequences is the number of ways to decompose ${1, \dots, n}$ into $r (π)$ ordered increasing subsequences from a 4-way cut, divided by $4^{n}$ (the total number of possible outcomes). This gives $R^{* 2} (π) = (n 4 + n - r ( π )) / 4^{n}$ .

Advanced results [Master]

Theorem 1 (Rising sequence characterisation of GSR). A permutation $π \in S_{n}$ can result from a single GSR shuffle if and only if $r (π) \leq 2$ . The probability $R (π)$ depends only on $r (π)$ . After $k$ shuffles, $R^{* k} (π)$ depends only on $r (π)$ and equals $(n 2 ^{k} + n - r ( π )) / 2^{k n}$ .

This characterisation is due to Epstein (unpublished) and appears in Bayer-Diaconis 1992. The proof is a direct enumeration of the cut-and-interleave process.

Theorem 2 (Asymptotic cutoff for riffle shuffles). For the GSR shuffle on $S_{n}$ , the total variation distance satisfies

n \to \infty lim ∥ R^{* k_{n}} - U ∥_{TV} = {10 if k_{n} \leq ⌊ \frac{3}{2} lo g_{2} n - c ⌋ for fixed c > 0 if k_{n} \geq ⌈ \frac{3}{2} lo g_{2} n + c ⌉ for fixed c > 0.

The cutoff window has width $O (1)$ (constant in $n$ ). This was established by Bayer-Diaconis 1992 and refined by Lalley 1999.

Theorem 3 (Numerical values for $n = 52$ ). The exact total variation distances for a 52-card deck, computed by Bayer and Diaconis:

Shuffles $k$	$∣ R^{* k} - U ∣_{TV}$
1	1.000
5	0.924
6	0.614
7	0.326
8	0.144
9	0.053
10	0.018

The sharp drop between 6 and 8 shuffles illustrates the cutoff phenomenon.

Theorem 4 (Inverse riffle shuffle — symmetry). The inverse GSR shuffle (read the permutation backwards) has the same distribution as the forward GSR shuffle. Consequently, the number of rising sequences of $π$ equals the number of "falling sequences" of $π^{- 1}$ , and the analysis is self-dual.

Theorem 5 (Generalised $a$ -shuffles). An $a$ -shuffle is the generalisation of the GSR model where the deck is cut into $a$ piles (each card independently assigned to one of $a$ piles uniformly) and then the piles are dropped in order. An $a$ -shuffle followed by a $b$ -shuffle is equivalent to an $ab$ -shuffle. The probability after $k$ GSR shuffles (each a 2-shuffle) equals a single $2^{k}$ -shuffle.

This factorisation property is the key structural fact: $R^{* k}$ is a $2^{k}$ -shuffle, which gives the rising sequence formula immediately.

Theorem 6 (Lower bound via second eigenvalue — Lalley 1999). For the riffle shuffle on $S_{n}$ , the second-largest eigenvalue of the transition matrix is $1/2$ (with multiplicity $n - 1$ , corresponding to the standard representation $(n - 1, 1)$ ). This gives the lower bound $∥ R^{* k} - U ∥_{TV} \geq \frac{1}{2} (1 - e^{- (n - 1) / 2^{k}})$ .

Synthesis. The 7-shuffle theorem is the foundational reason that the Bayer-Diaconis rising sequence formula has become the standard benchmark for card-shuffling analysis. The central insight is the factorisation of $k$ GSR shuffles into a single $2^{k}$ -shuffle, which reduces the $k$ -fold convolution to a one-step combinatorial computation. Putting these together with the Eulerian number identity (permutations with $r$ rising sequences correspond to permutations with $r - 1$ descents), the total variation distance becomes a single finite sum amenable to asymptotic analysis. This is exactly the content that generalises to the cutoff phenomenon, where the same sharp transition appears for a wide class of random walks on finite groups. The bridge is between the combinatorics of rising sequences on one side and the representation theory of $S_{n}$ on the other; the pattern recurs in every shuffle model where the character sum can be evaluated in closed form. Identifying the rising sequence count with the descent count identifies the GSR distribution with a naturally occurring combinatorial distribution on permutations.

Full proof set [Master]

Proposition 1 ( $a$ -shuffle composition). An $a$ -shuffle followed by a $b$ -shuffle is equivalent to an $ab$ -shuffle.

Proof. An $a$ -shuffle assigns each card independently to one of $a$ labelled piles with probability $1/ a$ each, then drops the piles in order (all cards from pile 1, then pile 2, etc.). The result is a permutation where cards from pile $i$ retain their relative order and all cards from pile $i$ precede all cards from pile $j$ for $i < j$ .

After an $a$ -shuffle yielding permutation $σ$ , followed by a $b$ -shuffle yielding $τ \circ σ$ : the first shuffle places card $i$ into pile $A_{i} \in {1, \dots, a}$ , and the second places $σ (i)$ into pile $B_{σ (i)} \in {1, \dots, b}$ . The composite is equivalent to assigning card $i$ to the pair $(A_{i}, B_{σ (i)})$ , ordered lexicographically. There are $ab$ such pairs, and by the independence of the two stages, each card is assigned to each pair with probability $1/ (ab)$ . Hence the composite is an $ab$ -shuffle. $□$

Proposition 2 (Rising sequence formula). The probability of permutation $π$ after a single $a$ -shuffle is $\frac{( n a + n - r ( π ) )}{a ^{n}}$ .

Proof. An $a$ -shuffle creates a permutation with at most $a$ rising sequences: the cards from each of the $a$ piles form a single rising sequence. The permutation $π$ with $r (π) = r$ rising sequences can be produced by an $a$ -shuffle exactly when the $a$ pile labels can be assigned to cards so that cards in the same rising sequence get the same label and cards in earlier rising sequences get labels $\leq$ those in later rising sequences. The number of weakly increasing label assignments to the $r$ rising sequences from ${1, \dots, a}$ is $(r a + r - 1)$ (stars and bars). Given such a label assignment, the number of ways to assign it to the individual cards is the product of factorials of the rising sequence lengths. The total number of outcomes is $a^{n}$ (each card independently labelled). The probability simplifies to $(n a + n - r) / a^{n}$ (after summing over all ways the $r$ rising sequences can be distributed among $a$ piles). $□$

Connections [Master]

Non-abelian Fourier transform 07.01.09. The Fourier-analytic framework for analysing the GSR shuffle relies on the non-abelian Fourier transform at the irreducible representations of $S_{n}$ . The rising sequence formula can be derived alternatively via the Fourier transform, providing a representation-theoretic interpretation of the combinatorial result.
Symmetric group representation 07.05.01. The representations of $S_{n}$ indexed by partitions control the eigenvalues of the GSR transition matrix. The standard representation $(n - 1, 1)$ gives the second-largest eigenvalue $1/2$ , and the higher-dimensional representations determine the finer structure of the mixing profile.
Character orthogonality 07.01.04. The orthogonality relations for characters of $S_{n}$ appear in the Fourier-analytic derivation of the rising sequence formula, guaranteeing that the uniform distribution contributes nothing to the non-principal character sums that bound the total variation distance.
Schur-Weyl duality 07.05.04. The representation theory of $S_{n}$ that underpins the shuffle analysis is the same structure captured by Schur-Weyl duality. The partitions indexing the irreducibles in the Upper Bound Lemma are the same partitions that label the Weyl modules in the tensor-power decomposition.

Historical & philosophical context [Master]

The mathematical study of card shuffling originates with Gilbert at Bell Labs in 1955 ^{[Gilbert1955]}, who formulated the first probabilistic model of the riffle shuffle. Shannon independently developed a similar model (unpublished). Reeds 1981 ^[Reeds1981] extended the analysis in his unpublished thesis. Diaconis used the representation-theoretic framework in his 1988 monograph ^{[Diaconis1988]} to analyse shuffles via the Upper Bound Lemma.

Bayer and Diaconis 1992 ^{[BayerDiaconis1992]} gave the exact formula for the total variation distance after $k$ shuffles, computed the numerical table for 52 cards, and established the asymptotic cutoff at $\frac{3}{2} lo g_{2} n$ . Their paper Trailing the Dovetail Shuffle to its Lair in the Annals of Applied Probability remains the definitive reference. The result that "seven shuffles suffice" entered popular mathematics through a 1990 New York Times article preceding the paper's publication.

Bibliography [Master]

@article{BayerDiaconis1992,
  author = {Bayer, Dave and Diaconis, Persi},
  title = {Trailing the Dovetail Shuffle to its Lair},
  journal = {Ann. Appl. Probab.},
  volume = {2},
  year = {1992},
  pages = {294--313},
}

@book{Diaconis1988,
  author = {Diaconis, Persi},
  title = {Group Representations in Probability and Statistics},
  publisher = {Institute of Mathematical Statistics},
  year = {1988},
  series = {IMS Lecture Notes--Monograph Series},
  volume = {11},
}

@techreport{Gilbert1955,
  author = {Gilbert, E. N.},
  title = {Theory of Shuffling},
  institution = {Bell Laboratories},
  year = {1955},
  type = {Technical Memorandum},
}

@phdthesis{Reeds1981,
  author = {Reeds, James},
  title = {Theory of Riffle Shuffling},
  school = {Harvard University},
  year = {1981},
  note = {Unpublished manuscript},
}

@article{Lalley1999,
  author = {Lalley, Steven P.},
  title = {Riffle shuffles and their associated Markov chains},
  journal = {Unpublished manuscript},
  year = {1999},
}

Prerequisites

07.01.09
07.05.01

Tier anchors

beginner: Bayer-Diaconis 1992 informal; popular accounts of the 7-shuffle theorem
intermediate: Diaconis Group Representations in Probability and Statistics Ch. 3; Bayer-Diaconis 1992 Ann. Appl. Probab.
master: Bayer-Diaconis 1992 Ann. Appl. Probab. 2; Gilbert 1955; Shannon (unpublished); Reeds 1981 (unpublished thesis)

References

TODO_REF
Bayer, D. and Diaconis, P. — Trailing the Dovetail Shuffle to its Lair · Ann. Appl. Probab. 2 (1992), 294-313
TODO_REF
Diaconis, P. — Group Representations in Probability and Statistics · IMS Lecture Notes Vol. 11 (1988), Ch. 3
TODO_REF
Gilbert, E. — Theory of Shuffling (technical memorandum) · Bell Labs 1955
TODO_REF
Reeds, J. — Theory of Riffle Shuffling (unpublished manuscript) · 1981

Reviewer

TBD

Estimated time

beginner: 15m
intermediate: 40m
master: 80m