De Finetti / exchangeability and the symmetric group
Anchor (Master): de Finetti 1937; Hewitt-Savage 1955; Diaconis-Freedman 1980 Ann. Probab. 8
Intuition [Beginner]
Imagine you flip a coin 100 times and record heads and tails. If you shuffle the order of the results, the sequence "looks the same" — the joint probability of any particular pattern depends only on the number of heads, not on where they appear. This symmetry under reordering is called exchangeability.
Exchangeability is a weaker condition than independence. Independent coin flips are exchangeable, but exchangeable sequences can have dependencies — they just have to be symmetric dependencies. De Finetti's theorem says that every infinite exchangeable sequence is a mixture of independent sequences: you first pick a coin with a random bias, then flip that coin independently.
Why does this concept exist? Because exchangeability captures the notion of "symmetric uncertainty" in a way that is both weaker and more realistic than independence, and the symmetric group is the exact algebraic structure encoding this symmetry.
Visual [Beginner]
A diagram showing five binary sequences of length 4, all with the same number of 1s but in different positions. An arrow labelled " acts by permuting positions" connects them, indicating they all have the same probability under exchangeability.
Exchangeability means all sequences with the same number of 1s are equally likely, regardless of the positions of the 1s.
Worked example [Beginner]
Consider sequences of three coin flips, each either H (heads) or T (tails). There are possible sequences.
Step 1. An exchangeable distribution assigns the same probability to all sequences with the same number of heads. There are four types: 0 heads (TTT), 1 head (HTT, THT, TTH), 2 heads (HHT, HTH, THH), and 3 heads (HHH).
Step 2. An exchangeable distribution is specified by four numbers with , where is the probability of each individual sequence with exactly heads.
Step 3. A mixture of i.i.d. coin flips works as follows: pick a bias from some distribution, then flip independently with probability of heads. The probability of a sequence with heads is the total of across all values of , weighted by the distribution of .
What this tells us: de Finetti's theorem says every exchangeable distribution on infinite sequences has this mixture-of-i.i.d. form. For finite sequences, the approximation is close: Diaconis and Freedman showed the error is at most in total variation distance for sequences of length .
Check your understanding [Beginner]
Formal definition [Intermediate+]
Let be a probability space and let be random variables taking values in a measurable space . The sequence is exchangeable if for every permutation :
where denotes equality in distribution. Equivalently, the joint law on is invariant under the action of on by coordinate permutation: .
An infinite sequence is exchangeable if every finite subsequence is exchangeable, for all .
Definition (Mixture of i.i.d.). A probability measure on is a mixture of i.i.d. measures if there exists a probability measure on the space of probability measures on such that:
Every mixture of i.i.d. measures is exchangeable (because the product measure is exchangeable and mixtures preserve exchangeability).
Counterexamples to common slips
- Exchangeability does not imply independence. The Polya urn model produces exchangeable but dependent sequences: drawing balls without replacement from an urn gives exchangeable draws that are negatively correlated.
- Finite exchangeability is strictly weaker than infinite extendibility. A sequence of length can be exchangeable without being extendable to an exchangeable sequence of length . The Diaconis-Freedman theorem quantifies the gap.
- De Finetti's theorem requires infinite sequences. The exact representation as a mixture of i.i.d. measures holds only for infinite exchangeable sequences. For finite sequences, the representation is approximate.
Key theorem with proof [Intermediate+]
Theorem (De Finetti's theorem — de Finetti 1937, Hewitt-Savage 1955). An infinite sequence of -valued random variables is exchangeable if and only if there exists a probability measure on such that for all and all :
Proof. We prove the binary case using the method of moments and the law of large numbers.
Step 1 (Exchangeability implies the frequency is sufficient). Define . By exchangeability, the probability depends only on . So the exchangeable law is determined by the numbers for .
Step 2 (Existence of the mixing measure). Define . By exchangeability and the de Finetti strong law (which we prove next), the empirical frequency converges almost surely to some random variable taking values in . Define as the distribution of .
Step 3 (Conditioning on the limit gives independence). Let be the tail -algebra. By the Kolmogorov zero-one law applied conditionally, is -measurable, and conditional on , the random variables are i.i.d. Bernoulli.
Step 4 (Verify the mixture formula). For any :
Bridge. De Finetti's theorem builds toward the spectral analysis of permutation data in 07.05.11 by identifying exchangeable sequences as the -invariant measures on the product space, and appears again in 07.05.13 where the Gelfand pair structure extends the exchangeability analysis to partial observations. The foundational reason the theorem works is that the tail -algebra provides the mixing measure, which is exactly the quotient of the product space by the infinite symmetric group action. The central insight is that exchangeability is invariance under , and this is exactly the condition that makes the Fourier analysis on the symmetric group relevant to probability theory; the bridge is between the group-theoretic notion of -invariance and the probabilistic notion of conditional independence, putting these together via the representation-theoretic machinery of 07.05.05.
Exercises [Intermediate+]
Advanced results [Master]
Theorem 1 (Hewitt-Savage 1955: general de Finetti theorem). Let be a standard Borel space. An infinite sequence of -valued random variables is exchangeable if and only if there exists a random probability measure on (with distribution on ) such that conditional on , the sequence is i.i.d. with law :
The mixing measure is unique. This extends de Finetti's binary theorem to arbitrary measurable spaces.
Theorem 2 (Diaconis-Freedman 1980: finite exchangeability approximation). For any exchangeable distribution on (where is finite), there exists a mixture of i.i.d. measures on such that . For binary sequences (), the bound is .
This was proved by Diaconis and Freedman 1980 in Ann. Probab. 8 using a direct combinatorial comparison between exchangeable and mixture-of-i.i.d. distributions.
Theorem 3 (Exchangeability and the Choquet theorem). The set of exchangeable measures on is a convex set. By the Choquet theorem, every exchangeable measure is an integral over the extreme points. The extreme points are the ergodic measures (measures that cannot be decomposed as a non-identity convex combination of other exchangeable measures). For infinite sequences, de Finetti's theorem identifies the extreme points as the product measures .
Theorem 4 (Aldous 1985: exchangeability and representation theory). The set of exchangeable probability measures on can be identified with the set of -invariant measures on , which in turn corresponds to the positive cone in the fixed-point subspace of acting on the space of measures. The spectral decomposition of this fixed-point subspace via the irreducible representations of gives the structure of the exchangeable measures.
This was developed by Aldous 1985 in Exchangeability and Related Topics and connects the probabilistic notion to the representation-theoretic framework.
Theorem 5 (Exchangeability and sufficiency). For an exchangeable sequence , the order statistic (the sorted values) is a sufficient statistic. This is a consequence of exchangeability: the likelihood depends on the data only through the empirical distribution.
Theorem 6 (Exchangeable pairs and Stein's method). An exchangeable pair is a pair of random variables such that . Stein's method for proving central limit theorems uses exchangeable pairs: if is exchangeable and , then is approximately Gaussian with variance controlled by and .
This was developed by Stein 1972 and systematised by Chatterjee 2008 in Stein's Method for Concentration Inequalities.
Synthesis. De Finetti's theorem provides the foundational reason that the symmetric group enters probability theory through exchangeability. The central insight is that -invariance of the joint law on is the probabilistic content of exchangeability, and putting these together with the Choquet theorem, the extreme -invariant measures are the product measures. This is exactly the content that builds toward the spectral analysis in 07.05.11 where the -Fourier transform decomposes exchangeable functions, and appears again in the partial ranking framework of 07.05.13 where -invariance replaces pure exchangeability. The bridge is between the group-theoretic notion of invariance and the probabilistic notion of conditional independence; the pattern generalises from binary sequences to arbitrary state spaces via the Hewitt-Savage theorem, and identifies the representation theory of as the natural language for symmetric dependence structures. The Diaconis-Freedman bound quantifies how closely finite exchangeability approximates the mixture-of-i.i.d. ideal, putting a precise number on the gap between the finite and infinite theories.
Full proof set [Master]
Proposition 1 (Exchangeable measures are -invariant). A probability measure on is exchangeable if and only if it is invariant under the action of on by coordinate permutation.
Proof. By definition, is exchangeable iff for every permutation and every measurable rectangle :
This is precisely the statement that for all and , which is -invariance of the measure.
Proposition 2 (Mixtures of i.i.d. are exchangeable). If is any probability measure on and , then is exchangeable.
Proof. For any and measurable :
The third equality uses that the product is commutative, so reordering the factors does not change the value.
Connections [Master]
Spectral analysis of permutation data
07.05.11. Exchangeable measures on are the -invariant measures, and the spectral decomposition of with respect to the -action provides the Fourier-analytic framework for studying them. The first-order spectral component in07.05.11corresponds to the marginal distribution (the "mean" of the mixing measure), and higher components correspond to higher moments of the mixing measure.Random walk upper bound lemma
07.05.05. The Upper Bound Lemma bounds the distance from an -invariant measure to uniformity by character sums, which is a spectral bound on exchangeable measures. The mixing time of random walks on finite groups produces exchangeable distributions at each time step, and the spectral analysis of exchangeability connects to the convergence analysis in07.05.05.Partially ranked data
07.05.13. The Gelfand pair extends the exchangeability framework: partial ranking data is exchangeable under a larger symmetry group (the subgroup permuting unranked items among themselves). The spectral decomposition of partial rankings in07.05.13is the Gelfand pair analogue of the de Finetti decomposition for full rankings.Character orthogonality
07.01.04. The character orthogonality relations of07.01.04are the analytical tool that makes the spectral decomposition of exchangeable measures work. The characters span the space of -invariant functions on , and the orthogonality relations ensure the decomposition is unique and explicit.
Historical & philosophical context [Master]
De Finetti introduced the concept of exchangeability and proved the representation theorem for binary sequences in his 1937 paper La prévision: ses lois logiques, ses sources subjectives [deFinetti1937] published in Ann. Inst. H. Poincaré 7. His motivation was to provide a subjective Bayesian foundation for probability: instead of assuming a "true" parameter , one assumes exchangeable observations and derives the existence of as a mathematical consequence.
Hewitt and Savage 1955 generalised de Finetti's theorem to arbitrary standard Borel spaces in Symmetric Measures on Cartesian Products [HewittSavage1955], establishing the result in its modern form using the Choquet theorem on extreme points of convex sets.
Diaconis and Freedman 1980 proved the finite exchangeability approximation in Finite Exchangeable Sequences [DiaconisFreedman1980] published in Ann. Probab. 8, showing that the total variation distance between an exchangeable distribution on variables and the closest mixture of i.i.d. is at most . This result quantifies the practical relevance of de Finetti's theorem for finite data.
Bibliography [Master]
@article{deFinetti1937,
author = {de Finetti, Bruno},
title = {La pr\'{e}vision: ses lois logiques, ses sources subjectives},
journal = {Ann. Inst. H. Poincar\'{e}},
volume = {7},
year = {1937},
pages = {1--68},
}
@article{HewittSavage1955,
author = {Hewitt, Edwin and Savage, Leonard J.},
title = {Symmetric Measures on Cartesian Products},
journal = {Trans. Amer. Math. Soc.},
volume = {80},
year = {1955},
pages = {470--501},
}
@article{DiaconisFreedman1980,
author = {Diaconis, Persi and Freedman, David},
title = {Finite Exchangeable Sequences},
journal = {Ann. Probab.},
volume = {8},
year = {1980},
pages = {739--759},
}
@book{Aldous1985,
author = {Aldous, David J.},
title = {Exchangeability and Related Topics},
publisher = {Springer},
year = {1985},
series = {Lecture Notes in Mathematics},
volume = {1117},
}