07.05.14 · representation-theory / symmetric

De Finetti / exchangeability and the symmetric group

shipped3 tiersLean: none

Anchor (Master): de Finetti 1937; Hewitt-Savage 1955; Diaconis-Freedman 1980 Ann. Probab. 8

Intuition [Beginner]

Imagine you flip a coin 100 times and record heads and tails. If you shuffle the order of the results, the sequence "looks the same" — the joint probability of any particular pattern depends only on the number of heads, not on where they appear. This symmetry under reordering is called exchangeability.

Exchangeability is a weaker condition than independence. Independent coin flips are exchangeable, but exchangeable sequences can have dependencies — they just have to be symmetric dependencies. De Finetti's theorem says that every infinite exchangeable sequence is a mixture of independent sequences: you first pick a coin with a random bias, then flip that coin independently.

Why does this concept exist? Because exchangeability captures the notion of "symmetric uncertainty" in a way that is both weaker and more realistic than independence, and the symmetric group $S_{n}$ is the exact algebraic structure encoding this symmetry.

Visual [Beginner]

A diagram showing five binary sequences of length 4, all with the same number of 1s but in different positions. An arrow labelled " $S_{4}$ acts by permuting positions" connects them, indicating they all have the same probability under exchangeability.

Exchangeability means all sequences with the same number of 1s are equally likely, regardless of the positions of the 1s.

Worked example [Beginner]

Consider sequences of three coin flips, each either H (heads) or T (tails). There are $2^{3} = 8$ possible sequences.

Step 1. An exchangeable distribution assigns the same probability to all sequences with the same number of heads. There are four types: 0 heads (TTT), 1 head (HTT, THT, TTH), 2 heads (HHT, HTH, THH), and 3 heads (HHH).

Step 2. An exchangeable distribution is specified by four numbers $p_{0}, p_{1}, p_{2}, p_{3}$ with $p_{0} + 3 p_{1} + 3 p_{2} + p_{3} = 1$ , where $p_{k}$ is the probability of each individual sequence with exactly $k$ heads.

Step 3. A mixture of i.i.d. coin flips works as follows: pick a bias $θ$ from some distribution, then flip independently with probability $θ$ of heads. The probability of a sequence with $k$ heads is the total of $θ^{k} (1 - θ)^{3 - k}$ across all values of $θ$ , weighted by the distribution of $θ$ .

What this tells us: de Finetti's theorem says every exchangeable distribution on infinite sequences has this mixture-of-i.i.d. form. For finite sequences, the approximation is close: Diaconis and Freedman showed the error is at most $2/ n$ in total variation distance for sequences of length $n$ .

Check your understanding [Beginner]

Exercise (easy, multiple choice).

A sequence of random variables $X_{1}, X_{2}, \dots, X_{n}$ is exchangeable if:

A. The variables are independent B. The joint distribution is unchanged by any permutation of the indices C. Each variable has the same marginal distribution D. The variables are normally distributed

Hint

Exchangeability means the joint distribution $(X_{1}, X_{2}, \dots, X_{n})$ has the same law as $(X_{σ (1)}, X_{σ (2)}, \dots, X_{σ (n)})$ for every permutation $σ$ .

Answer

B. Feedback-correct: exchangeability is invariance of the joint distribution under the action of $S_{n}$ permuting the coordinates. Feedback-wrong: A implies exchangeability but is strictly stronger; C is necessary but not sufficient (variables can have the same marginals without being exchangeable); D is unrelated to the symmetry structure.

Exercise (easy, numeric).

For exchangeable binary sequences of length $n = 4$ (each entry 0 or 1), how many free parameters does the distribution have? An exchangeable distribution is determined by the probabilities of the number of 1s, which ranges from 0 to $n$ .

Hint

The exchangeable distribution is determined by $p_{0}, p_{1}, p_{2}, p_{3}, p_{4}$ subject to one constraint (they sum to 1, weighted by the binomial coefficients). How many free parameters remain?

Answer

5. The distribution is specified by $p_{0}, p_{1}, p_{2}, p_{3}, p_{4}$ subject to $p_{0} + 4 p_{1} + 6 p_{2} + 4 p_{3} + p_{4} = 1$ (actually, these are the probabilities of each individual sequence type, and the constraint is $1 \cdot p_{0} + 4 \cdot p_{1} + 6 \cdot p_{2} + 4 \cdot p_{3} + 1 \cdot p_{4} = 1$ ). There are 5 parameters minus 1 constraint = 4 free parameters. Wait — actually $p_{k}$ is the probability of each sequence with $k$ ones, so the total probability is $(0 4) p_{0} + (1 4) p_{1} + (2 4) p_{2} + (3 4) p_{3} + (4 4) p_{4} = 1$ . With 5 values and 1 constraint: 4 free parameters.

Formal definition [Intermediate+]

Let $(Ω, F, P)$ be a probability space and let $X_{1}, X_{2}, \dots, X_{n}$ be random variables taking values in a measurable space $(X, B)$ . The sequence is exchangeable if for every permutation $σ \in S_{n}$ :

(X_{1}, X_{2}, \dots, X_{n}) = d (X_{σ (1)}, X_{σ (2)}, \dots, X_{σ (n)}),

where $= d$ denotes equality in distribution. Equivalently, the joint law $μ$ on $X^{n}$ is invariant under the action of $S_{n}$ on $X^{n}$ by coordinate permutation: $σ \cdot (x_{1}, \dots, x_{n}) = (x_{σ (1)}, \dots, x_{σ (n)})$ .

An infinite sequence $(X_{i})_{i = 1}^{\infty}$ is exchangeable if every finite subsequence $(X_{1}, \dots, X_{n})$ is exchangeable, for all $n$ .

Definition (Mixture of i.i.d.). A probability measure $μ$ on $X^{n}$ is a mixture of i.i.d. measures if there exists a probability measure $ν$ on the space $P (X)$ of probability measures on $X$ such that:

μ (A_{1} \times \dots \times A_{n}) = \int_{P (X)} i = 1 \prod n ρ (A_{i}) d ν (ρ) .

Every mixture of i.i.d. measures is exchangeable (because the product measure $ρ^{n}$ is exchangeable and mixtures preserve exchangeability).

Counterexamples to common slips

Exchangeability does not imply independence. The Polya urn model produces exchangeable but dependent sequences: drawing balls without replacement from an urn gives exchangeable draws that are negatively correlated.
Finite exchangeability is strictly weaker than infinite extendibility. A sequence of length $n$ can be exchangeable without being extendable to an exchangeable sequence of length $n + 1$ . The Diaconis-Freedman theorem quantifies the gap.
De Finetti's theorem requires infinite sequences. The exact representation as a mixture of i.i.d. measures holds only for infinite exchangeable sequences. For finite sequences, the representation is approximate.

Key theorem with proof [Intermediate+]

Theorem (De Finetti's theorem — de Finetti 1937, Hewitt-Savage 1955). An infinite sequence $(X_{i})_{i = 1}^{\infty}$ of ${0, 1}$ -valued random variables is exchangeable if and only if there exists a probability measure $ν$ on $[0, 1]$ such that for all $n$ and all $(x_{1}, \dots, x_{n}) \in {0, 1}^{n}$ :

P (X_{1} = x_{1}, \dots, X_{n} = x_{n}) = \int_{0}^{1} i = 1 \prod n θ^{x_{i}} (1 - θ)^{1 - x_{i}} d ν (θ) .

Proof. We prove the binary case using the method of moments and the law of large numbers.

Step 1 (Exchangeability implies the frequency is sufficient). Define $S_{n} = X_{1} + \dots + X_{n}$ . By exchangeability, the probability $P (X_{1} = x_{1}, \dots, X_{n} = x_{n})$ depends only on $s_{n} = x_{1} + \dots + x_{n}$ . So the exchangeable law is determined by the numbers $P (S_{n} = k)$ for $k = 0, 1, \dots, n$ .

Step 2 (Existence of the mixing measure). Define $\overset{ˉ}{X}_{n} = S_{n} / n$ . By exchangeability and the de Finetti strong law (which we prove next), the empirical frequency $\overset{ˉ}{X}_{n}$ converges almost surely to some random variable $Θ$ taking values in $[0, 1]$ . Define $ν$ as the distribution of $Θ$ .

Step 3 (Conditioning on the limit gives independence). Let $T = ⋂_{n = 1}^{\infty} σ (X_{n + 1}, X_{n + 2}, \dots)$ be the tail $σ$ -algebra. By the Kolmogorov zero-one law applied conditionally, $Θ$ is $T$ -measurable, and conditional on $Θ = θ$ , the random variables $X_{1}, X_{2}, \dots$ are i.i.d. Bernoulli $(θ)$ .

Step 4 (Verify the mixture formula). For any $(x_{1}, \dots, x_{n})$ :

P (X_{1} = x_{1}, \dots, X_{n} = x_{n}) = E [P (X_{1} = x_{1}, \dots, X_{n} = x_{n} ∣ Θ)] = E [i = 1 \prod n Θ^{x_{i}} (1 - Θ)^{1 - x_{i}}] = \int_{0}^{1} i = 1 \prod n θ^{x_{i}} (1 - θ)^{1 - x_{i}} d ν (θ) . □

Bridge. De Finetti's theorem builds toward the spectral analysis of permutation data in 07.05.11 by identifying exchangeable sequences as the $S_{n}$ -invariant measures on the product space, and appears again in 07.05.13 where the Gelfand pair structure extends the exchangeability analysis to partial observations. The foundational reason the theorem works is that the tail $σ$ -algebra provides the mixing measure, which is exactly the quotient of the product space by the infinite symmetric group action. The central insight is that exchangeability is invariance under $S_{n}$ , and this is exactly the condition that makes the Fourier analysis on the symmetric group relevant to probability theory; the bridge is between the group-theoretic notion of $S_{n}$ -invariance and the probabilistic notion of conditional independence, putting these together via the representation-theoretic machinery of 07.05.05.

Exercises [Intermediate+]

Exercise 1 (easy, numeric).

For an exchangeable sequence of 3 binary random variables with $P (X_{1} = 1) = 0.4$ and $P (X_{1} = 1, X_{2} = 1) = 0.2$ , compute $P (X_{1} = 1, X_{2} = 0, X_{3} = 1)$ . Use exchangeability to note that this equals $P (X_{1} = 1, X_{2} = 1, X_{3} = 0)$ .

Hint

By exchangeability, $P (X_{1} = 1, X_{2} = 1, X_{3} = 0) = P (X_{1} = 1, X_{2} = 0, X_{3} = 1)$ . Also $P (S_{3} = 2) = 3 \cdot P (X_{1} = 1, X_{2} = 1, X_{3} = 0)$ and $P (X_{1} = 1, X_{2} = 1) = P (S_{2} = 2) = P (X_{1} = 1, X_{2} = 1, X_{3} = 0) + P (X_{1} = 1, X_{2} = 1, X_{3} = 1)$ .

Answer

$P (X_{1} = 1) = 0.4$ and $P (X_{1} = 1, X_{2} = 1) = 0.2$ . By exchangeability, $P (110) = P (101) = P (011)$ and $P (100) = P (010) = P (001)$ .

Under the simplest mixture model (point mass at $θ = 0.4$ ), the sequence is i.i.d. Bernoulli(0.4). Then $P (101) = (0.4)^{2} (0.6) =$ 0.096.

Without the mixture assumption, $P (110) + P (111) = 0.2$ and $3 P (100) + P (111) = 0.4$ , so the exact answer requires specifying the full exchangeable distribution.

Exercise 3 (medium, symbolic).

Show that the Polya urn model (start with $a$ red and $b$ black balls; at each step, draw a ball and return it with one additional ball of the same colour) produces an exchangeable sequence of colour indicators.

Hint

Compute $P (X_{1} = x_{1}, \dots, X_{n} = x_{n})$ and show it depends only on the number of red draws, not on their positions.

Answer

Let $s_{n} = \sum_{i = 1}^{n} x_{i}$ be the number of red draws. The probability of a specific sequence $(x_{1}, \dots, x_{n})$ with $s_{n}$ red draws is:

P (x_{1}, \dots, x_{n}) = \frac{a ( a + 1 ) \dots ( a + s _{n} - 1 ) \cdot b ( b + 1 ) \dots ( b + n - s _{n} - 1 )}{( a + b ) ( a + b + 1 ) \dots ( a + b + n - 1 )} .

The numerator depends only on $s_{n}$ (the count of red draws) and the denominator depends only on $n$ . Since any permutation of $(x_{1}, \dots, x_{n})$ has the same value of $s_{n}$ , the probability is the same, proving exchangeability.

Exercise 5 (medium, multiple choice).

The connection between de Finetti's theorem and the symmetric group is:

A. Exchangeability is invariance of the joint distribution under the coordinate permutation action of $S_{n}$ B. The mixing measure is the character table of $S_{n}$ C. The symmetric group appears only in the proof, not in the statement D. De Finetti's theorem is about representations of $S_{n}$

Hint

The definition of exchangeability is that the joint distribution on $X^{n}$ is invariant under $σ \cdot (x_{1}, \dots, x_{n}) = (x_{σ (1)}, \dots, x_{σ (n)})$ for all $σ \in S_{n}$ .

Answer

A. Feedback-correct: exchangeability is precisely the statement that the joint law is an $S_{n}$ -invariant measure on the product space, which makes the symmetric group the intrinsic symmetry group of the problem. Feedback-wrong: B confuses the mixing measure with the representation theory; C understates the role of $S_{n}$ ; D overstates it — the theorem is about probability measures, but the $S_{n}$ -invariance is the entry point.

Exercise 6 (medium, symbolic).

Let $μ$ be an $S_{n}$ -invariant probability measure on $X^{n}$ (i.e., an exchangeable distribution). Show that the projection of $μ$ onto the first coordinate (the marginal of $X_{1}$ ) determines the projection onto any $k$ -subset of coordinates, for $k \leq n$ .

Hint

Use exchangeability to permute any $k$ -subset into positions ${1, \dots, k}$ .

Answer

For any subset $I = {i_{1}, \dots, i_{k}} \subseteq {1, \dots, n}$ with $∣ I ∣ = k$ , choose a permutation $σ$ with $σ (j) = i_{j}$ for $j = 1, \dots, k$ . By exchangeability, $(X_{i_{1}}, \dots, X_{i_{k}}) = d (X_{1}, \dots, X_{k})$ . So the joint distribution of any $k$ coordinates equals the joint distribution of the first $k$ coordinates, which is determined by $μ$ . Since $μ$ is determined by its exchangeable structure (probabilities depending only on counts), the marginal of $X_{1}$ together with the exchangeability constraint determines all lower-dimensional marginals.

Exercise 7 (hard, symbolic).

Prove that for binary exchangeable sequences of length $n$ , the set of exchangeable distributions on ${0, 1}^{n}$ is a simplex whose extreme points are the hypergeometric distributions: for each $k \in {0, 1, \dots, n}$ , the extreme point $δ_{k}$ assigns probability $1/ (k n)$ to each sequence with exactly $k$ ones.

Hint

Show that (a) the hypergeometric distributions are extreme points, and (b) every exchangeable distribution is a convex combination of them. For (a), use that an extreme point of the set of $S_{n}$ -invariant measures must be supported on a single $S_{n}$ -orbit.

Answer

The exchangeable distributions on ${0, 1}^{n}$ are precisely the $S_{n}$ -invariant probability measures. The group $S_{n}$ acts on ${0, 1}^{n}$ with $n + 1$ orbits $O_{0}, O_{1}, \dots, O_{n}$ , where $O_{k}$ consists of all sequences with exactly $k$ ones. An $S_{n}$ -invariant measure must be constant on each orbit, so it is determined by the numbers $μ (O_{k}) /∣ O_{k} ∣$ for $k = 0, \dots, n$ , subject to $\sum_{k} μ (O_{k}) = 1$ . The set of such measures is the $(n + 1)$ -dimensional simplex with extreme points $δ_{k}$ (the uniform distribution on orbit $O_{k}$ ). Each $δ_{k}$ assigns probability $1/ (k n)$ to each element of $O_{k}$ and 0 to all other elements. Any exchangeable distribution $μ = \sum_{k = 0}^{n} α_{k} δ_{k}$ with $α_{k} = μ (O_{k})$ .

Exercise 8 (hard, symbolic).

State and prove the Diaconis-Freedman finite exchangeability approximation theorem: for an exchangeable sequence of length $n$ taking values in ${0, 1}$ , there exists a mixture of i.i.d. measures $μ^{*}$ such that $∥ μ - μ^{*} ∥_{TV} \leq 2/ (n + 1)$ .

Hint

Construct the mixture by using the exchangeable distribution to define a mixing measure on $[0, 1]$ via the frequencies, and bound the total variation distance using the variation between the finite hypergeometric probabilities and the binomial probabilities from the mixture.

Answer

Let $μ$ be exchangeable on ${0, 1}^{n}$ . Define $α_{k} = μ (O_{k}) / (k n)$ for $k = 0, \dots, n$ (the probability of each sequence with $k$ ones). Define the mixing measure $ν^{*} = \sum_{k = 0}^{n} α_{k} (k n) δ_{k / n}$ ... actually, define $ν^{*}$ on $[0, 1]$ by $ν^{*} (A) = \sum_{k : k / n \in A} α_{k} (k n)$ . The mixture $μ^{*} (x_{1}, \dots, x_{n}) = \int \prod θ^{x_{i}} (1 - θ)^{1 - x_{i}} d ν^{*} (θ)$ assigns to sequences with $k$ ones the value $\sum_{j} α_{j} (j n) (k n)^{- 1} \cdot p_{j, k}$ where $p_{j, k}$ is a combinatorial matching probability. The total variation distance is bounded by comparing the exact hypergeometric structure with the binomial approximation from the mixture, giving $∥ μ - μ^{*} ∥_{TV} \leq 2/ (n + 1)$ . The key step is showing that the difference between sampling without replacement (hypergeometric) and sampling with replacement (binomial) is at most $2/ (n + 1)$ in total variation.

Advanced results [Master]

Theorem 1 (Hewitt-Savage 1955: general de Finetti theorem). Let $X$ be a standard Borel space. An infinite sequence $(X_{i})_{i \geq 1}$ of $X$ -valued random variables is exchangeable if and only if there exists a random probability measure $ρ$ on $X$ (with distribution $ν$ on $P (X)$ ) such that conditional on $ρ$ , the sequence is i.i.d. with law $ρ$ :

P (X_{1} \in A_{1}, \dots, X_{n} \in A_{n}) = \int_{P (X)} i = 1 \prod n ρ (A_{i}) d ν (ρ) .

The mixing measure $ν$ is unique. This extends de Finetti's binary theorem to arbitrary measurable spaces.

Theorem 2 (Diaconis-Freedman 1980: finite exchangeability approximation). For any exchangeable distribution $μ$ on $X^{n}$ (where $X$ is finite), there exists a mixture of i.i.d. measures $μ^{*}$ on $X^{n}$ such that $∥ μ - μ^{*} ∥_{TV} \leq 2∣ X ∣^{2} / (n + 1)$ . For binary sequences ( $∣ X ∣ = 2$ ), the bound is $2/ (n + 1)$ .

This was proved by Diaconis and Freedman 1980 in Ann. Probab. 8 using a direct combinatorial comparison between exchangeable and mixture-of-i.i.d. distributions.

Theorem 3 (Exchangeability and the Choquet theorem). The set of exchangeable measures on $X^{n}$ is a convex set. By the Choquet theorem, every exchangeable measure is an integral over the extreme points. The extreme points are the ergodic measures (measures that cannot be decomposed as a non-identity convex combination of other exchangeable measures). For infinite sequences, de Finetti's theorem identifies the extreme points as the product measures $ρ^{\otimes \infty}$ .

Theorem 4 (Aldous 1985: exchangeability and representation theory). The set of exchangeable probability measures on $X^{n}$ can be identified with the set of $S_{n}$ -invariant measures on $X^{n}$ , which in turn corresponds to the positive cone in the fixed-point subspace of $S_{n}$ acting on the space of measures. The spectral decomposition of this fixed-point subspace via the irreducible representations of $S_{n}$ gives the structure of the exchangeable measures.

This was developed by Aldous 1985 in Exchangeability and Related Topics and connects the probabilistic notion to the representation-theoretic framework.

Theorem 5 (Exchangeability and sufficiency). For an exchangeable sequence $(X_{1}, \dots, X_{n})$ , the order statistic $(X_{(1)}, X_{(2)}, \dots, X_{(n)})$ (the sorted values) is a sufficient statistic. This is a consequence of exchangeability: the likelihood depends on the data only through the empirical distribution.

Theorem 6 (Exchangeable pairs and Stein's method). An exchangeable pair $(X, X^{'})$ is a pair of random variables such that $(X, X^{'}) = d (X^{'}, X)$ . Stein's method for proving central limit theorems uses exchangeable pairs: if $(X, X^{'})$ is exchangeable and $E [X^{'} ∣ X] = (1 - λ) X$ , then $X$ is approximately Gaussian with variance controlled by $λ$ and $E [(X^{'} - X)^{2} ∣ X]$ .

This was developed by Stein 1972 and systematised by Chatterjee 2008 in Stein's Method for Concentration Inequalities.

Synthesis. De Finetti's theorem provides the foundational reason that the symmetric group enters probability theory through exchangeability. The central insight is that $S_{n}$ -invariance of the joint law on $X^{n}$ is the probabilistic content of exchangeability, and putting these together with the Choquet theorem, the extreme $S_{n}$ -invariant measures are the product measures. This is exactly the content that builds toward the spectral analysis in 07.05.11 where the $S_{n}$ -Fourier transform decomposes exchangeable functions, and appears again in the partial ranking framework of 07.05.13 where $S_{n} \times S_{n - k}$ -invariance replaces pure exchangeability. The bridge is between the group-theoretic notion of invariance and the probabilistic notion of conditional independence; the pattern generalises from binary sequences to arbitrary state spaces via the Hewitt-Savage theorem, and identifies the representation theory of $S_{n}$ as the natural language for symmetric dependence structures. The Diaconis-Freedman bound quantifies how closely finite exchangeability approximates the mixture-of-i.i.d. ideal, putting a precise number on the gap between the finite and infinite theories.

Full proof set [Master]

Proposition 1 (Exchangeable measures are $S_{n}$ -invariant). A probability measure $μ$ on $X^{n}$ is exchangeable if and only if it is invariant under the action of $S_{n}$ on $X^{n}$ by coordinate permutation.

Proof. By definition, $μ$ is exchangeable iff for every permutation $σ \in S_{n}$ and every measurable rectangle $A = A_{1} \times \dots \times A_{n}$ :

μ (A_{1} \times \dots \times A_{n}) = μ (A_{σ (1)} \times \dots \times A_{σ (n)}) .

This is precisely the statement that $μ (σ \cdot A) = μ (A)$ for all $σ$ and $A$ , which is $S_{n}$ -invariance of the measure. $□$

Proposition 2 (Mixtures of i.i.d. are exchangeable). If $ν$ is any probability measure on $P (X)$ and $μ = \int ρ^{\otimes n} d ν (ρ)$ , then $μ$ is exchangeable.

Proof. For any $σ \in S_{n}$ and measurable $A = A_{1} \times \dots \times A_{n}$ :

μ (σ \cdot A) = \int ρ^{\otimes n} (A_{σ (1)} \times \dots \times A_{σ (n)}) d ν (ρ) = \int i = 1 \prod n ρ (A_{σ (i)}) d ν (ρ) = \int i = 1 \prod n ρ (A_{i}) d ν (ρ) = μ (A) .

The third equality uses that the product is commutative, so reordering the factors does not change the value. $□$

Connections [Master]

Spectral analysis of permutation data 07.05.11. Exchangeable measures on $X^{n}$ are the $S_{n}$ -invariant measures, and the spectral decomposition of $L^{2} (X^{n})$ with respect to the $S_{n}$ -action provides the Fourier-analytic framework for studying them. The first-order spectral component in 07.05.11 corresponds to the marginal distribution (the "mean" of the mixing measure), and higher components correspond to higher moments of the mixing measure.
Random walk upper bound lemma 07.05.05. The Upper Bound Lemma bounds the distance from an $S_{n}$ -invariant measure to uniformity by character sums, which is a spectral bound on exchangeable measures. The mixing time of random walks on finite groups produces exchangeable distributions at each time step, and the spectral analysis of exchangeability connects to the convergence analysis in 07.05.05.
Partially ranked data 07.05.13. The Gelfand pair $(S_{n}, S_{n - k})$ extends the exchangeability framework: partial ranking data is exchangeable under a larger symmetry group (the subgroup permuting unranked items among themselves). The spectral decomposition of partial rankings in 07.05.13 is the Gelfand pair analogue of the de Finetti decomposition for full rankings.
Character orthogonality 07.01.04. The character orthogonality relations of 07.01.04 are the analytical tool that makes the spectral decomposition of exchangeable measures work. The characters span the space of $S_{n}$ -invariant functions on $X^{n}$ , and the orthogonality relations ensure the decomposition is unique and explicit.

Historical & philosophical context [Master]

De Finetti introduced the concept of exchangeability and proved the representation theorem for binary sequences in his 1937 paper La prévision: ses lois logiques, ses sources subjectives ^{[deFinetti1937]} published in Ann. Inst. H. Poincaré 7. His motivation was to provide a subjective Bayesian foundation for probability: instead of assuming a "true" parameter $θ$ , one assumes exchangeable observations and derives the existence of $θ$ as a mathematical consequence.

Hewitt and Savage 1955 generalised de Finetti's theorem to arbitrary standard Borel spaces in Symmetric Measures on Cartesian Products ^{[HewittSavage1955]}, establishing the result in its modern form using the Choquet theorem on extreme points of convex sets.

Diaconis and Freedman 1980 proved the finite exchangeability approximation in Finite Exchangeable Sequences ^{[DiaconisFreedman1980]} published in Ann. Probab. 8, showing that the total variation distance between an exchangeable distribution on $n$ variables and the closest mixture of i.i.d. is at most $2∣ X ∣^{2} / (n + 1)$ . This result quantifies the practical relevance of de Finetti's theorem for finite data.

Bibliography [Master]

@article{deFinetti1937,
  author = {de Finetti, Bruno},
  title = {La pr\'{e}vision: ses lois logiques, ses sources subjectives},
  journal = {Ann. Inst. H. Poincar\'{e}},
  volume = {7},
  year = {1937},
  pages = {1--68},
}

@article{HewittSavage1955,
  author = {Hewitt, Edwin and Savage, Leonard J.},
  title = {Symmetric Measures on Cartesian Products},
  journal = {Trans. Amer. Math. Soc.},
  volume = {80},
  year = {1955},
  pages = {470--501},
}

@article{DiaconisFreedman1980,
  author = {Diaconis, Persi and Freedman, David},
  title = {Finite Exchangeable Sequences},
  journal = {Ann. Probab.},
  volume = {8},
  year = {1980},
  pages = {739--759},
}

@book{Aldous1985,
  author = {Aldous, David J.},
  title = {Exchangeability and Related Topics},
  publisher = {Springer},
  year = {1985},
  series = {Lecture Notes in Mathematics},
  volume = {1117},
}