07.05.11 · representation-theory / symmetric

Spectral analysis of permutation-valued data

shipped3 tiersLean: none

Anchor (Master): Diaconis 1989 J. Amer. Statist. Assoc. 84; Marden 1995 Analyzing and Modeling Rank Data

Intuition [Beginner]

Imagine you collect survey responses where people rank five items from best to worst. Each person's response is a permutation — a reordering of the items. You have hundreds of such rankings and want to find the main "patterns" in the data.

Fourier analysis on the symmetric group does for permutation data what ordinary Fourier analysis does for time-series data: it decomposes a complicated signal into simple frequency components. Instead of sines and cosines, the "frequencies" are the irreducible representations of the symmetric group, indexed by partitions. Each component captures a different layer of structure in the ranking data.

Why does this concept exist? Because classical statistical tools assume numerical data, but rankings are permutations — and the representation theory of the symmetric group provides the natural harmonic analysis for functions defined on permutations.

Visual [Beginner]

A diagram showing a bar chart of ranking data on three items (six possible rankings) being decomposed into three spectral components. The first component is the uniform average; the second captures "item 1 tends to be ranked first"; the third captures a pairwise swap effect.

The uniform component gives the overall popularity; the second-order component identifies which items are consistently preferred; higher components capture finer interactions.

Worked example [Beginner]

Consider ranking data on $n = 3$ items. There are $3! = 6$ possible rankings. Imagine we observe the following counts from 60 respondents: the identity ranking (1,2,3) appears 20 times, the swap (2,1,3) appears 10 times, and each of the remaining four rankings appears 7 or 8 times.

Step 1. Convert counts to a probability distribution $f$ on the six permutations. For the identity $f (e) = 20/60 = 1/3$ ; for the swap $f ((12)) = 10/60 = 1/6$ ; for the others roughly $1/8$ .

Step 2. Compute the first spectral component: the "mean" level, which is the uniform distribution $U$ giving $1/6$ to each permutation. The deviation $f - U$ measures how far the data strays from uniform.

Step 3. The next spectral component projects onto the standard representation, which detects whether item 1 is systematically ranked too high or too low. Here, $f (e) - 1/6 = 1/6$ and $f ((12)) - 1/6 = 0$ , so the data is concentrated on rankings where item 1 is first.

What this tells us: the spectral decomposition separates the overall level (uniform), the first-order preference effects (item 1 is preferred), and the residual structure, giving a hierarchical summary of the ranking data.

Check your understanding [Beginner]

Formal definition [Intermediate+]

Let $S_{n}$ be the symmetric group on $n$ letters with irreducible representations $V^{λ}$ indexed by partitions $λ ⊢ n$ , of dimension $d_{λ}$ . Let $ρ^{λ} : S_{n} \to GL (V^{λ})$ denote the irreducible representation indexed by $λ$ .

For a function $f : S_{n} \to C$ (e.g., a probability distribution or a data histogram), the Fourier transform of $f$ at $λ$ is the $d_{λ} \times d_{λ}$ matrix

\hat{f} (λ) = π \in S_{n} \sum f (π) ρ^{λ} (π) .

The spectral decomposition of $f$ is the collection ${\hat{f} (λ)}_{λ ⊢ n}$ of Fourier coefficients at all irreducible representations. The inverse Fourier formula reconstructs $f$ from its spectrum:

f (π) = \frac{1}{n !} λ ⊢ n \sum d_{λ} tr (ρ^{λ} (π^{- 1}) \hat{f} (λ)) .

For ranking data, the key observation is that the isotypic components correspond to interpretable statistical effects. The component at $λ = (n)$ is the overall mean (uniform distribution). The component at $λ = (n - 1, 1)$ captures first-order effects (marginal preferences for individual items). The components at $λ = (n - 2, 2)$ and $(n - 2, 1, 1)$ capture second-order effects (pairwise interactions).

Counterexamples to common slips

Confusing Fourier coefficients with eigenvalues. The Fourier coefficient $\hat{f} (λ)$ is a matrix, not a scalar. Its eigenvalues carry spectral information, but the coefficient itself is the basic object.
Omitting the dimension weighting. The inverse formula weights by $d_{λ}$ , not by 1. For $S_{n}$ the dimensions $d_{λ}$ vary widely (e.g., the standard representation has $d_{(n - 1, 1)} = n - 1$ ), so this weighting is essential.
Assuming abelian Fourier analysis suffices. For abelian groups every irreducible is 1-dimensional and the Fourier transform produces scalars. For $S_{n}$ the irreducibles have dimensions up to roughly $n!$ , so the matrix-valued transform carries much more information.

Key theorem with proof [Intermediate+]

Theorem (Spectral decomposition for ranking data — Diaconis 1989). Let $f : S_{n} \to [0, 1]$ be a probability distribution on permutations. Define the first-order projection $f_{1}$ and the second-order projection $f_{2}$ by projecting $f$ onto the isotypic subspaces corresponding to the partitions $(n - 1, 1)$ and ${(n - 2, 2), (n - 2, 1, 1)}$ respectively. Then:

∥ f - U ∥_{2}^{2} = ∥ \hat{f} ((n - 1, 1)) ∥_{HS}^{2} + λ ⊢ n λ \neq = (n), (n - 1, 1) \sum \frac{d _{λ}}{n !} ∥ \hat{f} (λ) ∥_{HS}^{2},

and the first-order term $∥ \hat{f} ((n - 1, 1)) ∥_{HS}^{2} / (n! \cdot d_{(n - 1, 1)})$ equals the sum of squared deviations of the marginal ranking probabilities from uniform:

\frac{1}{n - 1} i = 1 \sum n j = 1 \sum n (P (π (i) = j) - \frac{1}{n})^{2} .

Proof. The proof has three steps.

Step 1 (Plancherel decomposition). By the Plancherel formula for $S_{n}$ :

π \in S_{n} \sum ∣ f (π) - U (π) ∣^{2} = \frac{1}{n !} λ ⊢ n \sum d_{λ} ∥ f - U (λ) ∥_{HS}^{2} .

Since $f - U ((n)) = \hat{f} ((n)) - 1 = 0$ (as $f$ is a probability measure, its Fourier transform at the identity representation equals 1), the sum starts at $λ = (n - 1, 1)$ .

Step 2 (First-order identification). The standard representation $V^{(n - 1, 1)}$ has a concrete realisation: it is the $(n - 1)$ -dimensional subspace of $C^{n}$ consisting of vectors whose coordinates sum to zero, with $S_{n}$ acting by permutation of coordinates. The Fourier coefficient $\hat{f} ((n - 1, 1))$ can be computed entry by entry:

\hat{f} ((n - 1, 1))_{a, b} = π \sum f (π) \cdot [ρ^{(n - 1, 1)} (π)]_{a, b} .

Using the basis $e_{i} - e_{i + 1}$ for $i = 1, \dots, n - 1$ , the matrix entries of $ρ^{(n - 1, 1)} (π)$ encode whether $π$ maps $i$ to $j$ . The squared Hilbert-Schmidt norm then becomes a sum over the marginal probabilities $P (π (i) = j) = \sum_{π : π (i) = j} f (π)$ :

\frac{d _{(n - 1, 1)}}{n !} ∥ \hat{f} ((n - 1, 1)) ∥_{HS}^{2} = (n - 1) i, j \sum (P (π (i) = j) - \frac{1}{n})^{2} .

The factor $(n - 1)$ arises from the relationship between the trace inner product on the $(n - 1)$ -dimensional representation and the full $n \times n$ permutation matrix entries.

Step 3 (Combine). Separating the $(n - 1, 1)$ term from the rest of the Plancherel sum gives the stated decomposition. The first-order term captures all marginal (single-item) effects; the remaining terms capture all higher-order interactions. $□$

Bridge. This spectral decomposition builds toward the analysis of partial rankings in 07.05.13 where the coset structure $S_{n} / S_{n - k}$ requires the representation theory of Gelfand pairs, and appears again in the metric analysis of 07.05.12 where the character-theoretic expressions for distances between permutations emerge from the same Fourier coefficients. The foundational reason the decomposition works is that the group algebra $C [S_{n}]$ decomposes as a direct product of matrix algebras $\prod_{λ} M_{d_{λ}} (C)$ , which is exactly the Artin-Wedderburn decomposition. The bridge is between statistical data analysis on permutations and the harmonic analysis on $S_{n}$ developed in 07.05.01; putting these together, every test statistic on ranking data has a representation-theoretic interpretation as a spectral norm.

Exercises [Intermediate+]

Exercise 2 (easy, multiple choice).

In the spectral decomposition of ranking data, the component corresponding to the partition $(n - 1, 1)$ captures which statistical effect?

A. The overall uniform distribution B. First-order effects: how likely each item is to be ranked in each position C. The sign of each permutation in the data D. The total number of inversions

Hint

The representation $(n - 1, 1)$ is the standard representation, which records how items are displaced from their "expected" positions.

Answer

B. Feedback-correct: the first-order projection measures deviations of the marginal probabilities $P (π (i) = j)$ from $1/ n$ , which is exactly how likely item $i$ is to be in position $j$ . Feedback-wrong: A is the identity representation $(n)$ ; C is the sign representation $(1^{n})$ ; D is a scalar function, not tied to a single isotypic component.

Exercise 4 (medium, numeric).

For the Mallows model on $S_{3}$ with center $π_{0} = e$ (identity) and dispersion parameter $θ = 1$ : $f (π) = e^{- d (π, e)} / Z$ where $d$ is Cayley distance and $Z$ is the normalising constant. Compute $\hat{f} ((2, 1))$ , which is a $2 \times 2$ matrix. Give its trace.

Hint

The Cayley distance $d (π, e)$ equals $n$ minus the number of cycles of $π$ . For $S_{3}$ : $d (e, e) = 0$ , $d (transp) = 1$ , $d (3-cycle) = 2$ . So $f (e) = 1/ Z$ , $f (transp) = e^{- 1} / Z$ , $f (3-cycle) = e^{- 2} / Z$ . Compute $Z = 1 + 3 e^{- 1} + 2 e^{- 2}$ . The character of $(2, 1)$ is $χ_{(2, 1)} (e) = 2$ , $χ_{(2, 1)} (transp) = 0$ , $χ_{(2, 1)} (3-cycle) = - 1$ .

Answer

Since the Mallows model is conjugation-invariant (it depends only on the cycle type, i.e., the conjugacy class), the Fourier transform at every irreducible is a scalar matrix: $\hat{f} (λ) = \frac{χ _{λ} ( C )}{d _{λ}} \hat{Q} (λ)$ ... more directly, $\hat{f} (λ) = c_{λ} I_{d_{λ}}$ with $c_{λ} = \sum_{π} f (π) χ_{λ} (π) / d_{λ}$ .

For $λ = (2, 1)$ : $c_{(2, 1)} = (f (e) \cdot 2 + f (transp) \cdot 0 + f (3-cycle) \cdot (- 1)) /2$ .

$Z = 1 + 3 e^{- 1} + 2 e^{- 2} \approx 1 + 1.1036 + 0.2707 = 2.3743$ .

$f (e) = 1/2.3743 \approx 0.4212$ , $f (3-cycle) = e^{- 2} /2.3743 \approx 0.1140$ .

$c_{(2, 1)} = (0.4212 \cdot 2 + 0.1140 \cdot (- 1)) /2 = (0.8424 - 0.1140) /2 = 0.7284/2 = 0.3642$ .

Trace $= c_{(2, 1)} \cdot d_{(2, 1)} = 0.3642 \cdot 2 \approx$ 0.73.

Exercise 5 (medium, multiple choice).

Why is the spectral decomposition of ranking data called "spectral"?

A. Because it uses the spectrum of light B. Because it decomposes the data into eigencomponents of the group algebra, analogous to the spectral theorem for matrices C. Because it was developed by a person named Specter D. Because it requires spectral sequences

Hint

Think about the analogy with the spectral theorem in linear algebra, which decomposes a matrix into eigenvalues and eigenvectors.

Answer

B. Feedback-correct: the group algebra $C [S_{n}]$ decomposes as $\prod_{λ} M_{d_{λ}} (C)$ by Artin-Wedderburn, and projecting a data function onto the matrix blocks is the analogue of diagonalising a matrix — the "spectrum" is the collection of Fourier coefficients. Feedback-wrong: A is a physics analogy without mathematical content; C is a fabrication; D confuses representation theory with homological algebra.

Exercise 7 (hard, symbolic).

Prove the Plancherel formula for $S_{n}$ : for any $f, g : S_{n} \to C$ ,

π \in S_{n} \sum f (π) \overline{g (π)} = \frac{1}{n !} λ ⊢ n \sum d_{λ} tr (\hat{f} (λ) \overset{g}{^} (λ)^{*}) .

Hint

Use the orthogonality of matrix coefficients: the functions $d_{λ} / n! ρ_{ij}^{λ}$ form an orthonormal basis of $L^{2} (S_{n})$ .

Answer

The matrix coefficients ${ρ_{ij}^{λ} : λ ⊢ n, 1 \leq i, j \leq d_{λ}}$ form an orthogonal basis of $L^{2} (S_{n})$ with $⟨ ρ_{ij}^{λ}, ρ_{k l}^{μ} ⟩ = (n! / d_{λ}) δ_{λ μ} δ_{ik} δ_{j l}$ . Expanding $f$ in this basis: $f (π) = \frac{1}{n !} \sum_{λ} d_{λ} \sum_{i, j} \hat{f} (λ)_{j i} ρ_{ij}^{λ} (π)$ . Then $⟨ f, g ⟩ = \sum_{π} f (π) \overline{g (π)}$ . Substituting the expansions and using orthogonality: each $(i, j)$ cross-term gives $\frac{d _{λ}}{n !} \hat{f} (λ)_{j i} \overline{\overset{g}{^} (λ)_{j i}}$ . Summing over $i, j$ gives $\frac{d _{λ}}{n !} tr (\hat{f} (λ) \overset{g}{^} (λ)^{*})$ . Summing over $λ$ gives the Plancherel formula.

Exercise 8 (hard, symbolic).

For the Mallows model $f (π) = e^{- θ \cdot d_{K} (π, π_{0})} / Z$ where $d_{K}$ is the Kendall tau distance and $π_{0}$ is the identity, show that the first-order spectral coefficient $\hat{f} ((n - 1, 1))$ determines the parameter $θ$ uniquely (for $θ > 0$ ).

Hint

The Mallows model is right-invariant: $f (π σ) = f (σ)$ for $σ = π_{0}^{- 1} π$ , so it is conjugation-invariant when $π_{0} = e$ . Compute the character sum $\sum_{π} f (π) χ_{(n - 1, 1)} (π)$ in terms of $θ$ and show it is strictly monotone in $θ$ .

Answer

For $π_{0} = e$ , the Mallows model depends only on the cycle type via Kendall tau distance. The character $χ_{(n - 1, 1)}$ takes values $χ_{(n - 1, 1)} (e) = n - 1$ , $χ_{(n - 1, 1)} (transp) = n - 3$ , $χ_{(n - 1, 1)} (3-cycle) = n - 4$ , etc. The scalar Fourier coefficient is $c (θ) = \frac{1}{n - 1} \sum_{π} f (π) χ_{(n - 1, 1)} (π)$ . Since $f$ is strictly decreasing in $d_{K}$ and $χ_{(n - 1, 1)}$ is a class function that distinguishes conjugacy classes, $c (θ)$ is a strictly monotone function of $θ$ (as $θ$ increases, more weight concentrates on $e$ where the character is largest). Hence $c (θ)$ is injective for $θ > 0$ , and $θ$ is determined by the first-order coefficient.

Advanced results [Master]

Theorem 1 (Diaconis 1989: spectral analysis generalises ANOVA). The decomposition of $L^{2} (S_{n})$ into isotypic components generalises the classical ANOVA decomposition for factorial designs. The identity component gives the grand mean; the $(n - 1, 1)$ component gives the main effects; the $(n - 2, 2)$ and $(n - 2, 1, 1)$ components give the two-factor interactions; and so on through the partition lattice.

This was established in Diaconis 1989 J. Amer. Statist. Assoc. 84, where the isomorphism between isotypic components and ANOVA effects was made explicit through the relationship between irreducible characters and the inclusion-exclusion structure of marginal probabilities.

Theorem 2 (Mallows model: explicit Fourier coefficients). For the Mallows model $f (π) = e^{- θ d_{K} (π, π_{0})} / Z$ with Kendall tau distance $d_{K}$ , the Fourier coefficients satisfy $\hat{f} (λ) = c_{λ} I_{d_{λ}}$ where $c_{λ}$ is expressible as a rational function of $q = e^{- θ}$ involving the hook lengths of $λ$ .

The closed-form expression was given by Diaconis 1989 and developed further by Marden 1995. The coefficient at $λ = (n - 1, 1)$ equals $(1 - q) / (1 - q^{n})$ , providing a direct link between the dispersion parameter and the first-order spectral component.

Theorem 3 (Consistency of spectral estimates). If $X_{1}, \dots, X_{N}$ are i.i.d. draws from a distribution $f$ on $S_{n}$ , then the empirical Fourier coefficients $\hat{f}_{N} (λ) = \frac{1}{N} \sum_{k = 1}^{N} ρ^{λ} (X_{k})$ satisfy $\hat{f}_{N} (λ) \to \hat{f} (λ)$ almost surely as $N \to \infty$ , and $N (\hat{f}_{N} (λ) - \hat{f} (λ))$ converges in distribution to a matrix-valued Gaussian.

This follows from the multivariate central limit theorem applied to the matrix entries of $ρ^{λ} (X_{k})$ , which are bounded random variables.

Theorem 4 (Testing uniformity via spectral components). To test $H_{0} : f = U$ against $H_{1} : f \neq = U$ , the test statistic $T = \sum_{π} (f_{N} (π) - 1/ n!)^{2}$ has a representation-theoretic decomposition $T = \frac{1}{n !} \sum_{λ \neq = (n)} d_{λ} ∥ \hat{f}_{N} (λ) ∥_{HS}^{2}$ , and the first-order term $∥ \hat{f}_{N} ((n - 1, 1)) ∥_{HS}^{2}$ provides the most powerful invariant test against alternatives with first-order structure.

This result appears in Diaconis 1989 and connects to the Neyman-Pearson lemma for group-invariant testing problems.

Theorem 5 (Diaconis 1989: sufficient statistics from spectral components). For the exponential family $f (π) = exp (\sum_{λ} ⟨ Θ_{λ}, ρ^{λ} (π)⟩ - A (Θ)) / Z$ , where $Θ_{λ}$ is a $d_{λ} \times d_{λ}$ matrix parameter for each partition $λ$ , the collection of empirical Fourier coefficients ${\hat{f}_{N} (λ)}$ forms a sufficient statistic.

This is the exponential family on $S_{n}$ with natural parameters in each isotypic component; sufficiency follows from the factorisation theorem since the log-likelihood depends on the data only through the Fourier coefficients.

Theorem 6 (Spectral analysis of paired comparisons). For paired-comparison data (where subjects choose between pairs of items), the Bradley-Terry model has a spectral interpretation: the Bradley-Terry preference parameters are the eigenvalues of the first-order Fourier coefficient $\hat{f} ((n - 1, 1))$ .

This was observed by Marden 1995 and connects classical psychometric models to the spectral framework.

Synthesis. Spectral analysis of permutation-valued data is the foundational reason that representation theory enters statistics through the symmetric group. The central insight is that the Artin-Wedderburn decomposition of $C [S_{n}]$ identifies the space of real-valued functions on permutations with the direct product of matrix algebras $\prod_{λ} M_{d_{λ}} (C)$ , and putting these together with the Plancherel isomorphism, every statistical question about ranking data has a spectral reformulation. This is exactly the content that builds toward the metric analysis in 07.05.12 where distances between permutations become spectral norms, and appears again in 07.05.13 where partial rankings extend the framework to cosets. The bridge is between the combinatorics of partitions and the statistics of rankings; the pattern generalises from full rankings to partial rankings to paired comparisons, identifying the Fourier transform on $S_{n}$ as the universal tool for permutation-valued data analysis.

Full proof set [Master]

Proposition 1 (Inverse Fourier formula). For $f : S_{n} \to C$ :

f (π) = \frac{1}{n !} λ ⊢ n \sum d_{λ} tr (ρ^{λ} (π^{- 1}) \hat{f} (λ)) .

Proof. Expand $\hat{f} (λ) = \sum_{σ} f (σ) ρ^{λ} (σ)$ . Then:

\frac{1}{n !} λ \sum d_{λ} tr (ρ^{λ} (π^{- 1}) \hat{f} (λ)) = \frac{1}{n !} λ \sum d_{λ} σ \sum f (σ) tr (ρ^{λ} (π^{- 1}) ρ^{λ} (σ)) .

By the orthogonality of characters: $\sum_{λ} d_{λ} tr (ρ^{λ} (π^{- 1}) ρ^{λ} (σ)) = \sum_{λ} d_{λ} χ_{λ} (π^{- 1} σ) = n! \cdot δ_{π, σ}$ (column orthogonality of characters at the identity). Hence the right side collapses to $f (π)$ . $□$

Proposition 2 (First-order coefficient and marginals). For a probability distribution $f$ on $S_{n}$ , the trace of $\hat{f} ((n - 1, 1))$ equals $\sum_{i = 1}^{n} P (π (i) = i) - 1$ .

Proof. The character of the standard representation at $π$ is $χ_{(n - 1, 1)} (π) = fix (π) - 1$ where $fix (π)$ is the number of fixed points of $π$ . So:

tr (\hat{f} ((n - 1, 1))) = π \sum f (π) χ_{(n - 1, 1)} (π) = π \sum f (π) (fix (π) - 1) = i = 1 \sum n P (π (i) = i) - 1. □

Connections [Master]

Random walk upper bound lemma 07.05.05. The spectral decomposition of permutation data is the static counterpart to the dynamic analysis in 07.05.05. The Upper Bound Lemma bounds the distance of a random walk distribution from uniform using the same character sums that appear here as spectral components; the random walk at time $k$ has Fourier coefficients $\hat{Q} (λ)^{k}$ , and their decay rates control mixing, while the spectral analysis of data examines the same coefficients for a fixed empirical distribution.
Symmetric group representation 07.05.01. The irreducible representations of $S_{n}$ indexed by partitions, their characters, and the hook-length formula for dimensions developed in 07.05.01 are the raw material for the spectral decomposition. Every Fourier coefficient $\hat{f} (λ)$ is computed using the matrix representations $ρ^{λ}$ and the characters $χ_{λ}$ from that unit.
Metrics on the symmetric group 07.05.12. The character-theoretic expressions for distances between permutations developed in the next unit are spectral norms of the difference of Fourier coefficients. The Cayley, Hamming, and Kendall tau distances all have representations as weighted character sums, making the spectral decomposition of this unit the common framework unifying distance computations on permutations.
Schur-Weyl duality 07.05.04. The decomposition of tensor powers of $C^{n}$ via Schur-Weyl duality produces the same indexing by partitions that governs the spectral analysis. The isotypic components of $L^{2} (S_{n})$ are dual to the irreducible $GL_{n}$ -modules in the tensor algebra, linking the statistical framework to the representation theory of the general linear group.

Historical & philosophical context [Master]

Diaconis introduced spectral analysis of permutation-valued data in his 1989 paper A Generalization of Spectral Analysis ^{[Diaconis1989]} published in J. Amer. Statist. Assoc. 84. The key insight was that the ANOVA decomposition for factorial designs, which separates main effects from interactions, has a natural generalisation to permutations via the isotypic decomposition of the group algebra of $S_{n}$ .

The statistical framework was systematised in Diaconis's 1988 monograph Group Representations in Probability and Statistics ^{[Diaconis1988]}, which placed the spectral analysis alongside the random walk and shuffling results as part of a unified programme applying representation theory to probability and statistics. Marden 1995 extended the framework to a wide variety of ranking models in Analyzing and Modeling Rank Data ^[Marden1995], including the Mallows model, the Bradley-Terry model, and the Plackett-Luce model.

Bibliography [Master]

@article{Diaconis1989,
  author = {Diaconis, Persi},
  title = {A Generalization of Spectral Analysis},
  journal = {J. Amer. Statist. Assoc.},
  volume = {84},
  year = {1989},
  pages = {694--701},
}

@book{Diaconis1988,
  author = {Diaconis, Persi},
  title = {Group Representations in Probability and Statistics},
  publisher = {Institute of Mathematical Statistics},
  year = {1988},
  series = {IMS Lecture Notes--Monograph Series},
  volume = {11},
}

@book{Marden1995,
  author = {Marden, John I.},
  title = {Analyzing and Modeling Rank Data},
  publisher = {Chapman \& Hall},
  year = {1995},
}