37.08.01 · probability / 08-random-matrices

The Wigner Semicircle Law and the Moment Method

shipped3 tiersLean: none

Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) Ch. 2; Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) Ch. 2; Mehta, Random Matrices 3e; Erdős-Yau, A Dynamical Approach to Random Matrix Theory (2017); Tracy-Widom, Commun. Math. Phys. 159 (1994)

Intuition Beginner

Take a large square grid of numbers, fill every entry with an independent random value, and then make the grid symmetric by reflecting it across the diagonal. Such a grid has a list of special numbers attached to it — its eigenvalues — that measure how it stretches space along certain natural directions. You might expect the eigenvalues of a random grid to be a random mess with no recognisable shape. The surprise of this subject is that they are not: as the grid grows, the histogram of its eigenvalues settles onto one fixed, smooth shape every single time.

That shape is a half-circle. If you rescale the entries so the eigenvalues stay inside a fixed window, their histogram fills out the upper half of a circle: dense in the middle, thinning smoothly to zero at the two edges. It does not matter whether the entries were plus-or-minus coins, bell-curve draws, or any other reasonable random rule — the limiting picture is the same half-circle. This independence from the fine print is the first hint of a phenomenon called universality.

The reason a clean shape emerges is the same reason a bell curve emerges from adding many small independent contributions: averaging over an enormous number of random entries washes out their individual quirks and leaves only a few summary numbers. Here those summary numbers are the averaged powers of the grid, and they conspire to single out the half-circle and nothing else.

Visual Beginner

Picture three histograms stacked left to right, one for a small grid, one for a medium grid, one for a large grid. Each histogram plots how many eigenvalues land in each narrow vertical strip across the window from minus two to plus two. The small grid gives a ragged, spiky histogram. The medium grid is smoother. The large grid traces out a clean half-circle: tall in the middle near zero, falling away on both sides, and touching the floor exactly at minus two and plus two.

The dashed reference curve is the same on all three panels. The point of the picture is that the random histogram does not wander toward different limits for different random rules: it always presses up against this one half-circle as the grid grows, and the wiggles around the curve shrink as the size increases.

Worked example Beginner

We check the simplest summary number — the average of the squared eigenvalues — for a small symmetric grid, because that average controls how wide the half-circle is.

Step 1. Take a two-by-two symmetric grid with diagonal entries $a$ and $c$ and off-diagonal entry $b$ (appearing in both off-diagonal slots because the grid is symmetric). Its two eigenvalues $λ_{1}$ and $λ_{2}$ satisfy a useful bookkeeping fact: the sum of their squares equals the sum of the squares of all the grid entries counted with their multiplicities, namely $a^{2} + c^{2} + 2 b^{2}$ .

Step 2. Now make the entries random with average value zero and average squared value one. The average of $a^{2}$ is one, the average of $c^{2}$ is one, and the average of $b^{2}$ is one, so the average of $a^{2} + c^{2} + 2 b^{2}$ is $1 + 1 + 2 = 4$ .

Step 3. Divide by the grid size, which is two, to get the average squared eigenvalue per direction: $4 \div 2 = 2$ . If we had not rescaled the entries, this number would keep growing with the grid size, pushing the eigenvalues off to infinity.

Step 4. The rescaling fix is to shrink every entry by the square root of the grid size. For a size- $n$ grid that divides each squared entry by $n$ , and the average squared eigenvalue settles to one rather than growing. A half-circle of radius two has exactly average squared value one, which is why the limiting window is minus two to plus two.

Step 5. What this tells us: the single rescaling — shrink entries by the square root of the size — is what freezes the eigenvalue window in place. Once frozen, the averaged powers of the grid stop depending on the size and lock onto the fixed numbers that draw the half-circle.

Check your understanding Beginner

Exercise (easy, multiple choice).

As the random symmetric grid grows large (with entries properly rescaled), the histogram of its eigenvalues settles onto which shape?

A. A bell curve
B. A flat rectangle
C. The upper half of a circle
D. A pair of spikes at the two ends

Hint

Recall the dashed reference curve in the visual: dense in the middle, fading smoothly to zero at both edges.

Answer

C. The upper half of a circle. The limiting histogram is the semicircle: densest at the center and tapering to zero at the two edges of a fixed window. Feedback-correct: the half-circle is the universal limit, independent of the fine details of the entry distribution. Feedback-wrong: A is the limit for sums of independent numbers, not eigenvalues of random matrices; B and D never arise for these ensembles.

Formal definition Intermediate+

Fix a probability space $(Ω, F, P)$ . A Wigner ensemble is a family of random symmetric (or Hermitian) matrices built from two independent collections of real (or complex) random variables. Let ${a_{ij} : 1 \leq i < j}$ be i.i.d. with $E [a_{ij}] = 0$ and $E [∣ a_{ij} ∣^{2}] = 1$ , and let ${a_{ii} : i \geq 1}$ be i.i.d. (independent of the off-diagonal family) with $E [a_{ii}] = 0$ and $E [a_{ii}^{2}] = σ_{d}^{2} < \infty$ . Assume all entries have finite moments of every order (the moment hypothesis; it is relaxed in the universality discussion below). For each $n$ define the $n \times n$ symmetric matrix $A_{n}$ by $(A_{n})_{ij} = a_{ij}$ for $i < j$ , $(A_{n})_{j i} = \overline{a_{ij}}$ , and $(A_{n})_{ii} = a_{ii}$ , and set the normalised Wigner matrix $$ M_n = \frac{1}{\sqrt n}, A_n . $$ The $1/ n$ scaling is the variance normalisation that keeps the spectrum in a bounded window: it makes $E [(M_{n})_{ij}^{2}] = 1/ n$ , so a typical row has squared Euclidean length of order one.

Because $M_{n}$ is symmetric (Hermitian) it has $n$ real eigenvalues $λ_{1} (M_{n}) \leq \dots \leq λ_{n} (M_{n})$ [from 37.02.02 the relevant convergence notions; the spectral theorem supplies reality]. The empirical spectral distribution (ESD) is the random probability measure $$ \mu_n = \frac{1}{n}\sum_{i=1}^n \delta_{\lambda_i(M_n)} , $$ a random element of the space of probability measures on $R$ . Its moments are trace functionals: for each integer $k \geq 0$ , $$ \int x^k, d\mu_n(x) = \frac{1}{n}\sum_{i=1}^n \lambda_i(M_n)^k = \frac{1}{n},\mathrm{tr},M_n^k , $$ since the trace of a power equals the sum of the eigenvalues raised to that power. This identity is the engine of the moment method: it converts a statement about eigenvalues into a statement about expected traces, which expand combinatorially over matrix entries.

The standard semicircle law is the deterministic probability measure $μ_{sc}$ with density $$ \rho_{\mathrm{sc}}(x) = \frac{1}{2\pi}\sqrt{4 - x^2};\mathbf{1}{[-2,2]}(x). $$ Its odd moments vanish by symmetry, and its even moments are the Catalan numbers: $\int x^{2k}, d\mu{\mathrm{sc}}(x) = C_k = \frac{1}{k+1}\binom{2k}{k} $, w hi l e$ \int x^{2k+1}, d\mu_{\mathrm{sc}} = 0 $. I n p a r t i c u l a r$ \int x^2, d\mu_{\mathrm{sc}} = C_1 = 1 $, f i x in g t h e w in d o w$ [-2,2]$.

Counterexamples to common slips Intermediate+

The semicircle is not a Gaussian. The limiting eigenvalue density has compact support $[- 2, 2]$ and falls to zero like a square root at the edges; it is not the bell curve. The bell curve is the limit for sums of independent scalars, not for eigenvalues of independent-entry matrices.
The diagonal does not matter to the limit. The off-diagonal variance must equal one, but the diagonal variance $σ_{d}^{2}$ may be any finite number: there are only $n$ diagonal entries against $\sim n^{2} /2$ off-diagonal ones, so the diagonal contributes negligibly to $\frac{1}{n} tr M_{n}^{k}$ . Forgetting this and demanding $σ_{d}^{2} = 1$ is unnecessary.
Without the $1/ n$ scaling there is no limit. Replacing $1/ n$ by $1/ n$ collapses every eigenvalue to zero; using no scaling sends the spectrum to infinity. Only the square-root normalisation freezes the window.
Convergence is of the empirical measure, not of individual eigenvalues. The statement $μ_{n} \Rightarrow μ_{sc}$ is about the bulk histogram. A single eigenvalue, such as $λ_{n} (M_{n})$ at the edge, fluctuates on a much finer scale governed by the Tracy-Widom law, not by the semicircle.

Key theorem with proof Intermediate+

Theorem (Wigner's semicircle law, moment form). Let $(M_{n})$ be a sequence of normalised Wigner matrices satisfying the moment hypothesis. Then for every integer $k \geq 0$ , $$ \lim_{n \to \infty} \mathbb{E}!\left[\frac{1}{n},\mathrm{tr},M_n^{k}\right] = \int x^k, d\mu_{\mathrm{sc}}(x) = \begin{cases} C_{k/2}, & k \text{ even},\[2pt] 0, & k \text{ odd}, \end{cases} $$ where $C_{m} = \frac{1}{m + 1} (m 2 m)$ is the $m$ -th Catalan number. Consequently the expected empirical spectral distribution converges weakly to the semicircle law, $E μ_{n} \Rightarrow μ_{sc}$ .

Proof. Expand the normalised trace into a sum over closed walks on the index set ${1, \dots, n}$ . Writing $M_{n} = n^{- 1/2} A_{n}$ , $$ \mathbb{E}!\left[\frac{1}{n},\mathrm{tr},M_n^{k}\right] = \frac{1}{n^{1 + k/2}} \sum_{i_1, i_2, \dots, i_k = 1}^{n} \mathbb{E}\big[ a_{i_1 i_2}, a_{i_2 i_3} \cdots a_{i_k i_1} \big]. $$ Each index sequence $i = (i_{1}, \dots, i_{k}, i_{1})$ is a closed walk of length $k$ on the complete graph with vertex set ${1, \dots, n}$ ; the summand is the expectation of the product of the entries traversed. Because the entries are independent and centred, the expectation of a product vanishes unless every edge ${i_{r}, i_{r + 1}}$ traversed by the walk is traversed at least twice: an edge visited only once contributes an independent mean-zero factor that kills the whole expectation.

Group the walks by the partition of the $k$ steps induced by edge-equality. The number of distinct vertices visited by a walk in which every edge appears at least twice is at most $k /2 + 1$ : a connected walk with $e$ distinct edges visits at most $e + 1$ vertices, and each edge used $\geq 2$ times forces $e \leq k /2$ . The factor $n^{- 1 - k /2}$ assigns weight $n^{(# distinct vertices) - 1 - k /2}$ to each walk class (the sum over choices of distinct vertex labels contributes $n^{# distinct vertices}$ up to lower order). The exponent is non-positive, and it equals zero exactly when the walk has $k /2$ distinct edges each used exactly twice and visits exactly $k /2 + 1$ distinct vertices.

For odd $k$ , the parity obstruction is fatal: $k /2$ is not an integer, no edge-balanced walk of odd length can have every edge used an even number of times while returning to start, so the leading exponent is strictly negative and the limit is $0$ .

For even $k = 2 m$ , the surviving walks are exactly those that traverse a tree on $m + 1$ vertices, crossing each of its $m$ edges once in each direction. Such a walk is a depth-first traversal of a rooted plane tree, and it corresponds bijectively to a non-crossing pair partition of the $2 m$ steps: pair each step with the step that re-traverses the same edge in the opposite direction. The number of rooted plane trees with $m$ edges — equivalently, the number of non-crossing pair partitions of $2 m$ points, equivalently the number of Dyck paths of length $2 m$ — is the Catalan number $C_{m}$ . Each surviving walk contributes $E [∣ a_{ij} ∣^{2}]^{m} = 1$ in the limit (each of the $m$ doubled edges supplies one second moment, normalised to one), and the count of vertex labellings is $n (n - 1) \dots (n - m) \sim n^{m + 1}$ . Multiplying by $n^{- 1 - m}$ gives $C_{m}$ in the limit. Diagonal entries, coincidences forcing fewer than $m + 1$ vertices, and edges used more than twice all live in walk classes with strictly negative exponent and vanish.

Therefore $E [\frac{1}{n} tr M_{n}^{2 m}] \to C_{m}$ and $E [\frac{1}{n} tr M_{n}^{2 m + 1}] \to 0$ . These limits are exactly the moments of $μ_{sc}$ . Since the semicircle law has compact support, it is uniquely determined by its moments (its moment generating function is finite near zero), so by the method of moments — convergence of all moments to those of a moment-determinate target forces weak convergence [from 37.03.01] — $E μ_{n} \Rightarrow μ_{sc}$ . $□$

Bridge. This computation builds toward the full almost-sure semicircle law, the local laws of Erdős-Yau, and the free-probability reading of the limit, and it appears again in the variance bound below that upgrades convergence in expectation to convergence in probability and almost surely. The foundational reason the Catalan numbers appear is that only edge-balanced walks survive the $1/ n$ scaling, and the maximally efficient balanced walks are tree traversals, which are counted by non-crossing pair partitions; this is exactly the combinatorial skeleton that free probability later identifies as freeness, so the semicircle is to free independence what the Gaussian is to classical independence. The moment method generalises the characteristic-function route to the classical central limit theorem 37.03.01: there the cumulants beyond the second wash out and the Gaussian survives, here the crossing pairings wash out and the semicircle survives. Putting these together, the bridge is that a single scaling exponent — vertices minus one minus half the walk length — sorts every term, and the central insight is that the limit law is whatever distribution has the surviving combinatorial count as its moment sequence.

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Derive the Catalan recurrence $C_{m + 1} = \sum_{j = 0}^{m} C_{j} C_{m - j}$ from the rooted-plane-tree picture, and identify it with the non-crossing-pairing decomposition of the $2 (m + 1)$ walk steps.

Hint

In a non-crossing pairing of $2 (m + 1)$ points, fix the partner $2 r$ of point $1$ ; the chord $(1, 2 r)$ splits the remaining points into an inside block and an outside block.

Answer

In any non-crossing pairing of the ordered points $1, \dots, 2 (m + 1)$ , the partner of point $1$ must be an even-indexed point $2 r$ , so that the $2 r - 2$ points strictly between $1$ and $2 r$ pair among themselves (an inside block of $j = r - 1$ chords) and the remaining $2 (m + 1) - 2 r$ points pair among themselves (an outside block of $m - j$ chords); non-crossing forbids any chord linking the two blocks. The inside block admits $C_{j}$ non-crossing pairings and the outside $C_{m - j}$ , and summing over $j = 0, \dots, m$ gives $C_{m + 1} = \sum_{j = 0}^{m} C_{j} C_{m - j}$ . In the tree picture this is the decomposition of a rooted plane tree into the subtree hanging off the root's first child and the rest. The generating function $C (x) = \sum_{m} C_{m} x^{m}$ therefore satisfies $C (x) = 1 + x C (x)^{2}$ , giving $C (x) = \frac{1 - 1 - 4 x}{2 x}$ , whose coefficients are the Catalan numbers and whose branch point at $x = 1/4$ encodes the edge of the spectrum at $\pm 2$ .

Exercise 5 (medium, symbolic).

Show that the Stieltjes transform $s (z) = \int \frac{d μ _{sc} ( x )}{x - z}$ of the semicircle law satisfies the self-consistent equation $s (z)^{2} + z s (z) + 1 = 0$ , and solve for $s$ .

Hint

The generating function $\sum_{m} C_{m} z^{- (2 m + 1)}$ of the moments is $- s (z)$ for large $∣ z ∣$ ; use $C (x) = 1 + x C (x)^{2}$ with $x = z^{- 2}$ .

Answer

For large $∣ z ∣$ , expanding $\frac{1}{x - z} = - \sum_{k \geq 0} x^{k} z^{- (k + 1)}$ and integrating against $μ_{sc}$ gives $s (z) = - \sum_{m \geq 0} C_{m} z^{- (2 m + 1)} = - z^{- 1} C (z^{- 2})$ , where $C$ is the Catalan generating function. Substituting $C (x) = 1 + x C (x)^{2}$ at $x = z^{- 2}$ and writing $s = - z^{- 1} C (z^{- 2})$ yields, after clearing denominators, $s (z)^{2} + z s (z) + 1 = 0$ . Solving the quadratic, $s (z) = \frac{1}{2} (- z + z^{2} - 4)$ , with the branch chosen so that $s (z) \sim - 1/ z$ as $z \to \infty$ . The square-root branch points at $z = \pm 2$ are the spectral edges, and recovering the density via $ρ_{sc} (x) = \frac{1}{π} lim_{ε ↓ 0} Im s (x + i ε) = \frac{1}{2 π} 4 - x^{2}$ reproduces the semicircle. This is the alternative proof route, complementary to the moment method.

Exercise 6 (hard, short-answer).

Prove the variance bound $Var (\frac{1}{n} tr M_{n}^{k}) = O (n^{- 2})$ for fixed $k$ , and explain why it upgrades convergence in expectation to convergence in probability.

Hint

Expand $Var (\frac{1}{n} tr M_{n}^{k}) = \frac{1}{n ^{2}} \sum_{i, j} Cov (W_{i}, W_{j})$ over pairs of closed walks, where $W_{i}$ is the product of entries along walk $i$ . A covariance is nonzero only when the two walks share an edge.

Answer

Write $\frac{1}{n} tr M_{n}^{k} = \frac{1}{n ^{1 + k /2}} \sum_{i} W_{i}$ with $W_{i} = a_{i_{1} i_{2}} \dots a_{i_{k} i_{1}}$ . Then $Var = \frac{1}{n ^{2 + k}} \sum_{i, j} Cov (W_{i}, W_{j})$ . By independence, $Cov (W_{i}, W_{j}) = 0$ unless the union walk $i \cup j$ has every edge used at least twice and the two walks share at least one edge (otherwise the covariance factors into a product of expectations, one of which would already be the mean and cancel). The sharing constraint forces the combined vertex count of the union to be at most $k$ rather than $k + 2$ : glued walks lose at least one vertex relative to two independent edge-balanced walks. The number of distinct vertices in a contributing pair is therefore at most $k$ , so the vertex-labelling count is $O (n^{k})$ against the normalisation $n^{- (2 + k)}$ , giving $Var = O (n^{- 2})$ . By Chebyshev's inequality, $P (∣ \frac{1}{n} tr M_{n}^{k} - E \frac{1}{n} tr M_{n}^{k} ∣ > ε) \leq ε^{- 2} O (n^{- 2}) \to 0$ , so each moment converges in probability to its limit $C_{k /2}$ (or $0$ ); convergence of all moments in probability to the moment-determinate semicircle gives $μ_{n} \Rightarrow μ_{sc}$ in probability. Since $\sum_{n} n^{- 2} < \infty$ , Borel-Cantelli [from 37.02.02] upgrades this to almost-sure convergence.

Exercise 7 (hard, short-answer).

Explain why the diagonal entries and the precise off-diagonal distribution (beyond mean and variance) do not affect the limiting moments, and state precisely what universality claims.

Hint

Track which walks survive the scaling exponent and which moments of the entries those walks use.

Answer

In the leading term of $E [\frac{1}{n} tr M_{n}^{2 m}]$ only walks that are tree double-traversals survive, and each such walk uses each of its $m$ edges exactly twice, so it depends on the entries only through their second moment $E ∣ a_{ij} ∣^{2} = 1$ . Higher moments of the entries enter only via walks that revisit an edge three or more times or that use a diagonal entry; both classes visit strictly fewer than $m + 1$ vertices and so carry a strictly negative scaling exponent, vanishing as $n \to \infty$ . The diagonal contributes $n$ entries versus $\sim n^{2} /2$ off-diagonal, an additional suppression. Universality is the statement that the conclusion $μ_{n} \Rightarrow μ_{sc}$ holds for every entry distribution with mean zero and variance one (only the second moment is fixed; the moment hypothesis can be reduced to a finite-variance Lindeberg condition by truncation), and — at the much deeper level of local spectral statistics — that the spacing and edge fluctuation laws are also independent of the entry distribution within this class.

Advanced results Master

The convergence holds almost surely, not merely in expectation. The variance bound $Var (\frac{1}{n} tr M_{n}^{k}) = O (n^{- 2})$ derived above is summable in $n$ , so Borel-Cantelli [from 37.02.02] gives $\frac{1}{n} tr M_{n}^{k} \to C_{k /2}$ almost surely for each fixed $k$ ; a diagonal argument over a countable dense set of test polynomials then yields $μ_{n} \Rightarrow μ_{sc}$ almost surely. The same conclusion follows from concentration of measure: for the Gaussian ensembles the map $A \mapsto \int f d μ_{n}$ is Lipschitz in the entries with constant $O (n^{- 1})$ for $1$ -Lipschitz $f$ , and Gaussian concentration gives exponential tail bounds $P (∣ \int f d μ_{n} - E \int f d μ_{n} ∣ > t) \leq 2 exp (- c n^{2} t^{2})$ , far stronger than the polynomial Chebyshev bound and stable under removing the finite-moment hypothesis.

The Stieltjes-transform proof is the analytic complement to the moment method and survives when entries have only two moments. Writing $s_{n} (z) = \frac{1}{n} tr (M_{n} - z)^{- 1}$ for $z$ in the upper half-plane, a Schur-complement (resolvent) expansion produces the self-consistent equation $s_{n} (z) \approx - (z + s_{n} (z))^{- 1}$ , whose stable solution is the semicircle Stieltjes transform $s_{sc} (z) = \frac{1}{2} (- z + z^{2} - 4)$ solving $s^{2} + z s + 1 = 0$ . Inverting via the Stieltjes inversion formula recovers $ρ_{sc}$ . This route also yields local laws: the semicircle approximation for $s_{n} (z)$ holds down to imaginary parts $Im z ≫ 1/ n$ , which controls eigenvalue counts in windows containing only $n^{ε}$ eigenvalues and is the technical heart of the Erdős-Yau program.

The Gaussian ensembles are the exactly solvable representatives. The Gaussian Orthogonal Ensemble (real symmetric, $β = 1$ ), Gaussian Unitary Ensemble (complex Hermitian, $β = 2$ ), and Gaussian Symplectic Ensemble ( $β = 4$ ) have joint eigenvalue density proportional to $\prod_{i < j} ∣ λ_{i} - λ_{j} ∣^{β} exp (- \frac{β}{4} \sum_{i} λ_{i}^{2})$ , a log-gas at inverse temperature $β$ . The Vandermonde repulsion $\prod_{i < j} ∣ λ_{i} - λ_{j} ∣^{β}$ encodes eigenvalue repulsion and, through orthogonal-polynomial (Hermite) analysis, gives exact $n$ -level correlation functions whose large- $n$ bulk limit is the sine kernel and whose edge limit is the Airy kernel.

At the spectral edge the fluctuations are not Gaussian but Tracy-Widom. The largest eigenvalue obeys $n^{2/3} (λ_{m a x} (M_{n}) - 2) \Rightarrow TW_{β}$ , where $TW_{β}$ is the Tracy-Widom distribution of index $β$ , expressible through the Hastings-McLeod solution of the Painlevé II equation $q^{''} = s q + 2 q^{3}$ via $F_{2} (s) = exp (- \int_{s}^{\infty} (x - s) q (x)^{2} d x)$ for $β = 2$ . The $n^{2/3}$ scaling and the square-root vanishing of $ρ_{sc}$ at the edge are linked: the soft edge of a density vanishing like $(2 - x)^{1/2}$ forces exactly this fluctuation exponent. Universality, proved by Erdős-Schlein-Yau and Tao-Vu, extends both the bulk sine-kernel and the edge Tracy-Widom laws from the Gaussian ensembles to all Wigner matrices with the matching first four moments.

Synthesis. The foundational reason a single half-circle organises this entire subject is that the $1/ n$ scaling selects exactly the edge-balanced closed walks, and among them the non-crossing tree double-traversals dominate, so the limiting moments are the Catalan numbers and nothing else; this is exactly the combinatorial content that free probability re-reads as the moments of a free-semicircular element, making the semicircle to free independence what the Gaussian is to classical independence. Putting these together, the moment method, the Stieltjes self-consistent equation $s^{2} + z s + 1 = 0$ , and the log-gas free-energy minimisation are three routes to the same density, and they are dual to one another: the branch point of the Catalan generating function at $x = 1/4$ , the square-root branch point of $s_{sc}$ at $z = \pm 2$ , and the edge of the equilibrium measure are the same edge seen three ways. The central insight is that universality is a statement about which terms survive a scaling limit, and it generalises the classical central limit theorem 37.03.01: there only the second cumulant survives and the Gaussian appears, here only the second moment of the entries survives and the semicircle appears. The bridge to the frontier is that the bulk universality of sine-kernel statistics and the edge universality of the Tracy-Widom law are the same selection principle pushed from global moments down to local correlations, and this is the central insight that the local-law program makes quantitative.

Full proof set Master

The moment computation, the Catalan identification, and the variance bound are proved in full above. The remaining Master claims are recorded here.

Proposition (moment-determinacy of the semicircle law). The semicircle law $μ_{sc}$ is the unique probability measure whose moments are $C_{m}$ (even orders) and $0$ (odd orders); equivalently, convergence of all moments to this sequence implies weak convergence to $μ_{sc}$ .

Proof. The measure $μ_{sc}$ is supported on the bounded set $[- 2, 2]$ , so $∣ \int x^{2 m} d μ_{sc} ∣ = C_{m} \leq 4^{m}$ . Hence the moment generating function $\sum_{m} \frac{t ^{2 m}}{( 2 m )!} C_{m}$ converges for all $t$ , and Carleman's condition $\sum_{m} C_{m}^{- 1/ (2 m)} = \infty$ holds (since $C_{m} \leq 4^{m}$ gives $C_{m}^{- 1/ (2 m)} \geq 1/2$ ), so the moment problem is determinate: no other measure shares these moments. A sequence of probability measures whose every moment converges to that of a determinate compactly-moment-bounded target converges weakly to it, because tightness follows from the bounded second moments and any weak subsequential limit must share all moments, hence equal $μ_{sc}$ [from 37.03.01]. $□$

Proposition (the semicircle density has Catalan even moments). With $ρ_{sc} (x) = \frac{1}{2 π} 4 - x^{2}$ on $[- 2, 2]$ , one has $\int_{- 2}^{2} x^{2 m} ρ_{sc} (x) d x = C_{m}$ .

Proof. Substitute $x = 2 sin θ$ , $d x = 2 cos θ d θ$ , $4 - x^{2} = 2 cos θ$ , with $θ \in [- π /2, π /2]$ . Then $$ \int_{-2}^{2} x^{2m}\rho_{\mathrm{sc}}, dx = \frac{1}{2\pi}\int_{-\pi/2}^{\pi/2} (2\sin\theta)^{2m},(2\cos\theta),(2\cos\theta), d\theta = \frac{2^{2m+1}}{\pi}\int_{-\pi/2}^{\pi/2}\sin^{2m}\theta,\cos^2\theta, d\theta. $$ Using $cos^{2} θ = 1 - sin^{2} θ$ and the Wallis integral $\frac{1}{π} \int_{- π /2}^{π /2} sin^{2 k} θ d θ = (k 2 k) 4^{- k}$ , the bracket evaluates to $(m 2 m) 4^{- m} - (m + 1 2 m + 2) 4^{- (m + 1)}$ . Multiplying by $2^{2 m + 1} = 2 \cdot 4^{m}$ and simplifying with $(m + 1 2 m + 2) = \frac{( 2 m + 2 ) ( 2 m + 1 )}{( m + 1 ) ^{2}} (m 2 m)$ collapses the expression to $\frac{1}{m + 1} (m 2 m) = C_{m}$ . $□$

Proposition (Stieltjes self-consistency). The Stieltjes transform $s_{sc} (z) = \int \frac{d μ _{sc} ( x )}{x - z}$ , $z \in C ∖ [- 2, 2]$ , satisfies $s_{sc} (z)^{2} + z s_{sc} (z) + 1 = 0$ and equals $\frac{1}{2} (- z + z^{2} - 4)$ with the branch $s_{sc} (z) \sim - 1/ z$ at infinity.

Proof. For $∣ z ∣ > 2$ , expand $\frac{1}{x - z} = - \sum_{k \geq 0} x^{k} z^{- k - 1}$ and integrate term by term against $μ_{sc}$ , using the moments above: $s_{sc} (z) = - \sum_{m \geq 0} C_{m} z^{- 2 m - 1}$ . The Catalan generating function $C (w) = \sum_{m} C_{m} w^{m} = \frac{1 - 1 - 4 w}{2 w}$ satisfies $C = 1 + w C^{2}$ . With $w = z^{- 2}$ and $s_{sc} = - z^{- 1} C (z^{- 2})$ , multiply $C = 1 + z^{- 2} C^{2}$ by $z^{- 1}$ : $- s_{sc} = z^{- 1} + z^{- 1} (z s_{sc})^{2} z^{- 2} \cdot z = z^{- 1} + z^{- 1} s_{sc}^{2}$ , i.e. $- z s_{sc} = 1 + s_{sc}^{2}$ , which is the stated quadratic. Solving and matching the asymptotic $- 1/ z$ selects the root $\frac{1}{2} (- z + z^{2} - 4)$ . Analytic continuation extends the identity from $∣ z ∣ > 2$ to all of $C ∖ [- 2, 2]$ . $□$

Proposition (square-root edge and the $n^{2/3}$ scaling heuristic). The semicircle density vanishes like $ρ_{sc} (x) \sim \frac{1}{2 π} 2 (2 - x)^{1/2}$ as $x ↑ 2$ , and the typical spacing of the top eigenvalues is of order $n^{- 2/3}$ .

Proof. Near $x = 2$ , $4 - x^{2} = (2 - x) (2 + x) \sim 4 (2 - x)$ , so $ρ_{sc} (x) \sim \frac{1}{2 π} \cdot 2 2 - x = \frac{1}{π} 2 - x$ . The expected number of eigenvalues in $[2 - δ, 2]$ is $n \int_{2 - δ}^{2} ρ_{sc} \sim n \cdot \frac{2}{3 π} δ^{3/2}$ . Setting this count to order one — the scale at which individual edge eigenvalues are resolved — gives $δ \sim n^{- 2/3}$ , the Tracy-Widom window width. $□$

Connections Master

The strong law of large numbers 37.02.02 supplies the almost-sure upgrade. The summable variance bound $Var (\frac{1}{n} tr M_{n}^{k}) = O (n^{- 2})$ feeds the Borel-Cantelli lemma established in that unit to turn convergence in expectation of each empirical moment into almost-sure convergence; the law-of-large-numbers intuition that averaging over many weakly-dependent contributions yields a deterministic limit is exactly what the empirical spectral distribution realises at the level of the whole spectrum.

Characteristic functions and the Lévy continuity theorem 37.03.01 are the moment-method analogue and the source of the determinacy step. The semicircle law is identified by its Catalan moment sequence the way a classical limit is identified by its characteristic function, and the method-of-moments convergence criterion — all moments converge to those of a moment-determinate target — is the moment-side companion to the continuity theorem; the central limit theorem proved there is the scalar shadow of which the semicircle law is the matrix-valued analogue, with second cumulant replaced by second moment of the entries.

The QFT large- $N$ matrix model and topological expansion 08.14.06 is the field-theoretic counterpart of this probabilistic theorem. There the same Gaussian matrix integral is organised by ribbon graphs graded by the genus of the surface they tile, and the planar (genus-zero) sector reproduces the semicircle as the leading large- $N$ density; the non-crossing pair partitions counted here by the Catalan numbers are precisely the planar Wick contractions there, so this unit is the ensemble theorem and that unit is the saddle-point / topological-expansion derivation of the same object.

The Itô integral and Itô's formula 02.15.02 connect through the dynamical approach to the semicircle law: Dyson Brownian motion evolves the eigenvalues as interacting diffusions $d λ_{i} = d B_{i} + \frac{β}{2 n} \sum_{j \neq = i} \frac{d t}{λ _{i} - λ _{j}}$ , an SDE whose stationary measure is the log-gas and whose hydrodynamic limit is the semicircle; the Itô calculus of that unit is the machinery that makes this eigenvalue flow and its local-law analysis rigorous.

Historical & philosophical context Master

Eugene Wigner introduced the semicircle law in 1955 while modelling the statistics of energy levels of heavy nuclei, where the exact Hamiltonian is unknown but its level density and spacing might be captured by a random matrix of the appropriate symmetry class ^{[Wigner 1955]}. The 1955 paper treated symmetric matrices with bounded entries; the 1958 follow-up ^{[Wigner 1958]} gave the moment argument identifying the even moments with the Catalan numbers and established the law for a broad class of distributions, founding the method of moments in random matrix theory. The Gaussian ensembles and their orthogonal-polynomial solution were systematised by Freeman Dyson and Madan Lal Mehta around 1960-1963, with Mehta's monograph ^{[Mehta 2004]} becoming the standard reference; Dyson's threefold classification by time-reversal and spin symmetry produced the $β = 1, 2, 4$ ensembles.

The edge fluctuation law was discovered by Craig Tracy and Harold Widom in 1994 ^{[Tracy 1994]}, who expressed the limiting largest-eigenvalue distribution of the Gaussian Unitary Ensemble through a Painlevé II transcendent; the same distribution was subsequently found governing the longest increasing subsequence of a random permutation, last-passage percolation, and a growing family of models in the Kardar-Parisi-Zhang universality class, far outside the matrix setting in which it was found. The universality conjecture — that bulk and edge statistics depend only on the symmetry class and not on the entry distribution — was proved in the late 2000s by the Erdős-Schlein-Yau local-law program and independently by Tao and Vu, completing a line of inquiry that began with Wigner's hypothesis that random matrices model the universal statistics of complex spectra.

Bibliography Master

@article{wigner1955,
  author  = {Wigner, Eugene P.},
  title   = {Characteristic vectors of bordered matrices with infinite dimensions},
  journal = {Annals of Mathematics},
  volume  = {62},
  number  = {3},
  pages   = {548--564},
  year    = {1955}
}

@article{wigner1958,
  author  = {Wigner, Eugene P.},
  title   = {On the distribution of the roots of certain symmetric matrices},
  journal = {Annals of Mathematics},
  volume  = {67},
  number  = {2},
  pages   = {325--327},
  year    = {1958}
}

@book{agz2010,
  author    = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
  title     = {An Introduction to Random Matrices},
  series    = {Cambridge Studies in Advanced Mathematics},
  volume    = {118},
  publisher = {Cambridge University Press},
  year      = {2010}
}

@book{tao2012,
  author    = {Tao, Terence},
  title     = {Topics in Random Matrix Theory},
  series    = {Graduate Studies in Mathematics},
  volume    = {132},
  publisher = {American Mathematical Society},
  year      = {2012}
}

@article{tracywidom1994,
  author  = {Tracy, Craig A. and Widom, Harold},
  title   = {Level-spacing distributions and the Airy kernel},
  journal = {Communications in Mathematical Physics},
  volume  = {159},
  number  = {1},
  pages   = {151--174},
  year    = {1994}
}

@book{mehta2004,
  author    = {Mehta, Madan Lal},
  title     = {Random Matrices},
  edition   = {3rd},
  publisher = {Elsevier/Academic Press, Amsterdam},
  year      = {2004}
}

@book{erdosyau2017,
  author    = {Erd{\H o}s, L\'aszl\'o and Yau, Horng-Tzer},
  title     = {A Dynamical Approach to Random Matrix Theory},
  series    = {Courant Lecture Notes},
  volume    = {28},
  publisher = {American Mathematical Society},
  year      = {2017}
}

Prerequisites

37.02.02
37.03.01

Tier anchors

beginner: Tao, Topics in Random Matrix Theory §2.4 (the semicircle picture, histograms of eigenvalues); Trefethen-Embree numerical-eigenvalue intuition; the physical picture of an energy-level histogram filling a fixed window
intermediate: Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices §2.1 (Wigner's theorem by the moment method); Tao §2.3-2.4; Bai-Silverstein, Spectral Analysis of Large Dimensional Random Matrices §2
master: Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) Ch. 2; Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) Ch. 2; Mehta, Random Matrices 3e; Erdős-Yau, A Dynamical Approach to Random Matrix Theory (2017); Tracy-Widom, Commun. Math. Phys. 159 (1994)

References

Wigner — Characteristic vectors of bordered matrices with infinite dimensions · Annals of Mathematics 62 (1955), 548-564 (original semicircle law for symmetric matrices)
Wigner — On the distribution of the roots of certain symmetric matrices · Annals of Mathematics 67 (1958), 325-327 (moment method, Catalan numbers)
Anderson, Guionnet, Zeitouni — An Introduction to Random Matrices · Cambridge University Press, 2010, Ch. 2 (Wigner's theorem via moments, concentration)
Tao — Topics in Random Matrix Theory · AMS Graduate Studies in Mathematics 132, 2012, Ch. 2 (semicircle law, moment and Stieltjes-transform proofs)
Tracy, Widom — Level-spacing distributions and the Airy kernel · Commun. Math. Phys. 159 (1994), 151-174 (edge fluctuation law)
Mehta — Random Matrices · Elsevier/Academic Press, 3rd ed., 2004 (Gaussian ensembles, orthogonal polynomials)

Estimated time

beginner: 20m
intermediate: 55m
master: 95m