37.08.01 · probability / 08-random-matrices

The Wigner Semicircle Law and the Moment Method

shipped3 tiersLean: none

Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) Ch. 2; Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) Ch. 2; Mehta, Random Matrices 3e; Erdős-Yau, A Dynamical Approach to Random Matrix Theory (2017); Tracy-Widom, Commun. Math. Phys. 159 (1994)

Intuition Beginner

Take a large square grid of numbers, fill every entry with an independent random value, and then make the grid symmetric by reflecting it across the diagonal. Such a grid has a list of special numbers attached to it — its eigenvalues — that measure how it stretches space along certain natural directions. You might expect the eigenvalues of a random grid to be a random mess with no recognisable shape. The surprise of this subject is that they are not: as the grid grows, the histogram of its eigenvalues settles onto one fixed, smooth shape every single time.

That shape is a half-circle. If you rescale the entries so the eigenvalues stay inside a fixed window, their histogram fills out the upper half of a circle: dense in the middle, thinning smoothly to zero at the two edges. It does not matter whether the entries were plus-or-minus coins, bell-curve draws, or any other reasonable random rule — the limiting picture is the same half-circle. This independence from the fine print is the first hint of a phenomenon called universality.

The reason a clean shape emerges is the same reason a bell curve emerges from adding many small independent contributions: averaging over an enormous number of random entries washes out their individual quirks and leaves only a few summary numbers. Here those summary numbers are the averaged powers of the grid, and they conspire to single out the half-circle and nothing else.

Visual Beginner

Picture three histograms stacked left to right, one for a small grid, one for a medium grid, one for a large grid. Each histogram plots how many eigenvalues land in each narrow vertical strip across the window from minus two to plus two. The small grid gives a ragged, spiky histogram. The medium grid is smoother. The large grid traces out a clean half-circle: tall in the middle near zero, falling away on both sides, and touching the floor exactly at minus two and plus two.

The dashed reference curve is the same on all three panels. The point of the picture is that the random histogram does not wander toward different limits for different random rules: it always presses up against this one half-circle as the grid grows, and the wiggles around the curve shrink as the size increases.

Worked example Beginner

We check the simplest summary number — the average of the squared eigenvalues — for a small symmetric grid, because that average controls how wide the half-circle is.

Step 1. Take a two-by-two symmetric grid with diagonal entries and and off-diagonal entry (appearing in both off-diagonal slots because the grid is symmetric). Its two eigenvalues and satisfy a useful bookkeeping fact: the sum of their squares equals the sum of the squares of all the grid entries counted with their multiplicities, namely .

Step 2. Now make the entries random with average value zero and average squared value one. The average of is one, the average of is one, and the average of is one, so the average of is .

Step 3. Divide by the grid size, which is two, to get the average squared eigenvalue per direction: . If we had not rescaled the entries, this number would keep growing with the grid size, pushing the eigenvalues off to infinity.

Step 4. The rescaling fix is to shrink every entry by the square root of the grid size. For a size- grid that divides each squared entry by , and the average squared eigenvalue settles to one rather than growing. A half-circle of radius two has exactly average squared value one, which is why the limiting window is minus two to plus two.

Step 5. What this tells us: the single rescaling — shrink entries by the square root of the size — is what freezes the eigenvalue window in place. Once frozen, the averaged powers of the grid stop depending on the size and lock onto the fixed numbers that draw the half-circle.

Check your understanding Beginner

Formal definition Intermediate+

Fix a probability space . A Wigner ensemble is a family of random symmetric (or Hermitian) matrices built from two independent collections of real (or complex) random variables. Let be i.i.d. with and , and let be i.i.d. (independent of the off-diagonal family) with and . Assume all entries have finite moments of every order (the moment hypothesis; it is relaxed in the universality discussion below). For each define the symmetric matrix by for , , and , and set the normalised Wigner matrix $$ M_n = \frac{1}{\sqrt n}, A_n . $$ The scaling is the variance normalisation that keeps the spectrum in a bounded window: it makes , so a typical row has squared Euclidean length of order one.

Because is symmetric (Hermitian) it has real eigenvalues [from 37.02.02 the relevant convergence notions; the spectral theorem supplies reality]. The empirical spectral distribution (ESD) is the random probability measure $$ \mu_n = \frac{1}{n}\sum_{i=1}^n \delta_{\lambda_i(M_n)} , $$ a random element of the space of probability measures on . Its moments are trace functionals: for each integer , $$ \int x^k, d\mu_n(x) = \frac{1}{n}\sum_{i=1}^n \lambda_i(M_n)^k = \frac{1}{n},\mathrm{tr},M_n^k , $$ since the trace of a power equals the sum of the eigenvalues raised to that power. This identity is the engine of the moment method: it converts a statement about eigenvalues into a statement about expected traces, which expand combinatorially over matrix entries.

The standard semicircle law is the deterministic probability measure with density $$ \rho_{\mathrm{sc}}(x) = \frac{1}{2\pi}\sqrt{4 - x^2};\mathbf{1}{[-2,2]}(x). $$ Its odd moments vanish by symmetry, and its even moments are the Catalan numbers: $\int x^{2k}, d\mu{\mathrm{sc}}(x) = C_k = \frac{1}{k+1}\binom{2k}{k}\int x^{2k+1}, d\mu_{\mathrm{sc}} = 0\int x^2, d\mu_{\mathrm{sc}} = C_1 = 1[-2,2]$.

Counterexamples to common slips Intermediate+

  • The semicircle is not a Gaussian. The limiting eigenvalue density has compact support and falls to zero like a square root at the edges; it is not the bell curve. The bell curve is the limit for sums of independent scalars, not for eigenvalues of independent-entry matrices.
  • The diagonal does not matter to the limit. The off-diagonal variance must equal one, but the diagonal variance may be any finite number: there are only diagonal entries against off-diagonal ones, so the diagonal contributes negligibly to . Forgetting this and demanding is unnecessary.
  • Without the scaling there is no limit. Replacing by collapses every eigenvalue to zero; using no scaling sends the spectrum to infinity. Only the square-root normalisation freezes the window.
  • Convergence is of the empirical measure, not of individual eigenvalues. The statement is about the bulk histogram. A single eigenvalue, such as at the edge, fluctuates on a much finer scale governed by the Tracy-Widom law, not by the semicircle.

Key theorem with proof Intermediate+

Theorem (Wigner's semicircle law, moment form). Let be a sequence of normalised Wigner matrices satisfying the moment hypothesis. Then for every integer , $$ \lim_{n \to \infty} \mathbb{E}!\left[\frac{1}{n},\mathrm{tr},M_n^{k}\right] = \int x^k, d\mu_{\mathrm{sc}}(x) = \begin{cases} C_{k/2}, & k \text{ even},\[2pt] 0, & k \text{ odd}, \end{cases} $$ where is the -th Catalan number. Consequently the expected empirical spectral distribution converges weakly to the semicircle law, .

Proof. Expand the normalised trace into a sum over closed walks on the index set . Writing , $$ \mathbb{E}!\left[\frac{1}{n},\mathrm{tr},M_n^{k}\right] = \frac{1}{n^{1 + k/2}} \sum_{i_1, i_2, \dots, i_k = 1}^{n} \mathbb{E}\big[ a_{i_1 i_2}, a_{i_2 i_3} \cdots a_{i_k i_1} \big]. $$ Each index sequence is a closed walk of length on the complete graph with vertex set ; the summand is the expectation of the product of the entries traversed. Because the entries are independent and centred, the expectation of a product vanishes unless every edge traversed by the walk is traversed at least twice: an edge visited only once contributes an independent mean-zero factor that kills the whole expectation.

Group the walks by the partition of the steps induced by edge-equality. The number of distinct vertices visited by a walk in which every edge appears at least twice is at most : a connected walk with distinct edges visits at most vertices, and each edge used times forces . The factor assigns weight to each walk class (the sum over choices of distinct vertex labels contributes up to lower order). The exponent is non-positive, and it equals zero exactly when the walk has distinct edges each used exactly twice and visits exactly distinct vertices.

For odd , the parity obstruction is fatal: is not an integer, no edge-balanced walk of odd length can have every edge used an even number of times while returning to start, so the leading exponent is strictly negative and the limit is .

For even , the surviving walks are exactly those that traverse a tree on vertices, crossing each of its edges once in each direction. Such a walk is a depth-first traversal of a rooted plane tree, and it corresponds bijectively to a non-crossing pair partition of the steps: pair each step with the step that re-traverses the same edge in the opposite direction. The number of rooted plane trees with edges — equivalently, the number of non-crossing pair partitions of points, equivalently the number of Dyck paths of length — is the Catalan number . Each surviving walk contributes in the limit (each of the doubled edges supplies one second moment, normalised to one), and the count of vertex labellings is . Multiplying by gives in the limit. Diagonal entries, coincidences forcing fewer than vertices, and edges used more than twice all live in walk classes with strictly negative exponent and vanish.

Therefore and . These limits are exactly the moments of . Since the semicircle law has compact support, it is uniquely determined by its moments (its moment generating function is finite near zero), so by the method of moments — convergence of all moments to those of a moment-determinate target forces weak convergence [from 37.03.01] — .

Bridge. This computation builds toward the full almost-sure semicircle law, the local laws of Erdős-Yau, and the free-probability reading of the limit, and it appears again in the variance bound below that upgrades convergence in expectation to convergence in probability and almost surely. The foundational reason the Catalan numbers appear is that only edge-balanced walks survive the scaling, and the maximally efficient balanced walks are tree traversals, which are counted by non-crossing pair partitions; this is exactly the combinatorial skeleton that free probability later identifies as freeness, so the semicircle is to free independence what the Gaussian is to classical independence. The moment method generalises the characteristic-function route to the classical central limit theorem 37.03.01: there the cumulants beyond the second wash out and the Gaussian survives, here the crossing pairings wash out and the semicircle survives. Putting these together, the bridge is that a single scaling exponent — vertices minus one minus half the walk length — sorts every term, and the central insight is that the limit law is whatever distribution has the surviving combinatorial count as its moment sequence.

Exercises Intermediate+

Advanced results Master

The convergence holds almost surely, not merely in expectation. The variance bound derived above is summable in , so Borel-Cantelli [from 37.02.02] gives almost surely for each fixed ; a diagonal argument over a countable dense set of test polynomials then yields almost surely. The same conclusion follows from concentration of measure: for the Gaussian ensembles the map is Lipschitz in the entries with constant for -Lipschitz , and Gaussian concentration gives exponential tail bounds , far stronger than the polynomial Chebyshev bound and stable under removing the finite-moment hypothesis.

The Stieltjes-transform proof is the analytic complement to the moment method and survives when entries have only two moments. Writing for in the upper half-plane, a Schur-complement (resolvent) expansion produces the self-consistent equation , whose stable solution is the semicircle Stieltjes transform solving . Inverting via the Stieltjes inversion formula recovers . This route also yields local laws: the semicircle approximation for holds down to imaginary parts , which controls eigenvalue counts in windows containing only eigenvalues and is the technical heart of the Erdős-Yau program.

The Gaussian ensembles are the exactly solvable representatives. The Gaussian Orthogonal Ensemble (real symmetric, ), Gaussian Unitary Ensemble (complex Hermitian, ), and Gaussian Symplectic Ensemble () have joint eigenvalue density proportional to , a log-gas at inverse temperature . The Vandermonde repulsion encodes eigenvalue repulsion and, through orthogonal-polynomial (Hermite) analysis, gives exact -level correlation functions whose large- bulk limit is the sine kernel and whose edge limit is the Airy kernel.

At the spectral edge the fluctuations are not Gaussian but Tracy-Widom. The largest eigenvalue obeys , where is the Tracy-Widom distribution of index , expressible through the Hastings-McLeod solution of the Painlevé II equation via for . The scaling and the square-root vanishing of at the edge are linked: the soft edge of a density vanishing like forces exactly this fluctuation exponent. Universality, proved by Erdős-Schlein-Yau and Tao-Vu, extends both the bulk sine-kernel and the edge Tracy-Widom laws from the Gaussian ensembles to all Wigner matrices with the matching first four moments.

Synthesis. The foundational reason a single half-circle organises this entire subject is that the scaling selects exactly the edge-balanced closed walks, and among them the non-crossing tree double-traversals dominate, so the limiting moments are the Catalan numbers and nothing else; this is exactly the combinatorial content that free probability re-reads as the moments of a free-semicircular element, making the semicircle to free independence what the Gaussian is to classical independence. Putting these together, the moment method, the Stieltjes self-consistent equation , and the log-gas free-energy minimisation are three routes to the same density, and they are dual to one another: the branch point of the Catalan generating function at , the square-root branch point of at , and the edge of the equilibrium measure are the same edge seen three ways. The central insight is that universality is a statement about which terms survive a scaling limit, and it generalises the classical central limit theorem 37.03.01: there only the second cumulant survives and the Gaussian appears, here only the second moment of the entries survives and the semicircle appears. The bridge to the frontier is that the bulk universality of sine-kernel statistics and the edge universality of the Tracy-Widom law are the same selection principle pushed from global moments down to local correlations, and this is the central insight that the local-law program makes quantitative.

Full proof set Master

The moment computation, the Catalan identification, and the variance bound are proved in full above. The remaining Master claims are recorded here.

Proposition (moment-determinacy of the semicircle law). The semicircle law is the unique probability measure whose moments are (even orders) and (odd orders); equivalently, convergence of all moments to this sequence implies weak convergence to .

Proof. The measure is supported on the bounded set , so . Hence the moment generating function converges for all , and Carleman's condition holds (since gives ), so the moment problem is determinate: no other measure shares these moments. A sequence of probability measures whose every moment converges to that of a determinate compactly-moment-bounded target converges weakly to it, because tightness follows from the bounded second moments and any weak subsequential limit must share all moments, hence equal [from 37.03.01].

Proposition (the semicircle density has Catalan even moments). With on , one has .

Proof. Substitute , , , with . Then $$ \int_{-2}^{2} x^{2m}\rho_{\mathrm{sc}}, dx = \frac{1}{2\pi}\int_{-\pi/2}^{\pi/2} (2\sin\theta)^{2m},(2\cos\theta),(2\cos\theta), d\theta = \frac{2^{2m+1}}{\pi}\int_{-\pi/2}^{\pi/2}\sin^{2m}\theta,\cos^2\theta, d\theta. $$ Using and the Wallis integral , the bracket evaluates to . Multiplying by and simplifying with collapses the expression to .

Proposition (Stieltjes self-consistency). The Stieltjes transform , , satisfies and equals with the branch at infinity.

Proof. For , expand and integrate term by term against , using the moments above: . The Catalan generating function satisfies . With and , multiply by : , i.e. , which is the stated quadratic. Solving and matching the asymptotic selects the root . Analytic continuation extends the identity from to all of .

Proposition (square-root edge and the scaling heuristic). The semicircle density vanishes like as , and the typical spacing of the top eigenvalues is of order .

Proof. Near , , so . The expected number of eigenvalues in is . Setting this count to order one — the scale at which individual edge eigenvalues are resolved — gives , the Tracy-Widom window width.

Connections Master

The strong law of large numbers 37.02.02 supplies the almost-sure upgrade. The summable variance bound feeds the Borel-Cantelli lemma established in that unit to turn convergence in expectation of each empirical moment into almost-sure convergence; the law-of-large-numbers intuition that averaging over many weakly-dependent contributions yields a deterministic limit is exactly what the empirical spectral distribution realises at the level of the whole spectrum.

Characteristic functions and the Lévy continuity theorem 37.03.01 are the moment-method analogue and the source of the determinacy step. The semicircle law is identified by its Catalan moment sequence the way a classical limit is identified by its characteristic function, and the method-of-moments convergence criterion — all moments converge to those of a moment-determinate target — is the moment-side companion to the continuity theorem; the central limit theorem proved there is the scalar shadow of which the semicircle law is the matrix-valued analogue, with second cumulant replaced by second moment of the entries.

The QFT large- matrix model and topological expansion 08.14.06 is the field-theoretic counterpart of this probabilistic theorem. There the same Gaussian matrix integral is organised by ribbon graphs graded by the genus of the surface they tile, and the planar (genus-zero) sector reproduces the semicircle as the leading large- density; the non-crossing pair partitions counted here by the Catalan numbers are precisely the planar Wick contractions there, so this unit is the ensemble theorem and that unit is the saddle-point / topological-expansion derivation of the same object.

The Itô integral and Itô's formula 02.15.02 connect through the dynamical approach to the semicircle law: Dyson Brownian motion evolves the eigenvalues as interacting diffusions , an SDE whose stationary measure is the log-gas and whose hydrodynamic limit is the semicircle; the Itô calculus of that unit is the machinery that makes this eigenvalue flow and its local-law analysis rigorous.

Historical & philosophical context Master

Eugene Wigner introduced the semicircle law in 1955 while modelling the statistics of energy levels of heavy nuclei, where the exact Hamiltonian is unknown but its level density and spacing might be captured by a random matrix of the appropriate symmetry class [Wigner 1955]. The 1955 paper treated symmetric matrices with bounded entries; the 1958 follow-up [Wigner 1958] gave the moment argument identifying the even moments with the Catalan numbers and established the law for a broad class of distributions, founding the method of moments in random matrix theory. The Gaussian ensembles and their orthogonal-polynomial solution were systematised by Freeman Dyson and Madan Lal Mehta around 1960-1963, with Mehta's monograph [Mehta 2004] becoming the standard reference; Dyson's threefold classification by time-reversal and spin symmetry produced the ensembles.

The edge fluctuation law was discovered by Craig Tracy and Harold Widom in 1994 [Tracy 1994], who expressed the limiting largest-eigenvalue distribution of the Gaussian Unitary Ensemble through a Painlevé II transcendent; the same distribution was subsequently found governing the longest increasing subsequence of a random permutation, last-passage percolation, and a growing family of models in the Kardar-Parisi-Zhang universality class, far outside the matrix setting in which it was found. The universality conjecture — that bulk and edge statistics depend only on the symmetry class and not on the entry distribution — was proved in the late 2000s by the Erdős-Schlein-Yau local-law program and independently by Tao and Vu, completing a line of inquiry that began with Wigner's hypothesis that random matrices model the universal statistics of complex spectra.

Bibliography Master

@article{wigner1955,
  author  = {Wigner, Eugene P.},
  title   = {Characteristic vectors of bordered matrices with infinite dimensions},
  journal = {Annals of Mathematics},
  volume  = {62},
  number  = {3},
  pages   = {548--564},
  year    = {1955}
}

@article{wigner1958,
  author  = {Wigner, Eugene P.},
  title   = {On the distribution of the roots of certain symmetric matrices},
  journal = {Annals of Mathematics},
  volume  = {67},
  number  = {2},
  pages   = {325--327},
  year    = {1958}
}

@book{agz2010,
  author    = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
  title     = {An Introduction to Random Matrices},
  series    = {Cambridge Studies in Advanced Mathematics},
  volume    = {118},
  publisher = {Cambridge University Press},
  year      = {2010}
}

@book{tao2012,
  author    = {Tao, Terence},
  title     = {Topics in Random Matrix Theory},
  series    = {Graduate Studies in Mathematics},
  volume    = {132},
  publisher = {American Mathematical Society},
  year      = {2012}
}

@article{tracywidom1994,
  author  = {Tracy, Craig A. and Widom, Harold},
  title   = {Level-spacing distributions and the Airy kernel},
  journal = {Communications in Mathematical Physics},
  volume  = {159},
  number  = {1},
  pages   = {151--174},
  year    = {1994}
}

@book{mehta2004,
  author    = {Mehta, Madan Lal},
  title     = {Random Matrices},
  edition   = {3rd},
  publisher = {Elsevier/Academic Press, Amsterdam},
  year      = {2004}
}

@book{erdosyau2017,
  author    = {Erd{\H o}s, L\'aszl\'o and Yau, Horng-Tzer},
  title     = {A Dynamical Approach to Random Matrix Theory},
  series    = {Courant Lecture Notes},
  volume    = {28},
  publisher = {American Mathematical Society},
  year      = {2017}
}