The Largest Eigenvalue and the Operator-Norm Bound
Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.1.6 (operator norm, high-trace method); Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) §2.3; Bai-Yin, Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue, Annals of Probability 16 (1988), 1729-1741; Tracy-Widom, Commun. Math. Phys. 159 (1994), 151-174
Intuition Beginner
The semicircle picture tells you where the bulk of a random matrix's eigenvalues live: spread smoothly across a fixed window from minus two to plus two once the entries are rescaled. But the histogram of the bulk leaves one question open. The single largest eigenvalue — the rightmost spike in the picture — could in principle poke out past the edge of the window, sitting alone far to the right of where the bulk ends. Does it? The answer is that it does not. As the matrix grows, the top eigenvalue presses right up against the edge of the semicircle and stops there, settling onto the value two.
This is a statement about the strongest stretching the matrix can do in any direction. The largest eigenvalue measures the most a symmetric matrix can amplify a vector. So saying it converges to two says the matrix never stretches anything by much more than two, no matter which direction you try, once you have rescaled it. There is no rogue direction of runaway amplification hiding outside the bulk.
Two different styles of argument pin this down. One counts very high powers of the matrix, because raising to a big power makes the largest eigenvalue dominate everything else, so its size leaks out of an averaged power. The other covers the sphere of all directions with a fine grid and checks them all at once. Both lead to the same wall at two.
Visual Beginner
Picture the familiar semicircle histogram of eigenvalues, a smooth half-dome sitting over the window from minus two to plus two. Now mark the single rightmost eigenvalue with a tall thin pointer. For a small matrix the pointer wobbles: sometimes it lands a little inside the right edge, sometimes it pokes slightly past two. As you redraw the picture for bigger and bigger matrices, the pointer's wobble shrinks and it locks onto the right edge at two, hugging the spot where the half-dome meets the floor.
The dashed vertical line at two is the wall. The point of the picture is that the rightmost eigenvalue does not escape past this wall as the matrix grows: its random wandering is confined to a shrinking neighbourhood of the right edge of the semicircle, and the amount it pokes past two shrinks to nothing.
Worked example Beginner
We watch how raising a matrix to a high power makes its largest eigenvalue stand out, using a tiny example with two known eigenvalues.
Step 1. Suppose a symmetric matrix has just two eigenvalues, and . The largest is . We want to recover the number from an averaged power without looking at the eigenvalues one at a time.
Step 2. The sum of the squares of the eigenvalues is . Take the square root: about . That overshoots the true largest value of , because the smaller eigenvalue still contributes.
Step 3. Now use the sixth powers instead of the squares. The sum is . Take the sixth root: about . Much closer to , because the bigger eigenvalue raised to the sixth power towers over the smaller one.
Step 4. Push to the twentieth powers. The sum is . The twentieth root of that is about . The smaller eigenvalue has been almost completely drowned out, and the averaged high power has handed us back the largest eigenvalue to high accuracy.
Step 5. What this tells us: a high enough power of a matrix is dominated by its largest eigenvalue, so an averaged high power gives a bound on that eigenvalue. For a random matrix you do not know the eigenvalues, but you can compute the average of a high power from the entries — and that is exactly the handle the moment method uses to fence the largest eigenvalue against the edge at two.
Check your understanding Beginner
Formal definition Intermediate+
Let be a normalised Wigner matrix as in 37.08.01: is real symmetric (or Hermitian) with independent centred entries on and above the diagonal, off-diagonal variance , and finite higher moments where stated. Its eigenvalues are real, and the largest eigenvalue is . Because is Hermitian, its operator norm (spectral norm) is the largest eigenvalue in absolute value,
$$
|M_n|{\mathrm{op}} = \max{|x|2 = 1}|M_n x|2 = \max_i |\lambda_i(M_n)| = \max\big(\lambda{\max}(M_n),, -\lambda{\min}(M_n)\big).
$$
The symmetry of the entry distribution makes the spectrum symmetric in law, so and have the same limit; the edge analysis below is stated for the right edge and applies verbatim to the left.
The right edge of the semicircle law is the right endpoint of its support . The semicircle is the limit of the empirical spectral distribution 37.08.01, a statement about the bulk histogram; it does not by itself control the single extreme eigenvalue, because moving one eigenvalue does not change the limiting empirical measure. The edge statement is therefore a genuinely sharper claim. Three forms appear, in increasing strength:
- Operator-norm upper bound (in probability): for every , . No eigenvalue strays a fixed distance past the edge.
- Bai-Yin convergence (almost sure): if the entries have a finite fourth moment, almost surely; the fourth-moment condition is necessary as well as sufficient.
- Tracy-Widom fluctuation: the centred and rescaled top eigenvalue converges in distribution to the Tracy-Widom law , where is the symmetry class.
The first is the operator-norm bound proper; the second is its almost-sure sharpening under the optimal moment hypothesis; the third describes the fluctuations once the location is pinned. The high-trace (moment) method proves the upper bound by comparing to the trace of the -th power, and the -net method proves a non-asymptotic version by discretising the unit sphere.
Counterexamples to common slips Intermediate+
- The semicircle law does not imply the edge. Convergence of the empirical measure allows a vanishing fraction of eigenvalues — even a single one — to sit far outside without affecting the limiting histogram. A separate argument is required to confine the extreme eigenvalue.
- A fixed power does not reach the edge. For each fixed , and only in the limit ; one must let the exponent grow with to extract the value rather than a strict underestimate.
- Finite fourth moment is exactly the threshold. With infinitely many moments the upper bound is easy, but the sharp Bai-Yin result is that almost surely holds if and only if . If the fourth moment is infinite, the largest entry alone produces eigenvalues escaping to infinity, and .
- Operator norm is not the bulk. The norm reads off the extreme eigenvalue, not a typical one. A bound on controls the sum of -th powers, hence the maximum, but the converse fails: knowing the bulk says nothing about the extreme.
Key theorem with proof Intermediate+
Theorem (operator-norm upper bound via the high-trace method). Let be normalised Wigner matrices with entries having finite moments of every order (the moment hypothesis of 37.08.01). Then converges to in probability from above: for every ,
$$
\lim_{n\to\infty}\mathbb{P}\big(|M_n|_{\mathrm{op}} > 2 + \varepsilon\big) = 0.
$$
Proof. The largest eigenvalue is bounded by any even trace power, because every eigenvalue contributes a non-negative term to an even trace:
$$
\lambda_{\max}(M_n)^{2k} \le \sum_{i=1}^n \lambda_i(M_n)^{2k} = \mathrm{tr},M_n^{2k},
$$
and the same bound holds for , hence . Taking expectations and applying Markov's inequality, for any threshold ,
$$
\mathbb{P}\big(|M_n|{\mathrm{op}} > t\big) = \mathbb{P}\big(|M_n|{\mathrm{op}}^{2k} > t^{2k}\big) \le t^{-2k},\mathbb{E},\mathrm{tr},M_n^{2k} = t^{-2k}, n,\mathbb{E}\Big[\tfrac1n\mathrm{tr},M_n^{2k}\Big].
$$
The moment method of 37.08.01 computed the leading order of for fixed as the Catalan number . To reach the edge the exponent must be allowed to grow with , and the combinatorics must be controlled uniformly in . A refinement of the walk-counting argument gives the non-asymptotic trace bound: there is an absolute constant such that for all and all ,
$$
\mathbb{E},\mathrm{tr},M_n^{2k} \le n, C_k, \big(1 + o(1)\big) \le n, \frac{4^k}{k^{3/2}},C'
$$
for a constant , using the closed form . The error terms collected from walks visiting fewer than vertices, or using an edge more than twice, are each suppressed by a factor relative to the leading , so as long as grows slower than a power of the leading Catalan term dominates.
Now choose the exponent to grow slowly: set , so while . Fix and take . Then $$ \mathbb{P}\big(|M_n|{\mathrm{op}} > 2 + \varepsilon\big) \le (2 + \varepsilon)^{-2k_n}\cdot n,\frac{4^{k_n}}{k_n^{3/2}},C' = \frac{n,C'}{k_n^{3/2}}\left(\frac{4}{(2+\varepsilon)^2}\right)^{k_n} = \frac{n,C'}{k_n^{3/2}},\rho^{k_n}, $$ where . With , the factor decays faster than any power of , so . Therefore . The matching lower bound $\liminf_n\lambda{\max}\ge 2 - \varepsilon\mu_n\Rightarrow\mu_{\mathrm{sc}}\mu_{\mathrm{sc}}(2 - \varepsilon, 2]2 - \varepsilon\lambda_{\max}\ge 2 - \varepsilon|M_n|_{\mathrm{op}}\to 2\square$
Bridge. This high-trace computation builds toward the Bai-Yin almost-sure theorem and the Tracy-Widom edge fluctuation, and it appears again in the -net route below, which reaches the same wall by discretising directions rather than by powering. The foundational reason the value is exactly is that the growth rate of the even traces is , whose exponential rate is the square of the edge: the moment method counts the same non-crossing tree walks as in the bulk, but letting the walk length converts the growth constant into the location of the edge. This is exactly the slowly-growing-exponent device that turns a bulk moment computation into an edge bound, and it is dual to the resolvent analysis of 37.08.02, where the square-root branch point of at locates the same edge analytically. Putting these together, the bridge is that a single growth constant — the in the Catalan asymptotics — generalises into the edge location once the exponent is allowed to scale with the matrix size, and the central insight is that the extreme eigenvalue is controlled by trading the precision of a fixed moment for the reach of a growing one.
Exercises Intermediate+
Advanced results Master
The almost-sure operator-norm convergence under the optimal moment hypothesis is the Bai-Yin theorem [Bai 1988]: for a Wigner matrix with i.i.d. centred off-diagonal entries of unit variance, almost surely if and only if (and the corresponding diagonal moment is finite). The proof truncates each entry at level with ; the finite fourth moment makes the truncation error negligible in operator norm and ensures almost surely, after which the high-trace bound with produces a tail summable in , so Borel-Cantelli gives the almost-sure upper bound; the lower bound is the bulk argument. The necessity is sharp: if the fourth moment diverges, does not vanish and the single largest entry forces , so no finite limit exists.
The non-asymptotic theory gives bounds valid at every finite rather than only in the limit. For an symmetric matrix with independent sub-gaussian entries of variance , the -net argument [Vershynin 2018] yields for absolute constants : cover by a net of cardinality , apply a sub-gaussian (Hanson-Wright) tail to each fixed quadratic form , and union-bound. The constant this delivers is not sharp — it exceeds the true edge — but the bound is uniform in and carries an exponentially small failure probability, which the moment method does not provide directly. The two methods are complementary: the high-trace method gives the sharp constant but only a polynomial tail at fixed , while the net gives the sharp tail but a loose constant. Sharpening the net constant to requires a chaining (generic-chaining / Dudley) refinement that replaces the single-scale net by a hierarchy of nets.
The edge fluctuations are governed by the Tracy-Widom law [Tracy 1994], a distribution with no elementary closed form, defined through the Fredholm determinant of the Airy kernel on , equivalently through Painlevé II. The left tail of decays like and the right tail like , the asymmetry reflecting that pushing the top eigenvalue inward against the bulk's repulsion is harder than pulling it outward into empty spectrum. The same distribution governs the largest eigenvalue of the Gaussian ensembles exactly, and edge universality, proved by the Erdős-Schlein-Yau program and by Tao-Vu, extends it to all Wigner matrices whose entries match the Gaussian moments through fourth order. The rescaling and the Airy kernel both descend from the softening of the semicircle at the edge: a soft edge of a density vanishing like a square root is exactly the universality class of the Airy point process.
The operator norm of non-Wigner ensembles follows the same architecture with a shifted edge. For sample-covariance (Wishart) matrices with aspect ratio , the largest eigenvalue converges to the right edge of the Marchenko-Pastur law and fluctuates, after rescaling, according to Tracy-Widom; the high-trace and net methods both transfer, with the trace growth constant set by the Marchenko-Pastur edge rather than by . The principle is that the operator norm is the exponential growth rate of the even traces, and that growth rate is the square of the right edge of whatever limiting spectral distribution the ensemble has.
Synthesis. The foundational reason the operator norm converges to exactly is that is the exponential growth rate of the -th moments of the semicircle — the in the Catalan asymptotics is — and the high-trace method extracts that rate by letting the exponent slowly, so the bulk moment computation of 37.08.01 becomes an edge statement with no new combinatorics. Putting these together, the moment route, the -net route, and the resolvent route of 37.08.02 are three roads to the same edge: the growth constant of the traces, the covering-number-versus-sub-gaussian-tail balance on the sphere, and the square-root branch point of at are the same edge seen three ways, and this is exactly the duality that the operator-norm bound, the net bound, and the analytic edge analysis share. The central insight is that the extreme eigenvalue is an exponential-growth-rate functional, so it is governed by the tail of the moment sequence and by the right endpoint of the support, not by the bulk; this generalises to every ensemble whose limiting law has a hard or soft edge, with the soft square-root edge carrying the universal Tracy-Widom fluctuation. The bridge to the frontier is that edge universality is the operator-norm bound made quantitative down to the fluctuation scale, the same selection principle that pinned the location now pinning the law, and this is dual to the bulk universality the local semicircle law makes quantitative.
Full proof set Master
The high-trace upper bound, the slowly-growing-exponent device, and the matching lower bound are proved in full above. The remaining Master claims are recorded here.
Proposition (operator norm equals the largest even-trace growth rate). For any Hermitian matrix , .
Proof. Let . Each term of is non-negative, so , the lower bound from the single largest term and the upper from bounding all terms by the largest. Taking -th roots, , and as , so the limit is .
Proposition (covering number of the sphere). For the unit sphere admits an -net with .
Proof. Take a maximal -separated subset (it exists by Zorn or by greedy selection since the sphere is compact). Maximality forces to be an -net: any sphere point within of no net point could be added, contradicting maximality. The open balls of radius about the points of are disjoint and contained in the shell , which sits inside the ball of radius . Comparing volumes, , so .
Proposition (net comparison for the operator norm). Let be Hermitian and an -net of with . Then .
Proof. The left inequality is immediate since each . For the right, let with attain (a maximiser exists by compactness and the variational characterisation of the spectral norm for Hermitian ). Pick with . By sesquilinearity, $$ \langle x^, Mx^\rangle - \langle y, My\rangle = \langle x^, Mx^\rangle - \langle x^* - h, M(x^* - h)\rangle = 2\operatorname{Re}\langle x^, Mh\rangle - \langle h, Mh\rangle, $$ so $|\langle x^, Mx^\rangle - \langle y, My\rangle|\le 2|M|,|x^|,|h| + |M|,|h|^2\le(2\varepsilon + \varepsilon^2)|M|\le 2\varepsilon|M|\varepsilon < 1\varepsilon^2|M| = |\langle x^, Mx^\rangle|\le|\langle y, My\rangle| + 2\varepsilon|M|\le\max_{x\in\mathcal N}|\langle x, Mx\rangle| + 2\varepsilon|M|\square$
Proposition (high-trace tail with growing exponent). Under the moment hypothesis, with the trace bound valid uniformly for , the choice gives for all large , which is summable in .
Proof. By the Markov step in the Key theorem, with . With , for large enough that . Setting gives the stated bound. Since (the terms decay faster than for large ), Borel-Cantelli yields eventually almost surely; intersecting over a sequence gives a.s. under the full moment hypothesis.
Proposition (edge count and the scale). The expected number of eigenvalues of in is asymptotic to , so the natural edge fluctuation scale solving is .
Proof. Using the semicircle density and the edge expansion as , the expected proportion of eigenvalues in is . Multiplying by gives the count . The top eigenvalue is resolved at the scale where this count is of order one, namely , which is the width of the Tracy-Widom window.
Connections Master
The Wigner semicircle law and the moment method 37.08.01 supplies both the trace-power machinery and the matching lower bound. The high-trace upper bound reuses the closed-walk expansion of developed there, now with the exponent instead of fixed, and the lower bound is read directly off the convergence proved in that unit; the operator-norm statement is the edge refinement of which the semicircle law is the bulk statement.
The Stieltjes transform and the semicircle law via the resolvent 37.08.02 is the analytic dual of the edge analysis here. The branch point of at is the same spectral edge the operator-norm bound locates combinatorially, and the resolvent method, refined to imaginary parts of order , is what makes the Tracy-Widom edge fluctuation quantitative where the moment method only pins the location.
The strong law of large numbers and Borel-Cantelli 37.02.02 is the engine of the almost-sure Bai-Yin upgrade. The summable tail produced by the growing-exponent high-trace bound is exactly the input Borel-Cantelli needs to convert convergence in probability into almost-sure convergence of the largest eigenvalue.
The spectral concentration via log-Sobolev and the Herbst argument 37.08.07 underlies the -net route and its sharpening. The Hanson-Wright quadratic-form tail bounding each and the union bound over an exponentially large net are concentration statements, and the functional-inequality machinery of that unit gives the dimension-free Gaussian concentration of the operator norm around its mean; the net method trades the sharp constant of the moment method for the exponentially small failure probability that concentration provides.
Historical & philosophical context Master
The high-trace method for the operator norm was introduced by Zoltán Füredi and János Komlós in 1981 [Füredi 1981], who refined Wigner's moment computation by letting the trace exponent grow with the matrix dimension and tracking the walk combinatorics uniformly, obtaining with quantitative error terms for matrices with bounded entries. The sharp almost-sure result under the minimal hypothesis is due to Zhidong Bai and Yong-Qua Yin in 1988 [Bai 1988], who proved that a finite fourth moment of the entries is both necessary and sufficient for almost surely, identifying the exact integrability threshold and closing the question of when the largest eigenvalue is asymptotically governed by the bulk edge rather than by a single anomalous entry.
The edge fluctuation law was found by Craig Tracy and Harold Widom in 1994 [Tracy 1994] for the Gaussian ensembles, who expressed the limiting distribution of through the Airy kernel and a Painlevé II transcendent; the same distribution was subsequently identified governing the longest increasing subsequence of a random permutation by Baik, Deift and Johansson, and a wide family of models in the Kardar-Parisi-Zhang universality class. The -net approach to non-asymptotic matrix norms, by contrast, grew out of geometric functional analysis and the local theory of Banach spaces, and was systematised for high-dimensional probability and data science by Roman Vershynin [Vershynin 2018]; edge universality, extending the Tracy-Widom law from the Gaussian ensembles to general Wigner matrices, was established by the Erdős-Schlein-Yau local-law program and by Tao and Vu in the late 2000s.
Bibliography Master
@article{furedikomlos1981,
author = {F\"uredi, Zolt\'an and Koml\'os, J\'anos},
title = {The eigenvalues of random symmetric matrices},
journal = {Combinatorica},
volume = {1},
number = {3},
pages = {233--241},
year = {1981}
}
@article{baiyin1988,
author = {Bai, Zhidong and Yin, Yong-Qua},
title = {Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a {Wigner} matrix},
journal = {Annals of Probability},
volume = {16},
number = {4},
pages = {1729--1741},
year = {1988}
}
@book{agz2010,
author = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
title = {An Introduction to Random Matrices},
series = {Cambridge Studies in Advanced Mathematics},
volume = {118},
publisher = {Cambridge University Press},
year = {2010}
}
@book{vershynin2018,
author = {Vershynin, Roman},
title = {High-Dimensional Probability: An Introduction with Applications in Data Science},
series = {Cambridge Series in Statistical and Probabilistic Mathematics},
publisher = {Cambridge University Press},
year = {2018}
}
@article{tracywidom1994,
author = {Tracy, Craig A. and Widom, Harold},
title = {Level-spacing distributions and the Airy kernel},
journal = {Communications in Mathematical Physics},
volume = {159},
number = {1},
pages = {151--174},
year = {1994}
}
@book{tao2012,
author = {Tao, Terence},
title = {Topics in Random Matrix Theory},
series = {Graduate Studies in Mathematics},
volume = {132},
publisher = {American Mathematical Society},
year = {2012}
}