37.08.06 · probability / 08-random-matrices

The Largest Eigenvalue and the Operator-Norm Bound

shipped3 tiersLean: none

Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.1.6 (operator norm, high-trace method); Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) §2.3; Bai-Yin, Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue, Annals of Probability 16 (1988), 1729-1741; Tracy-Widom, Commun. Math. Phys. 159 (1994), 151-174

Intuition Beginner

The semicircle picture tells you where the bulk of a random matrix's eigenvalues live: spread smoothly across a fixed window from minus two to plus two once the entries are rescaled. But the histogram of the bulk leaves one question open. The single largest eigenvalue — the rightmost spike in the picture — could in principle poke out past the edge of the window, sitting alone far to the right of where the bulk ends. Does it? The answer is that it does not. As the matrix grows, the top eigenvalue presses right up against the edge of the semicircle and stops there, settling onto the value two.

This is a statement about the strongest stretching the matrix can do in any direction. The largest eigenvalue measures the most a symmetric matrix can amplify a vector. So saying it converges to two says the matrix never stretches anything by much more than two, no matter which direction you try, once you have rescaled it. There is no rogue direction of runaway amplification hiding outside the bulk.

Two different styles of argument pin this down. One counts very high powers of the matrix, because raising to a big power makes the largest eigenvalue dominate everything else, so its size leaks out of an averaged power. The other covers the sphere of all directions with a fine grid and checks them all at once. Both lead to the same wall at two.

Visual Beginner

Picture the familiar semicircle histogram of eigenvalues, a smooth half-dome sitting over the window from minus two to plus two. Now mark the single rightmost eigenvalue with a tall thin pointer. For a small matrix the pointer wobbles: sometimes it lands a little inside the right edge, sometimes it pokes slightly past two. As you redraw the picture for bigger and bigger matrices, the pointer's wobble shrinks and it locks onto the right edge at two, hugging the spot where the half-dome meets the floor.

The dashed vertical line at two is the wall. The point of the picture is that the rightmost eigenvalue does not escape past this wall as the matrix grows: its random wandering is confined to a shrinking neighbourhood of the right edge of the semicircle, and the amount it pokes past two shrinks to nothing.

Worked example Beginner

We watch how raising a matrix to a high power makes its largest eigenvalue stand out, using a tiny example with two known eigenvalues.

Step 1. Suppose a symmetric matrix has just two eigenvalues, $2$ and $1$ . The largest is $2$ . We want to recover the number $2$ from an averaged power without looking at the eigenvalues one at a time.

Step 2. The sum of the squares of the eigenvalues is $2^{2} + 1^{2} = 4 + 1 = 5$ . Take the square root: about $2.24$ . That overshoots the true largest value of $2$ , because the smaller eigenvalue still contributes.

Step 3. Now use the sixth powers instead of the squares. The sum is $2^{6} + 1^{6} = 64 + 1 = 65$ . Take the sixth root: about $2.01$ . Much closer to $2$ , because the bigger eigenvalue raised to the sixth power towers over the smaller one.

Step 4. Push to the twentieth powers. The sum is $2^{20} + 1^{20} = 1048576 + 1 = 1048577$ . The twentieth root of that is about $2.0000007$ . The smaller eigenvalue has been almost completely drowned out, and the averaged high power has handed us back the largest eigenvalue to high accuracy.

Step 5. What this tells us: a high enough power of a matrix is dominated by its largest eigenvalue, so an averaged high power gives a bound on that eigenvalue. For a random matrix you do not know the eigenvalues, but you can compute the average of a high power from the entries — and that is exactly the handle the moment method uses to fence the largest eigenvalue against the edge at two.

Check your understanding Beginner

Formal definition Intermediate+

Let $M_{n} = n^{- 1/2} A_{n}$ be a normalised Wigner matrix as in 37.08.01: $A_{n}$ is $n \times n$ real symmetric (or Hermitian) with independent centred entries on and above the diagonal, off-diagonal variance $E ∣ a_{ij} ∣^{2} = 1$ , and finite higher moments where stated. Its eigenvalues $λ_{1} (M_{n}) \leq \dots \leq λ_{n} (M_{n})$ are real, and the largest eigenvalue is $λ_{m a x} (M_{n}) = λ_{n} (M_{n})$ . Because $M_{n}$ is Hermitian, its operator norm (spectral norm) is the largest eigenvalue in absolute value, $$ |M_n|{\mathrm{op}} = \max{|x|2 = 1}|M_n x|2 = \max_i |\lambda_i(M_n)| = \max\big(\lambda{\max}(M_n),, -\lambda{\min}(M_n)\big). $$ The symmetry of the entry distribution makes the spectrum symmetric in law, so $λ_{m a x}$ and $- λ_{m i n}$ have the same limit; the edge analysis below is stated for the right edge and applies verbatim to the left.

The right edge of the semicircle law $μ_{sc}$ is the right endpoint of its support $[- 2, 2]$ . The semicircle is the limit of the empirical spectral distribution 37.08.01, a statement about the bulk histogram; it does not by itself control the single extreme eigenvalue, because moving one eigenvalue does not change the limiting empirical measure. The edge statement is therefore a genuinely sharper claim. Three forms appear, in increasing strength:

Operator-norm upper bound (in probability): for every $ε > 0$ , $P (∥ M_{n} ∥_{op} > 2 + ε) \to 0$ . No eigenvalue strays a fixed distance past the edge.
Bai-Yin convergence (almost sure): if the entries have a finite fourth moment, $λ_{m a x} (M_{n}) \to 2$ almost surely; the fourth-moment condition is necessary as well as sufficient.
Tracy-Widom fluctuation: the centred and rescaled top eigenvalue $n^{2/3} (λ_{m a x} (M_{n}) - 2)$ converges in distribution to the Tracy-Widom law $TW_{β}$ , where $β \in {1, 2}$ is the symmetry class.

The first is the operator-norm bound proper; the second is its almost-sure sharpening under the optimal moment hypothesis; the third describes the fluctuations once the location is pinned. The high-trace (moment) method proves the upper bound by comparing $λ_{m a x}^{2 k}$ to the trace of the $2 k$ -th power, and the $ε$ -net method proves a non-asymptotic version by discretising the unit sphere.

Counterexamples to common slips Intermediate+

The semicircle law does not imply the edge. Convergence of the empirical measure $μ_{n} \Rightarrow μ_{sc}$ allows a vanishing fraction of eigenvalues — even a single one — to sit far outside $[- 2, 2]$ without affecting the limiting histogram. A separate argument is required to confine the extreme eigenvalue.
A fixed power does not reach the edge. For each fixed $k$ , $E [\frac{1}{n} tr M_{n}^{2 k}] \to C_{k}$ and $(C_{k})^{1/2 k} \to 2$ only in the limit $k \to \infty$ ; one must let the exponent $k = k_{n}$ grow with $n$ to extract the value $2$ rather than a strict underestimate.
Finite fourth moment is exactly the threshold. With infinitely many moments the upper bound is easy, but the sharp Bai-Yin result is that $λ_{m a x} \to 2$ almost surely holds if and only if $E [a_{ij}^{4}] < \infty$ . If the fourth moment is infinite, the largest entry alone produces eigenvalues escaping to infinity, and $lim sup_{n} λ_{m a x} / ? = \infty$ .
Operator norm is not the bulk. The norm reads off the extreme eigenvalue, not a typical one. A bound on $\frac{1}{n} tr M_{n}^{2 k}$ controls the sum of $2 k$ -th powers, hence the maximum, but the converse fails: knowing the bulk says nothing about the extreme.

Key theorem with proof Intermediate+

Theorem (operator-norm upper bound via the high-trace method). Let $(M_{n})$ be normalised Wigner matrices with entries having finite moments of every order (the moment hypothesis of 37.08.01). Then $∥ M_{n} ∥_{op}$ converges to $2$ in probability from above: for every $ε > 0$ , $$ \lim_{n\to\infty}\mathbb{P}\big(|M_n|_{\mathrm{op}} > 2 + \varepsilon\big) = 0. $$

Proof. The largest eigenvalue is bounded by any even trace power, because every eigenvalue contributes a non-negative term to an even trace: $$ \lambda_{\max}(M_n)^{2k} \le \sum_{i=1}^n \lambda_i(M_n)^{2k} = \mathrm{tr},M_n^{2k}, $$ and the same bound holds for $∣ λ_{m i n} ∣^{2 k}$ , hence $∥ M_{n} ∥_{op}^{2 k} \leq tr M_{n}^{2 k}$ . Taking expectations and applying Markov's inequality, for any threshold $t > 0$ , $$ \mathbb{P}\big(|M_n|{\mathrm{op}} > t\big) = \mathbb{P}\big(|M_n|{\mathrm{op}}^{2k} > t^{2k}\big) \le t^{-2k},\mathbb{E},\mathrm{tr},M_n^{2k} = t^{-2k}, n,\mathbb{E}\Big[\tfrac1n\mathrm{tr},M_n^{2k}\Big]. $$ The moment method of 37.08.01 computed the leading order of $E [\frac{1}{n} tr M_{n}^{2 k}]$ for fixed $k$ as the Catalan number $C_{k}$ . To reach the edge the exponent must be allowed to grow with $n$ , and the combinatorics must be controlled uniformly in $k$ . A refinement of the walk-counting argument gives the non-asymptotic trace bound: there is an absolute constant $C$ such that for all $n$ and all $k \leq n^{1/2}$ , $$ \mathbb{E},\mathrm{tr},M_n^{2k} \le n, C_k, \big(1 + o(1)\big) \le n, \frac{4^k}{k^{3/2}},C' $$ for a constant $C^{'}$ , using the closed form $C_{k} = \frac{1}{k + 1} (k 2 k) \sim \frac{4 ^{k}}{π k ^{3/2}}$ . The error terms collected from walks visiting fewer than $k + 1$ vertices, or using an edge more than twice, are each suppressed by a factor $k^{O (1)} / n$ relative to the leading $C_{k}$ , so as long as $k$ grows slower than a power of $n$ the leading Catalan term dominates.

Now choose the exponent to grow slowly: set $k = k_{n} = ⌊(lo g n)^{2} ⌋$ , so $k_{n} \to \infty$ while $k_{n} = o (n^{1/2})$ . Fix $ε > 0$ and take $t = 2 + ε$ . Then $$ \mathbb{P}\big(|M_n|{\mathrm{op}} > 2 + \varepsilon\big) \le (2 + \varepsilon)^{-2k_n}\cdot n,\frac{4^{k_n}}{k_n^{3/2}},C' = \frac{n,C'}{k_n^{3/2}}\left(\frac{4}{(2+\varepsilon)^2}\right)^{k_n} = \frac{n,C'}{k_n^{3/2}},\rho^{k_n}, $$ where $ρ = 4/ (2 + ε)^{2} < 1$ . With $k_{n} = (lo g n)^{2}$ , the factor $ρ^{k_{n}} = exp (k_{n} lo g ρ) = exp (- c (lo g n)^{2})$ decays faster than any power of $n$ , so $n ρ^{k_{n}} \to 0$ . Therefore $P (∥ M_{n} ∥_{op} > 2 + ε) \to 0$ . The matching lower bound $\liminf_n\lambda{\max}\ge 2 - \varepsilon $f o l l o w s f r o m t h e b u l k : s in ce$ \mu_n\Rightarrow\mu_{\mathrm{sc}} $an d$ \mu_{\mathrm{sc}} $p u t s p os i t i v e ma sso n$ (2 - \varepsilon, 2] $, w i t hhi g h p r o babi l i t y a tl e a s t o n ee i g e n v a l u ee x cee d s$ 2 - \varepsilon $, so$ \lambda_{\max}\ge 2 - \varepsilon $. C o mbinin g,$ |M_n|_{\mathrm{op}}\to 2 $in p r o babi l i t y .$ \square$

Bridge. This high-trace computation builds toward the Bai-Yin almost-sure theorem and the Tracy-Widom edge fluctuation, and it appears again in the $ε$ -net route below, which reaches the same wall by discretising directions rather than by powering. The foundational reason the value is exactly $2$ is that the growth rate of the even traces is $C_{k} \sim 4^{k} k^{- 3/2}$ , whose exponential rate $4 = 2^{2}$ is the square of the edge: the moment method counts the same non-crossing tree walks as in the bulk, but letting the walk length $k_{n} \to \infty$ converts the $4^{k}$ growth constant into the location $2$ of the edge. This is exactly the slowly-growing-exponent device that turns a bulk moment computation into an edge bound, and it is dual to the resolvent analysis of 37.08.02, where the square-root branch point of $z^{2} - 4$ at $z = 2$ locates the same edge analytically. Putting these together, the bridge is that a single growth constant — the $4^{k}$ in the Catalan asymptotics — generalises into the edge location once the exponent is allowed to scale with the matrix size, and the central insight is that the extreme eigenvalue is controlled by trading the precision of a fixed moment for the reach of a growing one.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Carry out the Markov-inequality step of the proof explicitly: starting from $∥ M_{n} ∥_{op}^{2 k} \leq tr M_{n}^{2 k}$ and the trace bound $E tr M_{n}^{2 k} \leq n 4^{k}$ , find the largest $ε$ -decay rate obtainable by optimising over $k$ for fixed $n$ .

Hint

Minimise $n (4/ (2 + ε)^{2})^{k}$ over $k$ ; with $ρ < 1$ fixed it decreases in $k$ , so the binding constraint is the largest admissible $k \approx n^{1/2}$ .

Answer

From $P (∥ M_{n} ∥_{op} > 2 + ε) \leq n ρ^{k}$ with $ρ = 4/ (2 + ε)^{2} < 1$ , the bound decreases monotonically in $k$ , so one pushes $k$ as large as the trace estimate permits, $k \sim c n^{1/2}$ . Then $n ρ^{k} = exp (lo g n - c n^{1/2} ∣ lo g ρ ∣) \to 0$ at a stretched-exponential rate $exp (- c^{'} n)$ . Choosing instead $k_{n} = (lo g n)^{2}$ as in the proof gives the weaker but sufficient rate $exp (- c (lo g n)^{2})$ , which already beats every power of $n$ ; the gain from the larger $k$ is what feeds the Borel-Cantelli step in the almost-sure version, since $exp (- c^{'} n)$ is summable. The optimisation shows the moment method delivers not just convergence in probability but a tail bound strong enough to upgrade to almost-sure convergence.

Exercise 4 (medium, short-answer).

State the $ε$ -net bound for the operator norm: if $N$ is an $ε$ -net of the unit sphere $S^{n - 1}$ with $ε < 1$ , then $∥ M ∥_{op} \leq (1 - 2 ε)^{- 1} max_{x \in N} ∣ ⟨ x, M x ⟩ ∣$ for Hermitian $M$ . Prove it.

Hint

For the maximiser $x^{*}$ of the quadratic form, pick a net point $y$ within $ε$ and expand $⟨ x^{*}, M x^{*} ⟩ - ⟨ y, M y ⟩$ using bilinearity and $∥ M ∥ = ∣ ⟨ x^{*}, M x^{*} ⟩ ∣$ .

Answer

For Hermitian $M$ , $∥ M ∥_{op} = max_{∥ x ∥ = 1} ∣ ⟨ x, M x ⟩ ∣$ ; let $x^{*}$ attain it. Choose $y \in N$ with $∥ x^{*} - y ∥ \leq ε$ . Writing $x^{*} = y + h$ with $∥ h ∥ \leq ε$ , $$ \langle x^, Mx^\rangle = \langle y, My\rangle + 2\operatorname{Re}\langle y, Mh\rangle + \langle h, Mh\rangle, $$ and $∣2 Re ⟨ y, M h ⟩ + ⟨ h, M h ⟩ ∣ \leq 2∥ M ∥ ∥ y ∥ ∥ h ∥ + ∥ M ∥ ∥ h ∥^{2} \leq (2 ε + ε^{2}) ∥ M ∥ \leq 2 ε ∥ M ∥$ for the relevant regime. Hence $∥ M ∥ = ∣ ⟨ x^{*}, M x^{*} ⟩ ∣ \leq ∣ ⟨ y, M y ⟩ ∣ + 2 ε ∥ M ∥$ , giving $∥ M ∥ \leq (1 - 2 ε)^{- 1} max_{x \in N} ∣ ⟨ x, M x ⟩ ∣$ . The use is that $S^{n - 1}$ has an $ε$ -net of size at most $(1 + 2/ ε)^{n}$ , so a union bound over $N$ of a sub-gaussian tail for each fixed quadratic form $⟨ x, M x ⟩$ controls the norm at the cost of an $e^{C n}$ factor that the tail must beat.

Exercise 5 (medium, short-answer).

Use the $ε$ -net bound to prove a non-asymptotic operator-norm estimate for an $n \times n$ symmetric matrix with independent sub-gaussian entries of variance $1/ n$ : with high probability $∥ M_{n} ∥_{op} \leq C$ for an absolute constant $C$ .

Hint

For fixed unit $x$ , $⟨ x, M_{n} x ⟩ = \sum_{ij} (M_{n})_{ij} x_{i} x_{j}$ is a sub-gaussian quadratic form with variance $O (1/ n) \cdot ∥ x ∥^{4} = O (1/ n)$ in the off-diagonal sum; use the Hanson-Wright tail and a union bound over a net of size $e^{C n}$ .

Answer

Fix a unit vector $x$ . The quadratic form $⟨ x, M_{n} x ⟩ = \sum_{i \leq j} c_{ij} (M_{n})_{ij}$ (with $c_{ij}$ collecting the $x_{i} x_{j}$ weights) is a linear combination of independent sub-gaussian entries, hence sub-gaussian with parameter of order $(\sum_{ij} (x_{i} x_{j})^{2} / n)^{1/2} = ∥ x ∥^{2} / n = 1/ n$ ; more carefully the Hanson-Wright inequality gives $P (∣ ⟨ x, M_{n} x ⟩ ∣ > t) \leq 2 exp (- c n min (t^{2}, t))$ for $t = O (1)$ . Take a $\frac{1}{4}$ -net $N$ with $∣ N ∣ \leq 9^{n}$ . Union bound: $P (max_{x \in N} ∣ ⟨ x, M_{n} x ⟩ ∣ > t) \leq 9^{n} \cdot 2 e^{- c n t^{2}}$ , which is $\leq e^{- c n}$ once $t = C$ is large enough that $c C^{2} > lo g 9 + c$ . On that event the net bound gives $∥ M_{n} ∥_{op} \leq 2 max_{x \in N} ∣ ⟨ x, M_{n} x ⟩ ∣ \leq 2 C$ . This proves $∥ M_{n} ∥_{op} = O (1)$ with probability $1 - e^{- c n}$ — a non-asymptotic bound, weaker in constant than the sharp edge value $2$ but valid for every $n$ and with an exponentially small failure probability the moment method does not directly provide.

Exercise 6 (hard, short-answer).

Prove the matching lower bound $lim inf_{n} λ_{m a x} (M_{n}) \geq 2$ almost surely cannot be improved, and show $λ_{m a x} \geq 2 - ε$ eventually, using the bulk semicircle law. Then explain where the finite fourth moment enters the Bai-Yin upper bound.

Hint

For the lower bound, count eigenvalues in $(2 - ε, 2]$ via $μ_{n} \Rightarrow μ_{sc}$ . For the fourth moment, consider the largest single entry $max_{ij} ∣ a_{ij} ∣/ n$ as a crude lower bound on the norm.

Answer

Lower bound. Since $μ_{n} \Rightarrow μ_{sc}$ almost surely 37.08.01 and $μ_{sc} ((2 - ε, 2]) = δ_{ε} > 0$ , the portmanteau theorem gives $lim inf_{n} μ_{n} ((2 - ε, 2)) \geq δ_{ε} > 0$ , so for large $n$ the count $n μ_{n} ((2 - ε, 2)) \geq \frac{1}{2} n δ_{ε} \geq 1$ eigenvalues lie above $2 - ε$ ; hence $λ_{m a x} (M_{n}) \geq 2 - ε$ eventually, for every $ε$ , giving $lim inf λ_{m a x} \geq 2$ a.s. Combined with the high-trace upper bound $lim sup λ_{m a x} \leq 2$ , the limit is exactly $2$ . The value cannot be improved because the bulk genuinely reaches the edge: the semicircle has support up to $2$ , not less. Fourth moment. The operator norm dominates the largest entry: $∥ M_{n} ∥_{op} \geq max_{i} ∣ (M_{n})_{ii} ∣$ and, via $⟨ e_{i}, M_{n} e_{j} ⟩$ test vectors, also $≳ max_{i \neq = j} ∣ a_{ij} ∣/ n$ . If $E [a_{ij}^{4}] = \infty$ , then $max_{ij} ∣ a_{ij} ∣$ over the $\sim n^{2} /2$ entries grows faster than $n$ — by Borel-Cantelli the event ${∣ a_{ij} ∣ > δ n}$ occurs infinitely often when $\sum_{n} n^{2} P (∣ a ∣ > δ n) = \infty$ , which fails exactly when the fourth moment is infinite — so $lim sup λ_{m a x} = \infty$ and convergence to $2$ breaks. Finiteness of the fourth moment is precisely the borderline that keeps $max_{ij} ∣ a_{ij} ∣ = o (n)$ a.s., which is what the Bai-Yin truncation argument needs to push the high-trace bound to almost-sure convergence.

Exercise 7 (hard, short-answer).

State the Tracy-Widom edge theorem precisely and explain, via the square-root vanishing of the semicircle density at the edge, why the fluctuation scale is $n^{2/3}$ rather than the bulk scale $n^{- 1}$ .

Hint

The number of eigenvalues within distance $δ$ of the edge is $n \int_{2 - δ}^{2} ρ_{sc} \sim n δ^{3/2}$ ; set this to order one.

Answer

Statement. For a Wigner matrix in symmetry class $β \in {1, 2}$ (real symmetric or complex Hermitian) with entries matching the Gaussian moments to fourth order, $n^{2/3} (λ_{m a x} (M_{n}) - 2) \Rightarrow TW_{β}$ , where $TW_{β}$ is the Tracy-Widom distribution, expressible for $β = 2$ through the Hastings-McLeod solution $q$ of Painlevé II ( $q^{''} = s q + 2 q^{3}$ ) via $F_{2} (s) = exp (- \int_{s}^{\infty} (x - s) q (x)^{2} d x)$ . Scaling heuristic. Near the edge $ρ_{sc} (x) \sim \frac{1}{π} 2 - x$ , so the expected number of eigenvalues in $[2 - δ, 2]$ is $n \int_{2 - δ}^{2} \frac{1}{π} 2 - x d x = n \cdot \frac{2}{3 π} δ^{3/2}$ . The top eigenvalue lives at the scale $δ$ where this count is of order one, i.e. $n δ^{3/2} \sim 1$ , giving $δ \sim n^{- 2/3}$ . The mean eigenvalue spacing at the edge is therefore $n^{- 2/3}$ , far coarser than the bulk spacing $n^{- 1}$ where the density is order one; the square-root softening of the density spreads the top eigenvalues out, producing the $n^{2/3}$ window and the non-Gaussian Tracy-Widom law in place of the Gaussian fluctuations seen for sums of independent scalars. This is the same edge that the operator-norm bound locates at $2$ and the resolvent 37.08.02 locates through the branch point of $z^{2} - 4$ .

Advanced results Master

The almost-sure operator-norm convergence under the optimal moment hypothesis is the Bai-Yin theorem ^{[Bai 1988]}: for a Wigner matrix with i.i.d. centred off-diagonal entries of unit variance, $λ_{m a x} (M_{n}) \to 2$ almost surely if and only if $E [a_{ij}^{4}] < \infty$ (and the corresponding diagonal moment is finite). The proof truncates each entry at level $δ_{n} n$ with $δ_{n} ↓ 0$ ; the finite fourth moment makes the truncation error negligible in operator norm and ensures $max_{ij} ∣ a_{ij} ∣ = o (n)$ almost surely, after which the high-trace bound with $k = k_{n} \sim lo g n$ produces a tail $P (λ_{m a x} > 2 + ε) \leq n^{- A}$ summable in $n$ , so Borel-Cantelli gives the almost-sure upper bound; the lower bound is the bulk argument. The necessity is sharp: if the fourth moment diverges, $max_{ij} ∣ a_{ij} ∣/ n$ does not vanish and the single largest entry forces $lim sup λ_{m a x} = \infty$ , so no finite limit exists.

The non-asymptotic theory gives bounds valid at every finite $n$ rather than only in the limit. For an $n \times n$ symmetric matrix with independent sub-gaussian entries of variance $1/ n$ , the $ε$ -net argument ^{[Vershynin 2018]} yields $P (∥ M_{n} ∥_{op} > C) \leq 2 e^{- c n}$ for absolute constants $C, c$ : cover $S^{n - 1}$ by a net of cardinality $e^{C n}$ , apply a sub-gaussian (Hanson-Wright) tail to each fixed quadratic form $⟨ x, M_{n} x ⟩$ , and union-bound. The constant $C$ this delivers is not sharp — it exceeds the true edge $2$ — but the bound is uniform in $n$ and carries an exponentially small failure probability, which the moment method does not provide directly. The two methods are complementary: the high-trace method gives the sharp constant $2$ but only a polynomial tail at fixed $k$ , while the net gives the sharp $e^{- c n}$ tail but a loose constant. Sharpening the net constant to $2$ requires a chaining (generic-chaining / Dudley) refinement that replaces the single-scale net by a hierarchy of nets.

The edge fluctuations are governed by the Tracy-Widom law $TW_{β}$ ^{[Tracy 1994]}, a distribution with no elementary closed form, defined through the Fredholm determinant of the Airy kernel $K_{Ai} (x, y) = \frac{Ai ( x ) Ai ^{'} ( y ) - Ai ^{'} ( x ) Ai ( y )}{x - y}$ on $[s, \infty)$ , equivalently through Painlevé II. The left tail of $TW_{β}$ decays like $exp (- c ∣ s ∣^{3})$ and the right tail like $exp (- c s^{3/2})$ , the asymmetry reflecting that pushing the top eigenvalue inward against the bulk's repulsion is harder than pulling it outward into empty spectrum. The same distribution governs the largest eigenvalue of the Gaussian ensembles exactly, and edge universality, proved by the Erdős-Schlein-Yau program and by Tao-Vu, extends it to all Wigner matrices whose entries match the Gaussian moments through fourth order. The $n^{2/3}$ rescaling and the Airy kernel both descend from the $(2 - x)^{1/2}$ softening of the semicircle at the edge: a soft edge of a density vanishing like a square root is exactly the universality class of the Airy point process.

The operator norm of non-Wigner ensembles follows the same architecture with a shifted edge. For sample-covariance (Wishart) matrices $S_{n} = \frac{1}{n} X X^{*}$ with aspect ratio $γ$ , the largest eigenvalue converges to the right edge $(1 + γ)^{2}$ of the Marchenko-Pastur law and fluctuates, after $n^{2/3}$ rescaling, according to Tracy-Widom; the high-trace and net methods both transfer, with the trace growth constant set by the Marchenko-Pastur edge rather than by $4$ . The principle is that the operator norm is the exponential growth rate of the even traces, and that growth rate is the square of the right edge of whatever limiting spectral distribution the ensemble has.

Synthesis. The foundational reason the operator norm converges to exactly $2$ is that $2$ is the exponential growth rate of the $2 k$ -th moments of the semicircle — the $4^{k}$ in the Catalan asymptotics is $(2^{2})^{k}$ — and the high-trace method extracts that rate by letting the exponent $k = k_{n} \to \infty$ slowly, so the bulk moment computation of 37.08.01 becomes an edge statement with no new combinatorics. Putting these together, the moment route, the $ε$ -net route, and the resolvent route of 37.08.02 are three roads to the same edge: the growth constant $4^{k}$ of the traces, the covering-number-versus-sub-gaussian-tail balance on the sphere, and the square-root branch point of $z^{2} - 4$ at $z = 2$ are the same edge seen three ways, and this is exactly the duality that the operator-norm bound, the net bound, and the analytic edge analysis share. The central insight is that the extreme eigenvalue is an exponential-growth-rate functional, so it is governed by the tail of the moment sequence and by the right endpoint of the support, not by the bulk; this generalises to every ensemble whose limiting law has a hard or soft edge, with the soft square-root edge carrying the universal $n^{2/3}$ Tracy-Widom fluctuation. The bridge to the frontier is that edge universality is the operator-norm bound made quantitative down to the fluctuation scale, the same selection principle that pinned the location now pinning the law, and this is dual to the bulk universality the local semicircle law makes quantitative.

Full proof set Master

The high-trace upper bound, the slowly-growing-exponent device, and the matching lower bound are proved in full above. The remaining Master claims are recorded here.

Proposition (operator norm equals the largest even-trace growth rate). For any Hermitian matrix $M$ , $∥ M ∥_{op} = lim_{k \to \infty} (tr M^{2 k})^{1/2 k}$ .

Proof. Let $r = ∥ M ∥_{op} = max_{i} ∣ λ_{i} ∣$ . Each term of $tr M^{2 k} = \sum_{i} λ_{i}^{2 k}$ is non-negative, so $r^{2 k} \leq tr M^{2 k} \leq n r^{2 k}$ , the lower bound from the single largest term and the upper from bounding all $n$ terms by the largest. Taking $2 k$ -th roots, $r \leq (tr M^{2 k})^{1/2 k} \leq n^{1/2 k} r$ , and $n^{1/2 k} \to 1$ as $k \to \infty$ , so the limit is $r$ . $□$

Proposition (covering number of the sphere). For $ε \in (0, 1)$ the unit sphere $S^{n - 1} \subset R^{n}$ admits an $ε$ -net $N$ with $∣ N ∣ \leq (1 + 2/ ε)^{n}$ .

Proof. Take a maximal $ε$ -separated subset $N \subseteq S^{n - 1}$ (it exists by Zorn or by greedy selection since the sphere is compact). Maximality forces $N$ to be an $ε$ -net: any sphere point within $ε$ of no net point could be added, contradicting maximality. The open balls of radius $ε /2$ about the points of $N$ are disjoint and contained in the shell ${x : 1 - ε /2 \leq ∥ x ∥ \leq 1 + ε /2}$ , which sits inside the ball of radius $1 + ε /2$ . Comparing volumes, $∣ N ∣ (ε /2)^{n} \leq (1 + ε /2)^{n}$ , so $∣ N ∣ \leq (1 + 2/ ε)^{n}$ . $□$

Proposition (net comparison for the operator norm). Let $M$ be Hermitian and $N$ an $ε$ -net of $S^{n - 1}$ with $ε < 1/2$ . Then $max_{x \in N} ∣ ⟨ x, M x ⟩ ∣ \leq ∥ M ∥_{op} \leq (1 - 2 ε)^{- 1} max_{x \in N} ∣ ⟨ x, M x ⟩ ∣$ .

Proof. The left inequality is immediate since each $∣ ⟨ x, M x ⟩ ∣ \leq ∥ M ∥_{op}$ . For the right, let $x^{*}$ with $∥ x^{*} ∥ = 1$ attain $∥ M ∥_{op} = ∣ ⟨ x^{*}, M x^{*} ⟩ ∣$ (a maximiser exists by compactness and the variational characterisation of the spectral norm for Hermitian $M$ ). Pick $y \in N$ with $∥ x^{*} - y ∥ = ∥ h ∥ \leq ε$ . By sesquilinearity, $$ \langle x^, Mx^\rangle - \langle y, My\rangle = \langle x^, Mx^\rangle - \langle x^* - h, M(x^* - h)\rangle = 2\operatorname{Re}\langle x^, Mh\rangle - \langle h, Mh\rangle, $$ so $|\langle x^, Mx^\rangle - \langle y, My\rangle|\le 2|M|,|x^|,|h| + |M|,|h|^2\le(2\varepsilon + \varepsilon^2)|M|\le 2\varepsilon|M| $f or$ \varepsilon < 1 $a f t er ab sor bin g t h e$ \varepsilon^2 $t er m . H e n ce$ |M| = |\langle x^, Mx^\rangle|\le|\langle y, My\rangle| + 2\varepsilon|M|\le\max_{x\in\mathcal N}|\langle x, Mx\rangle| + 2\varepsilon|M| $, an d r e a r r an g in g g i v es t h ec l aim .$ \square$

Proposition (high-trace tail with growing exponent). Under the moment hypothesis, with the trace bound $E tr M_{n}^{2 k} \leq n 4^{k}$ valid uniformly for $k \leq c n$ , the choice $k_{n} = ⌊(lo g n)^{2} ⌋$ gives $P (∥ M_{n} ∥_{op} > 2 + ε) \leq exp (- c_{ε} (lo g n)^{2})$ for all large $n$ , which is summable in $n$ .

Proof. By the Markov step in the Key theorem, $P (∥ M_{n} ∥_{op} > 2 + ε) \leq (2 + ε)^{- 2 k_{n}} n 4^{k_{n}} = n ρ^{k_{n}}$ with $ρ = 4/ (2 + ε)^{2} < 1$ . With $k_{n} = (lo g n)^{2}$ , $n ρ^{k_{n}} = exp (lo g n + (lo g n)^{2} lo g ρ) = exp (- ∣ lo g ρ ∣ (lo g n)^{2} + lo g n) \leq exp (- \frac{1}{2} ∣ lo g ρ ∣ (lo g n)^{2})$ for $n$ large enough that $(lo g n)^{2} ∣ lo g ρ ∣/2 > lo g n$ . Setting $c_{ε} = ∣ lo g ρ ∣/2$ gives the stated bound. Since $\sum_{n} exp (- c_{ε} (lo g n)^{2}) < \infty$ (the terms decay faster than $n^{- 2}$ for large $n$ ), Borel-Cantelli yields $∥ M_{n} ∥_{op} \leq 2 + ε$ eventually almost surely; intersecting over a sequence $ε ↓ 0$ gives $lim sup_{n} ∥ M_{n} ∥_{op} \leq 2$ a.s. under the full moment hypothesis. $□$

Proposition (edge count and the $n^{2/3}$ scale). The expected number of eigenvalues of $M_{n}$ in $[2 - δ, 2]$ is asymptotic to $\frac{2}{3 π} n δ^{3/2}$ , so the natural edge fluctuation scale solving $n δ^{3/2} ≍ 1$ is $δ ≍ n^{- 2/3}$ .

Proof. Using the semicircle density $ρ_{sc} (x) = \frac{1}{2 π} 4 - x^{2}$ and the edge expansion $4 - x^{2} = (2 - x) (2 + x) \sim 4 (2 - x)$ as $x ↑ 2$ , the expected proportion of eigenvalues in $[2 - δ, 2]$ is $\int_{2 - δ}^{2} ρ_{sc} (x) d x \sim \int_{2 - δ}^{2} \frac{1}{2 π} \cdot 2 2 - x d x = \frac{1}{π} \int_{0}^{δ} u d u = \frac{2}{3 π} δ^{3/2}$ . Multiplying by $n$ gives the count $\frac{2}{3 π} n δ^{3/2}$ . The top eigenvalue is resolved at the scale $δ$ where this count is of order one, namely $δ ≍ n^{- 2/3}$ , which is the width of the Tracy-Widom window. $□$

Connections Master

The Wigner semicircle law and the moment method 37.08.01 supplies both the trace-power machinery and the matching lower bound. The high-trace upper bound reuses the closed-walk expansion of $E tr M_{n}^{2 k}$ developed there, now with the exponent $k = k_{n} \to \infty$ instead of fixed, and the lower bound $λ_{m a x} \geq 2 - ε$ is read directly off the convergence $μ_{n} \Rightarrow μ_{sc}$ proved in that unit; the operator-norm statement is the edge refinement of which the semicircle law is the bulk statement.

The Stieltjes transform and the semicircle law via the resolvent 37.08.02 is the analytic dual of the edge analysis here. The branch point of $s_{sc} (z) = \frac{1}{2} (- z + z^{2} - 4)$ at $z = 2$ is the same spectral edge the operator-norm bound locates combinatorially, and the resolvent method, refined to imaginary parts of order $n^{- 2/3}$ , is what makes the Tracy-Widom edge fluctuation quantitative where the moment method only pins the location.

The strong law of large numbers and Borel-Cantelli 37.02.02 is the engine of the almost-sure Bai-Yin upgrade. The summable tail $P (∥ M_{n} ∥_{op} > 2 + ε) \leq exp (- c (lo g n)^{2})$ produced by the growing-exponent high-trace bound is exactly the input Borel-Cantelli needs to convert convergence in probability into almost-sure convergence of the largest eigenvalue.

The spectral concentration via log-Sobolev and the Herbst argument 37.08.07 underlies the $ε$ -net route and its sharpening. The Hanson-Wright quadratic-form tail bounding each $⟨ x, M_{n} x ⟩$ and the union bound over an exponentially large net are concentration statements, and the functional-inequality machinery of that unit gives the dimension-free Gaussian concentration of the operator norm around its mean; the net method trades the sharp constant of the moment method for the exponentially small failure probability that concentration provides.

Historical & philosophical context Master

The high-trace method for the operator norm was introduced by Zoltán Füredi and János Komlós in 1981 ^{[Füredi 1981]}, who refined Wigner's moment computation by letting the trace exponent grow with the matrix dimension and tracking the walk combinatorics uniformly, obtaining $∥ M_{n} ∥_{op} = 2 + o (1)$ with quantitative error terms for matrices with bounded entries. The sharp almost-sure result under the minimal hypothesis is due to Zhidong Bai and Yong-Qua Yin in 1988 ^{[Bai 1988]}, who proved that a finite fourth moment of the entries is both necessary and sufficient for $λ_{m a x} (M_{n}) \to 2$ almost surely, identifying the exact integrability threshold and closing the question of when the largest eigenvalue is asymptotically governed by the bulk edge rather than by a single anomalous entry.

The edge fluctuation law was found by Craig Tracy and Harold Widom in 1994 ^{[Tracy 1994]} for the Gaussian ensembles, who expressed the limiting distribution of $n^{2/3} (λ_{m a x} - 2)$ through the Airy kernel and a Painlevé II transcendent; the same distribution was subsequently identified governing the longest increasing subsequence of a random permutation by Baik, Deift and Johansson, and a wide family of models in the Kardar-Parisi-Zhang universality class. The $ε$ -net approach to non-asymptotic matrix norms, by contrast, grew out of geometric functional analysis and the local theory of Banach spaces, and was systematised for high-dimensional probability and data science by Roman Vershynin ^{[Vershynin 2018]}; edge universality, extending the Tracy-Widom law from the Gaussian ensembles to general Wigner matrices, was established by the Erdős-Schlein-Yau local-law program and by Tao and Vu in the late 2000s.

Bibliography Master

@article{furedikomlos1981,
  author  = {F\"uredi, Zolt\'an and Koml\'os, J\'anos},
  title   = {The eigenvalues of random symmetric matrices},
  journal = {Combinatorica},
  volume  = {1},
  number  = {3},
  pages   = {233--241},
  year    = {1981}
}

@article{baiyin1988,
  author  = {Bai, Zhidong and Yin, Yong-Qua},
  title   = {Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a {Wigner} matrix},
  journal = {Annals of Probability},
  volume  = {16},
  number  = {4},
  pages   = {1729--1741},
  year    = {1988}
}

@book{agz2010,
  author    = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
  title     = {An Introduction to Random Matrices},
  series    = {Cambridge Studies in Advanced Mathematics},
  volume    = {118},
  publisher = {Cambridge University Press},
  year      = {2010}
}

@book{vershynin2018,
  author    = {Vershynin, Roman},
  title     = {High-Dimensional Probability: An Introduction with Applications in Data Science},
  series    = {Cambridge Series in Statistical and Probabilistic Mathematics},
  publisher = {Cambridge University Press},
  year      = {2018}
}

@article{tracywidom1994,
  author  = {Tracy, Craig A. and Widom, Harold},
  title   = {Level-spacing distributions and the Airy kernel},
  journal = {Communications in Mathematical Physics},
  volume  = {159},
  number  = {1},
  pages   = {151--174},
  year    = {1994}
}

@book{tao2012,
  author    = {Tao, Terence},
  title     = {Topics in Random Matrix Theory},
  series    = {Graduate Studies in Mathematics},
  volume    = {132},
  publisher = {American Mathematical Society},
  year      = {2012}
}

Prerequisites

37.08.01

Tier anchors

beginner: Tao, Topics in Random Matrix Theory §2.3 (operator norm of a random matrix, the spectral edge); Vershynin, High-Dimensional Probability §4.4 (the epsilon-net heuristic for the norm); the physical picture of the eigenvalue histogram and its rightmost spike
intermediate: Tao, Topics in Random Matrix Theory §2.3 (operator norm via high moments); Vershynin, High-Dimensional Probability (Cambridge, 2018) §4.4 (net argument, sub-gaussian norm bound); Bai-Yin, Annals of Probability 16 (1988) (a.s. convergence of the largest eigenvalue)
master: Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.1.6 (operator norm, high-trace method); Tao, Topics in Random Matrix Theory (AMS GSM 132, 2012) §2.3; Bai-Yin, Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue, Annals of Probability 16 (1988), 1729-1741; Tracy-Widom, Commun. Math. Phys. 159 (1994), 151-174

References

Bai, Yin — Necessary and sufficient conditions for almost sure convergence of the largest eigenvalue of a Wigner matrix · Annals of Probability 16 (1988), 1729-1741 (a.s. convergence of the operator norm under a fourth-moment condition)
Füredi, Komlós — The eigenvalues of random symmetric matrices · Combinatorica 1 (1981), 233-241 (high-trace / moment method for the operator-norm bound)
Anderson, Guionnet, Zeitouni — An Introduction to Random Matrices · Cambridge University Press, 2010, §2.1.6 (operator-norm upper bound via growing even traces)
Vershynin — High-Dimensional Probability: An Introduction with Applications in Data Science · Cambridge University Press, 2018, §4.4 (epsilon-net argument, non-asymptotic norm bounds for sub-gaussian matrices)
Tracy, Widom — Level-spacing distributions and the Airy kernel · Communications in Mathematical Physics 159 (1994), 151-174 (Tracy-Widom edge fluctuation law)
Tao — Topics in Random Matrix Theory · AMS Graduate Studies in Mathematics 132, 2012, §2.3 (operator norm, high moments, edge)

Estimated time

beginner: 20m
intermediate: 55m
master: 95m