37.08.07 · probability / 08-random-matrices

Spectral Concentration: Log-Sobolev and the Herbst Argument

shipped3 tiersLean: none

Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.3, §4.4; Ledoux, The Concentration of Measure Phenomenon (AMS Surveys 89, 2001) Ch. 5; Boucheron-Lugosi-Massart, Concentration Inequalities (Oxford, 2013); Guionnet-Zeitouni, Concentration of the spectral measure for large matrices, Electron. Commun. Probab. 5 (2000)

Intuition Beginner

Imagine you build a large random symmetric grid of numbers, read off one summary statistic of its eigenvalues — say the average of all of them, or the size of the largest one — and write the answer down. Then you throw the whole grid away, build a fresh independent one of the same size, and measure the same statistic again. You might expect two independent random experiments to give noticeably different answers. The remarkable fact is that for a big grid they barely differ: the number you measure is almost the same every time, locked near one fixed value, with only tiny wobble. The statistic is self-averaging.

This is the concentration of measure phenomenon. The reason it happens is that the summary statistic depends on each of the many random entries only a little. A function of a great many independent inputs, none of which can swing the answer much on its own, is overwhelmingly likely to sit close to its average value. The more inputs there are, the tighter the answer clings to that average, and the chance of a large deviation shrinks faster than any power — it shrinks like a bell-curve tail.

There is a clean engine behind this for bell-curve entries. A single inequality about Gaussian draws, fed through a short argument, shows that any statistic which does not change much when you nudge its inputs cannot stray far from its average. That engine is what lets us prove the histogram of eigenvalues is essentially deterministic for large grids.

Visual Beginner

Picture a dartboard view of the same measurement repeated many times. The horizontal axis is the value of one eigenvalue statistic; the vertical axis counts how often each value came up across many independent random grids. For a small grid the darts scatter widely. For a large grid they cluster into a narrow spike centred on the average, and the spike gets narrower as the grid grows.

The table below shows the qualitative trend: a 1-Lipschitz statistic (one no single entry can move much) measured on grids of growing size $n$ , with the typical spread of its values shrinking toward zero.

Grid size $n$	Typical spread of the statistic	Tail beyond spread
small	wide	fat, polynomial
medium	moderate	thinning
large	narrow, $\sim 1/ n$	bell-curve thin
very large	tiny	essentially deterministic

The picture is the visual content of self-averaging: the spread of a Lipschitz spectral statistic collapses as $n$ grows, and the leftover wobble has a thin, bell-curve-style tail rather than a fat one.

Worked example Beginner

We check, with concrete numbers, that one eigenvalue cannot move far when we nudge a single matrix entry — the property that drives concentration.

Step 1. Take the two-by-two symmetric grid with diagonal entries $3$ and $1$ and off-diagonal entry $0$ . Being diagonal, its eigenvalues are just $3$ and $1$ . The largest eigenvalue is $3$ .

Step 2. Now nudge the off-diagonal entry from $0$ up to a small value $b = 0.2$ , leaving the diagonal alone. The grid is now $(3 0.2 0.2 1)$ . Its eigenvalues are the two numbers $\frac{3 + 1}{2} \pm (\frac{3 - 1}{2})^{2} + 0. 2^{2} = 2 \pm 1 + 0.04 = 2 \pm 1.04$ .

Step 3. Compute the square root: $1.04 \approx 1.0198$ . So the eigenvalues are about $3.0198$ and $0.9802$ . The largest moved from $3$ to about $3.0198$ .

Step 4. Compare the change in the answer to the change in the input. The input moved by $0.2$ . The largest eigenvalue moved by about $0.0198$ , which is much less than $0.2$ . The answer changed by less than the nudge.

Step 5. What this tells us: the largest eigenvalue is a 1-Lipschitz function of the entries in the right distance — when you measure the size of the nudge using the square-root-of-sum-of-squares distance on the entries, no eigenvalue ever moves by more than that size. Each single entry has limited leverage on the answer, and that limited leverage, summed over many independent entries, is exactly what forces the answer to concentrate near its average.

Check your understanding Beginner

Exercise (easy, multiple choice).

"Concentration of measure" for an eigenvalue statistic of a large random grid means that the statistic:

A. Grows without bound as the grid grows
B. Takes a nearly fixed value, with vanishing spread, across independent grids
C. Becomes more random and harder to predict as the grid grows
D. Equals zero for every large grid

Hint

Recall the dartboard picture: as the grid grows, the darts cluster into a narrow spike.

Answer

B. Concentration means the statistic is self-averaging: it clusters tightly around one value (its mean), and the spread shrinks as the grid grows. Feedback-correct: this near-deterministic behaviour is exactly what lets the eigenvalue histogram have a fixed limiting shape. Feedback-wrong: A confuses concentration with divergence; C is the opposite of what happens; D is false — the value is fixed but generally nonzero.

Formal definition Intermediate+

Fix the space of $n \times n$ real symmetric (or complex Hermitian) matrices, identified with $R^{N}$ for the appropriate $N$ via the independent entries, and equipped with the Frobenius (Hilbert-Schmidt) norm $∥ A ∥_{2} = (\sum_{i, j} ∣ A_{ij} ∣^{2})^{1/2} = (tr A^{*} A)^{1/2}$ . Order the eigenvalues $λ_{1} (A) \leq \dots \leq λ_{n} (A)$ [the spectral setup is from 37.08.01]. A function $F : R^{N} \to R$ is $L$ -Lipschitz if $∣ F (x) - F (y) ∣ \leq L ∥ x - y ∥_{2}$ for all $x, y$ .

Two notions of Lipschitz spectral statistic recur. The largest eigenvalue $A \mapsto λ_{n} (A)$ , and more generally each $A \mapsto λ_{i} (A)$ , is $1$ -Lipschitz in the Frobenius norm; this is the content of the Hoffman-Wielandt inequality below. The linear eigenvalue statistic $A \mapsto \frac{1}{n} \sum_{i = 1}^{n} f (λ_{i} (A)) = \int f d μ_{A}$ , where $μ_{A}$ is the empirical spectral distribution and $f$ is a fixed test function, is $\frac{∥ f ∥ _{Lip}}{n}$ -Lipschitz when $f$ is Lipschitz — the $1/ n$ prefactor is the source of the sharp $n$ -dependence in its fluctuations.

A probability measure $μ$ on $R^{N}$ satisfies a logarithmic Sobolev inequality (LSI) with constant $c$ if for every smooth $g$ with $\int g^{2} d μ = 1$ , $$ \mathrm{Ent}\mu(g^2) := \int g^2 \log g^2, d\mu ;\le; 2c \int |\nabla g|2^2, d\mu , $$ where the relative entropy functional is $\mathrm{Ent}\mu(h) = \int h \log h, d\mu - \big(\int h, d\mu\big)\log\big(\int h, d\mu\big) $f or$ h \ge 0 $. T h es t an d a r d G a u ss ianm e a s u r e$ \gamma $o n$ \mathbb{R}^N $s a t i s f i es L S I w i t h co n s t an t$ c = 1 $. A m e a s u r e$ \mu $* * s a t i s f i es$ T $- s u b - G a u ss ian co n ce n t r a t i o n w i t h co n s t an t$ \sigma^2 $* * i f f or e v er y$ 1 $- L i p sc hi t z$ F$, $$ \mathbb{P}\big(|F - \mathbb{E}\mu F| \ge t\big) \le 2\exp!\Big(-\frac{t^2}{2\sigma^2}\Big), \qquad t \ge 0 . $$ The Herbst argument is the implication LSI $(c)$ $\Rightarrow$ sub-Gaussian concentration with $σ^{2} = c$ .

Counterexamples to common slips Intermediate+

Lipschitz in Frobenius, not in operator norm, is what the eigenvalue map satisfies cleanly. The bound $∣ λ_{i} (A) - λ_{i} (B) ∣ \leq ∥ A - B ∥_{op}$ (Weyl) is also true, but for concentration from independent entries the Frobenius geometry is the right one, because the Gaussian/product measure on entries is isotropic in $∥ \cdot ∥_{2}$ , not in $∥ \cdot ∥_{op}$ .
Concentration is around the mean (or median), not around the limiting value. LSI gives $P (∣ F - E F ∣ \geq t)$ small; identifying $E F$ with a deterministic limit (e.g. $E λ_{n} \to 2$ ) is a separate computation. Confusing the two conflates fluctuation control with a law of large numbers.
The $1/ n$ prefactor for linear statistics is not optional. The map $A \mapsto \frac{1}{n} \sum f (λ_{i})$ has Lipschitz constant $∥ f ∥_{Lip} / n$ , giving tails $exp (- c n t^{2})$ . Forgetting the $1/ n$ inside the sum gives $\frac{1}{n} \sum f (λ_{i})$ , an $O (1)$ -Lipschitz object that does not concentrate at the same rate.
LSI is strictly stronger than a Poincaré (spectral-gap) inequality. Poincaré controls the variance and gives only exponential (not sub-Gaussian) tails for Lipschitz functions; LSI is needed for the Gaussian-square tail. A bounded-support entry distribution may fail LSI yet still concentrate via the bounded-differences route.

Key theorem with proof Intermediate+

Theorem (Herbst argument: LSI implies sub-Gaussian concentration). Let $μ$ be a probability measure on $R^{N}$ satisfying a logarithmic Sobolev inequality with constant $c$ . Let $F : R^{N} \to R$ be $1$ -Lipschitz with respect to the Euclidean norm. Then $F$ is integrable, and for every $λ \in R$ , $$ \mathbb{E}\mu\big[e^{\lambda (F - \mathbb{E}\mu F)}\big] \le \exp!\Big(\frac{c,\lambda^2}{2}\Big), $$ and consequently $P (∣ F - E_{μ} F ∣ \geq t) \leq 2 exp (- t^{2} / (2 c))$ for all $t \geq 0$ .

Proof. Assume first that $F$ is bounded and smooth with $∥\nabla F ∥_{2} \leq 1$ everywhere; the general $1$ -Lipschitz case follows by mollification and truncation, since the bound is preserved under the limits. Apply the LSI to $g = e^{λ F /2}$ , so that $g^{2} = e^{λ F}$ and $\nabla g = \frac{λ}{2} e^{λ F /2} \nabla F$ , giving $∥\nabla g ∥_{2}^{2} = \frac{λ ^{2}}{4} e^{λ F} ∥\nabla F ∥_{2}^{2} \leq \frac{λ ^{2}}{4} e^{λ F}$ . Define the Laplace transform $H (λ) = E_{μ} [e^{λ F}]$ . The LSI reads $$ \mathbb{E}\mu\big[\lambda F, e^{\lambda F}\big] - H(\lambda)\log H(\lambda) = \mathrm{Ent}\mu(e^{\lambda F}) \le 2c\cdot \frac{\lambda^2}{4},\mathbb{E}\mu\big[e^{\lambda F}\big] = \frac{c\lambda^2}{2}H(\lambda). $$ Recognise the left side through $H'(\lambda) = \mathbb{E}\mu[F e^{\lambda F}] $: t h e b r a c k e t i s$ \lambda H'(\lambda) - H(\lambda)\log H(\lambda) $. D i v i d e b y$ \lambda^2 H(\lambda) > 0 $an d se t$ K(\lambda) = \tfrac{1}{\lambda}\log H(\lambda)$. A direct computation gives $$ K'(\lambda) = \frac{\lambda H'(\lambda) - H(\lambda)\log H(\lambda)}{\lambda^2 H(\lambda)} \le \frac{c}{2}. $$ As $λ \to 0$ , $K (λ) = \frac{1}{λ} lo g (1 + λ E_{μ} F + O (λ^{2})) \to E_{μ} F$ . Integrating $K^{'} \leq c /2$ from $0$ to $λ$ yields $K (λ) \leq E_{μ} F + \frac{c λ}{2}$ for $λ > 0$ , i.e. $lo g H (λ) \leq λ E_{μ} F + \frac{c λ ^{2}}{2}$ ; the same integration on $[λ, 0]$ handles $λ < 0$ . Subtracting $λ E_{μ} F$ gives the stated Laplace bound on $F - E_{μ} F$ .

For the tail, Markov's inequality applied to $e^{λ (F - E F)}$ gives, for $t, λ > 0$ , $P (F - E F \geq t) \leq e^{- λ t} E [e^{λ (F - E F)}] \leq exp (\frac{c λ ^{2}}{2} - λ t)$ . Optimising over $λ$ at $λ = t / c$ gives $exp (- t^{2} / (2 c))$ . Applying the same to $- F$ (also $1$ -Lipschitz) and summing the two one-sided bounds yields the two-sided tail. $□$

Bridge. This argument builds toward every Gaussian-entry spectral concentration estimate in random matrix theory and appears again in the linear-statistic and largest-eigenvalue bounds below, where the only extra ingredient is the Lipschitz constant of the relevant spectral map. The foundational reason the proof works is the differential inequality $K^{'} \leq c /2$ for $K (λ) = λ^{- 1} lo g E e^{λ F}$ : entropy controls the derivative of the free energy, and integrating that control is exactly the passage from a local smoothness statement (LSI) to a global tail statement. This is exactly the entropy method, and it generalises the classical Chernoff bound — there one bounds the cumulant generating function by hand for a specific distribution, here LSI bounds it uniformly for every Lipschitz observable at once. Putting these together, the Herbst argument is dual to the moment method of 37.08.01: moments identify the limiting shape of the spectrum, while the Herbst tail certifies that each spectral statistic sticks to its mean, and the central insight is that low coordinate-sensitivity plus a log-Sobolev inequality is enough to force sub-Gaussian self-averaging.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Show that the linear eigenvalue statistic $A \mapsto \frac{1}{n} \sum_{i = 1}^{n} f (λ_{i} (A))$ is $\frac{∥ f ∥ _{Lip}}{n}$ -Lipschitz in the Frobenius norm, and deduce the tail bound for Gaussian entries.

Hint

Combine the duality $\sum_{i} (f (λ_{i} (A)) - f (λ_{i} (B))) \leq ∥ f ∥_{Lip} \sum_{i} ∣ λ_{i} (A) - λ_{i} (B) ∣$ with Cauchy-Schwarz and Hoffman-Wielandt.

Answer

Write $Φ (A) = \frac{1}{n} \sum_{i} f (λ_{i} (A))$ . Then $∣Φ (A) - Φ (B) ∣ \leq \frac{∥ f ∥ _{Lip}}{n} \sum_{i} ∣ λ_{i} (A) - λ_{i} (B) ∣$ . By Cauchy-Schwarz, $\sum_{i} ∣ λ_{i} (A) - λ_{i} (B) ∣ \leq n (\sum_{i} (λ_{i} (A) - λ_{i} (B))^{2})^{1/2} \leq n ∥ A - B ∥_{2}$ using Hoffman-Wielandt. Hence $∣Φ (A) - Φ (B) ∣ \leq \frac{∥ f ∥ _{Lip}}{n} n ∥ A - B ∥_{2} = \frac{∥ f ∥ _{Lip}}{n} ∥ A - B ∥_{2}$ , so $Φ$ is $∥ f ∥_{Lip} / n$ -Lipschitz. If the entries form a standard Gaussian (LSI with $c$ of order one after entry scaling), Herbst with $L = ∥ f ∥_{Lip} / n$ gives $P (∣Φ - E Φ∣ \geq t) \leq 2 exp (- \frac{n t ^{2}}{2 c ∥ f ∥ _{Lip}^{2}})$ , the speed- $n$ concentration of the empirical spectral distribution against a fixed test function.

Exercise 4 (medium, symbolic).

Prove the tensorisation of LSI: if each $μ_{i}$ on $R$ satisfies LSI with constant $c$ , then the product $μ = μ_{1} \otimes \dots \otimes μ_{N}$ satisfies LSI with the same constant $c$ .

Hint

Use the subadditivity of entropy: $Ent_{μ} (g^{2}) \leq \sum_{i} \int Ent_{μ_{i}} (g^{2}) d μ$ , where the inner entropy is taken in the $i$ -th variable with the others fixed.

Answer

Subadditivity of entropy for product measures gives $Ent_{μ} (g^{2}) \leq \sum_{i = 1}^{N} \int Ent_{μ_{i}} (g^{2} (\cdot)) d μ$ , where $Ent_{μ_{i}}$ acts on the $i$ -th coordinate with the rest frozen. Apply the one-dimensional LSI to each inner term: $Ent_{μ_{i}} (g^{2}) \leq 2 c \int (\partial_{i} g)^{2} d μ_{i}$ . Summing and integrating, $Ent_{μ} (g^{2}) \leq 2 c \sum_{i} \int (\partial_{i} g)^{2} d μ = 2 c \int ∥\nabla g ∥_{2}^{2} d μ$ . The constant $c$ is unchanged by the dimension $N$ — this dimension-free stability is precisely why LSI yields $n$ -independent (per-coordinate) concentration and is the structural advantage of LSI over a naive union bound.

Exercise 5 (medium, symbolic).

Derive the variance bound (a Poincaré inequality) from LSI by linearising: applying LSI to $g = 1 + ε h$ and letting $ε \to 0$ , show $Var_{μ} (h) \leq c \int ∥\nabla h ∥_{2}^{2} d μ$ .

Hint

Expand $Ent_{μ} ((1 + ε h)^{2})$ to second order in $ε$ ; the leading nonzero term is $2 ε^{2} Var_{μ} (h)$ .

Answer

Put $g = 1 + ε h$ with $\int h d μ = 0$ (subtract the mean; entropy is translation-blind in this sense). Then $g^{2} = 1 + 2 ε h + ε^{2} h^{2}$ , and a Taylor expansion of $u lo g u$ about $u = 1$ gives $Ent_{μ} (g^{2}) = 2 ε^{2} Var_{μ} (h) + O (ε^{3})$ , using $\int g^{2} = 1 + ε^{2} E h^{2}$ and $\int g^{2} lo g g^{2} = 2 ε E h + ε^{2} (2 E h^{2}) + O (ε^{3})$ . Meanwhile $\int ∥\nabla g ∥_{2}^{2} = ε^{2} \int ∥\nabla h ∥_{2}^{2}$ . The LSI $Ent_{μ} (g^{2}) \leq 2 c \int ∥\nabla g ∥_{2}^{2}$ becomes $2 ε^{2} Var_{μ} (h) + O (ε^{3}) \leq 2 c ε^{2} \int ∥\nabla h ∥_{2}^{2}$ . Dividing by $2 ε^{2}$ and letting $ε \to 0$ gives $Var_{μ} (h) \leq c \int ∥\nabla h ∥_{2}^{2} d μ$ , the Poincaré inequality. Thus LSI implies Poincaré with the same constant; the converse fails, which is why LSI delivers the strictly thinner sub-Gaussian tail.

Exercise 6 (hard, short-answer).

Prove the bounded-differences (McDiarmid) inequality and explain how it gives concentration of $λ_{n} (M_{n})$ for bounded non-Gaussian entries without any LSI.

Hint

Build the Doob martingale $D_{k} = E [F ∣ X_{1}, \dots, X_{k}]$ and apply the Azuma-Hoeffding inequality to its increments, bounded by the per-coordinate difference $c_{k}$ .

Answer

Let $F (X_{1}, \dots, X_{m})$ satisfy the bounded-difference condition $sup_{x, x_{k}^{'}} ∣ F (\dots, x_{k}, \dots) - F (\dots, x_{k}^{'}, \dots) ∣ \leq c_{k}$ for each coordinate $k$ , with the $X_{k}$ independent. Form the Doob martingale $D_{k} = E [F ∣ X_{1}, \dots, X_{k}]$ , so $D_{0} = E F$ and $D_{m} = F$ . Each increment $D_{k} - D_{k - 1}$ is bounded in range by $c_{k}$ (averaging over $X_{k}$ cannot exceed the worst single-coordinate swing), so by the Azuma-Hoeffding inequality $P (F - E F \geq t) \leq exp (- 2 t^{2} / \sum_{k} c_{k}^{2})$ , and the two-sided version follows. For $λ_{n} (M_{n}) = λ_{n} (n^{- 1/2} A_{n})$ with entries bounded by $K$ , changing one off-diagonal pair perturbs $A_{n}$ in Frobenius norm by at most $22 K$ , hence $λ_{n}$ by at most $c_{k} = O (K / n)$ via Hoffman-Wielandt; with $\sim n^{2} /2$ coordinates, $\sum_{k} c_{k}^{2} = O (K^{2})$ , giving $P (∣ λ_{n} - E λ_{n} ∣ \geq t) \leq 2 exp (- c t^{2} / K^{2})$ . This is the non-Gaussian route: it needs only boundedness and independence, never a log-Sobolev inequality, though it loses the optimal constants the entropy method achieves.

Exercise 7 (hard, short-answer).

Explain Talagrand's convex-distance concentration and why it is the right tool for $λ_{n} (M_{n})$ when entries are bounded but the bounded-differences constants $c_{k}$ are individually large.

Hint

McDiarmid charges $\sum_{k} c_{k}^{2}$ ; Talagrand's convex-Lipschitz inequality charges only the Lipschitz constant of a convex function on a product of bounded intervals, decoupling from the number of coordinates.

Answer

Talagrand's inequality states that for independent coordinates in a product of intervals of length $\leq 1$ and a $1$ -Lipschitz convex function $F$ with median $M_{F}$ , $P (∣ F - M_{F} ∣ \geq t) \leq 4 exp (- t^{2} /4)$ — a dimension-free sub-Gaussian bound depending only on the Lipschitz constant, not on $\sum_{k} c_{k}^{2}$ . The largest eigenvalue $λ_{n} (A) = sup_{∥ v ∥ = 1} v^{*} A v$ is a supremum of linear functions of the entries, hence convex, and $1$ -Lipschitz in Frobenius norm by Hoffman-Wielandt. So Talagrand applies directly to $λ_{n}$ on bounded-entry ensembles and delivers $P (∣ λ_{n} - M_{λ_{n}} ∣ \geq t) \leq 4 exp (- c n t^{2})$ after the $1/ n$ scaling, sharper than McDiarmid when many coordinates each move $λ_{n}$ a little but their squared budget $\sum_{k} c_{k}^{2}$ would otherwise be charged in full. Convexity is the essential extra hypothesis that buys the decoupling from coordinate count and matches the Gaussian-LSI rate.

Advanced results Master

The Gaussian ensembles satisfy a dimension-free log-Sobolev inequality at the level of the entries, and this is the cleanest source of spectral concentration. For the Gaussian Orthogonal and Unitary Ensembles the entries are independent centred Gaussians, the joint law is a standard Gaussian on $R^{N}$ after scaling, and Gross's theorem supplies LSI with constant of order one; tensorisation keeps the constant dimension-free. Composing with the Frobenius-Lipschitz bounds of the spectral maps, the Herbst argument yields: $P (∣ λ_{n} (M_{n}) - E λ_{n} (M_{n}) ∣ \geq t) \leq 2 e^{- c n t^{2}}$ for the largest eigenvalue, and $P (∣ \int f d μ_{n} - E \int f d μ_{n} ∣ \geq t) \leq 2 e^{- c n^{2} t^{2} /∥ f ∥_{Lip}^{2}}$ for a linear statistic against a Lipschitz $f$ . The speed- $n^{2}$ in the second bound — versus speed- $n$ for a single eigenvalue — records that a smoothed statistic averages $n$ eigenvalues and so concentrates far more tightly; this is the quantitative form of self-averaging of the empirical spectral distribution.

The concentration is around the mean, and converting it into convergence of $μ_{n}$ to the semicircle requires separately controlling $E \int f d μ_{n} \to \int f d μ_{sc}$ , supplied by the moment method 37.08.01. The two pieces combine cleanly: the deterministic mean tracks the semicircle while the fluctuation is sub-Gaussian at speed $n^{2}$ , so the Borel-Cantelli lemma gives almost-sure weak convergence $μ_{n} \Rightarrow μ_{sc}$ without ever computing a variance by hand — the concentration inequality replaces the combinatorial variance bound used in 37.08.01 and is robust to relaxing the finite-moment hypothesis to finite second moment plus a Lipschitz observable.

For entries that are not Gaussian, three routes recover concentration. First, the bounded-differences (McDiarmid) inequality via a Doob martingale needs only boundedness and independence, charging $\sum_{k} c_{k}^{2}$ ; it gives $λ_{n}$ concentration at speed $n$ but with suboptimal constants. Second, Talagrand's convex-distance inequality exploits that $λ_{n} (A) = sup_{∥ v ∥ = 1} v^{*} A v$ is a convex Lipschitz function of the entries and delivers a dimension-free sub-Gaussian bound on bounded product spaces, matching the Gaussian rate. Third, the entropy method of Boucheron-Lugosi-Massart builds a modified log-Sobolev inequality directly from the bounded differences of $F$ and reproduces the Herbst conclusion combinatorially. A measure satisfying LSI is automatically sub-Gaussian; the converse holds for bounded perturbations of Gaussians via the Holley-Stroock perturbation lemma, which multiplies the LSI constant by $e^{osc V}$ for a bounded potential perturbation $V$ .

The sharpest statements concern fluctuations rather than tails. While concentration shows $\int f d μ_{n} - E \int f d μ_{n}$ is $O (1/ n)$ in size, the centred linear statistic $\sum_{i} f (λ_{i}) - E \sum_{i} f (λ_{i})$ — note: no $1/ n$ — converges to a Gaussian with an $O (1)$ variance and no $n$ normalisation, a central limit theorem without the usual scaling, reflecting the strong eigenvalue rigidity that concentration first signals. The Herbst tail is the soft, non-asymptotic shadow of this rigidity: concentration is what makes the limiting Gaussian fluctuation a small correction to an essentially frozen spectrum.

Synthesis. The foundational reason a single inequality organises spectral concentration is that the log-Sobolev inequality controls the derivative of the free energy $λ^{- 1} lo g E e^{λ F}$ uniformly over all Lipschitz observables, and the Herbst integration turns that local control into a global sub-Gaussian tail; this is exactly the entropy method, and it is dual to the moment method of 37.08.01 in the precise sense that moments pin the mean of each spectral statistic while Herbst pins its fluctuation around that mean. Putting these together, the three pillars — Hoffman-Wielandt making the eigenvalue map $1$ -Lipschitz, the LSI of the Gaussian entry law, and the Herbst argument converting LSI to tails — compose into a single pipeline, and the $1/ n$ Lipschitz constant of a linear statistic versus the $1$ of a single eigenvalue is exactly what generalises one self-averaging rate into two, speed $n$ and speed $n^{2}$ . The central insight is that concentration is a statement about coordinate-sensitivity: a spectral functional with small Lipschitz constant in the Frobenius geometry of an LSI measure cannot stray from its mean, and this is the bridge from the deterministic limiting shape of the spectrum to the near-deterministic behaviour of every individual large matrix, which the non-Gaussian Talagrand and bounded-differences routes show is robust far beyond the Gaussian case where the constants are sharpest.

Full proof set Master

The Herbst Laplace-transform bound, its tail corollary, tensorisation, the Poincaré linearisation, the bounded-differences inequality, and the linear-statistic Lipschitz constant are proved in full above. The remaining Master claims are recorded here.

Proposition (Hoffman-Wielandt inequality). For Hermitian $A, B$ with eigenvalues $λ_{1} (A) \leq \dots \leq λ_{n} (A)$ and $λ_{1} (B) \leq \dots \leq λ_{n} (B)$ , $\sum_{i = 1}^{n} (λ_{i} (A) - λ_{i} (B))^{2} \leq ∥ A - B ∥_{2}^{2}$ .

Proof. Diagonalise $A = U diag (α) U^{*}$ and $B = V diag (β) V^{*}$ with $α, β$ increasing. Then $∥ A - B ∥_{2}^{2} = ∥ diag (α) - W diag (β) W^{*} ∥_{2}^{2}$ with $W = U^{*} V$ unitary. Expanding, $∥ A - B ∥_{2}^{2} = \sum_{i} α_{i}^{2} + \sum_{j} β_{j}^{2} - 2 \sum_{i, j} ∣ W_{ij} ∣^{2} α_{i} β_{j}$ . The matrix $P_{ij} = ∣ W_{ij} ∣^{2}$ is doubly stochastic, so $\sum_{i, j} P_{ij} α_{i} β_{j}$ is a convex combination of values $\sum_{i} α_{i} β_{π (i)}$ over permutations $π$ (Birkhoff's theorem: doubly stochastic matrices are convex combinations of permutation matrices). Maximising the bilinear form over permutations is achieved when $α$ and $β$ are sorted the same way (the rearrangement inequality), i.e. by the identity permutation given both are increasing. Hence $\sum_{i, j} P_{ij} α_{i} β_{j} \leq \sum_{i} α_{i} β_{i}$ , so $∥ A - B ∥_{2}^{2} \geq \sum_{i} (α_{i}^{2} + β_{i}^{2} - 2 α_{i} β_{i}) = \sum_{i} (α_{i} - β_{i})^{2}$ . $□$

Proposition (Gaussian log-Sobolev inequality, Gross). The standard Gaussian measure $γ$ on $R^{N}$ satisfies LSI with constant $c = 1$ : $Ent_{γ} (g^{2}) \leq 2 \int ∥\nabla g ∥_{2}^{2} d γ$ .

Proof. By tensorisation it suffices to take $N = 1$ . Use the Ornstein-Uhlenbeck semigroup $P_{t}$ with generator $L = Δ - x \cdot \nabla$ and stationary measure $γ$ , satisfying the commutation $\nabla P_{t} = e^{- t} P_{t} \nabla$ . For $h = g^{2} \geq 0$ with $\int h d γ = 1$ , write $Ent_{γ} (h) = - \int_{0}^{\infty} \frac{d}{d t} \int P_{t} h lo g P_{t} h d γ d t$ , using $P_{0} h = h$ and $P_{\infty} h = 1$ . Differentiating and integrating by parts gives $\frac{d}{d t} \int P_{t} h lo g P_{t} h = - \int \frac{∥\nabla P _{t} h ∥ ^{2}}{P _{t} h} d γ$ . The commutation and Cauchy-Schwarz bound $∥\nabla P_{t} h ∥^{2} \leq e^{- 2 t} (P_{t} ∥\nabla h ∥)^{2} \leq e^{- 2 t} P_{t} h \cdot P_{t} \frac{∥\nabla h ∥ ^{2}}{h}$ , so $\int \frac{∥\nabla P _{t} h ∥ ^{2}}{P _{t} h} d γ \leq e^{- 2 t} \int \frac{∥\nabla h ∥ ^{2}}{h} d γ$ . Integrating $e^{- 2 t}$ over $t \geq 0$ yields $Ent_{γ} (h) \leq \frac{1}{2} \int \frac{∥\nabla h ∥ ^{2}}{h} d γ$ . Substituting $h = g^{2}$ , $\frac{∥\nabla h ∥ ^{2}}{h} = 4∥\nabla g ∥^{2}$ , gives $Ent_{γ} (g^{2}) \leq 2 \int ∥\nabla g ∥^{2} d γ$ . $□$

Proposition (Holley-Stroock perturbation). If $μ$ satisfies LSI with constant $c$ and $d \tilde{μ} = Z^{- 1} e^{- V} d μ$ with $V$ bounded and oscillation $osc (V) = sup V - in f V$ , then $\tilde{μ}$ satisfies LSI with constant $c e^{osc (V)}$ .

Proof. For the entropy with respect to $\tilde{μ}$ , the variational formula $Ent_{\tilde{μ}} (g^{2}) = in f_{a > 0} \int (g^{2} lo g \frac{g ^{2}}{a} - g^{2} + a) d \tilde{μ}$ shows the integrand is non-negative and pointwise comparable across $μ, \tilde{μ}$ up to the density ratio $e^{- V} / Z \in [e^{- s u p V} / Z, e^{- i n f V} / Z]$ . Hence $Ent_{\tilde{μ}} (g^{2}) \leq \frac{e ^{- i n f V}}{Z} Ent_{μ} (g^{2}) \cdot$ (correction) and $\int ∥\nabla g ∥^{2} d \tilde{μ} \geq \frac{e ^{- s u p V}}{Z} \int ∥\nabla g ∥^{2} d μ$ ; taking the worst-case ratio of the two density bounds multiplies the LSI constant by $e^{s u p V - i n f V} = e^{osc (V)}$ . $□$

Proposition (concentration of the empirical spectral distribution). For a Gaussian Wigner ensemble and $1$ -Lipschitz $f$ , $P (∣ \int f d μ_{n} - E \int f d μ_{n} ∣ \geq t) \leq 2 exp (- c n^{2} t^{2})$ , and consequently $\int f d μ_{n} \to \int f d μ_{sc}$ almost surely.

Proof. The map $A \mapsto \int f d μ_{A}$ is $1/ n$ -Lipschitz in Frobenius norm (Exercise 3 with $∥ f ∥_{Lip} = 1$ ). The Gaussian entry law satisfies LSI with constant of order $1/ n$ after the $n^{- 1/2}$ matrix scaling, so the composed observable has effective LSI-Lipschitz product giving $σ^{2} = O (1/ n^{2})$ . Herbst then yields $P (∣ \int f d μ_{n} - E \int f d μ_{n} ∣ \geq t) \leq 2 e^{- c n^{2} t^{2}}$ . Since $\sum_{n} e^{- c n^{2} t^{2}} < \infty$ for every $t > 0$ , Borel-Cantelli gives $\int f d μ_{n} - E \int f d μ_{n} \to 0$ almost surely; combined with $E \int f d μ_{n} \to \int f d μ_{sc}$ from the moment method 37.08.01 for polynomial and then Lipschitz $f$ by approximation, $\int f d μ_{n} \to \int f d μ_{sc}$ almost surely. $□$

Connections Master

The Wigner semicircle law and the moment method 37.08.01 supply the deterministic half that concentration leaves open. Herbst controls fluctuations around the mean $E \int f d μ_{n}$ , while the moment computation identifies that mean's limit as $\int f d μ_{sc}$ ; together they give almost-sure weak convergence of the empirical spectral distribution, with the concentration inequality replacing the combinatorial variance bound and surviving the relaxation of the finite-moment hypothesis.

Characteristic functions and the Lévy continuity theorem 37.03.01 are the moment-determinacy companion that turns moment-wise or test-function-wise convergence into weak convergence; concentration gives the probabilistic upgrade (a.s. convergence of each $\int f d μ_{n}$ ) while the continuity theorem gives the measure-theoretic one (pointwise-in- $f$ convergence implies $μ_{n} \Rightarrow μ_{sc}$ ), and the two are used in tandem in the proof above.

Cramér's theorem and large-deviation rate functions 37.07.02 sit on the other side of the same coin: concentration bounds the probability of an $O (1)$ deviation by $e^{- c n^{2} t^{2}}$ , while a large-deviation principle gives the exact exponential rate of rare spectral events, and the log-Sobolev constant controls the Gaussian-tail regime that the LDP rate function reproduces near its minimiser. The Herbst sub-Gaussian bound is the soft envelope of the sharp LDP speed.

The strong law of large numbers and Borel-Cantelli 37.02.02 is the engine that converts the summable concentration tail $\sum_{n} e^{- c n^{2} t^{2}} < \infty$ into almost-sure convergence; the self-averaging of a Lipschitz spectral statistic is the spectral-measure realisation of the law-of-large-numbers intuition that an average over many weakly-dependent contributions is deterministic in the limit.

Historical & philosophical context Master

The logarithmic Sobolev inequality for Gaussian measure was proved by Leonard Gross in 1975 ^{[Gross 1975]}, who established its equivalence with Edward Nelson's hypercontractivity of the Ornstein-Uhlenbeck semigroup; the inequality had appeared implicitly in quantum field theory before Gross gave it its measure-theoretic form and dimension-free constant. The exponential-integrability argument turning a log-Sobolev inequality into a sub-Gaussian Laplace-transform bound is due to Ira Herbst in an unpublished letter, transmitted through the functional-analysis community and recorded in the monographs of Davies-Simon and Ledoux ^[Herbst]; it is the differential-inequality computation $K^{'} \leq c /2$ reproduced above.

The application to random matrices was systematised by Alice Guionnet and Ofer Zeitouni in 2000 ^{[Guionnet 2000]}, who used the Lipschitz property of linear eigenvalue statistics together with Gaussian and Talagrand concentration to prove the empirical spectral measure self-averages at speed $n^{2}$ . The non-Gaussian product-space route rests on Michel Talagrand's 1995 convex-distance inequality ^{[Talagrand 1995]}, which gave dimension-free concentration on product spaces without any smoothness of the underlying law and reshaped the field; the concentration-of-measure phenomenon itself was identified earlier by Vitali Milman in the geometry of high-dimensional convex bodies and developed into a systematic theory in the surveys of Michel Ledoux. The bounded-differences inequality via the Doob martingale traces to Colin McDiarmid's 1989 survey and the earlier Azuma-Hoeffding martingale bound.

Bibliography Master

@article{gross1975,
  author  = {Gross, Leonard},
  title   = {Logarithmic Sobolev inequalities},
  journal = {American Journal of Mathematics},
  volume  = {97},
  number  = {4},
  pages   = {1061--1083},
  year    = {1975}
}

@article{guionnetzeitouni2000,
  author  = {Guionnet, Alice and Zeitouni, Ofer},
  title   = {Concentration of the spectral measure for large matrices},
  journal = {Electronic Communications in Probability},
  volume  = {5},
  pages   = {119--136},
  year    = {2000}
}

@article{talagrand1995,
  author  = {Talagrand, Michel},
  title   = {Concentration of measure and isoperimetric inequalities in product spaces},
  journal = {Publications Math\'ematiques de l'IH\'ES},
  volume  = {81},
  pages   = {73--205},
  year    = {1995}
}

@book{ledoux2001,
  author    = {Ledoux, Michel},
  title     = {The Concentration of Measure Phenomenon},
  series    = {Mathematical Surveys and Monographs},
  volume    = {89},
  publisher = {American Mathematical Society},
  year      = {2001}
}

@book{blm2013,
  author    = {Boucheron, St\'ephane and Lugosi, G\'abor and Massart, Pascal},
  title     = {Concentration Inequalities: A Nonasymptotic Theory of Independence},
  publisher = {Oxford University Press},
  year      = {2013}
}

@book{agz2010,
  author    = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
  title     = {An Introduction to Random Matrices},
  series    = {Cambridge Studies in Advanced Mathematics},
  volume    = {118},
  publisher = {Cambridge University Press},
  year      = {2010}
}

@article{mcdiarmid1989,
  author  = {McDiarmid, Colin},
  title   = {On the method of bounded differences},
  journal = {Surveys in Combinatorics (London Math. Soc. Lecture Note Ser. 141)},
  pages   = {148--188},
  year    = {1989}
}

Prerequisites

37.08.01

Tier anchors

beginner: Tao, Topics in Random Matrix Theory §2.3 (concentration of eigenvalue statistics); the physical picture of a self-averaging histogram that barely moves when you resample the matrix; Boucheron-Lugosi-Massart, Concentration Inequalities §1 (the bounded-differences picture)
intermediate: Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices §2.3 and §4.4 (Hoffman-Wielandt, log-Sobolev, Herbst); Boucheron-Lugosi-Massart, Concentration Inequalities Ch. 3, 5, 6; Ledoux, The Concentration of Measure Phenomenon Ch. 5
master: Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.3, §4.4; Ledoux, The Concentration of Measure Phenomenon (AMS Surveys 89, 2001) Ch. 5; Boucheron-Lugosi-Massart, Concentration Inequalities (Oxford, 2013); Guionnet-Zeitouni, Concentration of the spectral measure for large matrices, Electron. Commun. Probab. 5 (2000)

References

Gross — Logarithmic Sobolev inequalities · American Journal of Mathematics 97 (1975), 1061-1083 (the log-Sobolev inequality for Gaussian measure, equivalence with hypercontractivity)
Herbst — unpublished letter, reproduced in Davies-Simon · the exponential-integrability argument from a log-Sobolev inequality; see Ledoux, Concentration of Measure §2.3 and AGZ §2.3.2
Guionnet, Zeitouni — Concentration of the spectral measure for large matrices · Electronic Communications in Probability 5 (2000), 119-136 (Lipschitz concentration of linear eigenvalue statistics)
Talagrand — Concentration of measure and isoperimetric inequalities in product spaces · Publications Mathématiques de l'IHÉS 81 (1995), 73-205 (convex-Lipschitz concentration, the non-Gaussian route)
Anderson, Guionnet, Zeitouni — An Introduction to Random Matrices · Cambridge University Press, 2010, §2.3 and §4.4 (Hoffman-Wielandt, log-Sobolev, Herbst, spectral concentration)
Ledoux — The Concentration of Measure Phenomenon · American Mathematical Society, Mathematical Surveys and Monographs 89, 2001 (log-Sobolev, Herbst, Lipschitz concentration)

Estimated time

beginner: 20m
intermediate: 55m
master: 95m