Spectral Concentration: Log-Sobolev and the Herbst Argument
Anchor (Master): Anderson-Guionnet-Zeitouni, An Introduction to Random Matrices (Cambridge, 2010) §2.3, §4.4; Ledoux, The Concentration of Measure Phenomenon (AMS Surveys 89, 2001) Ch. 5; Boucheron-Lugosi-Massart, Concentration Inequalities (Oxford, 2013); Guionnet-Zeitouni, Concentration of the spectral measure for large matrices, Electron. Commun. Probab. 5 (2000)
Intuition Beginner
Imagine you build a large random symmetric grid of numbers, read off one summary statistic of its eigenvalues — say the average of all of them, or the size of the largest one — and write the answer down. Then you throw the whole grid away, build a fresh independent one of the same size, and measure the same statistic again. You might expect two independent random experiments to give noticeably different answers. The remarkable fact is that for a big grid they barely differ: the number you measure is almost the same every time, locked near one fixed value, with only tiny wobble. The statistic is self-averaging.
This is the concentration of measure phenomenon. The reason it happens is that the summary statistic depends on each of the many random entries only a little. A function of a great many independent inputs, none of which can swing the answer much on its own, is overwhelmingly likely to sit close to its average value. The more inputs there are, the tighter the answer clings to that average, and the chance of a large deviation shrinks faster than any power — it shrinks like a bell-curve tail.
There is a clean engine behind this for bell-curve entries. A single inequality about Gaussian draws, fed through a short argument, shows that any statistic which does not change much when you nudge its inputs cannot stray far from its average. That engine is what lets us prove the histogram of eigenvalues is essentially deterministic for large grids.
Visual Beginner
Picture a dartboard view of the same measurement repeated many times. The horizontal axis is the value of one eigenvalue statistic; the vertical axis counts how often each value came up across many independent random grids. For a small grid the darts scatter widely. For a large grid they cluster into a narrow spike centred on the average, and the spike gets narrower as the grid grows.
The table below shows the qualitative trend: a 1-Lipschitz statistic (one no single entry can move much) measured on grids of growing size , with the typical spread of its values shrinking toward zero.
| Grid size | Typical spread of the statistic | Tail beyond spread |
|---|---|---|
| small | wide | fat, polynomial |
| medium | moderate | thinning |
| large | narrow, | bell-curve thin |
| very large | tiny | essentially deterministic |
The picture is the visual content of self-averaging: the spread of a Lipschitz spectral statistic collapses as grows, and the leftover wobble has a thin, bell-curve-style tail rather than a fat one.
Worked example Beginner
We check, with concrete numbers, that one eigenvalue cannot move far when we nudge a single matrix entry — the property that drives concentration.
Step 1. Take the two-by-two symmetric grid with diagonal entries and and off-diagonal entry . Being diagonal, its eigenvalues are just and . The largest eigenvalue is .
Step 2. Now nudge the off-diagonal entry from up to a small value , leaving the diagonal alone. The grid is now . Its eigenvalues are the two numbers .
Step 3. Compute the square root: . So the eigenvalues are about and . The largest moved from to about .
Step 4. Compare the change in the answer to the change in the input. The input moved by . The largest eigenvalue moved by about , which is much less than . The answer changed by less than the nudge.
Step 5. What this tells us: the largest eigenvalue is a 1-Lipschitz function of the entries in the right distance — when you measure the size of the nudge using the square-root-of-sum-of-squares distance on the entries, no eigenvalue ever moves by more than that size. Each single entry has limited leverage on the answer, and that limited leverage, summed over many independent entries, is exactly what forces the answer to concentrate near its average.
Check your understanding Beginner
Formal definition Intermediate+
Fix the space of real symmetric (or complex Hermitian) matrices, identified with for the appropriate via the independent entries, and equipped with the Frobenius (Hilbert-Schmidt) norm . Order the eigenvalues [the spectral setup is from 37.08.01]. A function is -Lipschitz if for all .
Two notions of Lipschitz spectral statistic recur. The largest eigenvalue , and more generally each , is -Lipschitz in the Frobenius norm; this is the content of the Hoffman-Wielandt inequality below. The linear eigenvalue statistic , where is the empirical spectral distribution and is a fixed test function, is -Lipschitz when is Lipschitz — the prefactor is the source of the sharp -dependence in its fluctuations.
A probability measure on satisfies a logarithmic Sobolev inequality (LSI) with constant if for every smooth with , $$ \mathrm{Ent}\mu(g^2) := \int g^2 \log g^2, d\mu ;\le; 2c \int |\nabla g|2^2, d\mu , $$ where the relative entropy functional is $\mathrm{Ent}\mu(h) = \int h \log h, d\mu - \big(\int h, d\mu\big)\log\big(\int h, d\mu\big)h \ge 0\gamma\mathbb{R}^Nc = 1\muT\sigma^21F$, $$ \mathbb{P}\big(|F - \mathbb{E}\mu F| \ge t\big) \le 2\exp!\Big(-\frac{t^2}{2\sigma^2}\Big), \qquad t \ge 0 . $$ The Herbst argument is the implication LSI sub-Gaussian concentration with .
Counterexamples to common slips Intermediate+
- Lipschitz in Frobenius, not in operator norm, is what the eigenvalue map satisfies cleanly. The bound (Weyl) is also true, but for concentration from independent entries the Frobenius geometry is the right one, because the Gaussian/product measure on entries is isotropic in , not in .
- Concentration is around the mean (or median), not around the limiting value. LSI gives small; identifying with a deterministic limit (e.g. ) is a separate computation. Confusing the two conflates fluctuation control with a law of large numbers.
- The prefactor for linear statistics is not optional. The map has Lipschitz constant , giving tails . Forgetting the inside the sum gives , an -Lipschitz object that does not concentrate at the same rate.
- LSI is strictly stronger than a Poincaré (spectral-gap) inequality. Poincaré controls the variance and gives only exponential (not sub-Gaussian) tails for Lipschitz functions; LSI is needed for the Gaussian-square tail. A bounded-support entry distribution may fail LSI yet still concentrate via the bounded-differences route.
Key theorem with proof Intermediate+
Theorem (Herbst argument: LSI implies sub-Gaussian concentration). Let be a probability measure on satisfying a logarithmic Sobolev inequality with constant . Let be -Lipschitz with respect to the Euclidean norm. Then is integrable, and for every , $$ \mathbb{E}\mu\big[e^{\lambda (F - \mathbb{E}\mu F)}\big] \le \exp!\Big(\frac{c,\lambda^2}{2}\Big), $$ and consequently for all .
Proof. Assume first that is bounded and smooth with everywhere; the general -Lipschitz case follows by mollification and truncation, since the bound is preserved under the limits. Apply the LSI to , so that and , giving . Define the Laplace transform . The LSI reads $$ \mathbb{E}\mu\big[\lambda F, e^{\lambda F}\big] - H(\lambda)\log H(\lambda) = \mathrm{Ent}\mu(e^{\lambda F}) \le 2c\cdot \frac{\lambda^2}{4},\mathbb{E}\mu\big[e^{\lambda F}\big] = \frac{c\lambda^2}{2}H(\lambda). $$ Recognise the left side through $H'(\lambda) = \mathbb{E}\mu[F e^{\lambda F}]\lambda H'(\lambda) - H(\lambda)\log H(\lambda)\lambda^2 H(\lambda) > 0K(\lambda) = \tfrac{1}{\lambda}\log H(\lambda)$. A direct computation gives $$ K'(\lambda) = \frac{\lambda H'(\lambda) - H(\lambda)\log H(\lambda)}{\lambda^2 H(\lambda)} \le \frac{c}{2}. $$ As , . Integrating from to yields for , i.e. ; the same integration on handles . Subtracting gives the stated Laplace bound on .
For the tail, Markov's inequality applied to gives, for , . Optimising over at gives . Applying the same to (also -Lipschitz) and summing the two one-sided bounds yields the two-sided tail.
Bridge. This argument builds toward every Gaussian-entry spectral concentration estimate in random matrix theory and appears again in the linear-statistic and largest-eigenvalue bounds below, where the only extra ingredient is the Lipschitz constant of the relevant spectral map. The foundational reason the proof works is the differential inequality for : entropy controls the derivative of the free energy, and integrating that control is exactly the passage from a local smoothness statement (LSI) to a global tail statement. This is exactly the entropy method, and it generalises the classical Chernoff bound — there one bounds the cumulant generating function by hand for a specific distribution, here LSI bounds it uniformly for every Lipschitz observable at once. Putting these together, the Herbst argument is dual to the moment method of 37.08.01: moments identify the limiting shape of the spectrum, while the Herbst tail certifies that each spectral statistic sticks to its mean, and the central insight is that low coordinate-sensitivity plus a log-Sobolev inequality is enough to force sub-Gaussian self-averaging.
Exercises Intermediate+
Advanced results Master
The Gaussian ensembles satisfy a dimension-free log-Sobolev inequality at the level of the entries, and this is the cleanest source of spectral concentration. For the Gaussian Orthogonal and Unitary Ensembles the entries are independent centred Gaussians, the joint law is a standard Gaussian on after scaling, and Gross's theorem supplies LSI with constant of order one; tensorisation keeps the constant dimension-free. Composing with the Frobenius-Lipschitz bounds of the spectral maps, the Herbst argument yields: for the largest eigenvalue, and for a linear statistic against a Lipschitz . The speed- in the second bound — versus speed- for a single eigenvalue — records that a smoothed statistic averages eigenvalues and so concentrates far more tightly; this is the quantitative form of self-averaging of the empirical spectral distribution.
The concentration is around the mean, and converting it into convergence of to the semicircle requires separately controlling , supplied by the moment method 37.08.01. The two pieces combine cleanly: the deterministic mean tracks the semicircle while the fluctuation is sub-Gaussian at speed , so the Borel-Cantelli lemma gives almost-sure weak convergence without ever computing a variance by hand — the concentration inequality replaces the combinatorial variance bound used in 37.08.01 and is robust to relaxing the finite-moment hypothesis to finite second moment plus a Lipschitz observable.
For entries that are not Gaussian, three routes recover concentration. First, the bounded-differences (McDiarmid) inequality via a Doob martingale needs only boundedness and independence, charging ; it gives concentration at speed but with suboptimal constants. Second, Talagrand's convex-distance inequality exploits that is a convex Lipschitz function of the entries and delivers a dimension-free sub-Gaussian bound on bounded product spaces, matching the Gaussian rate. Third, the entropy method of Boucheron-Lugosi-Massart builds a modified log-Sobolev inequality directly from the bounded differences of and reproduces the Herbst conclusion combinatorially. A measure satisfying LSI is automatically sub-Gaussian; the converse holds for bounded perturbations of Gaussians via the Holley-Stroock perturbation lemma, which multiplies the LSI constant by for a bounded potential perturbation .
The sharpest statements concern fluctuations rather than tails. While concentration shows is in size, the centred linear statistic — note: no — converges to a Gaussian with an variance and no normalisation, a central limit theorem without the usual scaling, reflecting the strong eigenvalue rigidity that concentration first signals. The Herbst tail is the soft, non-asymptotic shadow of this rigidity: concentration is what makes the limiting Gaussian fluctuation a small correction to an essentially frozen spectrum.
Synthesis. The foundational reason a single inequality organises spectral concentration is that the log-Sobolev inequality controls the derivative of the free energy uniformly over all Lipschitz observables, and the Herbst integration turns that local control into a global sub-Gaussian tail; this is exactly the entropy method, and it is dual to the moment method of 37.08.01 in the precise sense that moments pin the mean of each spectral statistic while Herbst pins its fluctuation around that mean. Putting these together, the three pillars — Hoffman-Wielandt making the eigenvalue map -Lipschitz, the LSI of the Gaussian entry law, and the Herbst argument converting LSI to tails — compose into a single pipeline, and the Lipschitz constant of a linear statistic versus the of a single eigenvalue is exactly what generalises one self-averaging rate into two, speed and speed . The central insight is that concentration is a statement about coordinate-sensitivity: a spectral functional with small Lipschitz constant in the Frobenius geometry of an LSI measure cannot stray from its mean, and this is the bridge from the deterministic limiting shape of the spectrum to the near-deterministic behaviour of every individual large matrix, which the non-Gaussian Talagrand and bounded-differences routes show is robust far beyond the Gaussian case where the constants are sharpest.
Full proof set Master
The Herbst Laplace-transform bound, its tail corollary, tensorisation, the Poincaré linearisation, the bounded-differences inequality, and the linear-statistic Lipschitz constant are proved in full above. The remaining Master claims are recorded here.
Proposition (Hoffman-Wielandt inequality). For Hermitian with eigenvalues and , .
Proof. Diagonalise and with increasing. Then with unitary. Expanding, . The matrix is doubly stochastic, so is a convex combination of values over permutations (Birkhoff's theorem: doubly stochastic matrices are convex combinations of permutation matrices). Maximising the bilinear form over permutations is achieved when and are sorted the same way (the rearrangement inequality), i.e. by the identity permutation given both are increasing. Hence , so .
Proposition (Gaussian log-Sobolev inequality, Gross). The standard Gaussian measure on satisfies LSI with constant : .
Proof. By tensorisation it suffices to take . Use the Ornstein-Uhlenbeck semigroup with generator and stationary measure , satisfying the commutation . For with , write , using and . Differentiating and integrating by parts gives . The commutation and Cauchy-Schwarz bound , so . Integrating over yields . Substituting , , gives .
Proposition (Holley-Stroock perturbation). If satisfies LSI with constant and with bounded and oscillation , then satisfies LSI with constant .
Proof. For the entropy with respect to , the variational formula shows the integrand is non-negative and pointwise comparable across up to the density ratio . Hence (correction) and ; taking the worst-case ratio of the two density bounds multiplies the LSI constant by .
Proposition (concentration of the empirical spectral distribution). For a Gaussian Wigner ensemble and -Lipschitz , , and consequently almost surely.
Proof. The map is -Lipschitz in Frobenius norm (Exercise 3 with ). The Gaussian entry law satisfies LSI with constant of order after the matrix scaling, so the composed observable has effective LSI-Lipschitz product giving . Herbst then yields . Since for every , Borel-Cantelli gives almost surely; combined with from the moment method 37.08.01 for polynomial and then Lipschitz by approximation, almost surely.
Connections Master
The Wigner semicircle law and the moment method 37.08.01 supply the deterministic half that concentration leaves open. Herbst controls fluctuations around the mean , while the moment computation identifies that mean's limit as ; together they give almost-sure weak convergence of the empirical spectral distribution, with the concentration inequality replacing the combinatorial variance bound and surviving the relaxation of the finite-moment hypothesis.
Characteristic functions and the Lévy continuity theorem 37.03.01 are the moment-determinacy companion that turns moment-wise or test-function-wise convergence into weak convergence; concentration gives the probabilistic upgrade (a.s. convergence of each ) while the continuity theorem gives the measure-theoretic one (pointwise-in- convergence implies ), and the two are used in tandem in the proof above.
Cramér's theorem and large-deviation rate functions 37.07.02 sit on the other side of the same coin: concentration bounds the probability of an deviation by , while a large-deviation principle gives the exact exponential rate of rare spectral events, and the log-Sobolev constant controls the Gaussian-tail regime that the LDP rate function reproduces near its minimiser. The Herbst sub-Gaussian bound is the soft envelope of the sharp LDP speed.
The strong law of large numbers and Borel-Cantelli 37.02.02 is the engine that converts the summable concentration tail into almost-sure convergence; the self-averaging of a Lipschitz spectral statistic is the spectral-measure realisation of the law-of-large-numbers intuition that an average over many weakly-dependent contributions is deterministic in the limit.
Historical & philosophical context Master
The logarithmic Sobolev inequality for Gaussian measure was proved by Leonard Gross in 1975 [Gross 1975], who established its equivalence with Edward Nelson's hypercontractivity of the Ornstein-Uhlenbeck semigroup; the inequality had appeared implicitly in quantum field theory before Gross gave it its measure-theoretic form and dimension-free constant. The exponential-integrability argument turning a log-Sobolev inequality into a sub-Gaussian Laplace-transform bound is due to Ira Herbst in an unpublished letter, transmitted through the functional-analysis community and recorded in the monographs of Davies-Simon and Ledoux [Herbst]; it is the differential-inequality computation reproduced above.
The application to random matrices was systematised by Alice Guionnet and Ofer Zeitouni in 2000 [Guionnet 2000], who used the Lipschitz property of linear eigenvalue statistics together with Gaussian and Talagrand concentration to prove the empirical spectral measure self-averages at speed . The non-Gaussian product-space route rests on Michel Talagrand's 1995 convex-distance inequality [Talagrand 1995], which gave dimension-free concentration on product spaces without any smoothness of the underlying law and reshaped the field; the concentration-of-measure phenomenon itself was identified earlier by Vitali Milman in the geometry of high-dimensional convex bodies and developed into a systematic theory in the surveys of Michel Ledoux. The bounded-differences inequality via the Doob martingale traces to Colin McDiarmid's 1989 survey and the earlier Azuma-Hoeffding martingale bound.
Bibliography Master
@article{gross1975,
author = {Gross, Leonard},
title = {Logarithmic Sobolev inequalities},
journal = {American Journal of Mathematics},
volume = {97},
number = {4},
pages = {1061--1083},
year = {1975}
}
@article{guionnetzeitouni2000,
author = {Guionnet, Alice and Zeitouni, Ofer},
title = {Concentration of the spectral measure for large matrices},
journal = {Electronic Communications in Probability},
volume = {5},
pages = {119--136},
year = {2000}
}
@article{talagrand1995,
author = {Talagrand, Michel},
title = {Concentration of measure and isoperimetric inequalities in product spaces},
journal = {Publications Math\'ematiques de l'IH\'ES},
volume = {81},
pages = {73--205},
year = {1995}
}
@book{ledoux2001,
author = {Ledoux, Michel},
title = {The Concentration of Measure Phenomenon},
series = {Mathematical Surveys and Monographs},
volume = {89},
publisher = {American Mathematical Society},
year = {2001}
}
@book{blm2013,
author = {Boucheron, St\'ephane and Lugosi, G\'abor and Massart, Pascal},
title = {Concentration Inequalities: A Nonasymptotic Theory of Independence},
publisher = {Oxford University Press},
year = {2013}
}
@book{agz2010,
author = {Anderson, Greg W. and Guionnet, Alice and Zeitouni, Ofer},
title = {An Introduction to Random Matrices},
series = {Cambridge Studies in Advanced Mathematics},
volume = {118},
publisher = {Cambridge University Press},
year = {2010}
}
@article{mcdiarmid1989,
author = {McDiarmid, Colin},
title = {On the method of bounded differences},
journal = {Surveys in Combinatorics (London Math. Soc. Lecture Note Ser. 141)},
pages = {148--188},
year = {1989}
}