The Lindeberg–Feller Central Limit Theorem
Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.4; Feller II §XV.6, §XVI.5-7; Petrov, Sums of Independent Random Variables (Springer, 1975) Ch. IV-V; Gnedenko-Kolmogorov, Limit Distributions for Sums of Independent Random Variables (Addison-Wesley, 1954) §§19-22
Intuition Beginner
A bell curve shows up whenever a quantity is the sum of many small independent pushes, and almost nothing about the individual pushes matters. The classical story assumes the pushes are identical copies of one another, but real measurement error, real noise, and real fluctuation rarely come from identical sources. A scale reading might combine a tiny temperature drift, a small vibration, a rounding error, and a faint electrical hiccup, each obeying its own law. The Lindeberg–Feller theorem explains why the total still piles up into a bell curve.
The key requirement is not that the pushes be identical but that no single push dominates the total. If one ingredient is huge compared with the rest, it stamps its own lopsided shape onto the sum and the bell curve never forms. So the theorem needs a precise way to say "every contributor is small relative to the whole." That condition is the Lindeberg condition: once you rescale so the total has spread one, the chance that any one contributor is responsible for a noticeable fraction of the spread must fade away as the number of contributors grows.
The surprising part is that this smallness condition is not just enough to force a bell curve, it is essentially the whole story. Feller proved a converse: if the contributors are all individually small and the sum settles into a bell curve, then the Lindeberg condition must have held. So the smallness of each piece and the emergence of the bell curve are two sides of one coin.
The one-sentence takeaway: a sum of many independent pieces becomes a bell curve exactly when each piece is negligibly small compared with the whole, and the Lindeberg condition is the precise meaning of "negligibly small."
Visual Beginner
Picture a row of independent contributions stacked into one total, then a second row with more contributions, then a third row with still more. In each row the contributions are first rescaled so the whole row sums to a quantity with spread one. As you move down the rows, the individual bars get thinner and more numerous, and the histogram of the row total slides closer to the standard bell shape.
The picture also shows what breaks the theorem. If in some row one bar stays fat while the others shrink, the "largest single share" gauge never reaches zero, the histogram keeps a permanent lump or skew, and no bell curve forms. The bell curve appears precisely when every bar's share of the total spread is driven down to nothing.
Worked example Beginner
Suppose in row number you add up independent contributions. The first contribution is a coin-flip worth plus or minus , each with chance one half. The other contributions are tiny coin-flips, each worth plus or minus the small amount divided by the square root of . We watch whether the total can become a bell curve.
Step 1. Find the spread of each contribution. The spread of a plus-or-minus value with equal chances is the size of that value. So the first contribution has spread , and each tiny one has spread divided by the square root of .
Step 2. Find the total spread squared, which adds across independent contributions. The first contributes . Each tiny one contributes divided by , and there are of them, contributing divided by , which is just under . So the total spread squared is about .
Step 3. Find the largest single share. After rescaling so the total spread squared is , the first contribution still carries about one half of the total spread squared, because out of is one half. That share does not shrink as grows.
Step 4. Read the verdict. Because one contribution permanently owns half the spread, no single piece is negligible, the Lindeberg condition fails, and the total cannot converge to a bell curve. Indeed the total always carries the lopsided imprint of that one fat coin flip.
Step 5. What this tells us: simply piling up more and more pieces does not guarantee a bell curve. The tiny pieces alone would converge to a bell curve, but the one stubborn large piece blocks it. The bell curve needs every contributor's share of the total spread to vanish, which is exactly what the Lindeberg condition demands.
Check your understanding Beginner
Formal definition Intermediate+
The natural setting is a triangular array. For each let be independent (within the row) real random variables with and . Write the row sum and its variance as $$ S_n = \sum_{k=1}^{r_n} X_{n,k}, \qquad s_n^2 = \mathrm{Var}(S_n) = \sum_{k=1}^{r_n} \sigma_{n,k}^2 . $$ We standardise so that for every (replace by ); this loses no generality and is assumed throughout. The classical i.i.d. case is recovered by taking and for an i.i.d. sequence with mean and variance [from 26.04.01].
Definition (Lindeberg condition). The array satisfies the Lindeberg condition if, with , $$ L_n(\varepsilon) := \sum_{k=1}^{r_n} \mathbb{E}!\left[ X_{n,k}^2 ,;, |X_{n,k}| > \varepsilon \right] = \sum_{k=1}^{r_n} \mathbb{E}!\left[ X_{n,k}^2 ,\mathbf{1}{{|X{n,k}| > \varepsilon}} \right] \xrightarrow[n \to \infty]{} 0 \quad \text{for every } \varepsilon > 0 . $$ Here abbreviates . The condition asks that the variance contributed by the "large" parts of the variables be asymptotically negligible at every threshold.
Definition (uniform asymptotic negligibility). The array is uniformly asymptotically negligible (UAN), also called holospoudic, if $$ \max_{1 \le k \le r_n} \sigma_{n,k}^2 = \max_{1 \le k \le r_n} \mathbb{E}[X_{n,k}^2] \xrightarrow[n\to\infty]{} 0 . $$ This is the precise form of "no single contributor carries a fixed share of the total variance," since .
Definition (Lyapunov condition). The array satisfies the Lyapunov condition of order for some if $$ \Lambda_n(\delta) := \sum_{k=1}^{r_n} \mathbb{E}!\left[ |X_{n,k}|^{2+\delta} \right] \xrightarrow[n\to\infty]{} 0 . $$
The Lindeberg condition implies UAN: for any , , so , and sending then gives the negligibility. The Lyapunov condition implies Lindeberg: on one has , so . Thus Lyapunov Lindeberg UAN, and none of the reverse implications holds in general.
Counterexamples to common slips Intermediate+
- Finite variances do not suffice. The worked Beginner example — a fixed flip plus many tiny flips — has every variance finite and the row variance bounded, yet the dominant flip keeps half the variance, UAN fails, Lindeberg fails, and there is no normal limit. Smallness of pieces, not finiteness of variances, is the load-bearing hypothesis.
- Lindeberg is strictly weaker than Lyapunov. There are arrays satisfying Lindeberg but no Lyapunov condition for any (mass placed far out on rare events so that moments blow up while truncated second moments behave). Lyapunov is the convenient sufficient condition, not the sharp one.
- Independence within rows is essential, across rows irrelevant. The variables in a single row must be independent; nothing is assumed about the relationship between rows, and indeed in the i.i.d. embedding the rows overlap heavily. Reading the array as one long independent sequence is a mistake.
- Without UAN the limit need not be normal. Drop negligibility and the possible limits of row sums expand to the whole class of infinitely divisible laws; the normal law is singled out precisely by UAN together with vanishing large-deviation variance.
Key theorem with proof Intermediate+
Theorem (Lindeberg–Feller). Let be a triangular array of row-wise independent random variables with and . Then:
(Sufficiency, Lindeberg.) If the Lindeberg condition holds for every , then .
(Converse, Feller.) Conversely, if the array is uniformly asymptotically negligible () and , then the Lindeberg condition holds.
Proof. Write for the characteristic function of [from 37.03.01]. By row independence the characteristic function of is the product . By the Lévy continuity theorem [from 37.03.01] it suffices to show for every fixed .
Sufficiency. Fix . Two elementary bounds drive the argument. For real , integrating the Taylor remainder gives $$ \left| e^{ix} - \Big(1 + ix - \tfrac{x^2}{2}\Big) \right| \le \min!\left( |x|^3,\ x^2 \right), $$ the cubic bound from one extra term of Taylor, the quadratic bound from estimating and each within . Apply this with and take expectations, using and : $$ \left| \varphi_{n,k}(t) - \Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \le \mathbb{E}!\left[ \min!\big( |t|^3 |X_{n,k}|^3,\ t^2 X_{n,k}^2 \big) \right]. $$ Split the expectation at the threshold versus : on the small set use the cubic bound, ; on the large set use the quadratic bound, . Summing over , $$ \sum_{k=1}^{r_n} \left| \varphi_{n,k}(t) - \Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \le |t|^3 \varepsilon \sum_k \sigma_{n,k}^2 + t^2 \sum_k \mathbb{E}[X_{n,k}^2; |X_{n,k}| > \varepsilon] = |t|^3 \varepsilon + t^2 L_n(\varepsilon). $$
Letting the second term vanishes by Lindeberg, and then letting kills the first; the whole sum tends to . Next pass from the additive approximation to the product. Using the lemma that for complex numbers with one has , with and (both of modulus at most once , which holds eventually by UAN), we get $$ \left| \varphi_{S_n}(t) - \prod_{k=1}^{r_n}\Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \xrightarrow[n\to\infty]{} 0 . $$
Finally evaluate the product. With uniformly (UAN) and , the bound for small gives , so the product converges to . Combining, , and Lévy continuity yields .
Converse. Assume UAN and , so , which is never zero. Under UAN, uniformly in , so each factor is close to and the principal logarithm is available. The identity for small converts the product convergence into . Taking real parts and using , $$ \sum_{k=1}^{r_n} \mathbb{E}\big[,1 - \cos(t X_{n,k}),\big] \xrightarrow[n\to\infty]{} \tfrac{t^2}{2}. $$
Now use and, for the truncation, the bound when is moderate together with always. Fix . Split each expectation at and . On the small set ; summing the small-set parts is at most . On the large set , and there, so subtracting the small-set lower bound from the full limit isolates $$ \tfrac{t^2}{2}\Big(1 - \tfrac{t^2\varepsilon^2}{12}\Big)\Big(1 - \limsup_n L_n(\varepsilon)\Big) \le \tfrac{t^2}{2}. $$
Dividing by and fixing with small forces . Holding fixed and letting gives for each fixed after the standard rearrangement; equivalently, the inequality with free shows that for each fixed the large-set contribution is bounded by a quantity tending to , and comparing with the -scale lower bounds on the large set yields . The Lindeberg condition follows.
Bridge. This theorem builds toward the full classification of limit laws for sums of independent variables and appears again in the proof of the Berry–Esseen rate and in the functional central limit theorem (Donsker's invariance principle), where triangular arrays of increments are exactly the objects in play. The foundational reason the result holds is that the characteristic function turns the row sum into a product, so a sum becomes additive in the logarithm and the Lindeberg condition is precisely what controls the cubic Taylor remainder that separates the true product from its Gaussian surrogate . This is exactly the i.i.d. central limit theorem when , recovered by checking Lindeberg through a single dominated-convergence step; the triangular-array statement generalises that special case to non-identical, non-stationary summands, and the bridge is that Feller's converse closes the loop by showing negligibility plus a normal limit can come from nothing but the Lindeberg condition, so sufficiency and necessity meet at one sharp hypothesis.
Exercises Intermediate+
Advanced results Master
The sufficiency direction quantifies into a convergence rate. When third moments are present, the Berry–Esseen theorem bounds the uniform distance between the distribution function of the standardised sum and the standard normal . For i.i.d. with mean , variance , and , $$ \sup_{x \in \mathbb{R}} \left| \mathbb{P}!\left( \frac{1}{\sigma\sqrt n}\sum_{k=1}^n Y_k \le x \right) - \Phi(x) \right| \le \frac{C,\rho}{\sigma^3 \sqrt n}, $$ with an absolute constant ; the historical value (Esseen's own constant was larger) has been improved toward the conjectured extremal attained by the two-point Bernoulli law. The non-identically-distributed version replaces by the Lyapunov ratio . The proof is the Esseen smoothing inequality from 37.03.01 applied to : the cubic Taylor remainder, integrated against over a window , produces exactly the rate.
Without a third moment but with the bare Lindeberg condition, no universal rate exists — the convergence can be arbitrarily slow — yet a rate of the form (the Lindeberg ratio or Zolotarev/Katz refinement) governs the speed, interpolating between the bounded-third-moment Berry–Esseen rate and the general qualitative statement. This is the genuinely sharp quantitative form of the Lindeberg theorem: the same truncated-second-moment functional that appears in the qualitative condition reappears as the rate.
Dropping the negligibility hypothesis opens the door to non-normal limits, and the complete answer is the classification of limit laws for sums of independent variables. Under UAN, the possible weak limits of row sums of triangular arrays are exactly the infinitely divisible laws, with characteristic function of Lévy–Khinchine form [from 37.03.01]. The Gaussian case is singled out by the Lindeberg-type condition , which forces the Lévy measure of the limit to vanish; relaxing it to convergence of those truncated-tail sums to a measure produces a general infinitely divisible limit. The Lindeberg–Feller theorem is thus the Gaussian fibre of the Gnedenko–Kolmogorov classification.
The condition is also exactly what is needed for the martingale and dependent generalisations. The martingale central limit theorem replaces row independence by a martingale-difference structure and replaces by convergence of the conditional variances in probability, retaining a conditional Lindeberg condition verbatim. This is the route to the central limit theorem for stationary sequences, Markov chains, and stochastic-approximation algorithms, and it is the discrete skeleton of the functional central limit theorem (Donsker), where the array of increments of a random walk converges to Brownian motion.
Synthesis. The central insight is that the characteristic transform linearises the row sum into a product , and putting these together with the cubic Taylor remainder shows that the only obstruction to the Gaussian product is the large-deviation variance measured by . So the Lindeberg condition is the foundational reason the bell curve appears, and Feller's converse shows it is dual to the conclusion itself, being recoverable from a normal limit under negligibility.
This is exactly why the i.i.d. theorem, the Lyapunov criterion, and the Berry–Esseen rate are one phenomenon viewed at three resolutions: identical summands are the immediate verification of Lindeberg, the Lyapunov -moment is a convenient overshoot of it, and the Berry–Esseen bound quantifies the very same truncated-moment functional. The same mechanism generalises upward to the Gnedenko–Kolmogorov classification, where dropping negligibility lets the Lévy measure survive and the limit becomes a general infinitely divisible law, and sideways to the martingale central limit theorem, where conditioning replaces independence but the conditional Lindeberg condition is the bridge that carries the whole argument across.
Full proof set Master
The Lindeberg–Feller theorem in both directions, the implication chain among the conditions, and the recovery of the i.i.d. theorem are proved in the Key theorem and Exercises sections. The remaining Master claims are recorded here.
Proposition (Lyapunov central limit theorem). Let be a row-wise independent triangular array with and . If for some , then .
Proof. By the implication established in the Formal definition section, the Lyapunov condition gives for every fixed , so the Lindeberg condition holds. Apply the sufficiency half of the Lindeberg–Feller theorem.
Proposition (a normal limit need not give Lindeberg without negligibility). There is a row-wise independent array with and for which the Lindeberg condition fails. Hence UAN is indispensable in Feller's converse.
Proof. Let and let be independent of , taking values each with probability and with probability , so . Regard the row as the two variables and (here ), with . Then since and in probability. But , so UAN fails, and the large component keeps order-one variance at every threshold below : for , , so . The normal limit coexists with the failure of Lindeberg precisely because negligibility is absent.
Proposition (Berry–Esseen via smoothing, i.i.d. case). With i.i.d., mean , variance , , and the distribution function of , one has .
Proof sketch (full proof requires the Esseen constant optimisation). Standardise so . The characteristic function of is where is the common characteristic function of . The cubic Taylor bound gives with , so for one obtains by the elementary inequality together with the Gaussian envelope. Feed this into the Esseen smoothing inequality from 37.03.01,
$$
\sup_x |F_n(x) - \Phi(x)| \le \frac{1}{\pi}\int_{-T}^{T}\left|\frac{\varphi(t/\sqrt n)^n - e^{-t^2/2}}{t}\right|dt + \frac{24}{\pi T}\sup_x \Phi'(x),
$$
with . The integral is and the boundary term is by the choice of ; together they give the stated bound with an explicit, though non-optimal, .
Connections Master
Characteristic functions, inversion, and the Lévy continuity theorem 37.03.01 are the engine of this entire unit. The row sum's characteristic function is the product by independence, convergence to is read back as the normal law by Lévy continuity, and the quantitative Berry–Esseen rate is the Esseen smoothing inequality of that unit applied to the cubic Taylor remainder; every step here is a use of that machinery on a product of fingerprints rather than a single one.
The strong law of large numbers 37.02.02 is the companion limit theorem on the same i.i.d. sums: where the strong law identifies the almost-sure first-order behaviour , the central limit theorem here describes the second-order Gaussian fluctuation around that limit, and the truncation arguments that drive Kolmogorov's three-series route to the strong law are the same-spirited truncations that produce the Lindeberg split here.
Sampling distributions and the central limit theorem 26.04.01 is the statistical face of this result: the i.i.d. corollary derived here is exactly the theorem that legitimises the normal approximation of the sample mean and the standard-error formula , and the Lindeberg generalisation is what licenses normal approximations for weighted estimators and regression residuals where the summands are independent but not identically distributed.
The Lévy–Khinchine representation and infinitely divisible laws (a downstream unit in this chapter, 37.03.03) is the generalisation in which the negligibility-plus-Lindeberg pair is relaxed: dropping the truncated-variance condition lets the Lévy measure of the limit be nonzero, and the Lindeberg–Feller theorem becomes the Gaussian fibre of the full Gnedenko–Kolmogorov classification of limits of triangular arrays.
Historical & philosophical context Master
The problem of identifying the conditions under which a sum of independent variables is approximately normal was the central open problem of classical probability after Laplace. Lyapunov supplied the first general sufficient condition in 1901 [Lyapunov 1901], using the -moment hypothesis and an early form of characteristic-function estimation. The decisive advance came from Jarl Waldemar Lindeberg, whose 1922 paper [Lindeberg 1922] introduced both the condition now bearing his name and a self-contained proof by direct comparison (the "Lindeberg swapping" method) that avoids characteristic functions entirely, replacing each summand by a Gaussian one term at a time. William Feller in 1935–37 [Feller 1935] proved the converse: under uniform asymptotic negligibility the Lindeberg condition is necessary as well as sufficient, sharpening Lindeberg's sufficient condition into a characterisation and completing the theorem now jointly named for them. Paul Lévy reached closely related results independently in the same period.
The quantitative theory was founded by Carl-Gustav Esseen, whose 1945 monograph [Esseen 1945] introduced the smoothing inequality and proved the rate under a third moment, the result independently obtained by Andrew Berry; the sharp constant remains an active subject. The full classification of limit laws for triangular arrays under negligibility — the infinitely divisible laws — was completed by Khinchine, Lévy, Gnedenko, and Kolmogorov in the 1930s, placing the Lindeberg–Feller theorem as the Gaussian special case of a single representation theorem. The conceptual content is that normality is not a property of any individual summand but an emergent property of aggregation under negligibility: the Lindeberg condition isolates the exact sense in which the parts must be small for the whole to forget their individual shapes.
Bibliography Master
@article{lindeberg1922,
author = {Lindeberg, Jarl Waldemar},
title = {Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung},
journal = {Mathematische Zeitschrift},
volume = {15},
pages = {211--225},
year = {1922}
}
@article{feller1935,
author = {Feller, Willy},
title = {\"Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung},
journal = {Mathematische Zeitschrift},
volume = {40},
pages = {521--559},
year = {1935}
}
@article{lyapunov1901,
author = {Lyapunov, Aleksandr M.},
title = {Nouvelle forme du th\'eor\`eme sur la limite de probabilit\'e},
journal = {M\'emoires de l'Acad\'emie des Sciences de St.-P\'etersbourg},
volume = {12},
pages = {1--24},
year = {1901}
}
@article{esseen1945,
author = {Esseen, Carl-Gustav},
title = {Fourier analysis of distribution functions: a mathematical study of the Laplace-Gaussian law},
journal = {Acta Mathematica},
volume = {77},
pages = {1--125},
year = {1945}
}
@book{petrov1975,
author = {Petrov, Valentin V.},
title = {Sums of Independent Random Variables},
publisher = {Springer-Verlag, Berlin},
year = {1975}
}
@book{gnedenko1954,
author = {Gnedenko, Boris V. and Kolmogorov, Andrey N.},
title = {Limit Distributions for Sums of Independent Random Variables},
publisher = {Addison-Wesley, Cambridge, MA},
year = {1954}
}
@book{durrett2019cltff,
author = {Durrett, Rick},
title = {Probability: Theory and Examples},
edition = {5th},
publisher = {Cambridge University Press},
year = {2019}
}