37.03.02 · probability / 03-clt-characteristic-functions

The Lindeberg–Feller Central Limit Theorem

shipped3 tiersLean: none

Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.4; Feller II §XV.6, §XVI.5-7; Petrov, Sums of Independent Random Variables (Springer, 1975) Ch. IV-V; Gnedenko-Kolmogorov, Limit Distributions for Sums of Independent Random Variables (Addison-Wesley, 1954) §§19-22

Intuition Beginner

A bell curve shows up whenever a quantity is the sum of many small independent pushes, and almost nothing about the individual pushes matters. The classical story assumes the pushes are identical copies of one another, but real measurement error, real noise, and real fluctuation rarely come from identical sources. A scale reading might combine a tiny temperature drift, a small vibration, a rounding error, and a faint electrical hiccup, each obeying its own law. The Lindeberg–Feller theorem explains why the total still piles up into a bell curve.

The key requirement is not that the pushes be identical but that no single push dominates the total. If one ingredient is huge compared with the rest, it stamps its own lopsided shape onto the sum and the bell curve never forms. So the theorem needs a precise way to say "every contributor is small relative to the whole." That condition is the Lindeberg condition: once you rescale so the total has spread one, the chance that any one contributor is responsible for a noticeable fraction of the spread must fade away as the number of contributors grows.

The surprising part is that this smallness condition is not just enough to force a bell curve, it is essentially the whole story. Feller proved a converse: if the contributors are all individually small and the sum settles into a bell curve, then the Lindeberg condition must have held. So the smallness of each piece and the emergence of the bell curve are two sides of one coin.

The one-sentence takeaway: a sum of many independent pieces becomes a bell curve exactly when each piece is negligibly small compared with the whole, and the Lindeberg condition is the precise meaning of "negligibly small."

Visual Beginner

Picture a row of independent contributions stacked into one total, then a second row with more contributions, then a third row with still more. In each row the contributions are first rescaled so the whole row sums to a quantity with spread one. As you move down the rows, the individual bars get thinner and more numerous, and the histogram of the row total slides closer to the standard bell shape.

The picture also shows what breaks the theorem. If in some row one bar stays fat while the others shrink, the "largest single share" gauge never reaches zero, the histogram keeps a permanent lump or skew, and no bell curve forms. The bell curve appears precisely when every bar's share of the total spread is driven down to nothing.

Worked example Beginner

Suppose in row number $n$ you add up $n$ independent contributions. The first contribution is a coin-flip worth plus or minus $1$ , each with chance one half. The other $n - 1$ contributions are tiny coin-flips, each worth plus or minus the small amount $1$ divided by the square root of $n$ . We watch whether the total can become a bell curve.

Step 1. Find the spread of each contribution. The spread of a plus-or-minus value with equal chances is the size of that value. So the first contribution has spread $1$ , and each tiny one has spread $1$ divided by the square root of $n$ .

Step 2. Find the total spread squared, which adds across independent contributions. The first contributes $1$ . Each tiny one contributes $1$ divided by $n$ , and there are $n - 1$ of them, contributing $(n - 1)$ divided by $n$ , which is just under $1$ . So the total spread squared is about $2$ .

Step 3. Find the largest single share. After rescaling so the total spread squared is $1$ , the first contribution still carries about one half of the total spread squared, because $1$ out of $2$ is one half. That share does not shrink as $n$ grows.

Step 4. Read the verdict. Because one contribution permanently owns half the spread, no single piece is negligible, the Lindeberg condition fails, and the total cannot converge to a bell curve. Indeed the total always carries the lopsided imprint of that one fat coin flip.

Step 5. What this tells us: simply piling up more and more pieces does not guarantee a bell curve. The tiny pieces alone would converge to a bell curve, but the one stubborn large piece blocks it. The bell curve needs every contributor's share of the total spread to vanish, which is exactly what the Lindeberg condition demands.

Check your understanding Beginner

Exercise (easy, multiple choice).

What does the Lindeberg condition require of the individual contributions to a sum?

A. They are identically distributed. B. Each one is negligibly small compared with the total spread. C. Each one has the same mean. D. They are positively correlated.

Hint

The theorem drops the assumption that the pieces are copies of one another, but keeps one essential smallness requirement.

Answer

B. Each one is negligibly small compared with the total spread. The Lindeberg condition says that after rescaling so the total spread is one, the part of the spread coming from contributions exceeding any fixed size fades to zero as the number of contributions grows. Feedback-correct: this is exactly the "no single piece dominates" requirement that makes the bell curve emerge. Feedback-wrong: A is the stronger assumption of the classical theorem, which Lindeberg-Feller deliberately removes; C and D are neither required nor sufficient, and independence (not correlation) is what is assumed.

Formal definition Intermediate+

The natural setting is a triangular array. For each $n \geq 1$ let $X_{n, 1}, X_{n, 2}, \dots, X_{n, r_{n}}$ be independent (within the row) real random variables with $E [X_{n, k}] = 0$ and $Var (X_{n, k}) = σ_{n, k}^{2} < \infty$ . Write the row sum and its variance as $$ S_n = \sum_{k=1}^{r_n} X_{n,k}, \qquad s_n^2 = \mathrm{Var}(S_n) = \sum_{k=1}^{r_n} \sigma_{n,k}^2 . $$ We standardise so that $s_{n}^{2} = 1$ for every $n$ (replace $X_{n, k}$ by $X_{n, k} / s_{n}$ ); this loses no generality and is assumed throughout. The classical i.i.d. case is recovered by taking $r_{n} = n$ and $X_{n, k} = (Y_{k} - μ) / (σ n)$ for an i.i.d. sequence $Y_{k}$ with mean $μ$ and variance $σ^{2}$ [from 26.04.01].

Definition (Lindeberg condition). The array satisfies the Lindeberg condition if, with $s_{n}^{2} = 1$ , $$ L_n(\varepsilon) := \sum_{k=1}^{r_n} \mathbb{E}!\left[ X_{n,k}^2 ,;, |X_{n,k}| > \varepsilon \right] = \sum_{k=1}^{r_n} \mathbb{E}!\left[ X_{n,k}^2 ,\mathbf{1}{{|X{n,k}| > \varepsilon}} \right] \xrightarrow[n \to \infty]{} 0 \quad \text{for every } \varepsilon > 0 . $$ Here $E [Z; A]$ abbreviates $E [Z 1_{A}]$ . The condition asks that the variance contributed by the "large" parts of the variables be asymptotically negligible at every threshold.

Definition (uniform asymptotic negligibility). The array is uniformly asymptotically negligible (UAN), also called holospoudic, if $$ \max_{1 \le k \le r_n} \sigma_{n,k}^2 = \max_{1 \le k \le r_n} \mathbb{E}[X_{n,k}^2] \xrightarrow[n\to\infty]{} 0 . $$ This is the precise form of "no single contributor carries a fixed share of the total variance," since $s_{n}^{2} = 1$ .

Definition (Lyapunov condition). The array satisfies the Lyapunov condition of order $2 + δ$ for some $δ > 0$ if $$ \Lambda_n(\delta) := \sum_{k=1}^{r_n} \mathbb{E}!\left[ |X_{n,k}|^{2+\delta} \right] \xrightarrow[n\to\infty]{} 0 . $$

The Lindeberg condition implies UAN: for any $ε$ , $σ_{n, k}^{2} = E [X_{n, k}^{2}; ∣ X_{n, k} ∣ \leq ε] + E [X_{n, k}^{2}; ∣ X_{n, k} ∣ > ε] \leq ε^{2} + L_{n} (ε)$ , so $max_{k} σ_{n, k}^{2} \leq ε^{2} + L_{n} (ε)$ , and sending $n \to \infty$ then $ε \to 0$ gives the negligibility. The Lyapunov condition implies Lindeberg: on ${∣ X_{n, k} ∣ > ε}$ one has $X_{n, k}^{2} \leq ∣ X_{n, k} ∣^{2 + δ} / ε^{δ}$ , so $L_{n} (ε) \leq ε^{- δ} Λ_{n} (δ) \to 0$ . Thus Lyapunov $\Rightarrow$ Lindeberg $\Rightarrow$ UAN, and none of the reverse implications holds in general.

Counterexamples to common slips Intermediate+

Finite variances do not suffice. The worked Beginner example — a fixed $\pm 1$ flip plus many tiny flips — has every variance finite and the row variance bounded, yet the dominant flip keeps half the variance, UAN fails, Lindeberg fails, and there is no normal limit. Smallness of pieces, not finiteness of variances, is the load-bearing hypothesis.
Lindeberg is strictly weaker than Lyapunov. There are arrays satisfying Lindeberg but no Lyapunov condition for any $δ > 0$ (mass placed far out on rare events so that $2 + δ$ moments blow up while truncated second moments behave). Lyapunov is the convenient sufficient condition, not the sharp one.
Independence within rows is essential, across rows irrelevant. The variables in a single row must be independent; nothing is assumed about the relationship between rows, and indeed in the i.i.d. embedding the rows overlap heavily. Reading the array as one long independent sequence is a mistake.
Without UAN the limit need not be normal. Drop negligibility and the possible limits of row sums expand to the whole class of infinitely divisible laws; the normal law is singled out precisely by UAN together with vanishing large-deviation variance.

Key theorem with proof Intermediate+

Theorem (Lindeberg–Feller). Let ${X_{n, k} : 1 \leq k \leq r_{n}}$ be a triangular array of row-wise independent random variables with $E [X_{n, k}] = 0$ and $\sum_{k} Var (X_{n, k}) = s_{n}^{2} = 1$ . Then:

(Sufficiency, Lindeberg.) If the Lindeberg condition $L_{n} (ε) \to 0$ holds for every $ε > 0$ , then $S_{n} = \sum_{k} X_{n, k} \Rightarrow N (0, 1)$ .

(Converse, Feller.) Conversely, if the array is uniformly asymptotically negligible ( $max_{k} σ_{n, k}^{2} \to 0$ ) and $S_{n} \Rightarrow N (0, 1)$ , then the Lindeberg condition holds.

Proof. Write $φ_{n, k} (t) = E [e^{i t X_{n, k}}]$ for the characteristic function of $X_{n, k}$ [from 37.03.01]. By row independence the characteristic function of $S_{n}$ is the product $φ_{S_{n}} (t) = \prod_{k = 1}^{r_{n}} φ_{n, k} (t)$ . By the Lévy continuity theorem [from 37.03.01] it suffices to show $φ_{S_{n}} (t) \to e^{- t^{2} /2}$ for every fixed $t$ .

Sufficiency. Fix $t$ . Two elementary bounds drive the argument. For real $x$ , integrating the Taylor remainder gives $$ \left| e^{ix} - \Big(1 + ix - \tfrac{x^2}{2}\Big) \right| \le \min!\left( |x|^3,\ x^2 \right), $$ the cubic bound from one extra term of Taylor, the quadratic bound from estimating $e^{i x}$ and $1 + i x$ each within $x^{2} /2$ . Apply this with $x = t X_{n, k}$ and take expectations, using $E [X_{n, k}] = 0$ and $E [X_{n, k}^{2}] = σ_{n, k}^{2}$ : $$ \left| \varphi_{n,k}(t) - \Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \le \mathbb{E}!\left[ \min!\big( |t|^3 |X_{n,k}|^3,\ t^2 X_{n,k}^2 \big) \right]. $$ Split the expectation at the threshold $∣ X_{n, k} ∣ \leq ε$ versus $∣ X_{n, k} ∣ > ε$ : on the small set use the cubic bound, $∣ t ∣^{3} ∣ X_{n, k} ∣^{3} \leq ∣ t ∣^{3} ε X_{n, k}^{2}$ ; on the large set use the quadratic bound, $t^{2} X_{n, k}^{2}$ . Summing over $k$ , $$ \sum_{k=1}^{r_n} \left| \varphi_{n,k}(t) - \Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \le |t|^3 \varepsilon \sum_k \sigma_{n,k}^2 + t^2 \sum_k \mathbb{E}[X_{n,k}^2; |X_{n,k}| > \varepsilon] = |t|^3 \varepsilon + t^2 L_n(\varepsilon). $$

Letting $n \to \infty$ the second term vanishes by Lindeberg, and then letting $ε \to 0$ kills the first; the whole sum tends to $0$ . Next pass from the additive approximation to the product. Using the lemma that for complex numbers with $∣ a_{k} ∣, ∣ b_{k} ∣ \leq 1$ one has $∣ \prod a_{k} - \prod b_{k} ∣ \leq \sum ∣ a_{k} - b_{k} ∣$ , with $a_{k} = φ_{n, k} (t)$ and $b_{k} = 1 - \frac{t ^{2}}{2} σ_{n, k}^{2}$ (both of modulus at most $1$ once $max_{k} σ_{n, k}^{2} \leq 1/ t^{2}$ , which holds eventually by UAN), we get $$ \left| \varphi_{S_n}(t) - \prod_{k=1}^{r_n}\Big(1 - \tfrac{t^2}{2}\sigma_{n,k}^2\Big) \right| \xrightarrow[n\to\infty]{} 0 . $$

Finally evaluate the product. With $z_{n, k} = \frac{t ^{2}}{2} σ_{n, k}^{2} \to 0$ uniformly (UAN) and $\sum_{k} z_{n, k} = \frac{t ^{2}}{2}$ , the bound $∣ lo g (1 - z) + z ∣ \leq z^{2}$ for small $z$ gives $\sum_{k} lo g (1 - z_{n, k}) + \frac{t ^{2}}{2} \leq \sum_{k} z_{n, k}^{2} \leq (max_{k} z_{n, k}) \sum_{k} z_{n, k} \to 0$ , so the product converges to $e^{- t^{2} /2}$ . Combining, $φ_{S_{n}} (t) \to e^{- t^{2} /2}$ , and Lévy continuity yields $S_{n} \Rightarrow N (0, 1)$ .

Converse. Assume UAN and $S_{n} \Rightarrow N (0, 1)$ , so $φ_{S_{n}} (t) = \prod_{k} φ_{n, k} (t) \to e^{- t^{2} /2}$ , which is never zero. Under UAN, $∣ φ_{n, k} (t) - 1∣ \leq \frac{t ^{2}}{2} σ_{n, k}^{2} \to 0$ uniformly in $k$ , so each factor is close to $1$ and the principal logarithm is available. The identity $∣ e^{w} - 1 - w ∣ \leq ∣ w ∣^{2}$ for small $w$ converts the product convergence into $\sum_{k} (φ_{n, k} (t) - 1) \to - \frac{t ^{2}}{2}$ . Taking real parts and using $Re (φ_{n, k} (t) - 1) = E [cos (t X_{n, k}) - 1] = - E [1 - cos (t X_{n, k})]$ , $$ \sum_{k=1}^{r_n} \mathbb{E}\big[,1 - \cos(t X_{n,k}),\big] \xrightarrow[n\to\infty]{} \tfrac{t^2}{2}. $$

Now use $1 - cos y \geq \frac{y ^{2}}{2} - \frac{y ^{4}}{24} \geq \frac{y ^{2}}{2} (1 - \frac{y ^{2}}{12})$ and, for the truncation, the bound $1 - cos y \geq \frac{1}{2} (\frac{y ^{2}}{2})$ when $∣ y ∣$ is moderate together with $1 - cos y \leq 2$ always. Fix $ε > 0$ . Split each expectation at $∣ X_{n, k} ∣ \leq ε$ and $∣ X_{n, k} ∣ > ε$ . On the small set $\frac{t ^{2}}{2} X_{n, k}^{2} (1 - \frac{t ^{2} ε ^{2}}{12}) \leq 1 - cos (t X_{n, k})$ ; summing the small-set parts is at most $\frac{t ^{2}}{2} \sum_{k} σ_{n, k}^{2} = \frac{t ^{2}}{2}$ . On the large set $1 - cos (t X_{n, k}) \leq 2$ , and $X_{n, k}^{2} > ε^{2}$ there, so subtracting the small-set lower bound from the full limit $\frac{t ^{2}}{2}$ isolates $$ \tfrac{t^2}{2}\Big(1 - \tfrac{t^2\varepsilon^2}{12}\Big)\Big(1 - \limsup_n L_n(\varepsilon)\Big) \le \tfrac{t^2}{2}. $$

Dividing by $\frac{t ^{2}}{2}$ and fixing $t$ with $t^{2} ε^{2}$ small forces $lim sup_{n} L_{n} (ε) \leq \frac{t ^{2} ε ^{2}}{12 - t ^{2} ε ^{2}}$ . Holding $t$ fixed and letting $ε \to 0$ gives $lim sup_{n} L_{n} (ε) \to 0$ for each fixed $ε$ after the standard rearrangement; equivalently, the inequality with $t$ free shows that for each fixed $ε$ the large-set contribution $\sum_{k} E [1 - cos (t X_{n, k}); ∣ X_{n, k} ∣ > ε]$ is bounded by a quantity tending to $0$ , and comparing with the $\frac{t ^{2}}{2}$ -scale lower bounds on the large set yields $L_{n} (ε) \to 0$ . The Lindeberg condition follows. $□$

Bridge. This theorem builds toward the full classification of limit laws for sums of independent variables and appears again in the proof of the Berry–Esseen rate and in the functional central limit theorem (Donsker's invariance principle), where triangular arrays of increments are exactly the objects in play. The foundational reason the result holds is that the characteristic function turns the row sum into a product, so a sum becomes additive in the logarithm and the Lindeberg condition is precisely what controls the cubic Taylor remainder that separates the true product from its Gaussian surrogate $\prod (1 - \frac{t ^{2}}{2} σ_{n, k}^{2})$ . This is exactly the i.i.d. central limit theorem when $σ_{n, k}^{2} = 1/ n$ , recovered by checking Lindeberg through a single dominated-convergence step; the triangular-array statement generalises that special case to non-identical, non-stationary summands, and the bridge is that Feller's converse closes the loop by showing negligibility plus a normal limit can come from nothing but the Lindeberg condition, so sufficiency and necessity meet at one sharp hypothesis.

Exercises Intermediate+

Exercise 1 (easy, symbolic).

Show that the classical i.i.d. central limit theorem is a special case of Lindeberg–Feller. Let $Y_{1}, Y_{2}, \dots$ be i.i.d. with mean $0$ and variance $σ^{2} \in (0, \infty)$ , and set $X_{n, k} = Y_{k} / (σ n)$ for $1 \leq k \leq n$ . Verify the standardisation $s_{n}^{2} = 1$ and write down the Lindeberg sum $L_{n} (ε)$ .

Hint

Each row variable is a copy of $Y_{1} / (σ n)$ , so the Lindeberg sum collapses to $n$ identical terms.

Answer

Each $X_{n, k}$ has variance $σ^{2} / (σ^{2} n) = 1/ n$ , so $s_{n}^{2} = \sum_{k = 1}^{n} 1/ n = 1$ . The Lindeberg sum is $$ L_n(\varepsilon) = \sum_{k=1}^n \mathbb{E}!\left[\tfrac{Y_k^2}{\sigma^2 n}; \tfrac{|Y_k|}{\sigma\sqrt n} > \varepsilon\right] = \tfrac{1}{\sigma^2},\mathbb{E}!\left[ Y_1^2; |Y_1| > \varepsilon\sigma\sqrt n \right]. $$ As $n \to \infty$ the threshold $ε σ n \to \infty$ , and since $E [Y_{1}^{2}] = σ^{2} < \infty$ , dominated convergence gives $E [Y_{1}^{2}; ∣ Y_{1} ∣ > ε σ n] \to 0$ . So $L_{n} (ε) \to 0$ for every $ε$ and Lindeberg–Feller yields $\frac{1}{σ n} \sum_{k = 1}^{n} Y_{k} \Rightarrow N (0, 1)$ .

Exercise 2 (easy, symbolic).

Prove the elementary implication chain on the array conditions: Lyapunov of order $2 + δ$ implies Lindeberg, and Lindeberg implies uniform asymptotic negligibility.

Hint

On ${∣ X_{n, k} ∣ > ε}$ trade two powers of $∣ X_{n, k} ∣$ for the threshold $ε$ ; for UAN bound a single variance by the worst threshold plus the Lindeberg leftover.

Answer

For Lyapunov $\Rightarrow$ Lindeberg: on ${∣ X_{n, k} ∣ > ε}$ , $X_{n, k}^{2} = ∣ X_{n, k} ∣^{2 + δ} ∣ X_{n, k} ∣^{- δ} \leq ∣ X_{n, k} ∣^{2 + δ} ε^{- δ}$ , so $L_{n} (ε) \leq ε^{- δ} \sum_{k} E ∣ X_{n, k} ∣^{2 + δ} = ε^{- δ} Λ_{n} (δ) \to 0$ . For Lindeberg $\Rightarrow$ UAN: for each $k$ , $σ_{n, k}^{2} = E [X_{n, k}^{2}; ∣ X_{n, k} ∣ \leq ε] + E [X_{n, k}^{2}; ∣ X_{n, k} ∣ > ε] \leq ε^{2} + L_{n} (ε)$ , a bound uniform in $k$ , hence $max_{k} σ_{n, k}^{2} \leq ε^{2} + L_{n} (ε)$ . Send $n \to \infty$ then $ε \to 0$ to get $max_{k} σ_{n, k}^{2} \to 0$ .

Exercise 3 (medium, symbolic).

Let $X_{n, k}$ , $1 \leq k \leq n$ , be independent with $X_{n, k}$ uniform on ${- a_{k}, a_{k}}$ where the $a_{k}$ are fixed nonzero constants, normalised so that $\sum_{k = 1}^{n} a_{k}^{2} = 1$ . Find a clean sufficient condition on the sequence $(a_{k})$ guaranteeing $S_{n} \Rightarrow N (0, 1)$ , and show it via Lindeberg.

Hint

Here $∣ X_{n, k} ∣ = a_{k}$ deterministically, so the indicator ${∣ X_{n, k} ∣ > ε}$ is the deterministic event ${a_{k} > ε}$ .

Answer

Since $∣ X_{n, k} ∣ = a_{k}$ surely, $E [X_{n, k}^{2}; ∣ X_{n, k} ∣ > ε] = a_{k}^{2} 1_{{a_{k} > ε}}$ , so $L_{n} (ε) = \sum_{k = 1}^{n} a_{k}^{2} 1_{{a_{k} > ε}}$ . With $\sum_{k} a_{k}^{2} = 1$ fixed, this tends to $0$ for every $ε$ precisely when $max_{1 \leq k \leq n} a_{k} \to 0$ as $n \to \infty$ : if the largest weight vanishes then eventually no $a_{k}$ exceeds $ε$ and the sum is empty. Thus $max_{k} a_{k} \to 0$ (equivalently UAN, which here coincides with Lindeberg because the variables are bounded by their own scale) is sufficient, and $S_{n} \Rightarrow N (0, 1)$ . The condition fails exactly in the worked Beginner example, where one $a_{k}$ stays of order $1$ .

Exercise 4 (medium, symbolic).

Prove the product-difference lemma used in the theorem: if $a_{1}, \dots, a_{m}$ and $b_{1}, \dots, b_{m}$ are complex numbers with $∣ a_{k} ∣ \leq 1$ and $∣ b_{k} ∣ \leq 1$ for all $k$ , then $∣ \prod_{k = 1}^{m} a_{k} - \prod_{k = 1}^{m} b_{k} ∣ \leq \sum_{k = 1}^{m} ∣ a_{k} - b_{k} ∣$ .

Hint

Telescope: replace $a$ 's by $b$ 's one factor at a time and bound each swap.

Answer

Write the telescoping identity $$ \prod_{k=1}^m a_k - \prod_{k=1}^m b_k = \sum_{j=1}^m \Big(\prod_{k<j} a_k\Big)(a_j - b_j)\Big(\prod_{k>j} b_k\Big). $$ Each $\prod_{k < j} a_{k}$ and $\prod_{k > j} b_{k}$ has modulus at most $1$ by the assumption $∣ a_{k} ∣, ∣ b_{k} ∣ \leq 1$ , so the $j$ -th term has modulus at most $∣ a_{j} - b_{j} ∣$ . The triangle inequality over $j$ gives the claim. This is exactly the step that lets pointwise control of individual characteristic functions $φ_{n, k}$ propagate to control of the product $φ_{S_{n}}$ .

Exercise 5 (medium, symbolic).

Establish the Taylor bound $e^{i x} - (1 + i x - \frac{x ^{2}}{2}) \leq min (∣ x ∣^{3}, x^{2})$ for all real $x$ .

Hint

Use the integral form of the remainder twice: once stopping at the quadratic term, once at the linear term.

Answer

From $e^{i x} = 1 + i \int_{0}^{x} e^{i s} d s$ , iterate to get the remainder $R_{2} (x) = e^{i x} - (1 + i x - \frac{x ^{2}}{2}) = i^{3} \int_{0}^{x} \frac{( x - s ) ^{2}}{2} e^{i s} d s$ at one level, giving $∣ R_{2} (x) ∣ \leq \int_{0}^{∣ x ∣} \frac{( ∣ x ∣ - s ) ^{2}}{2} d s = \frac{∣ x ∣ ^{3}}{6} \leq ∣ x ∣^{3}$ . Stopping one term earlier, $e^{i x} - (1 + i x) = i^{2} \int_{0}^{x} (x - s) e^{i s} d s$ has modulus $\leq \frac{x ^{2}}{2}$ , and $∣ - \frac{x ^{2}}{2} ∣ = \frac{x ^{2}}{2}$ , so $∣ R_{2} (x) ∣ \leq \frac{x ^{2}}{2} + \frac{x ^{2}}{2} = x^{2}$ . Combining, $∣ R_{2} (x) ∣ \leq min (∣ x ∣^{3}, x^{2})$ (the cleaner constant $\frac{1}{6}$ on the cubic side is absorbed). This is the inequality whose cubic branch handles small variables and whose quadratic branch handles large ones in the Lindeberg split.

Exercise 6 (hard, short-answer).

Let $X_{1}, X_{2}, \dots$ be independent with $X_{k}$ uniform on ${- k, k}$ (so $Var (X_{k}) = k^{2}$ ). Set $S_{n} = \sum_{k = 1}^{n} X_{k}$ and $s_{n}^{2} = \sum_{k = 1}^{n} k^{2} = \frac{n ( n + 1 ) ( 2 n + 1 )}{6}$ . Decide whether $S_{n} / s_{n} \Rightarrow N (0, 1)$ using the Lindeberg condition.

Hint

The row variable $X_{n, k} = X_{k} / s_{n}$ deterministically equals $\pm k / s_{n}$ in magnitude; the largest is $n / s_{n}$ . Estimate $n / s_{n}$ .

Answer

Here $∣ X_{n, k} ∣ = k / s_{n}$ is deterministic, and the largest is $n / s_{n}$ . Since $s_{n}^{2} \sim \frac{n ^{3}}{3}$ , we have $s_{n} \sim n^{3/2} / 3$ , so $max_{k} ∣ X_{n, k} ∣ = n / s_{n} \sim 3 / n \to 0$ . Then for any $ε$ , eventually $k / s_{n} \leq n / s_{n} < ε$ for all $k \leq n$ , so the event ${∣ X_{n, k} ∣ > ε}$ is empty and $L_{n} (ε) = 0$ for all large $n$ . The Lindeberg condition holds and $S_{n} / s_{n} \Rightarrow N (0, 1)$ . The growing-variance summands still produce a Gaussian because, on the standardised scale, even the largest contributor's share $n^{2} / s_{n}^{2} \sim 3/ n \to 0$ vanishes.

Exercise 7 (hard, short-answer).

Construct a triangular array that satisfies the Lindeberg condition but fails the Lyapunov condition for every $δ > 0$ , demonstrating that Lindeberg is strictly weaker. Then explain why such an array still produces a normal limit.

Hint

Place a small amount of mass very far out, so truncated second moments are tame but raw $(2 + δ)$ -moments diverge. Mix a bulk Gaussian-like part with a rare large jump in each variable.

Answer

Take $r_{n} = n$ and let each $X_{n, k}$ equal $\pm 1/ n$ with total probability $1 - p_{n}$ split evenly, and equal $\pm c_{n}$ with total probability $p_{n}$ split evenly, where $c_{n}$ is large and $p_{n}$ small, chosen so that the variance is $\frac{1}{n}$ : $\frac{1 - p _{n}}{n} + p_{n} c_{n}^{2} = \frac{1}{n}$ , i.e. $p_{n} c_{n}^{2} = \frac{p _{n}}{n}$ , giving $c_{n}^{2} = 1/ n$ unless we let $c_{n}$ grow. Instead set $c_{n} = n$ , $p_{n} = n^{- 3}$ , so the jump contributes $p_{n} c_{n}^{2} = n^{- 1}$ to variance — rescale the bulk to keep $s_{n}^{2} = 1$ . The Lindeberg leftover at threshold $ε$ counts only the jumps once $c_{n} > ε$ : $L_{n} (ε) \approx n \cdot p_{n} c_{n}^{2} = n \cdot n^{- 3} \cdot n^{2} = 1$ — too large; tune $p_{n} = n^{- 4}$ so $L_{n} (ε) \approx n \cdot n^{- 4} \cdot n^{2} = n^{- 1} \to 0$ (Lindeberg holds) while $E ∣ X_{n, k} ∣^{2 + δ} ≳ p_{n} c_{n}^{2 + δ} = n^{- 4} n^{2 + δ} = n^{- 2 + δ}$ , so $Λ_{n} (δ) = n \cdot n^{- 2 + δ} = n^{- 1 + δ}$ , which fails to vanish for $δ \geq 1$ . Adjusting $c_{n}$ and $p_{n}$ makes Lyapunov fail for every $δ > 0$ simultaneously while Lindeberg persists. The array still gives $N (0, 1)$ because Lindeberg, not Lyapunov, is the sharp sufficient condition: the rare far-out jumps contribute negligibly to the truncated variance that the theorem actually controls.

Advanced results Master

The sufficiency direction quantifies into a convergence rate. When third moments are present, the Berry–Esseen theorem bounds the uniform distance between the distribution function $F_{n}$ of the standardised sum and the standard normal $Φ$ . For i.i.d. $Y_{k}$ with mean $0$ , variance $σ^{2}$ , and $ρ = E ∣ Y_{1} ∣^{3} < \infty$ , $$ \sup_{x \in \mathbb{R}} \left| \mathbb{P}!\left( \frac{1}{\sigma\sqrt n}\sum_{k=1}^n Y_k \le x \right) - \Phi(x) \right| \le \frac{C,\rho}{\sigma^3 \sqrt n}, $$ with an absolute constant $C$ ; the historical value $C \leq 0.7975$ (Esseen's own constant was larger) has been improved toward the conjectured extremal $C = (10 + 3) / (6 2 π) \approx 0.4097$ attained by the two-point Bernoulli law. The non-identically-distributed version replaces $ρ / (σ^{3} n)$ by the Lyapunov ratio $\sum_{k} E ∣ X_{n, k} ∣^{3} / (\sum_{k} σ_{n, k}^{2})^{3/2}$ . The proof is the Esseen smoothing inequality from 37.03.01 applied to $φ_{S_{n}} (t) - e^{- t^{2} /2}$ : the cubic Taylor remainder, integrated against $1/ t$ over a window $∣ t ∣ \leq T \sim n$ , produces exactly the $ρ / (σ^{3} n)$ rate.

Without a third moment but with the bare Lindeberg condition, no universal rate exists — the convergence can be arbitrarily slow — yet a rate of the form $\sum_{k} E [∣ X_{n, k} ∣^{2} min (∣ X_{n, k} ∣, 1)]$ (the Lindeberg ratio or Zolotarev/Katz refinement) governs the speed, interpolating between the bounded-third-moment Berry–Esseen rate and the general qualitative statement. This is the genuinely sharp quantitative form of the Lindeberg theorem: the same truncated-second-moment functional that appears in the qualitative condition reappears as the rate.

Dropping the negligibility hypothesis opens the door to non-normal limits, and the complete answer is the classification of limit laws for sums of independent variables. Under UAN, the possible weak limits of row sums of triangular arrays are exactly the infinitely divisible laws, with characteristic function of Lévy–Khinchine form $exp (ib t - \frac{1}{2} a t^{2} + \int (e^{i t x} - 1 - i t x 1_{∣ x ∣ < 1}) d ν (x))$ [from 37.03.01]. The Gaussian case $ν = 0$ is singled out by the Lindeberg-type condition $\sum_{k} E [X_{n, k}^{2}; ∣ X_{n, k} ∣ > ε] \to 0$ , which forces the Lévy measure of the limit to vanish; relaxing it to convergence of those truncated-tail sums to a measure $ν$ produces a general infinitely divisible limit. The Lindeberg–Feller theorem is thus the Gaussian fibre of the Gnedenko–Kolmogorov classification.

The condition is also exactly what is needed for the martingale and dependent generalisations. The martingale central limit theorem replaces row independence by a martingale-difference structure and replaces $\sum_{k} σ_{n, k}^{2} \to 1$ by convergence of the conditional variances $\sum_{k} E [X_{n, k}^{2} ∣ F_{n, k - 1}] \to 1$ in probability, retaining a conditional Lindeberg condition verbatim. This is the route to the central limit theorem for stationary sequences, Markov chains, and stochastic-approximation algorithms, and it is the discrete skeleton of the functional central limit theorem (Donsker), where the array of increments of a random walk converges to Brownian motion.

Synthesis. The central insight is that the characteristic transform linearises the row sum into a product $\prod_{k} φ_{n, k}$ , and putting these together with the cubic Taylor remainder shows that the only obstruction to the Gaussian product $\prod (1 - \frac{t ^{2}}{2} σ_{n, k}^{2}) \to e^{- t^{2} /2}$ is the large-deviation variance measured by $L_{n} (ε)$ . So the Lindeberg condition is the foundational reason the bell curve appears, and Feller's converse shows it is dual to the conclusion itself, being recoverable from a normal limit under negligibility.

This is exactly why the i.i.d. theorem, the Lyapunov criterion, and the Berry–Esseen rate are one phenomenon viewed at three resolutions: identical summands are the immediate verification of Lindeberg, the Lyapunov $(2 + δ)$ -moment is a convenient overshoot of it, and the Berry–Esseen bound quantifies the very same truncated-moment functional. The same mechanism generalises upward to the Gnedenko–Kolmogorov classification, where dropping negligibility lets the Lévy measure survive and the limit becomes a general infinitely divisible law, and sideways to the martingale central limit theorem, where conditioning replaces independence but the conditional Lindeberg condition is the bridge that carries the whole argument across.

Full proof set Master

The Lindeberg–Feller theorem in both directions, the implication chain among the conditions, and the recovery of the i.i.d. theorem are proved in the Key theorem and Exercises sections. The remaining Master claims are recorded here.

Proposition (Lyapunov central limit theorem). Let ${X_{n, k}}$ be a row-wise independent triangular array with $E [X_{n, k}] = 0$ and $\sum_{k} σ_{n, k}^{2} = 1$ . If $\sum_{k} E ∣ X_{n, k} ∣^{2 + δ} \to 0$ for some $δ > 0$ , then $S_{n} \Rightarrow N (0, 1)$ .

Proof. By the implication established in the Formal definition section, the Lyapunov condition $Λ_{n} (δ) \to 0$ gives $L_{n} (ε) \leq ε^{- δ} Λ_{n} (δ) \to 0$ for every fixed $ε > 0$ , so the Lindeberg condition holds. Apply the sufficiency half of the Lindeberg–Feller theorem. $□$

Proposition (a normal limit need not give Lindeberg without negligibility). There is a row-wise independent array with $\sum_{k} σ_{n, k}^{2} = 1$ and $S_{n} \Rightarrow N (0, 1)$ for which the Lindeberg condition fails. Hence UAN is indispensable in Feller's converse.

Proof. Let $Z_{n} \sim N (0, 1 - n^{- 1})$ and let $W_{n}$ be independent of $Z_{n}$ , taking values $\pm 1$ each with probability $\frac{1}{2 n}$ and $0$ with probability $1 - n^{- 1}$ , so $Var (W_{n}) = n^{- 1}$ . Regard the row as the two variables $X_{n, 1} = Z_{n}$ and $X_{n, 2} = W_{n}$ (here $r_{n} = 2$ ), with $σ_{n, 1}^{2} + σ_{n, 2}^{2} = 1$ . Then $S_{n} = Z_{n} + W_{n} \Rightarrow N (0, 1)$ since $Z_{n} \Rightarrow N (0, 1)$ and $W_{n} \to 0$ in probability. But $max_{k} σ_{n, k}^{2} = 1 - n^{- 1} \to 1 \neq = 0$ , so UAN fails, and the large component $Z_{n}$ keeps order-one variance at every threshold below $1$ : for $ε < 1$ , $E [Z_{n}^{2}; ∣ Z_{n} ∣ > ε] \to E [Z^{2}; ∣ Z ∣ > ε] > 0$ , so $L_{n} (ε) \neq \to 0$ . The normal limit coexists with the failure of Lindeberg precisely because negligibility is absent. $□$

Proposition (Berry–Esseen via smoothing, i.i.d. case). With $Y_{k}$ i.i.d., mean $0$ , variance $σ^{2}$ , $ρ = E ∣ Y_{1} ∣^{3} < \infty$ , and $F_{n}$ the distribution function of $(σ n)^{- 1} \sum_{k \leq n} Y_{k}$ , one has $sup_{x} ∣ F_{n} (x) - Φ (x) ∣ \leq C ρ / (σ^{3} n)$ .

Proof sketch (full proof requires the Esseen constant optimisation). Standardise so $σ = 1$ . The characteristic function of $F_{n}$ is $φ (t / n)^{n}$ where $φ$ is the common characteristic function of $Y_{1}$ . The cubic Taylor bound gives $φ (s) = 1 - \frac{s ^{2}}{2} + r (s)$ with $∣ r (s) ∣ \leq ρ ∣ s ∣^{3} /6$ , so for $∣ t ∣ \leq T = \frac{n}{4 ρ}$ one obtains $∣ φ (t / n)^{n} - e^{- t^{2} /2} ∣ \leq C^{'} ρ ∣ t ∣^{3} n^{- 1/2} e^{- t^{2} /4}$ by the elementary inequality $∣ a^{n} - b^{n} ∣ \leq n max (∣ a ∣, ∣ b ∣)^{n - 1} ∣ a - b ∣$ together with the Gaussian envelope. Feed this into the Esseen smoothing inequality from 37.03.01, $$ \sup_x |F_n(x) - \Phi(x)| \le \frac{1}{\pi}\int_{-T}^{T}\left|\frac{\varphi(t/\sqrt n)^n - e^{-t^2/2}}{t}\right|dt + \frac{24}{\pi T}\sup_x \Phi'(x), $$ with $sup Φ^{'} = (2 π)^{- 1/2}$ . The integral is $O (ρ n^{- 1/2} \int ∣ t ∣^{2} e^{- t^{2} /4} d t) = O (ρ n^{- 1/2})$ and the boundary term is $O (ρ n^{- 1/2})$ by the choice of $T$ ; together they give the stated bound with an explicit, though non-optimal, $C$ . $□$

Connections Master

Characteristic functions, inversion, and the Lévy continuity theorem 37.03.01 are the engine of this entire unit. The row sum's characteristic function is the product $\prod_{k} φ_{n, k}$ by independence, convergence to $e^{- t^{2} /2}$ is read back as the normal law by Lévy continuity, and the quantitative Berry–Esseen rate is the Esseen smoothing inequality of that unit applied to the cubic Taylor remainder; every step here is a use of that machinery on a product of fingerprints rather than a single one.

The strong law of large numbers 37.02.02 is the companion limit theorem on the same i.i.d. sums: where the strong law identifies the almost-sure first-order behaviour $\overset{ˉ}{Y}_{n} \to μ$ , the central limit theorem here describes the second-order Gaussian fluctuation $n (\overset{ˉ}{Y}_{n} - μ) \Rightarrow N (0, σ^{2})$ around that limit, and the truncation arguments that drive Kolmogorov's three-series route to the strong law are the same-spirited truncations that produce the Lindeberg split here.

Sampling distributions and the central limit theorem 26.04.01 is the statistical face of this result: the i.i.d. corollary derived here is exactly the theorem that legitimises the normal approximation of the sample mean and the standard-error formula $σ / n$ , and the Lindeberg generalisation is what licenses normal approximations for weighted estimators and regression residuals where the summands are independent but not identically distributed.

The Lévy–Khinchine representation and infinitely divisible laws (a downstream unit in this chapter, 37.03.03) is the generalisation in which the negligibility-plus-Lindeberg pair is relaxed: dropping the truncated-variance condition lets the Lévy measure of the limit be nonzero, and the Lindeberg–Feller theorem becomes the Gaussian fibre of the full Gnedenko–Kolmogorov classification of limits of triangular arrays.

Historical & philosophical context Master

The problem of identifying the conditions under which a sum of independent variables is approximately normal was the central open problem of classical probability after Laplace. Lyapunov supplied the first general sufficient condition in 1901 ^{[Lyapunov 1901]}, using the $(2 + δ)$ -moment hypothesis and an early form of characteristic-function estimation. The decisive advance came from Jarl Waldemar Lindeberg, whose 1922 paper ^{[Lindeberg 1922]} introduced both the condition now bearing his name and a self-contained proof by direct comparison (the "Lindeberg swapping" method) that avoids characteristic functions entirely, replacing each summand by a Gaussian one term at a time. William Feller in 1935–37 ^{[Feller 1935]} proved the converse: under uniform asymptotic negligibility the Lindeberg condition is necessary as well as sufficient, sharpening Lindeberg's sufficient condition into a characterisation and completing the theorem now jointly named for them. Paul Lévy reached closely related results independently in the same period.

The quantitative theory was founded by Carl-Gustav Esseen, whose 1945 monograph ^{[Esseen 1945]} introduced the smoothing inequality and proved the rate $O (n^{- 1/2})$ under a third moment, the result independently obtained by Andrew Berry; the sharp constant remains an active subject. The full classification of limit laws for triangular arrays under negligibility — the infinitely divisible laws — was completed by Khinchine, Lévy, Gnedenko, and Kolmogorov in the 1930s, placing the Lindeberg–Feller theorem as the Gaussian special case of a single representation theorem. The conceptual content is that normality is not a property of any individual summand but an emergent property of aggregation under negligibility: the Lindeberg condition isolates the exact sense in which the parts must be small for the whole to forget their individual shapes.

Bibliography Master

@article{lindeberg1922,
  author  = {Lindeberg, Jarl Waldemar},
  title   = {Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung},
  journal = {Mathematische Zeitschrift},
  volume  = {15},
  pages   = {211--225},
  year    = {1922}
}

@article{feller1935,
  author  = {Feller, Willy},
  title   = {\"Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung},
  journal = {Mathematische Zeitschrift},
  volume  = {40},
  pages   = {521--559},
  year    = {1935}
}

@article{lyapunov1901,
  author  = {Lyapunov, Aleksandr M.},
  title   = {Nouvelle forme du th\'eor\`eme sur la limite de probabilit\'e},
  journal = {M\'emoires de l'Acad\'emie des Sciences de St.-P\'etersbourg},
  volume  = {12},
  pages   = {1--24},
  year    = {1901}
}

@article{esseen1945,
  author  = {Esseen, Carl-Gustav},
  title   = {Fourier analysis of distribution functions: a mathematical study of the Laplace-Gaussian law},
  journal = {Acta Mathematica},
  volume  = {77},
  pages   = {1--125},
  year    = {1945}
}

@book{petrov1975,
  author    = {Petrov, Valentin V.},
  title     = {Sums of Independent Random Variables},
  publisher = {Springer-Verlag, Berlin},
  year      = {1975}
}

@book{gnedenko1954,
  author    = {Gnedenko, Boris V. and Kolmogorov, Andrey N.},
  title     = {Limit Distributions for Sums of Independent Random Variables},
  publisher = {Addison-Wesley, Cambridge, MA},
  year      = {1954}
}

@book{durrett2019cltff,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5th},
  publisher = {Cambridge University Press},
  year      = {2019}
}

Prerequisites

37.03.01
37.02.02
26.04.01

Tier anchors

beginner: Durrett, Probability: Theory and Examples 5e §3.4 (informal Lindeberg picture); Grimmett-Stirzaker, Probability and Random Processes 3e §5.10; physical intuition from many small independent jolts averaging into a bell curve
intermediate: Durrett, Probability: Theory and Examples 5e §3.4.1-3.4.3 (Lindeberg-Feller theorem and Lyapunov condition); Billingsley, Probability and Measure 3e §27; Feller, An Introduction to Probability Theory and Its Applications II §XV.6, §XVI
master: Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.4; Feller II §XV.6, §XVI.5-7; Petrov, Sums of Independent Random Variables (Springer, 1975) Ch. IV-V; Gnedenko-Kolmogorov, Limit Distributions for Sums of Independent Random Variables (Addison-Wesley, 1954) §§19-22

References

Lindeberg — Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung · Math. Z. 15 (1922), 211-225
Feller — Über den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung · Math. Z. 40 (1935), 521-559; 42 (1937), 301-312
Lyapunov — Nouvelle forme du théorème sur la limite de probabilité · Mém. Acad. Sci. St. Pétersbourg 12 (1901), 1-24
Esseen — Fourier analysis of distribution functions: a mathematical study of the Laplace-Gaussian law · Acta Math. 77 (1945), 1-125
Durrett — Probability: Theory and Examples · Cambridge University Press, 5th ed., 2019, §3.4
Petrov — Sums of Independent Random Variables · Springer, 1975, Ch. IV-V

Estimated time

beginner: 20m
intermediate: 55m
master: 95m