37.03.03 · probability / 03-clt-characteristic-functions

Donsker's Invariance Principle and the Functional Central Limit Theorem

shipped3 tiersLean: none

Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §8.1-8.6; Billingsley, Convergence of Probability Measures (Wiley 2nd ed., 1999) Ch. 1-3; Ethier-Kurtz, Markov Processes: Characterization and Convergence (Wiley, 1986) Ch. 3; Whitt, Stochastic-Process Limits (Springer, 2002) Ch. 3-13

Intuition Beginner

The ordinary central limit theorem looks at one number: the position of a random walker after many steps. It says that single endpoint, rescaled, follows a bell curve. But a walk is not just its endpoint — it is a whole trajectory, a wiggly line traced out over time. The invariance principle asks the bolder question: what does the entire rescaled path look like when the walk is long?

The answer is that the whole path, shrunk down and viewed from far away, looks like a single fixed shape of randomness: Brownian motion, the continuous jittering path you would get from a speck of pollen in water. No matter what the individual steps look like — coin flips, dice rolls, any spread-out shocks with no built-in drift and a finite typical size — the coarse-grained trajectory forgets those details and settles into the same continuous random curve. That insensitivity to the fine print is why it is called an invariance principle: the limit is invariant under the choice of step distribution.

This is powerful because once you know the limiting shape, you can read off the behaviour of any quantity that depends on the path, not just the endpoint. How high did the walk ever get? How long did it spend on the positive side? When did it last return to zero? Each of these is a feature of the whole trajectory, and each has a limit you can compute by asking the same question of Brownian motion instead. One limit theorem about paths replaces a whole zoo of separate calculations.

The one-sentence takeaway: rescale a long random walk and the entire path converges to Brownian motion, so every question about the path's shape gets answered by the same universal continuous curve.

Visual Beginner

Picture three panels side by side. The leftmost shows a short random walk: a jagged staircase of a few up-and-down steps, blocky and visibly made of discrete jumps. The middle panel shows the same kind of walk with many more steps, drawn smaller so it fits the same box; the staircase is finer, the blockiness fading. The rightmost panel shows a very long walk squeezed into the box: the steps are now so small and so numerous that the line looks like a smooth-but-everywhere-jittery continuous curve — Brownian motion.

The key to the picture is the two different squeezes. Time is compressed by the number of steps, so the whole walk fits in one unit of width. Height is compressed only by the square root of that number, because a random walk's typical spread grows like the square root of time, not like time. Use the wrong vertical squeeze and the picture either flattens to a straight line or blows up to noise; use the square-root squeeze and the discrete walk locks onto the continuous Brownian shape.

Worked example Beginner

Take the simplest walk: at each step flip a fair coin and move up by on heads, down by on tails. After steps the walker sits at a position we call , the running total of the coin steps. We want to see how to rescale so that the path settles down as grows.

Step 1. Find the typical spread after steps. Each step has spread (it is plus or minus ). Spreads-squared add over independent steps, so after steps the spread-squared is and the spread is the square root of . So typically sits a distance of about the square root of from zero.

Step 2. Rescale the height. To keep the typical spread fixed at no matter how big is, divide the position by the square root of . The rescaled height is divided by the square root of , which by the ordinary central limit theorem follows a standard bell curve.

Step 3. Rescale time. We want the whole walk of steps to occupy the time window from to . So step number is placed at time divided by . At time between and , the walk has taken about times steps, and its rescaled height is at step , divided by the square root of .

Step 4. Read the limit at a sample time. Take . The walk has taken about steps, with spread-squared , so the rescaled height at time has spread-squared divided by , which is , and spread . A Brownian motion at time also has spread . The numbers match.

Step 5. What this tells us: with time squeezed by and height squeezed by the square root of , the rescaled coin-flip path lines up with Brownian motion time-point by time-point — and the invariance principle says the whole continuous path lines up too, not just one instant.

Check your understanding Beginner

Formal definition Intermediate+

Let be independent and identically distributed real random variables with and . Write and for the random walk. The rescaled path is the random function defined by linear interpolation of the rescaled partial sums: $$ W_n(t) = \frac{1}{\sigma\sqrt{n}}\Big( S_{\lfloor nt\rfloor} + (nt - \lfloor nt\rfloor),X_{\lfloor nt\rfloor + 1} \Big), \qquad t \in [0,1]. $$ At the grid points this is ; between them it interpolates linearly, so is a genuine continuous path and is a random element of the space of continuous functions on , equipped with the supremum norm and its Borel -algebra. (Some authors use the right-continuous step interpolation , a random element of the Skorokhod space ; the two interpolations differ by at most in probability, so they share the same weak limit.)

Definition (weak convergence in a metric space). Let be a metric space with Borel -algebra. A sequence of probability measures on converges weakly to , written , if for every bounded continuous . For random elements of this is , meaning their laws converge weakly. On this is convergence of the full path law, strictly stronger than convergence of any fixed finite collection of coordinates.

Definition (finite-dimensional distributions). For the finite-dimensional projection sends . The finite-dimensional distributions (fdds) of a random path are the laws of these projections. Convergence of all fdds, for every finite tuple, is necessary for but not sufficient.

Definition (tightness). A family of probability measures on a metric space is tight if for every there is a compact set with for all . On , the Arzelà-Ascoli theorem characterises compactness by a uniform bound and equicontinuity, so tightness is controlled by the modulus of continuity : the family is tight precisely when is tight in and $$ \lim_{\delta\downarrow 0}\ \limsup_{n\to\infty}\ \mathbb{P}\big(\omega_{W_n}(\delta) \ge \varepsilon\big) = 0 \quad\text{for every } \varepsilon > 0. $$

Counterexamples to common slips Intermediate+

  • Fdd convergence alone does not give path convergence. Take supported on a tall thin spike of width and height placed at a uniformly random location; every fixed coordinate tends to , so all fdds converge to the zero path, yet does not converge to . The mass escapes between the sampled times. Tightness is exactly the hypothesis that rules this out.
  • Tightness is not automatic from a variance bound. Bounded second moments of at each control no oscillation; one needs the joint modulus-of-continuity control, which for partial sums comes from a maximal inequality (Etemadi or Kolmogorov), not from marginal variances.
  • The limit is on , not on a sequence space. The walk's increments are independent, but the limit object is a single continuous path; reading Donsker as a statement about the sequence rather than about the law on path space loses the continuous-mapping payoff entirely.
  • Drift must vanish. If , the centred path converges to Brownian motion but the un-centred walk has a term that diverges; the correct scaling for nonzero drift is the law of large numbers at scale , with the Brownian fluctuation a lower-order correction.

Key theorem with proof Intermediate+

Theorem (Donsker's invariance principle / functional CLT). Let be i.i.d. with and , and let be the rescaled interpolated path above. Then in , where is standard Brownian motion 02.15.01.

Proof. By Prokhorov's theorem the strategy is fdd convergence plus tightness: on a complete separable metric space, if is tight and every finite-dimensional projection converges to that of , then . We supply both ingredients, taking the tightness route through the Skorokhod embedding, which also makes the fdd convergence transparent.

Step 1: Skorokhod embedding. There is a stopping time for a standard Brownian motion with and (the strong Markov property 37.05.04 underwrites the construction and the independent restarts below). Iterating with i.i.d. copies, there are stopping times with i.i.d. increments , , such that the embedded sums match the walk in law: $$ (B_{T_1}, B_{T_2}, \dots, B_{T_k}, \dots) \overset{d}{=} (S_1, S_2, \dots, S_k, \dots). $$ So we may assume the walk is the Brownian motion sampled at the random times , i.e. .

Step 2: the embedded times are nearly deterministic. By the strong law of large numbers, almost surely, so uniformly in (a monotone-limit / Dini argument upgrades the pointwise convergence of the increasing functions to a uniform one, since the limit is continuous). Hence the random clock tracks the deterministic clock to leading order.

Step 3: Brownian scaling. Define the rescaled Brownian motion . By the scaling invariance of Brownian motion 02.15.01, is itself a standard Brownian motion on for every ; in particular its law is exactly that of , not merely close to it.

Step 4: compare the walk's path to the scaled Brownian motion. The grid value of is , while . Their difference at any is governed by the gap , which is uniformly by Step 2, together with the uniform continuity (on compacts) of the Brownian path. Quantitatively, for the modulus of continuity of over the time horizon , $$ \sup_{t\in[0,1]} \big| W_n(t) - \widetilde B_n(t) \big| \le \frac{1}{\sigma\sqrt n}, \sup_{\substack{|u-v|\le \varepsilon_n \sigma^2 n}} |B_u - B_v| \xrightarrow[n\to\infty]{\ \mathbb{P}\ } 0, $$ where a.s. The right-hand side tends to in probability because Brownian motion is uniformly continuous on compacts and, after the rescaling, the oscillation over time-windows of length is exactly a Brownian oscillation over a window of rescaled length . The linear-interpolation correction adds at most in probability.

Step 5: conclude. Since exactly and in probability, the converging-together (Slutsky) lemma for the metric space gives . The fdd convergence and the tightness demanded by Prokhorov are both consequences of this uniform comparison: fdds converge because finitely many coordinates of are within of coordinates of an exact Brownian motion, and tightness holds because is a single tight law (the law of ) perturbed by a sup-norm-null sequence.

Bridge. Donsker's theorem builds toward the entire theory of weak convergence on path spaces and appears again in empirical-process theory, where the rescaled empirical distribution function converges to a Brownian bridge by the same Prokhorov tightness-plus-fdd template. The foundational reason it holds is that the random walk, embedded into Brownian motion by Skorokhod's stopping times, is a Brownian motion read on a clock that the law of large numbers forces to be asymptotically linear, so the discrete object and its continuous limit literally share a sample path up to a vanishing sup-norm error. This is exactly the functional generalisation of the Lindeberg-Feller central limit theorem 37.03.02: where Lindeberg-Feller turns a triangular array of increments into a single Gaussian endpoint through the characteristic-function product, Donsker turns the same increments — now read as a process — into the whole Gaussian path, and the central insight is that the finite-dimensional Gaussian limits of 37.03.02 are precisely the fdds of Brownian motion, with tightness the one extra ingredient that lifts a family of endpoint theorems to a single statement about trajectories. The bridge is that fdd convergence is the Lindeberg-Feller content and tightness is the path-space content, and Prokhorov's theorem is the gear that meshes them.

Exercises Intermediate+

Advanced results Master

The continuous-mapping theorem is the engine that converts the single weak limit into a library of corollaries. For a measurable whose discontinuity set has Wiener measure zero, gives . Three functionals carry the weight. The maximum is Lipschitz, so , half-normal. The occupation time is continuous off the Wiener-null set of paths lingering at level zero, so the positive-time fraction converges to the arcsine law with density . The last zero and the argmax are continuous off Wiener-null sets and both converge to arcsine-distributed limits; the coincidence of these three arcsine laws is Lévy's theorem on Brownian occupation, last exit, and the location of the maximum.

The argmax functional is the path-space face of a recurring statistical phenomenon: the location of the extreme of a random walk, suitably rescaled, is arcsine, hence concentrated near the endpoints of the interval. The same continuous-mapping mechanism, applied to the functional , gives the limit law of as , whose Laplace transform is computable from the Cameron-Martin-Girsanov theory and underlies the asymptotics of unit-root statistics in econometrics.

Two structural extensions deserve statement. First, the multivariate and triangular-array Donsker theorem: for a triangular array of row-wise independent mean-zero increments satisfying a Lindeberg condition 37.03.02, the interpolated row-sum process converges to Brownian motion in — this is the genuine functional generalisation, with the Lindeberg condition controlling fdd convergence (via Lindeberg-Feller) and a Lindeberg-type maximal inequality controlling tightness. Second, the empirical-process invariance principle: the rescaled empirical distribution function of i.i.d. uniforms converges in to the Brownian bridge , the Donsker theorem of empirical-process theory and the foundation of the Kolmogorov-Smirnov and Cramér-von Mises goodness-of-fit limits.

Tightness on the step-interpolation space requires the Skorokhod topology rather than the supremum norm, because is not separable under . The metric allows small horizontal as well as vertical perturbations of jump locations, making a Polish space on which Prokhorov's theorem applies; the modulus of continuity is replaced by Billingsley's -modulus, which permits a single jump within each small window. For continuous limits such as Brownian motion the and uniform topologies agree on , so the distinction is invisible in Donsker's theorem itself but essential for limits with jumps (Lévy processes, the topic of the infinitely-divisible classification).

Synthesis. The central insight is that one weak limit on path space, , generated by Prokhorov's tightness-plus-fdd criterion and proved through the Skorokhod embedding, dissolves an entire catalogue of separate random-walk limit theorems into evaluations of continuous functionals of a single Gaussian path. Putting these together, the maximum, the argmax, the last zero, and the positive-occupation fraction are not four theorems but one — the continuous-mapping theorem applied to Brownian motion — and the arcsine law that governs three of them is the foundational reason the extreme of a long walk sits near its ends rather than its middle. This is exactly the functional lift of Lindeberg-Feller 37.03.02: the finite-dimensional Gaussian limits there are dual to the fdds of Brownian motion here, and tightness is the bridge that promotes a family of endpoint statements to a statement about whole trajectories. The same architecture generalises upward to the empirical-process invariance principle, where the limit is the Brownian bridge and the payoff is the Kolmogorov-Smirnov law, and sideways to the theory, where relaxing continuity to the topology lets the limit acquire jumps and reconnects Donsker to the Gnedenko-Kolmogorov classification of infinitely divisible laws.

Full proof set Master

The invariance principle, the two-interpolation equivalence, the maximum and occupation-time corollaries, and the Slutsky lemma are proved in the Key theorem and Exercises sections. The remaining Master claims are recorded here.

Proposition (Skorokhod embedding). Let be a real random variable with and . There is a stopping time for a standard Brownian motion (on a possibly enlarged space carrying an independent randomisation) with and .

Proof. For a two-point law with and mean zero (so , ), let , the exit time of the interval. Since is a bounded martingale, optional stopping 02.15.01 gives , and the only boundary values are , forcing ; thus . Optional stopping on gives . For a general centred law, write it as a mixture of mean-zero two-point laws (the Chacón-Walsh / potential-theoretic construction): condition on an independent random pair drawn so that the conditional law is two-point mean-zero and the mixture reproduces the law of ; embed each two-point law by an interval-exit time and add an independent randomisation selecting the pair. The resulting satisfies and, by the tower property applied to , by the mean-zero two-point variance identity . The strong Markov property 37.05.04 guarantees that after each exit the post- process is an independent Brownian motion, so the i.i.d. iteration in the main proof is legitimate.

Proposition (Brownian arcsine law for occupation time). Let be the time a standard Brownian motion spends positive on . Then for .

Proof sketch. The discrete arcsine law of Lévy and Erdős-Kac [Erdős-Kac 1946] for the simple random walk states that the number of positive partial sums among , divided by , converges to the arcsine law; this is proved by a generating-function (Sparre Andersen) combinatorial identity counting sign patterns. Donsker's theorem transfers the limit to via the a.e.-continuity of the occupation functional, established by noting that the Wiener-measure of paths with positive Lebesgue time at level is zero (the Brownian zero set is a.s. Lebesgue-null, being a.s. of Hausdorff dimension ). Alternatively, a direct computation via the Feynman-Kac formula for the resolvent of Brownian motion killed by the rate yields the double Laplace transform , whose inversion is exactly the arcsine density. Both routes give the distribution function .

Proposition (Brownian bridge as the empirical-process limit). Let be i.i.d. uniform on with empirical distribution function , and let . Then in , where is the Brownian bridge.

Proof sketch. The fdds of are multivariate-CLT limits: for , the vector is a normalised sum of i.i.d. indicator vectors, converging by the multivariate central limit theorem 37.03.02 to a centred Gaussian with covariance , which is exactly the Brownian-bridge covariance. Tightness in the topology follows from a moment bound for , which controls Billingsley's -modulus. Prokhorov's theorem then upgrades fdd convergence plus tightness to weak convergence, and the covariance identifies the limit as .

Connections Master

The Lindeberg-Feller central limit theorem 37.03.02 is the finite-dimensional core of this unit: the convergence of every finite-dimensional projection to the corresponding Gaussian vector is precisely a multivariate Lindeberg-Feller statement about the increments, and the triangular-array Donsker theorem uses the Lindeberg condition verbatim to drive both the fdd convergence and, through a Lindeberg-type maximal inequality, the tightness; Donsker is Lindeberg-Feller read on the whole path rather than at the endpoint.

Brownian motion and the Wiener process 02.15.01 is the limit object and the proof's scaffolding at once: the rescaled walk converges to Brownian motion, the Skorokhod embedding realises the walk as a Brownian motion sampled at random times, Brownian scaling invariance turns the time-changed motion back into a standard one, and the reflection principle and the Lebesgue-null zero set are what make the maximum and arcsine corollaries computable on the limit.

The strong Markov property, recurrence and transience 37.05.04 underwrites the Skorokhod embedding: the independent restart of Brownian motion after each interval-exit time is exactly the strong Markov property at a stopping time, which is what licenses iterating the single-variable embedding into an i.i.d. sequence of stopping times whose sums match the walk, and the a.s. finiteness of those exit times is a recurrence statement for one-dimensional Brownian motion.

The Gnedenko-Kolmogorov classification of triangular-array limits as infinitely divisible laws [37.03.03 successors in this chapter] is the destination once the continuity hypothesis is relaxed: replacing by with the Skorokhod topology lets the functional limit acquire jumps, and the invariance principle then converges to a general Lévy process whose marginal is the infinitely divisible law singled out by dropping the Lindeberg negligibility condition.

Historical & philosophical context Master

The functional point of view originated with the 1946 paper of Paul Erdős and Mark Kac [Erdős-Kac 1946], who computed limit laws for the maximum and for the number of positive partial sums of a random walk by an invariance argument: they observed that these laws depend only on the increment variance, not on the increment distribution, and so could be evaluated on the most convenient case. Monroe Donsker, in his 1951 memoir [Donsker 1951], turned this heuristic into a theorem by proving weak convergence of the rescaled walk to Brownian motion in the path space , so that every continuous functional inherits its limit law from Brownian motion in one stroke. Yuri Prokhorov's 1956 work [Prokhorov 1956] supplied the abstract foundation — the equivalence of tightness and relative compactness for laws on a complete separable metric space — that makes the tightness-plus-fdd method rigorous and general.

The embedding route to Donsker's theorem is due to Anatoliy Skorokhod [Skorokhod 1965], who showed any centred finite-variance law can be realised as Brownian motion stopped at a suitable random time, reducing the invariance principle to a law-of-large-numbers control on the embedded clock. Patrick Billingsley's monograph systematised weak convergence on and and made the modulus-of-continuity tightness criteria standard. Donsker's principle is the prototype of a now-pervasive pattern in probability: a discrete combinatorial model, viewed at its natural scaling, converges to a universal continuous object whose law absorbs the details, a pattern that recurs in the convergence of interface models to the Schramm-Loewner evolution and of random trees to the Brownian continuum random tree.

Bibliography Master

@article{donsker1951,
  author  = {Donsker, Monroe D.},
  title   = {An invariance principle for certain probability limit theorems},
  journal = {Memoirs of the American Mathematical Society},
  volume  = {6},
  pages   = {1--12},
  year    = {1951}
}

@article{prokhorov1956,
  author  = {Prokhorov, Yuri V.},
  title   = {Convergence of random processes and limit theorems in probability theory},
  journal = {Theory of Probability and Its Applications},
  volume  = {1},
  number  = {2},
  pages   = {157--214},
  year    = {1956}
}

@book{skorokhod1965,
  author    = {Skorokhod, Anatoliy V.},
  title     = {Studies in the Theory of Random Processes},
  publisher = {Addison-Wesley, Reading, MA},
  year      = {1965}
}

@article{erdoskac1946,
  author  = {Erd\H{o}s, Paul and Kac, Mark},
  title   = {On certain limit theorems of the theory of probability},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {52},
  pages   = {292--302},
  year    = {1946}
}

@book{billingsley1999,
  author    = {Billingsley, Patrick},
  title     = {Convergence of Probability Measures},
  edition   = {2nd},
  publisher = {John Wiley \& Sons, New York},
  year      = {1999}
}

@book{ethierkurtz1986,
  author    = {Ethier, Stewart N. and Kurtz, Thomas G.},
  title     = {Markov Processes: Characterization and Convergence},
  publisher = {John Wiley \& Sons, New York},
  year      = {1986}
}

@book{durrett2019donsker,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5th},
  publisher = {Cambridge University Press},
  year      = {2019}
}