37.03.03 · probability / 03-clt-characteristic-functions

Donsker's Invariance Principle and the Functional Central Limit Theorem

shipped3 tiersLean: none

Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §8.1-8.6; Billingsley, Convergence of Probability Measures (Wiley 2nd ed., 1999) Ch. 1-3; Ethier-Kurtz, Markov Processes: Characterization and Convergence (Wiley, 1986) Ch. 3; Whitt, Stochastic-Process Limits (Springer, 2002) Ch. 3-13

Intuition Beginner

The ordinary central limit theorem looks at one number: the position of a random walker after many steps. It says that single endpoint, rescaled, follows a bell curve. But a walk is not just its endpoint — it is a whole trajectory, a wiggly line traced out over time. The invariance principle asks the bolder question: what does the entire rescaled path look like when the walk is long?

The answer is that the whole path, shrunk down and viewed from far away, looks like a single fixed shape of randomness: Brownian motion, the continuous jittering path you would get from a speck of pollen in water. No matter what the individual steps look like — coin flips, dice rolls, any spread-out shocks with no built-in drift and a finite typical size — the coarse-grained trajectory forgets those details and settles into the same continuous random curve. That insensitivity to the fine print is why it is called an invariance principle: the limit is invariant under the choice of step distribution.

This is powerful because once you know the limiting shape, you can read off the behaviour of any quantity that depends on the path, not just the endpoint. How high did the walk ever get? How long did it spend on the positive side? When did it last return to zero? Each of these is a feature of the whole trajectory, and each has a limit you can compute by asking the same question of Brownian motion instead. One limit theorem about paths replaces a whole zoo of separate calculations.

The one-sentence takeaway: rescale a long random walk and the entire path converges to Brownian motion, so every question about the path's shape gets answered by the same universal continuous curve.

Visual Beginner

Picture three panels side by side. The leftmost shows a short random walk: a jagged staircase of a few up-and-down steps, blocky and visibly made of discrete jumps. The middle panel shows the same kind of walk with many more steps, drawn smaller so it fits the same box; the staircase is finer, the blockiness fading. The rightmost panel shows a very long walk squeezed into the box: the steps are now so small and so numerous that the line looks like a smooth-but-everywhere-jittery continuous curve — Brownian motion.

The key to the picture is the two different squeezes. Time is compressed by the number of steps, so the whole walk fits in one unit of width. Height is compressed only by the square root of that number, because a random walk's typical spread grows like the square root of time, not like time. Use the wrong vertical squeeze and the picture either flattens to a straight line or blows up to noise; use the square-root squeeze and the discrete walk locks onto the continuous Brownian shape.

Worked example Beginner

Take the simplest walk: at each step flip a fair coin and move up by $1$ on heads, down by $1$ on tails. After $n$ steps the walker sits at a position we call $S_{n}$ , the running total of the coin steps. We want to see how to rescale so that the path settles down as $n$ grows.

Step 1. Find the typical spread after $n$ steps. Each step has spread $1$ (it is plus or minus $1$ ). Spreads-squared add over independent steps, so after $n$ steps the spread-squared is $n$ and the spread is the square root of $n$ . So $S_{n}$ typically sits a distance of about the square root of $n$ from zero.

Step 2. Rescale the height. To keep the typical spread fixed at $1$ no matter how big $n$ is, divide the position by the square root of $n$ . The rescaled height is $S_{n}$ divided by the square root of $n$ , which by the ordinary central limit theorem follows a standard bell curve.

Step 3. Rescale time. We want the whole walk of $n$ steps to occupy the time window from $0$ to $1$ . So step number $k$ is placed at time $k$ divided by $n$ . At time $t$ between $0$ and $1$ , the walk has taken about $n$ times $t$ steps, and its rescaled height is $S$ at step $(n times t)$ , divided by the square root of $n$ .

Step 4. Read the limit at a sample time. Take $t = \frac{1}{4}$ . The walk has taken about $n /4$ steps, with spread-squared $n /4$ , so the rescaled height at time $\frac{1}{4}$ has spread-squared $(n /4)$ divided by $n$ , which is $\frac{1}{4}$ , and spread $\frac{1}{2}$ . A Brownian motion at time $\frac{1}{4}$ also has spread $\frac{1}{2}$ . The numbers match.

Step 5. What this tells us: with time squeezed by $n$ and height squeezed by the square root of $n$ , the rescaled coin-flip path lines up with Brownian motion time-point by time-point — and the invariance principle says the whole continuous path lines up too, not just one instant.

Check your understanding Beginner

Exercise (easy, multiple choice).

To make a long random walk converge to Brownian motion, how should you rescale position and time?

A. Divide both position and time by the number of steps. B. Divide position by the number of steps and time by its square root. C. Divide time by the number of steps and position by the square root of the number of steps. D. Leave time alone and divide position by the number of steps.

Hint

A walk's typical spread grows like the square root of elapsed time, so the height squeeze must match that growth.

Answer

C. Divide time by the number of steps and position by the square root of the number of steps. Time is compressed by $n$ so the walk fills one unit of time; position is compressed by the square root of $n$ to match the square-root growth of the walk's spread. Feedback-correct: this is exactly the scaling under which Brownian motion is self-similar, which is why the discrete walk locks onto it. Feedback-wrong: A and B flatten the path to a line because the height is over-squeezed; D leaves the path spreading without bound. Only the square-root height squeeze keeps the spread fixed.

Formal definition Intermediate+

Let $X_{1}, X_{2}, \dots$ be independent and identically distributed real random variables with $E [X_{1}] = 0$ and $Var (X_{1}) = σ^{2} \in (0, \infty)$ . Write $S_{0} = 0$ and $S_{k} = X_{1} + \dots + X_{k}$ for the random walk. The rescaled path is the random function $W_{n} \in C [0, 1]$ defined by linear interpolation of the rescaled partial sums: $$ W_n(t) = \frac{1}{\sigma\sqrt{n}}\Big( S_{\lfloor nt\rfloor} + (nt - \lfloor nt\rfloor),X_{\lfloor nt\rfloor + 1} \Big), \qquad t \in [0,1]. $$ At the grid points $t = k / n$ this is $S_{k} / (σ n)$ ; between them it interpolates linearly, so $W_{n}$ is a genuine continuous path and $W_{n}$ is a random element of the space $C [0, 1]$ of continuous functions on $[0, 1]$ , equipped with the supremum norm $∥ f ∥ = sup_{t} ∣ f (t) ∣$ and its Borel $σ$ -algebra. (Some authors use the right-continuous step interpolation $S_{⌊ n t ⌋} / (σ n)$ , a random element of the Skorokhod space $D [0, 1]$ ; the two interpolations differ by at most $max_{k \leq n} ∣ X_{k} ∣/ (σ n) \to 0$ in probability, so they share the same weak limit.)

Definition (weak convergence in a metric space). Let $(S, d)$ be a metric space with Borel $σ$ -algebra. A sequence of probability measures $μ_{n}$ on $S$ converges weakly to $μ$ , written $μ_{n} \Rightarrow μ$ , if $\int f d μ_{n} \to \int f d μ$ for every bounded continuous $f : S \to R$ . For random elements $W_{n}, W$ of $S$ this is $W_{n} \Rightarrow W$ , meaning their laws converge weakly. On $S = C [0, 1]$ this is convergence of the full path law, strictly stronger than convergence of any fixed finite collection of coordinates.

Definition (finite-dimensional distributions). For $0 \leq t_{1} < \dots < t_{m} \leq 1$ the finite-dimensional projection $π_{t_{1}, \dots, t_{m}} : C [0, 1] \to R^{m}$ sends $f \mapsto (f (t_{1}), \dots, f (t_{m}))$ . The finite-dimensional distributions (fdds) of a random path are the laws of these projections. Convergence of all fdds, $π_{t_{1}, \dots, t_{m}} (W_{n}) \Rightarrow π_{t_{1}, \dots, t_{m}} (W)$ for every finite tuple, is necessary for $W_{n} \Rightarrow W$ but not sufficient.

Definition (tightness). A family ${μ_{n}}$ of probability measures on a metric space is tight if for every $ε > 0$ there is a compact set $K$ with $μ_{n} (K) \geq 1 - ε$ for all $n$ . On $C [0, 1]$ , the Arzelà-Ascoli theorem characterises compactness by a uniform bound and equicontinuity, so tightness is controlled by the modulus of continuity $ω_{f} (δ) = sup_{∣ s - t ∣ \leq δ} ∣ f (s) - f (t) ∣$ : the family is tight precisely when ${W_{n} (0)}$ is tight in $R$ and $$ \lim_{\delta\downarrow 0}\ \limsup_{n\to\infty}\ \mathbb{P}\big(\omega_{W_n}(\delta) \ge \varepsilon\big) = 0 \quad\text{for every } \varepsilon > 0. $$

Counterexamples to common slips Intermediate+

Fdd convergence alone does not give path convergence. Take $W_{n}$ supported on a tall thin spike of width $1/ n$ and height $1$ placed at a uniformly random location; every fixed coordinate $W_{n} (t)$ tends to $0$ , so all fdds converge to the zero path, yet $sup_{t} W_{n} (t) = 1$ does not converge to $0$ . The mass escapes between the sampled times. Tightness is exactly the hypothesis that rules this out.
Tightness is not automatic from a variance bound. Bounded second moments of $W_{n} (t)$ at each $t$ control no oscillation; one needs the joint modulus-of-continuity control, which for partial sums comes from a maximal inequality (Etemadi or Kolmogorov), not from marginal variances.
The limit is on $C [0, 1]$ , not on a sequence space. The walk's increments are independent, but the limit object is a single continuous path; reading Donsker as a statement about the sequence $(X_{k})$ rather than about the law on path space loses the continuous-mapping payoff entirely.
Drift must vanish. If $E [X_{1}] = μ \neq = 0$ , the centred path $W_{n}$ converges to Brownian motion but the un-centred walk $S_{⌊ n t ⌋} / (σ n)$ has a $μ n t$ term that diverges; the correct scaling for nonzero drift is the law of large numbers at scale $n$ , with the Brownian fluctuation a lower-order correction.

Key theorem with proof Intermediate+

Theorem (Donsker's invariance principle / functional CLT). Let $X_{1}, X_{2}, \dots$ be i.i.d. with $E [X_{1}] = 0$ and $Var (X_{1}) = σ^{2} \in (0, \infty)$ , and let $W_{n} \in C [0, 1]$ be the rescaled interpolated path above. Then $W_{n} \Rightarrow B$ in $C [0, 1]$ , where $B = (B_{t})_{t \in [0, 1]}$ is standard Brownian motion 02.15.01.

Proof. By Prokhorov's theorem the strategy is fdd convergence plus tightness: on a complete separable metric space, if ${W_{n}}$ is tight and every finite-dimensional projection converges to that of $B$ , then $W_{n} \Rightarrow B$ . We supply both ingredients, taking the tightness route through the Skorokhod embedding, which also makes the fdd convergence transparent.

Step 1: Skorokhod embedding. There is a stopping time $T$ for a standard Brownian motion $B$ with $E [T] = σ^{2}$ and $B_{T} = d X_{1}$ (the strong Markov property 37.05.04 underwrites the construction and the independent restarts below). Iterating with i.i.d. copies, there are stopping times $0 = T_{0} \leq T_{1} \leq T_{2} \leq \dots$ with i.i.d. increments $τ_{k} = T_{k} - T_{k - 1}$ , $E [τ_{k}] = σ^{2}$ , such that the embedded sums match the walk in law: $$ (B_{T_1}, B_{T_2}, \dots, B_{T_k}, \dots) \overset{d}{=} (S_1, S_2, \dots, S_k, \dots). $$ So we may assume the walk is the Brownian motion sampled at the random times $T_{k}$ , i.e. $S_{k} = B_{T_{k}}$ .

Step 2: the embedded times are nearly deterministic. By the strong law of large numbers, $T_{k} / k \to σ^{2}$ almost surely, so $T_{⌊ n t ⌋} / n \to σ^{2} t$ uniformly in $t \in [0, 1]$ (a monotone-limit / Dini argument upgrades the pointwise convergence of the increasing functions $t \mapsto T_{⌊ n t ⌋} / n$ to a uniform one, since the limit $t \mapsto σ^{2} t$ is continuous). Hence the random clock $T_{⌊ n t ⌋}$ tracks the deterministic clock $σ^{2} n t$ to leading order.

Step 3: Brownian scaling. Define the rescaled Brownian motion $B_{n} (t) = B_{σ^{2} n t} / (σ n)$ . By the scaling invariance of Brownian motion 02.15.01, $B_{n}$ is itself a standard Brownian motion on $[0, 1]$ for every $n$ ; in particular its law is exactly that of $B$ , not merely close to it.

Step 4: compare the walk's path to the scaled Brownian motion. The grid value of $W_{n}$ is $W_{n} (k / n) = B_{T_{k}} / (σ n)$ , while $B_{n} (k / n) = B_{σ^{2} k} / (σ n)$ . Their difference at any $t$ is governed by the gap $∣ T_{⌊ n t ⌋} - σ^{2} n t ∣$ , which is $o (n)$ uniformly by Step 2, together with the uniform continuity (on compacts) of the Brownian path. Quantitatively, for the modulus of continuity of $B$ over the time horizon $[0, σ^{2} n (1 + o (1))]$ , $$ \sup_{t\in[0,1]} \big| W_n(t) - \widetilde B_n(t) \big| \le \frac{1}{\sigma\sqrt n}, \sup_{\substack{|u-v|\le \varepsilon_n \sigma^2 n}} |B_u - B_v| \xrightarrow[n\to\infty]{\ \mathbb{P}\ } 0, $$ where $ε_{n} = sup_{t} ∣ T_{⌊ n t ⌋} / n - σ^{2} t ∣ \to 0$ a.s. The right-hand side tends to $0$ in probability because Brownian motion is uniformly continuous on compacts and, after the $σ n$ rescaling, the oscillation over time-windows of length $ε_{n} σ^{2} n$ is exactly a Brownian oscillation over a window of rescaled length $ε_{n} \to 0$ . The linear-interpolation correction adds at most $max_{k \leq n} ∣ X_{k} ∣/ (σ n) \to 0$ in probability.

Step 5: conclude. Since $B_{n} = d B$ exactly and $∥ W_{n} - B_{n} ∥ \to 0$ in probability, the converging-together (Slutsky) lemma for the metric space $C [0, 1]$ gives $W_{n} \Rightarrow B$ . The fdd convergence and the tightness demanded by Prokhorov are both consequences of this uniform comparison: fdds converge because finitely many coordinates of $W_{n}$ are within $o (1)$ of coordinates of an exact Brownian motion, and tightness holds because ${B_{n}}$ is a single tight law (the law of $B$ ) perturbed by a sup-norm-null sequence. $□$

Bridge. Donsker's theorem builds toward the entire theory of weak convergence on path spaces and appears again in empirical-process theory, where the rescaled empirical distribution function converges to a Brownian bridge by the same Prokhorov tightness-plus-fdd template. The foundational reason it holds is that the random walk, embedded into Brownian motion by Skorokhod's stopping times, is a Brownian motion read on a clock that the law of large numbers forces to be asymptotically linear, so the discrete object and its continuous limit literally share a sample path up to a vanishing sup-norm error. This is exactly the functional generalisation of the Lindeberg-Feller central limit theorem 37.03.02: where Lindeberg-Feller turns a triangular array of increments into a single Gaussian endpoint through the characteristic-function product, Donsker turns the same increments — now read as a process — into the whole Gaussian path, and the central insight is that the finite-dimensional Gaussian limits of 37.03.02 are precisely the fdds of Brownian motion, with tightness the one extra ingredient that lifts a family of endpoint theorems to a single statement about trajectories. The bridge is that fdd convergence is the Lindeberg-Feller content and tightness is the path-space content, and Prokhorov's theorem is the gear that meshes them.

Exercises Intermediate+

Exercise 2 (easy, symbolic).

Show that the two interpolations — the linear interpolation $W_{n}$ and the step interpolation $W_{n}^{\circ} (t) = S_{⌊ n t ⌋} / (σ n)$ — have the same weak limit by bounding their sup-norm difference.

Hint

On each grid interval the linear and step versions differ by at most one increment, rescaled.

Answer

On $[k / n, (k + 1) / n)$ the step version is constant at $S_{k} / (σ n)$ while the linear version moves from $S_{k} / (σ n)$ to $S_{k + 1} / (σ n)$ , so $∣ W_{n} (t) - W_{n}^{\circ} (t) ∣ \leq ∣ X_{k + 1} ∣/ (σ n)$ . Taking the supremum, $∥ W_{n} - W_{n}^{\circ} ∥ \leq max_{1 \leq k \leq n} ∣ X_{k} ∣/ (σ n)$ . Since $E [X_{1}^{2}] < \infty$ , $P (max_{k} ∣ X_{k} ∣ > ε σ n) \leq n P (∣ X_{1} ∣ > ε σ n) \leq ε^{- 2} E [X_{1}^{2}; ∣ X_{1} ∣ > ε σ n] / σ^{2} \to 0$ , so the difference is sup-norm-null in probability. By the converging-together lemma the two share the limit $B$ .

Exercise 3 (medium, symbolic).

Use Donsker's theorem and the continuous-mapping theorem to find the limit law of the rescaled maximum $M_{n} = max_{0 \leq k \leq n} S_{k} / (σ n)$ .

Hint

The maximum functional $f \mapsto max_{t \in [0, 1]} f (t)$ is continuous on $C [0, 1]$ ; then use the reflection principle for Brownian motion.

Answer

The functional $Φ (f) = max_{t \in [0, 1]} f (t)$ is continuous in the supremum norm (indeed $1$ -Lipschitz: $∣Φ (f) - Φ (g) ∣ \leq ∥ f - g ∥$ ). Donsker plus the continuous-mapping theorem give $M_{n} = Φ (W_{n}) \Rightarrow Φ (B) = max_{t \in [0, 1]} B_{t}$ . By the reflection principle 02.15.01, $max_{t \in [0, 1]} B_{t} = d ∣ B_{1} ∣$ , so $M_{n} \Rightarrow ∣ N (0, 1) ∣$ , the half-normal law: $P (M_{n} \leq x) \to 2Φ (x) - 1$ for $x \geq 0$ , where $Φ$ is the standard normal distribution function. The continuous limit does the work; no direct combinatorics on the walk is needed.

Exercise 4 (medium, symbolic).

State the modulus-of-continuity tightness criterion on $C [0, 1]$ and explain why a maximal inequality for partial sums (e.g. $P (max_{k \leq m} ∣ S_{k} ∣ \geq λ) \leq C Var (S_{m}) / λ^{2}$ ) is what supplies it for Donsker.

Hint

Tightness on $C [0, 1]$ needs $lim sup_{n} P (ω_{W_{n}} (δ) \geq ε) \to 0$ as $δ ↓ 0$ ; the oscillation of $W_{n}$ over a window of width $δ$ is the max of a partial sum over $\approx δ n$ steps.

Answer

A family ${W_{n}}$ with $W_{n} (0) = 0$ is tight on $C [0, 1]$ iff $lim_{δ ↓ 0} lim sup_{n} P (ω_{W_{n}} (δ) \geq ε) = 0$ for all $ε > 0$ . The oscillation of $W_{n}$ over an interval of length $δ$ is, up to the interpolation error, $max_{∣ j - k ∣ \leq δ n} ∣ S_{j} - S_{k} ∣/ (σ n)$ , the maximum fluctuation of the walk over about $δ n$ steps. A maximal inequality bounds $P (max_{i \leq δ n} ∣ S_{i} ∣ \geq ε σ n)$ by $C (δ n σ^{2}) / (ε^{2} σ^{2} n) = C δ / ε^{2}$ , uniform in $n$ and vanishing as $δ \to 0$ . Summing over the $⌈ 1/ δ ⌉$ windows keeps the bound proportional to $δ / ε^{2}$ , giving the tightness criterion. The maximal inequality, not the marginal variance, controls the oscillation, which is the path-space content of the theorem.

Exercise 5 (medium, symbolic).

Prove the converging-together (Slutsky) lemma in a metric space: if $W_{n} \Rightarrow W$ and $d (W_{n}, V_{n}) \to 0$ in probability, then $V_{n} \Rightarrow W$ .

Hint

Use a bounded Lipschitz test function and split on the event that $d (W_{n}, V_{n})$ is small.

Answer

It suffices to show $E [g (V_{n})] \to E [g (W)]$ for every bounded Lipschitz $g$ with constant $L$ and bound $K$ (these determine weak convergence on a metric space). Write $∣ E g (V_{n}) - E g (W) ∣ \leq ∣ E g (V_{n}) - E g (W_{n}) ∣ + ∣ E g (W_{n}) - E g (W) ∣$ . The second term $\to 0$ since $W_{n} \Rightarrow W$ . For the first, fix $ε > 0$ and split on $A_{n} = {d (W_{n}, V_{n}) \leq ε}$ : on $A_{n}$ , $∣ g (V_{n}) - g (W_{n}) ∣ \leq L ε$ ; off $A_{n}$ , it is $\leq 2 K$ . So $∣ E g (V_{n}) - E g (W_{n}) ∣ \leq L ε + 2 K P (d (W_{n}, V_{n}) > ε)$ . The probability $\to 0$ by hypothesis, leaving $L ε$ ; let $ε \to 0$ . Both terms vanish, so $V_{n} \Rightarrow W$ .

Exercise 6 (hard, short-answer).

Derive the limit law of the rescaled occupation time $L_{n} = \frac{1}{n} # {1 \leq k \leq n : S_{k} > 0}$ , the fraction of time the walk spends positive, and identify the resulting distribution by name.

Hint

The occupation functional $f \mapsto \int_{0}^{1} 1_{{f (t) > 0}} d t$ is not continuous everywhere on $C [0, 1]$ , but it is continuous at $B$ -almost-every path; use the a.e.-continuous form of the continuous-mapping theorem.

Answer

Let $Ψ (f) = \int_{0}^{1} 1_{{f (t) > 0}} d t$ . Then $L_{n} = Ψ (W_{n}^{\circ}) + o (1)$ , the Riemann-sum version of $Ψ$ along the grid. $Ψ$ fails to be sup-norm continuous only at paths spending positive Lebesgue time at level $0$ ; Brownian motion does so with probability $0$ (its zero set has Lebesgue measure $0$ ). So $Ψ$ is continuous on a set of full Wiener measure, and the continuous-mapping theorem in its almost-everywhere form gives $L_{n} \Rightarrow Ψ (B) = \int_{0}^{1} 1_{{B_{t} > 0}} d t$ . By Lévy's arcsine law this occupation time has the arcsine distribution on $[0, 1]$ , with density $\frac{1}{π x ( 1 - x )}$ and distribution function $P (Ψ (B) \leq x) = \frac{2}{π} arcsin x$ . The fraction of time spent positive is therefore most likely near $0$ or $1$ , not near $\frac{1}{2}$ — the counterintuitive arcsine phenomenon, which Donsker transfers verbatim from Brownian motion to the random walk (the Erdős-Kac theorem ^{[Erdős-Kac 1946]}).

Exercise 7 (hard, short-answer).

Sketch why convergence of all finite-dimensional distributions does not imply weak convergence on $C [0, 1]$ , and state precisely what Prokhorov's theorem adds.

Hint

Construct mass that escapes between the sampled times; then recall what relative compactness on a Polish space is equivalent to.

Answer

Let $g_{n}$ be the triangular spike of height $1$ and base width $1/ n$ centred at a point $U_{n}$ uniform on $[0, 1]$ , and set $V_{n} = g_{n}$ . For any fixed $t$ , $P (V_{n} (t) \neq = 0) \leq 1/ n \to 0$ , so $V_{n} (t) \to 0$ in probability and every fdd converges to that of the zero path. But $∥ V_{n} ∥ = 1$ for all $n$ , so $Φ (V_{n}) = max_{t} V_{n} (t) = 1 \neq \Rightarrow 0 = Φ (0)$ ; the law of $V_{n}$ does not converge to the point mass at $0$ . The fdds miss the mass concealed between sample times. Prokhorov's theorem supplies the missing ingredient: on a Polish space, a family is relatively compact (every subsequence has a weakly convergent sub-subsequence) iff it is tight. With tightness, every subsequential limit exists, and fdd convergence then pins down that limit uniquely (fdds determine a Borel measure on $C [0, 1]$ ). Tightness fails for the spikes because their modulus of continuity $ω_{V_{n}} (δ) \to 1$ does not vanish.

Advanced results Master

The continuous-mapping theorem is the engine that converts the single weak limit $W_{n} \Rightarrow B$ into a library of corollaries. For a measurable $Φ : C [0, 1] \to R$ whose discontinuity set $D_{Φ}$ has Wiener measure zero, $W_{n} \Rightarrow B$ gives $Φ (W_{n}) \Rightarrow Φ (B)$ . Three functionals carry the weight. The maximum $Φ (f) = max_{t} f (t)$ is Lipschitz, so $max_{k \leq n} S_{k} / (σ n) \Rightarrow max_{t} B_{t} = d ∣ B_{1} ∣$ , half-normal. The occupation time $Φ (f) = \int_{0}^{1} 1_{{f (t) > 0}} d t$ is continuous off the Wiener-null set of paths lingering at level zero, so the positive-time fraction converges to the arcsine law with density $1/ (π x (1 - x))$ . The last zero $Φ (f) = sup {t : f (t) = 0}$ and the argmax $Φ (f) = ar g max_{t} f (t)$ are continuous off Wiener-null sets and both converge to arcsine-distributed limits; the coincidence of these three arcsine laws is Lévy's theorem on Brownian occupation, last exit, and the location of the maximum.

The argmax functional is the path-space face of a recurring statistical phenomenon: the location of the extreme of a random walk, suitably rescaled, is arcsine, hence concentrated near the endpoints of the interval. The same continuous-mapping mechanism, applied to the functional $f \mapsto \int_{0}^{1} f (t)^{2} d t$ , gives the limit law of $\sum_{k \leq n} S_{k}^{2} / (σ^{2} n^{2})$ as $\int_{0}^{1} B_{t}^{2} d t$ , whose Laplace transform is computable from the Cameron-Martin-Girsanov theory and underlies the asymptotics of unit-root statistics in econometrics.

Two structural extensions deserve statement. First, the multivariate and triangular-array Donsker theorem: for a triangular array of row-wise independent mean-zero increments satisfying a Lindeberg condition 37.03.02, the interpolated row-sum process converges to Brownian motion in $C [0, 1]$ — this is the genuine functional generalisation, with the Lindeberg condition controlling fdd convergence (via Lindeberg-Feller) and a Lindeberg-type maximal inequality controlling tightness. Second, the empirical-process invariance principle: the rescaled empirical distribution function $α_{n} (t) = n (F_{n} (t) - t)$ of i.i.d. uniforms converges in $D [0, 1]$ to the Brownian bridge $B_{t}^{0} = B_{t} - t B_{1}$ , the Donsker theorem of empirical-process theory and the foundation of the Kolmogorov-Smirnov and Cramér-von Mises goodness-of-fit limits.

Tightness on the step-interpolation space $D [0, 1]$ requires the Skorokhod $J_{1}$ topology rather than the supremum norm, because $D [0, 1]$ is not separable under $∥ \cdot ∥$ . The $J_{1}$ metric allows small horizontal as well as vertical perturbations of jump locations, making $D [0, 1]$ a Polish space on which Prokhorov's theorem applies; the modulus of continuity is replaced by Billingsley's $w^{''}$ -modulus, which permits a single jump within each small window. For continuous limits such as Brownian motion the $J_{1}$ and uniform topologies agree on $C [0, 1]$ , so the distinction is invisible in Donsker's theorem itself but essential for limits with jumps (Lévy processes, the topic of the infinitely-divisible classification).

Synthesis. The central insight is that one weak limit on path space, $W_{n} \Rightarrow B$ , generated by Prokhorov's tightness-plus-fdd criterion and proved through the Skorokhod embedding, dissolves an entire catalogue of separate random-walk limit theorems into evaluations of continuous functionals of a single Gaussian path. Putting these together, the maximum, the argmax, the last zero, and the positive-occupation fraction are not four theorems but one — the continuous-mapping theorem applied to Brownian motion — and the arcsine law that governs three of them is the foundational reason the extreme of a long walk sits near its ends rather than its middle. This is exactly the functional lift of Lindeberg-Feller 37.03.02: the finite-dimensional Gaussian limits there are dual to the fdds of Brownian motion here, and tightness is the bridge that promotes a family of endpoint statements to a statement about whole trajectories. The same architecture generalises upward to the empirical-process invariance principle, where the limit is the Brownian bridge and the payoff is the Kolmogorov-Smirnov law, and sideways to the $D [0, 1]$ theory, where relaxing continuity to the $J_{1}$ topology lets the limit acquire jumps and reconnects Donsker to the Gnedenko-Kolmogorov classification of infinitely divisible laws.

Full proof set Master

The invariance principle, the two-interpolation equivalence, the maximum and occupation-time corollaries, and the Slutsky lemma are proved in the Key theorem and Exercises sections. The remaining Master claims are recorded here.

Proposition (Skorokhod embedding). Let $X$ be a real random variable with $E [X] = 0$ and $E [X^{2}] = σ^{2} < \infty$ . There is a stopping time $T$ for a standard Brownian motion $B$ (on a possibly enlarged space carrying an independent randomisation) with $B_{T} = d X$ and $E [T] = σ^{2}$ .

Proof. For a two-point law $X \in {- a, b}$ with $a, b > 0$ and mean zero (so $P (X = b) = a / (a + b)$ , $P (X = - a) = b / (a + b)$ ), let $T = in f {t : B_{t} \in / (- a, b)}$ , the exit time of the interval. Since $B_{t \land T}$ is a bounded martingale, optional stopping 02.15.01 gives $E [B_{T}] = 0$ , and the only boundary values are $- a, b$ , forcing $P (B_{T} = b) = a / (a + b)$ ; thus $B_{T} = d X$ . Optional stopping on $B_{t}^{2} - t$ gives $E [T] = E [B_{T}^{2}] = σ^{2}$ . For a general centred law, write it as a mixture of mean-zero two-point laws (the Chacón-Walsh / potential-theoretic construction): condition on an independent random pair $(- A, B)$ drawn so that the conditional law is two-point mean-zero and the mixture reproduces the law of $X$ ; embed each two-point law by an interval-exit time and add an independent randomisation selecting the pair. The resulting $T$ satisfies $B_{T} = d X$ and, by the tower property applied to $E [T ∣ pair] = A B$ , $E [T] = E [A B] = E [X^{2}] = σ^{2}$ by the mean-zero two-point variance identity $Var = ab$ . The strong Markov property 37.05.04 guarantees that after each exit the post- $T$ process is an independent Brownian motion, so the i.i.d. iteration in the main proof is legitimate. $□$

Proposition (Brownian arcsine law for occupation time). Let $A^{+} = \int_{0}^{1} 1_{{B_{t} > 0}} d t$ be the time a standard Brownian motion spends positive on $[0, 1]$ . Then $P (A^{+} \leq x) = \frac{2}{π} arcsin x$ for $x \in [0, 1]$ .

Proof sketch. The discrete arcsine law of Lévy and Erdős-Kac ^{[Erdős-Kac 1946]} for the simple random walk states that the number of positive partial sums among $S_{1}, \dots, S_{2 n}$ , divided by $2 n$ , converges to the arcsine law; this is proved by a generating-function (Sparre Andersen) combinatorial identity counting sign patterns. Donsker's theorem transfers the limit to $A^{+}$ via the a.e.-continuity of the occupation functional, established by noting that the Wiener-measure of paths with positive Lebesgue time at level $0$ is zero (the Brownian zero set is a.s. Lebesgue-null, being a.s. of Hausdorff dimension $1/2$ ). Alternatively, a direct computation via the Feynman-Kac formula for the resolvent of Brownian motion killed by the rate $λ 1_{{x > 0}}$ yields the double Laplace transform $\int_{0}^{\infty} e^{- q s} E [e^{- λ A_{s}^{+}}] d s = ((q + λ) q)^{- 1/2}$ , whose inversion is exactly the arcsine density. Both routes give the distribution function $\frac{2}{π} arcsin x$ . $□$

Proposition (Brownian bridge as the empirical-process limit). Let $U_{1}, U_{2}, \dots$ be i.i.d. uniform on $[0, 1]$ with empirical distribution function $F_{n}$ , and let $α_{n} (t) = n (F_{n} (t) - t)$ . Then $α_{n} \Rightarrow B^{0}$ in $D [0, 1]$ , where $B_{t}^{0} = B_{t} - t B_{1}$ is the Brownian bridge.

Proof sketch. The fdds of $α_{n}$ are multivariate-CLT limits: for $0 \leq t_{1} < \dots < t_{m} \leq 1$ , the vector $(α_{n} (t_{1}), \dots, α_{n} (t_{m}))$ is a normalised sum of i.i.d. indicator vectors, converging by the multivariate central limit theorem 37.03.02 to a centred Gaussian with covariance $t_{i} \land t_{j} - t_{i} t_{j}$ , which is exactly the Brownian-bridge covariance. Tightness in the $J_{1}$ topology follows from a moment bound $E [(α_{n} (t) - α_{n} (s))^{2} (α_{n} (r) - α_{n} (t))^{2}] \leq C (r - s)^{2}$ for $s \leq t \leq r$ , which controls Billingsley's $w^{''}$ -modulus. Prokhorov's theorem then upgrades fdd convergence plus tightness to weak convergence, and the covariance $t \land s - t s$ identifies the limit as $B^{0}$ . $□$

Connections Master

The Lindeberg-Feller central limit theorem 37.03.02 is the finite-dimensional core of this unit: the convergence of every finite-dimensional projection $π_{t_{1}, \dots, t_{m}} (W_{n})$ to the corresponding Gaussian vector is precisely a multivariate Lindeberg-Feller statement about the increments, and the triangular-array Donsker theorem uses the Lindeberg condition verbatim to drive both the fdd convergence and, through a Lindeberg-type maximal inequality, the tightness; Donsker is Lindeberg-Feller read on the whole path rather than at the endpoint.

Brownian motion and the Wiener process 02.15.01 is the limit object and the proof's scaffolding at once: the rescaled walk converges to Brownian motion, the Skorokhod embedding realises the walk as a Brownian motion sampled at random times, Brownian scaling invariance turns the time-changed motion back into a standard one, and the reflection principle and the Lebesgue-null zero set are what make the maximum and arcsine corollaries computable on the limit.

The strong Markov property, recurrence and transience 37.05.04 underwrites the Skorokhod embedding: the independent restart of Brownian motion after each interval-exit time is exactly the strong Markov property at a stopping time, which is what licenses iterating the single-variable embedding into an i.i.d. sequence of stopping times whose sums match the walk, and the a.s. finiteness of those exit times is a recurrence statement for one-dimensional Brownian motion.

The Gnedenko-Kolmogorov classification of triangular-array limits as infinitely divisible laws [37.03.03 successors in this chapter] is the destination once the continuity hypothesis is relaxed: replacing $C [0, 1]$ by $D [0, 1]$ with the Skorokhod $J_{1}$ topology lets the functional limit acquire jumps, and the invariance principle then converges to a general Lévy process whose marginal is the infinitely divisible law singled out by dropping the Lindeberg negligibility condition.

Historical & philosophical context Master

The functional point of view originated with the 1946 paper of Paul Erdős and Mark Kac ^{[Erdős-Kac 1946]}, who computed limit laws for the maximum and for the number of positive partial sums of a random walk by an invariance argument: they observed that these laws depend only on the increment variance, not on the increment distribution, and so could be evaluated on the most convenient case. Monroe Donsker, in his 1951 memoir ^{[Donsker 1951]}, turned this heuristic into a theorem by proving weak convergence of the rescaled walk to Brownian motion in the path space $C [0, 1]$ , so that every continuous functional inherits its limit law from Brownian motion in one stroke. Yuri Prokhorov's 1956 work ^{[Prokhorov 1956]} supplied the abstract foundation — the equivalence of tightness and relative compactness for laws on a complete separable metric space — that makes the tightness-plus-fdd method rigorous and general.

The embedding route to Donsker's theorem is due to Anatoliy Skorokhod ^{[Skorokhod 1965]}, who showed any centred finite-variance law can be realised as Brownian motion stopped at a suitable random time, reducing the invariance principle to a law-of-large-numbers control on the embedded clock. Patrick Billingsley's monograph systematised weak convergence on $C [0, 1]$ and $D [0, 1]$ and made the modulus-of-continuity tightness criteria standard. Donsker's principle is the prototype of a now-pervasive pattern in probability: a discrete combinatorial model, viewed at its natural scaling, converges to a universal continuous object whose law absorbs the details, a pattern that recurs in the convergence of interface models to the Schramm-Loewner evolution and of random trees to the Brownian continuum random tree.

Bibliography Master

@article{donsker1951,
  author  = {Donsker, Monroe D.},
  title   = {An invariance principle for certain probability limit theorems},
  journal = {Memoirs of the American Mathematical Society},
  volume  = {6},
  pages   = {1--12},
  year    = {1951}
}

@article{prokhorov1956,
  author  = {Prokhorov, Yuri V.},
  title   = {Convergence of random processes and limit theorems in probability theory},
  journal = {Theory of Probability and Its Applications},
  volume  = {1},
  number  = {2},
  pages   = {157--214},
  year    = {1956}
}

@book{skorokhod1965,
  author    = {Skorokhod, Anatoliy V.},
  title     = {Studies in the Theory of Random Processes},
  publisher = {Addison-Wesley, Reading, MA},
  year      = {1965}
}

@article{erdoskac1946,
  author  = {Erd\H{o}s, Paul and Kac, Mark},
  title   = {On certain limit theorems of the theory of probability},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {52},
  pages   = {292--302},
  year    = {1946}
}

@book{billingsley1999,
  author    = {Billingsley, Patrick},
  title     = {Convergence of Probability Measures},
  edition   = {2nd},
  publisher = {John Wiley \& Sons, New York},
  year      = {1999}
}

@book{ethierkurtz1986,
  author    = {Ethier, Stewart N. and Kurtz, Thomas G.},
  title     = {Markov Processes: Characterization and Convergence},
  publisher = {John Wiley \& Sons, New York},
  year      = {1986}
}

@book{durrett2019donsker,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5th},
  publisher = {Cambridge University Press},
  year      = {2019}
}

Prerequisites

37.03.02
02.15.01
37.05.04

Tier anchors

beginner: Durrett, Probability: Theory and Examples 5e §8.1 (the rescaled random walk and the invariance picture); Mörters-Peres, Brownian Motion (CUP, 2010) §5.3 (random walk to Brownian motion); physical intuition of a coarse-grained random walk filling out a continuous diffusion
intermediate: Durrett, Probability: Theory and Examples 5e §8.1-8.2 (Donsker's theorem, the continuous-mapping consequences); Billingsley, Convergence of Probability Measures 2e §§7-8, 13-14; Karatzas-Shreve, Brownian Motion and Stochastic Calculus 2e §2.4
master: Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §8.1-8.6; Billingsley, Convergence of Probability Measures (Wiley 2nd ed., 1999) Ch. 1-3; Ethier-Kurtz, Markov Processes: Characterization and Convergence (Wiley, 1986) Ch. 3; Whitt, Stochastic-Process Limits (Springer, 2002) Ch. 3-13

References

Donsker — An invariance principle for certain probability limit theorems · Mem. Amer. Math. Soc. 6 (1951), 1-12
Prokhorov — Convergence of random processes and limit theorems in probability theory · Theory Probab. Appl. 1 (1956), 157-214
Skorokhod — Studies in the Theory of Random Processes · Addison-Wesley, 1965 (the embedding theorem)
Erdős-Kac — On certain limit theorems of the theory of probability · Bull. Amer. Math. Soc. 52 (1946), 292-302
Billingsley — Convergence of Probability Measures · Wiley, 2nd ed., 1999, Ch. 1-3
Durrett — Probability: Theory and Examples · Cambridge University Press, 5th ed., 2019, §8.1-8.6

Estimated time

beginner: 20m
intermediate: 55m
master: 95m