Schilder's Theorem: Small-Noise Large Deviations for Brownian Motion
Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §5.1-§5.2 (Mogulskii, Schilder; the Cameron-Martin rate, exponential tightness in sup norm) and §5.6 (Freidlin-Wentzell); Deuschel & Stroock 1989 *Large Deviations* §1.3-§3.4; Freidlin & Wentzell 2012 *Random Perturbations of Dynamical Systems* 3rd ed. (Springer) §3 (the contraction from Schilder to diffusions)
Intuition Beginner
Watch a particle jiggle under random kicks — Brownian motion — but now slowly turn the kicks down by a small dial. As the noise shrinks, the wandering path is squeezed toward the one boring path that does nothing: it sits at the origin. Schilder's theorem answers the natural follow-up question. If, despite the tiny noise, the path manages to trace out some specific interesting shape instead of staying flat, how unlikely is that, and exactly how is the cost of each shape decided?
The answer is a single, very physical number attached to each candidate shape: its energy. Imagine the shape as the trip of a runner over one unit of time. At each instant the runner has a speed. Square that speed, add it up over the whole trip (and halve it), and you get the energy of that trip. A lazy, slow, smooth path has small energy and is cheap; a frantic, fast, wiggly path has huge energy and is wildly expensive. Schilder's theorem says the chance of the small-noise path looking like a given shape decays exponentially, and the number in the exponent is exactly that shape's energy divided by the size of the noise.
There is a catch that does real work. Only paths with a well-defined speed at (almost) every instant have a finite energy. A genuinely jagged shape — one that has no speed because it changes direction infinitely fast, the way a true Brownian path does — has infinite energy. So in the small-noise limit such jagged shapes are infinitely costly: the rare paths the particle is willing to draw are smooth ones, even though the typical (un-rationed) Brownian path is the jagged kind. The dial flips which paths are cheap.
This is the same recipe you have already seen for a single average, lifted to a whole path. For one average, the cost of landing at a wrong value came from a convex function — for the small-noise Gaussian, that cost was just half the value squared. A path is nothing but its readings at finitely many times, each reading a Gaussian increment with its own half-squared cost; add those costs along the trip, refine the grid, and the sum becomes the energy integral. Schilder's theorem is exactly this: many tiny Gaussian costs, glued along time.
Visual Beginner
Figure: a single time-axis from to . Faint grey: a cloud of jagged near-flat sample paths hugging the zero line, the typical small-noise behaviour. Bold: one smooth curve rising and falling — a candidate shape. Tangent arrows along mark its speed at sample times; a side gauge sums the squared speeds along the trip and halves the total to read off the energy. A caption notes that a hypothetical infinitely-jagged bold curve would send the gauge to infinity, so only smooth shapes get a finite price.
value
| bold smooth candidate f
| __
| ___/ \___ speed arrows: -> -> -> ->
| __/ \__ square & sum them, then halve:
0 |~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~ energy E(f) = 1/2 * sum (speed)^2
| (grey: jagged near-flat ----------------------------------
| typical small-noise paths) cost of f = E(f) / (noise size)
+--------------------------------- time
0 1
smooth f -> finite energy -> finite, payable cost
jagged f -> no speed -> infinite energy -> forbidden in the limit
Worked example Beginner
Take the simplest interesting shape: the straight ramp , rising at constant speed from to over the unit time interval. We compute its energy and read off the small-noise cost.
Step 1. Find the speed. A straight ramp has constant speed. Going from height to height in time means speed at every instant.
Step 2. Square the speed. The squared speed is , and it is the same constant all along the trip.
Step 3. Add it up over the trip and halve. The squared speed is over a time interval of length , so the total is . Halving gives the energy .
Step 4. Read off the small-noise cost. If the noise size is , the chance the small-noise path looks like this ramp decays like . At that is ; at it is .
What this tells us. A cheaper shape is one that reaches the same place more lazily. A ramp that climbs to height but takes a head start and a slow finish — spreading the same rise over the trip with smaller peak speeds — has more energy than the straight ramp only if it speeds up somewhere; in fact, among all paths from to height , the straight ramp has the least energy, because constant speed is the most economical way to cover a fixed rise. Schilder's theorem turns this "laziest path wins" principle into the precise exponential rate at which rare path-shapes appear as the noise vanishes.
Check your understanding Beginner
Formal definition Intermediate+
Fix the path space of continuous paths with , equipped with the supremum norm , a separable Banach space. Let be a standard -dimensional Brownian motion 02.15.01 and, for , let be the law on of the small-noise path . The speed is .
Definition (Cameron-Martin space). The Cameron-Martin space is $$ H^1_0 := \Big{ f\in\mathcal{C}0 : f \text{ is absolutely continuous},\ f(0)=0,\ \dot f\in L^2([0,1];\mathbb{R}^d) \Big}, $$ a Hilbert space under $\langle f,g\rangle{H}=\int_0^1\langle\dot f(t),\dot g(t)\rangle,dt\mathcal{C}0|f(t)-f(s)|\le|\dot f|{L^2}|t-s|^{1/2}|f|\infty\le|\dot f|{L^2}\mu_\varepsilon$-measure zero: Brownian paths are almost surely not absolutely continuous.
Definition (Cameron-Martin energy / Schilder rate function). The Schilder rate function is $$ \boxed{;I(f) ;=; \begin{cases} \dfrac12\displaystyle\int_0^1 |\dot f(t)|^2,dt, & f\in H^1_0,\[1.2ex] +\infty, & f\in\mathcal{C}_0\setminus H^1_0.\end{cases};} $$ Equivalently on and off it. This is the half-energy of the path, the same quadratic form that appears in the Cameron-Martin theorem governing admissible translations of Wiener measure [Cameron & Martin 1944].
Definition (good rate function, recalled). A function is a good rate function if it is lower-semicontinuous and its sublevel sets are compact in 37.07.01. For the Schilder rate, is a -bounded, uniformly equicontinuous (Hölder- with constant ) set, hence relatively compact in by Arzelà-Ascoli, and closed by lower semicontinuity — so is good.
Theorem statement (Schilder). The family satisfies the large deviation principle on at speed with good rate function : for every closed and open in , $$ \limsup_{\varepsilon\to0}\varepsilon\log\mu_\varepsilon(F)\le-\inf_F I,\qquad \liminf_{\varepsilon\to0}\varepsilon\log\mu_\varepsilon(G)\ge-\inf_G I. $$
Counterexamples to common slips
- The rate lives on a null set, and that is the point. is finite only on , which carries zero Wiener mass; the typical path is jagged and infinitely costly, while the rare path that the small-noise process draws is smooth. Reading "rate function supported on " as "Brownian motion is in " inverts the logic: the rate scores deviations, and the cheapest non-flat deviations are smooth precisely because the process resists them least.
- The topology is the sup norm, not an or weak topology. Schilder's LDP holds in the uniform topology on ; the exponential tightness that closes the upper bound is a sup-norm modulus-of-continuity estimate, and replacing by a coarser topology weakens the statement (fewer closed sets) while a finer one (e.g. Hölder- for ) requires a stronger tightness input. The rate function is unchanged, but the topology in which the bounds are asserted is part of the theorem.
- The factor and the speed are tied to the Gaussian. The increment of over is Gaussian with variance , whose Cramér rate is ; the is the Gaussian . For a non-Gaussian random walk the half-square is replaced by and one obtains Mogulskii's rate instead. Carrying the Gaussian into the random-walk statement is the error the next theorem corrects.
Key theorem with proof Intermediate+
We prove Schilder's theorem by the projective route: a finite-dimensional Cramér LDP on increments 37.07.02, glued across time-grids by the Dawson-Gärtner projective limit, and closed in sup norm by exponential tightness 37.07.09.
Theorem (Schilder). The laws of satisfy the LDP on at speed with good rate on and elsewhere. [Dembo & Zeitouni §5.2]
Proof. Finite-dimensional marginals. Fix a grid and the evaluation map , . Under the vector has independent Gaussian increments . Writing for the values and , the increment (with ) is, on the scale , an empirical-mean-type Gaussian whose Cramér rate 37.07.02 is the Gaussian , namely . By independence the joint rate is the sum,
$$
I_\pi(y)=\sum_{i=1}^n\frac{|y_i-y_{i-1}|^2}{2,(t_i-t_{i-1})},
$$
and Cramér's theorem gives the LDP for on at speed with this good rate. (Concretely evaluated on is a linear image of an i.i.d. Gaussian increment vector, and is the half-Euclidean-norm rate transported through that linear map.)
Compatibility and the projective rate. The grids, directed by refinement, form a projective system with through the evaluations, since a continuous path is determined by its values on a dense set of times. Coarsening a grid is a continuous linear projection, and the contraction principle forces compatibility : dropping an intermediate node replaces the two terms by the single merged term , which is no larger by convexity of (Jensen / the parallelogram law for the optimal interior value). Hence the Dawson-Gärtner theorem 37.07.09 yields the LDP for on the projective limit with rate
$$
I(f)=\sup_\pi I_\pi(p_\pi f)=\sup_\pi\sum_{i=1}^n\frac{|f(t_i)-f(t_{i-1})|^2}{2(t_i-t_{i-1})}.
$$
Identification with the energy. The supremum of the grid sums is exactly the Cameron-Martin energy. For , each term is by Jensen, so for every grid, and refining the grid makes the piecewise-constant approximation of converge in , pushing the sum up to . For the supremum diverges: a path that is not absolutely continuous, or whose distributional derivative is not in , has grid sums unbounded above (the -increments fail the Hölder-/finite-energy bound on some sequence of refinements). Thus on and off it, the stated rate, and the projective-limit goodness 37.07.09 together with the Arzelà-Ascoli compactness of sublevel sets makes good.
Sup-norm topology via exponential tightness. The Dawson-Gärtner construction delivers the LDP in the projective-limit (pointwise/cylinder) topology; upgrading to the sup norm requires exponential tightness in , supplied by a Brownian modulus-of-continuity estimate. For each ,
$$
\limsup_{\varepsilon\to0}\varepsilon\log\mathbb{P}\Big(\sup_{|t-s|\le h}|\sqrt\varepsilon(W_t-W_s)|>\eta\Big)\xrightarrow{h\to0}-\infty,
$$
a Garsia-Rodemich-Rumsey / reflection bound; the equicontinuous sets are compact by Arzelà-Ascoli and capture all but exponentially-thin mass at rate . Exponential tightness then closes the compact-set upper bound to all closed sets and confirms the projective and sup-norm topologies give the same LDP 37.07.09.
Bridge. This theorem builds toward the Freidlin-Wentzell theory of randomly perturbed dynamical systems and appears again in every small-noise computation of exit times, metastability rates, and quasipotentials, where the Cameron-Martin energy reappears as the action that the optimal escape path minimises. This is exactly the path-space realisation of the projective-limit machinery 37.07.09: a path is its readings at finitely many times, each reading carries the Gaussian Cramér rate 37.07.02, and Dawson-Gärtner glues the increment costs into the energy integral by the supremum over grids. The foundational reason the limit rate is the half-energy is the compatibility forced by convexity of — coarsening averages the speed and Jensen lowers the cost — so the supremum over refinements is exactly the limit and generalises the single-increment Gaussian rate to the whole trajectory. Putting these together, Schilder's theorem is dual to Mogulskii's: the Gaussian is replaced by the random-walk in the same projective gluing, and the bridge is the contraction principle 37.07.08, which carries this Brownian LDP through the Itô solution map to the diffusion rate of Freidlin-Wentzell.
Exercises Intermediate+
Advanced results Master
The Cameron-Martin space as the reproducing structure of Wiener measure
The rate function is not an accident of the proof: it is the energy of the Cameron-Martin Hilbert space , the unique Hilbert space continuously embedded in whose unit ball is the family of admissible shifts of Wiener measure [Cameron & Martin 1944]. The Cameron-Martin theorem says that translating by leaves the law quasi-invariant exactly when , with Radon-Nikodym density ; the exponent's deterministic part is precisely . Schilder's theorem is the large-deviation shadow of this: the cost of the small-noise path resembling is the same quadratic energy that prices the translation. The abstract-Wiener-space generalisation (Gross) replaces by any with dense and Gaussian, and Schilder becomes the Donsker-Varadhan-Stroock LDP with rate — the half-square of the Cameron-Martin norm, off [Deuschel & Stroock §3.4].
Mogulskii, the action integral, and the variational principle
Mogulskii's theorem [Mogulskii 1976] places Schilder inside a family: any i.i.d.-increment random walk, polygonally interpolated and diffusively scaled, obeys a sample-path LDP with rate , the action with Lagrangian . The Gaussian case is Schilder, with Lagrangian the kinetic energy; the general is the Legendre dual of the increment's cumulant generating function, so the path-space rate is a classical action whose Euler-Lagrange equations are the most-likely deviation paths. This is the entry point to the calculus of variations in large deviations: the infimum over an event is solved by a geodesic of the Lagrangian , and the minimiser is the instanton — the dominant rare trajectory. For Schilder the minimisers are straight lines (free kinetic Lagrangian), which is why every Schilder variational problem reduces to a Cauchy-Schwarz computation.
Freidlin-Wentzell as the contraction of Schilder
The payoff is the Freidlin-Wentzell theory of small random perturbations of [Freidlin & Wentzell §3]. For additive noise the Itô map is continuous on , so the contraction principle 37.07.08 transports Schilder's LDP to the diffusion , producing the rate — the energy of the control the noise must inject to steer the deterministic flow along . The quasipotential governs exit times, metastable transition rates (Eyring-Kramers), and the structure of the invariant measure's small-noise asymptotics. For multiplicative noise the Itô map is only continuous after a Wong-Zakai / rough-path correction, but the conclusion persists with replaced by the -weighted norm — Schilder remains the seed LDP at the top of the construction.
Synthesis. The central insight of Schilder's theorem is that the small-noise Brownian LDP is exactly the projective gluing of finite-dimensional Gaussian Cramér rates 37.07.02 into the Cameron-Martin energy, so the infinite-dimensional principle generalises the single-increment half-square to the whole path with no direct infinite-dimensional estimate beyond a sup-norm tightness bound 37.07.09. The foundational reason the rate is the energy is the convexity of , which makes the grid rates compatible and their supremum the action; this is dual to Mogulskii's , the same gluing with the Gaussian replaced by an arbitrary increment's Legendre dual. Putting these together with the contraction principle 37.07.08 yields Freidlin-Wentzell: the bridge is the Itô map, which carries the Cameron-Martin energy to the diffusion action , and the quasipotential that prices metastability appears again in Eyring-Kramers exit-rate asymptotics and the small-noise structure of invariant measures. This is exactly the architecture promised by the chapter: Cramér supplies the local cost, Dawson-Gärtner lifts it to path space, exponential tightness fixes the topology, and contraction propagates the rate to every downstream stochastic system.
Full proof set Master
Proposition 1 (finite-dimensional increment LDP). Fix a grid . The laws of satisfy the LDP on at speed with good rate , .
Proof. The increment vector , , has independent coordinates . For a single Gaussian , writing it as times , the small- LDP at speed has rate the Gaussian : the cumulant generating function of is , with Legendre dual 37.07.02. By independence the joint rate is the sum , and the linear change of coordinates (a bijection with unit Jacobian on ) transports the rate to by the contraction principle along a continuous bijection. Goodness holds because is a positive-definite quadratic form, with compact (ellipsoidal) sublevel sets.
Proposition 2 (compatibility under coarsening). If drops the node , then for all , so the family is compatible (monotone non-decreasing along refinement).
Proof. Coarsening merges the two terms over and into one over . Set , , , . The two-term cost is and the merged cost is . By the weighted-Cauchy-Schwarz (or the convexity of applied to the average ), $$ \frac{|u+w|^2}{\alpha+\beta}=\frac{\big|\alpha\tfrac{u}{\alpha}+\beta\tfrac{w}{\beta}\big|^2}{\alpha+\beta}\le\alpha\Big|\frac{u}{\alpha}\Big|^2+\beta\Big|\frac{w}{\beta}\Big|^2=\frac{|u|^2}{\alpha}+\frac{|w|^2}{\beta}, $$ the inequality being Jensen for the convex with weights . Halving gives the merged term the two-term sum, so . Dropping several nodes iterates this.
Proposition 3 (identification of the projective rate with the energy). For , if , and otherwise.
Proof. If : on , Jensen for gives , so ; summing, for every , so . Conversely, the piecewise-constant function is the -conditional expectation of on the grid -algebra, and ; as refines, in by the martingale convergence / Lebesgue differentiation theorem, so . Hence . If : either is not absolutely continuous or . In both cases the conditional-expectation norms are unbounded over refinements — if they stayed bounded by , the would be an -bounded martingale converging in to an derivative, forcing with , a contradiction. So .
Proposition 4 (goodness of the Schilder rate). on , off it, is a good rate function on .
Proof. Lower semicontinuity: is a supremum of the continuous (hence lsc) maps — each is sup-norm continuous and continuous — so is lsc. Compact sublevel sets: . By the embedding estimate and , the set is uniformly bounded and uniformly equicontinuous, hence relatively compact in by Arzelà-Ascoli; it is closed by lower semicontinuity of . A closed subset of a relatively compact set is compact, so is compact and is good.
Connections Master
Schilder's theorem is the canonical application of the Dawson-Gärtner projective limit
37.07.09: the path space is the projective limit of its finite-time evaluation marginals, each marginal carries a Gaussian Cramér LDP, and the projective-limit rate is the Cameron-Martin energy. The exponential tightness in sup norm that closes the upper bound is exactly the closure datum that unit isolates, supplied here by a Brownian modulus-of-continuity estimate.The finite-dimensional increment rates are instances of Cramér's theorem
37.07.02: the Gaussian increment has cumulant generating function and Legendre dual , so the per-increment cost is the Gaussian and the path cost is its glued sum; replacing the Gaussian by a general increment's turns Schilder into Mogulskii.The object whose small-noise scaling is studied is the Wiener process
02.15.01: the scaling is Brownian motion with variance dialled down by , and the proof leans on Brownian increment independence, Gaussianity, and the sup-norm modulus of continuity. The rate function lives on the Cameron-Martin subspace, a Wiener-null set, encoding that the rare small-noise paths are the smooth ones the process most resists.The downstream payoff is Freidlin-Wentzell theory via the contraction principle
37.07.08: pushing Schilder's LDP through the continuous Itô solution map produces the diffusion rate , whose quasipotential governs exit times and metastability — the contraction principle is the exact transport, and the Itô calculus02.15.02is the map.
Historical & philosophical context Master
The small-noise large-deviation principle for Wiener integrals was established by Michael Schilder in 1966 [Schilder 1966], who computed the asymptotics of for the rescaled Wiener measure and identified the exponential rate with the Cameron-Martin energy; Varadhan, in the same year [Varadhan 1966], gave the principle its modern variational form and connected it to small-parameter asymptotics of partial differential equations. The energy functional itself predates the large-deviation reading: Cameron and Martin had isolated the space and its quadratic form in 1944 [Cameron & Martin 1944] as the admissible translations of Wiener measure, so Schilder's rate is the large-deviation incarnation of a structure already central to Gaussian analysis.
The random-walk analogue was proved by Anatolii Mogulskii in 1976 [Mogulskii 1976], replacing the Gaussian half-square by the increment's Legendre dual and exhibiting the path-space rate as a classical action; the projective-limit organisation that unifies the two — and that the present proof follows — is the systematic treatment of Dembo and Zeitouni [Dembo & Zeitouni §5.1-§5.2] and of Deuschel and Stroock [Deuschel & Stroock §3.4]. The application that made the theorem indispensable is the Freidlin-Wentzell theory of randomly perturbed dynamical systems [Freidlin & Wentzell §3], developed through the 1970s, which contracts Schilder's Brownian principle along the Itô map to price the rare excursions of a diffusion against its deterministic drift.
Bibliography Master
@article{schilder1966asymptotic,
author = {Schilder, Michael},
title = {Some asymptotic formulas for Wiener integrals},
journal = {Transactions of the American Mathematical Society},
volume = {125},
number = {1},
pages = {63--85},
year = {1966}
}
@article{varadhan1966asymptotic,
author = {Varadhan, S. R. S.},
title = {Asymptotic probabilities and differential equations},
journal = {Communications on Pure and Applied Mathematics},
volume = {19},
number = {3},
pages = {261--286},
year = {1966}
}
@article{cameronmartin1944transformations,
author = {Cameron, R. H. and Martin, W. T.},
title = {Transformations of Wiener integrals under translations},
journal = {Annals of Mathematics},
volume = {45},
number = {2},
pages = {386--396},
year = {1944}
}
@article{mogulskii1976large,
author = {Mogulskii, A. A.},
title = {Large deviations for trajectories of multidimensional random walks},
journal = {Theory of Probability and its Applications},
volume = {21},
number = {2},
pages = {300--315},
year = {1976}
}
@book{dembozeitouni1998ldp,
author = {Dembo, Amir and Zeitouni, Ofer},
title = {Large Deviations Techniques and Applications},
edition = {2nd},
series = {Applications of Mathematics},
number = {38},
publisher = {Springer},
year = {1998}
}
@book{deuschelstroock1989large,
author = {Deuschel, Jean-Dominique and Stroock, Daniel W.},
title = {Large Deviations},
series = {Pure and Applied Mathematics},
number = {137},
publisher = {Academic Press},
year = {1989}
}
@book{freidlinwentzell2012random,
author = {Freidlin, Mark I. and Wentzell, Alexander D.},
title = {Random Perturbations of Dynamical Systems},
edition = {3rd},
series = {Grundlehren der mathematischen Wissenschaften},
number = {260},
publisher = {Springer},
year = {2012}
}