37.07.10 · probability / 07-large-deviations

Schilder's Theorem: Small-Noise Large Deviations for Brownian Motion

shipped3 tiersLean: none

Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §5.1-§5.2 (Mogulskii, Schilder; the Cameron-Martin rate, exponential tightness in sup norm) and §5.6 (Freidlin-Wentzell); Deuschel & Stroock 1989 *Large Deviations* §1.3-§3.4; Freidlin & Wentzell 2012 *Random Perturbations of Dynamical Systems* 3rd ed. (Springer) §3 (the contraction from Schilder to diffusions)

Intuition Beginner

Watch a particle jiggle under random kicks — Brownian motion — but now slowly turn the kicks down by a small dial. As the noise shrinks, the wandering path is squeezed toward the one boring path that does nothing: it sits at the origin. Schilder's theorem answers the natural follow-up question. If, despite the tiny noise, the path manages to trace out some specific interesting shape instead of staying flat, how unlikely is that, and exactly how is the cost of each shape decided?

The answer is a single, very physical number attached to each candidate shape: its energy. Imagine the shape as the trip of a runner over one unit of time. At each instant the runner has a speed. Square that speed, add it up over the whole trip (and halve it), and you get the energy of that trip. A lazy, slow, smooth path has small energy and is cheap; a frantic, fast, wiggly path has huge energy and is wildly expensive. Schilder's theorem says the chance of the small-noise path looking like a given shape decays exponentially, and the number in the exponent is exactly that shape's energy divided by the size of the noise.

There is a catch that does real work. Only paths with a well-defined speed at (almost) every instant have a finite energy. A genuinely jagged shape — one that has no speed because it changes direction infinitely fast, the way a true Brownian path does — has infinite energy. So in the small-noise limit such jagged shapes are infinitely costly: the rare paths the particle is willing to draw are smooth ones, even though the typical (un-rationed) Brownian path is the jagged kind. The dial flips which paths are cheap.

This is the same recipe you have already seen for a single average, lifted to a whole path. For one average, the cost of landing at a wrong value came from a convex function — for the small-noise Gaussian, that cost was just half the value squared. A path is nothing but its readings at finitely many times, each reading a Gaussian increment with its own half-squared cost; add those costs along the trip, refine the grid, and the sum becomes the energy integral. Schilder's theorem is exactly this: many tiny Gaussian costs, glued along time.

Visual Beginner

Figure: a single time-axis from $0$ to $1$ . Faint grey: a cloud of jagged near-flat sample paths hugging the zero line, the typical small-noise behaviour. Bold: one smooth curve $f$ rising and falling — a candidate shape. Tangent arrows along $f$ mark its speed at sample times; a side gauge sums the squared speeds along the trip and halves the total to read off the energy. A caption notes that a hypothetical infinitely-jagged bold curve would send the gauge to infinity, so only smooth shapes get a finite price.

   value
     |            bold smooth candidate f
     |               __
     |          ___/    \___              speed arrows: ->  ->   ->  ->
     |       __/             \__          square & sum them, then halve:
   0 |~^~^~^~^~^~^~^~^~^~^~^~^~^~^~^~      energy  E(f) = 1/2 * sum (speed)^2
     |  (grey: jagged near-flat            ----------------------------------
     |   typical small-noise paths)        cost of f  =  E(f) / (noise size)
     +--------------------------------- time
     0                                 1

   smooth f  -> finite energy -> finite, payable cost
   jagged f  -> no speed       -> infinite energy -> forbidden in the limit

Worked example Beginner

Take the simplest interesting shape: the straight ramp $f (t) = t$ , rising at constant speed from $0$ to $1$ over the unit time interval. We compute its energy and read off the small-noise cost.

Step 1. Find the speed. A straight ramp has constant speed. Going from height $0$ to height $1$ in time $1$ means speed $= 1$ at every instant.

Step 2. Square the speed. The squared speed is $1 \times 1 = 1$ , and it is the same constant $1$ all along the trip.

Step 3. Add it up over the trip and halve. The squared speed is $1$ over a time interval of length $1$ , so the total is $1 \times 1 = 1$ . Halving gives the energy $E (f) = \frac{1}{2} \times 1 = 0.5$ .

Step 4. Read off the small-noise cost. If the noise size is $ε$ , the chance the small-noise path looks like this ramp decays like $e^{- E (f) / ε} = e^{- 0.5/ ε}$ . At $ε = 0.1$ that is $e^{- 5} \approx 0.0067$ ; at $ε = 0.01$ it is $e^{- 50} \approx 2 \times 1 0^{- 22}$ .

What this tells us. A cheaper shape is one that reaches the same place more lazily. A ramp that climbs to height $1$ but takes a head start and a slow finish — spreading the same rise over the trip with smaller peak speeds — has more energy than the straight ramp only if it speeds up somewhere; in fact, among all paths from $0$ to height $1$ , the straight ramp has the least energy, because constant speed is the most economical way to cover a fixed rise. Schilder's theorem turns this "laziest path wins" principle into the precise exponential rate at which rare path-shapes appear as the noise vanishes.

Check your understanding Beginner

Formal definition Intermediate+

Fix the path space $C_{0} := C_{0} ([0, 1]; R^{d})$ of continuous paths $f : [0, 1] \to R^{d}$ with $f (0) = 0$ , equipped with the supremum norm $∥ f ∥_{\infty} = sup_{t \in [0, 1]} ∣ f (t) ∣$ , a separable Banach space. Let $W = (W_{t})_{t \in [0, 1]}$ be a standard $d$ -dimensional Brownian motion 02.15.01 and, for $ε > 0$ , let $μ_{ε}$ be the law on $C_{0}$ of the small-noise path $ε W$ . The speed is $a_{ε} = ε \to 0$ .

Definition (Cameron-Martin space). The Cameron-Martin space is $$ H^1_0 := \Big{ f\in\mathcal{C}0 : f \text{ is absolutely continuous},\ f(0)=0,\ \dot f\in L^2([0,1];\mathbb{R}^d) \Big}, $$ a Hilbert space under $\langle f,g\rangle{H}=\int_0^1\langle\dot f(t),\dot g(t)\rangle,dt $. I t e mb e d sco n t in u o u s l y an dd e n se l y in t o$ \mathcal{C}0 $(b y C a u c h y - S c h w a r z,$ |f(t)-f(s)|\le|\dot f|{L^2}|t-s|^{1/2} $, so$ |f|\infty\le|\dot f|{L^2} $), b u t i s i t se l f a B or e l se t o f$ \mu_\varepsilon$-measure zero: Brownian paths are almost surely not absolutely continuous.

Definition (Cameron-Martin energy / Schilder rate function). The Schilder rate function $I : C_{0} \to [0, \infty]$ is $$ \boxed{;I(f) ;=; \begin{cases} \dfrac12\displaystyle\int_0^1 |\dot f(t)|^2,dt, & f\in H^1_0,\[1.2ex] +\infty, & f\in\mathcal{C}_0\setminus H^1_0.\end{cases};} $$ Equivalently $I (f) = \frac{1}{2} ∥ f ∥_{H}^{2}$ on $H_{0}^{1}$ and $+ \infty$ off it. This is the half-energy of the path, the same quadratic form that appears in the Cameron-Martin theorem governing admissible translations of Wiener measure ^{[Cameron & Martin 1944]}.

Definition (good rate function, recalled). A function $I : C_{0} \to [0, \infty]$ is a good rate function if it is lower-semicontinuous and its sublevel sets $Ψ_{I} (α) = {f : I (f) \leq α}$ are compact in $(C_{0}, ∥ \cdot ∥_{\infty})$ 37.07.01. For the Schilder rate, $Ψ_{I} (α) = {f \in H_{0}^{1} : ∥ \dot{f} ∥_{L^{2}}^{2} \leq 2 α}$ is a $∥ \cdot ∥_{\infty}$ -bounded, uniformly equicontinuous (Hölder- $\frac{1}{2}$ with constant $2 α$ ) set, hence relatively compact in $C_{0}$ by Arzelà-Ascoli, and closed by lower semicontinuity — so $I$ is good.

Theorem statement (Schilder). The family ${μ_{ε}} = {law (ε W)}$ satisfies the large deviation principle on $(C_{0}, ∥ \cdot ∥_{\infty})$ at speed $a_{ε} = ε$ with good rate function $I$ : for every closed $F$ and open $G$ in $C_{0}$ , $$ \limsup_{\varepsilon\to0}\varepsilon\log\mu_\varepsilon(F)\le-\inf_F I,\qquad \liminf_{\varepsilon\to0}\varepsilon\log\mu_\varepsilon(G)\ge-\inf_G I. $$

Counterexamples to common slips

The rate lives on a null set, and that is the point. $I$ is finite only on $H_{0}^{1}$ , which carries zero Wiener mass; the typical path is jagged and infinitely costly, while the rare path that the small-noise process draws is smooth. Reading "rate function supported on $H_{0}^{1}$ " as "Brownian motion is in $H_{0}^{1}$ " inverts the logic: the rate scores deviations, and the cheapest non-flat deviations are smooth precisely because the process resists them least.
The topology is the sup norm, not an $L^{2}$ or weak topology. Schilder's LDP holds in the uniform topology on $C_{0}$ ; the exponential tightness that closes the upper bound is a sup-norm modulus-of-continuity estimate, and replacing $∥ \cdot ∥_{\infty}$ by a coarser topology weakens the statement (fewer closed sets) while a finer one (e.g. Hölder- $β$ for $β < \frac{1}{2}$ ) requires a stronger tightness input. The rate function is unchanged, but the topology in which the bounds are asserted is part of the theorem.
The factor $\frac{1}{2}$ and the speed $ε$ are tied to the Gaussian. The increment of $ε W$ over $[s, t]$ is Gaussian with variance $ε (t - s)$ , whose Cramér rate is $\frac{1}{2} ∣ \cdot ∣^{2} / (t - s)$ ; the $\frac{1}{2}$ is the Gaussian $Λ^{*}$ . For a non-Gaussian random walk the half-square is replaced by $Λ^{*}$ and one obtains Mogulskii's rate instead. Carrying the Gaussian $\frac{1}{2}$ into the random-walk statement is the error the next theorem corrects.

Key theorem with proof Intermediate+

We prove Schilder's theorem by the projective route: a finite-dimensional Cramér LDP on increments 37.07.02, glued across time-grids by the Dawson-Gärtner projective limit, and closed in sup norm by exponential tightness 37.07.09.

Theorem (Schilder). The laws $μ_{ε}$ of $ε W$ satisfy the LDP on $(C_{0} ([0, 1]; R^{d}), ∥ \cdot ∥_{\infty})$ at speed $ε$ with good rate $I (f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2} d t$ on $H_{0}^{1}$ and $+ \infty$ elsewhere. ^{[Dembo & Zeitouni §5.2]}

Proof. Finite-dimensional marginals. Fix a grid $π : 0 = t_{0} < t_{1} < \dots < t_{n} = 1$ and the evaluation map $p_{π} : C_{0} \to (R^{d})^{n}$ , $p_{π} (f) = (f (t_{1}), \dots, f (t_{n}))$ . Under $μ_{ε}$ the vector $p_{π} (ε W)$ has independent Gaussian increments $ε (W_{t_{i}} - W_{t_{i - 1}}) \sim N (0, ε (t_{i} - t_{i - 1}) I_{d})$ . Writing $y = (y_{1}, \dots, y_{n})$ for the values and $Δ_{i} = t_{i} - t_{i - 1}$ , the increment $y_{i} - y_{i - 1}$ (with $y_{0} = 0$ ) is, on the scale $ε$ , an empirical-mean-type Gaussian whose Cramér rate 37.07.02 is the Gaussian $Λ^{*}$ , namely $\frac{1}{2} ∣ y_{i} - y_{i - 1} ∣^{2} / Δ_{i}$ . By independence the joint rate is the sum, $$ I_\pi(y)=\sum_{i=1}^n\frac{|y_i-y_{i-1}|^2}{2,(t_i-t_{i-1})}, $$ and Cramér's theorem gives the LDP for ${p_{π}_{*} μ_{ε}}$ on $(R^{d})^{n}$ at speed $ε$ with this good rate. (Concretely $ε W$ evaluated on $π$ is a linear image of an i.i.d. Gaussian increment vector, and $I_{π}$ is the half-Euclidean-norm rate transported through that linear map.)

Compatibility and the projective rate. The grids, directed by refinement, form a projective system with $C_{0} = lim_{π} (R^{d})^{∣ π ∣}$ through the evaluations, since a continuous path is determined by its values on a dense set of times. Coarsening a grid is a continuous linear projection, and the contraction principle forces compatibility $I_{π^{'}} (p_{π^{'} π} y) \leq I_{π} (y)$ : dropping an intermediate node $t_{j}$ replaces the two terms $\frac{∣ y _{j} - y _{j - 1} ∣ ^{2}}{2 Δ _{j}} + \frac{∣ y _{j + 1} - y _{j} ∣ ^{2}}{2 Δ _{j + 1}}$ by the single merged term $\frac{∣ y _{j + 1} - y _{j - 1} ∣ ^{2}}{2 ( Δ _{j} + Δ _{j + 1} )}$ , which is no larger by convexity of $v \mapsto ∣ v ∣^{2}$ (Jensen / the parallelogram law for the optimal interior value). Hence the Dawson-Gärtner theorem 37.07.09 yields the LDP for ${μ_{ε}}$ on the projective limit with rate $$ I(f)=\sup_\pi I_\pi(p_\pi f)=\sup_\pi\sum_{i=1}^n\frac{|f(t_i)-f(t_{i-1})|^2}{2(t_i-t_{i-1})}. $$

Identification with the energy. The supremum of the grid sums is exactly the Cameron-Martin energy. For $f \in H_{0}^{1}$ , each term is $\frac{1}{2} Δ_{i} \frac{1}{Δ _{i}} \int_{t_{i - 1}}^{t_{i}} \dot{f}^{2} \leq \frac{1}{2} \int_{t_{i - 1}}^{t_{i}} ∣ \dot{f} ∣^{2}$ by Jensen, so $I_{π} (p_{π} f) \leq \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2}$ for every grid, and refining the grid makes the piecewise-constant approximation of $\dot{f}$ converge in $L^{2}$ , pushing the sum up to $\frac{1}{2} ∥ \dot{f} ∥_{L^{2}}^{2}$ . For $f \in / H_{0}^{1}$ the supremum diverges: a path that is not absolutely continuous, or whose distributional derivative is not in $L^{2}$ , has grid sums unbounded above (the $f$ -increments fail the Hölder- $\frac{1}{2}$ /finite-energy bound on some sequence of refinements). Thus $I (f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2}$ on $H_{0}^{1}$ and $+ \infty$ off it, the stated rate, and the projective-limit goodness 37.07.09 together with the Arzelà-Ascoli compactness of sublevel sets makes $I$ good.

Sup-norm topology via exponential tightness. The Dawson-Gärtner construction delivers the LDP in the projective-limit (pointwise/cylinder) topology; upgrading to the sup norm requires exponential tightness in $∥ \cdot ∥_{\infty}$ , supplied by a Brownian modulus-of-continuity estimate. For each $η > 0$ , $$ \limsup_{\varepsilon\to0}\varepsilon\log\mathbb{P}\Big(\sup_{|t-s|\le h}|\sqrt\varepsilon(W_t-W_s)|>\eta\Big)\xrightarrow{h\to0}-\infty, $$ a Garsia-Rodemich-Rumsey / reflection bound; the equicontinuous sets $K_{M} = {f : ∥ f ∥_{\infty} \leq R_{M}, ω_{f} (h) \leq ϕ_{M} (h)}$ are compact by Arzelà-Ascoli and capture all but exponentially-thin mass at rate $M$ . Exponential tightness then closes the compact-set upper bound to all closed sets and confirms the projective and sup-norm topologies give the same LDP 37.07.09. $□$

Bridge. This theorem builds toward the Freidlin-Wentzell theory of randomly perturbed dynamical systems and appears again in every small-noise computation of exit times, metastability rates, and quasipotentials, where the Cameron-Martin energy reappears as the action that the optimal escape path minimises. This is exactly the path-space realisation of the projective-limit machinery 37.07.09: a path is its readings at finitely many times, each reading carries the Gaussian Cramér rate 37.07.02, and Dawson-Gärtner glues the increment costs into the energy integral by the supremum over grids. The foundational reason the limit rate is the half-energy is the compatibility $I_{π^{'}} (p_{π^{'} π} \cdot) \leq I_{π}$ forced by convexity of $∣ v ∣^{2}$ — coarsening averages the speed and Jensen lowers the cost — so the supremum over refinements is exactly the $L^{2}$ limit $\frac{1}{2} \int ∣ \dot{f} ∣^{2}$ and generalises the single-increment Gaussian rate to the whole trajectory. Putting these together, Schilder's theorem is dual to Mogulskii's: the Gaussian $\frac{1}{2} ∣ v ∣^{2}$ is replaced by the random-walk $Λ^{*} (v)$ in the same projective gluing, and the bridge is the contraction principle 37.07.08, which carries this Brownian LDP through the Itô solution map to the diffusion rate $\frac{1}{2} \int ∣ \overset{φ}{˙} - b (φ) ∣^{2}$ of Freidlin-Wentzell.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Verify the grid-sum identity at the heart of the proof: for a fixed grid $π$ , the finite-dimensional rate $I_{π} (y) = \sum_{i} \frac{∣ y _{i} - y _{i - 1} ∣ ^{2}}{2 Δ _{i}}$ equals the Schilder energy $\frac{1}{2} \int_{0}^{1} ∣ \overset{g}{˙} ∣^{2}$ of the piecewise-linear interpolant $g$ of the data $(t_{i}, y_{i})$ . Conclude $I_{π} (p_{π} f) = I (g_{π})$ where $g_{π}$ is the polygonal interpolation of $f$ on $π$ .

Hint

On each $[t_{i - 1}, t_{i}]$ the interpolant has constant speed $(y_{i} - y_{i - 1}) / Δ_{i}$ .

Answer

On $[t_{i - 1}, t_{i}]$ the polygonal interpolant $g$ is linear with constant derivative $\overset{g}{˙} = (y_{i} - y_{i - 1}) / Δ_{i}$ , so $\int_{t_{i - 1}}^{t_{i}} ∣ \overset{g}{˙} ∣^{2} d t = Δ_{i} \cdot ∣ y_{i} - y_{i - 1} ∣^{2} / Δ_{i}^{2} = ∣ y_{i} - y_{i - 1} ∣^{2} / Δ_{i}$ . Summing over $i$ and halving, $\frac{1}{2} \int_{0}^{1} ∣ \overset{g}{˙} ∣^{2} = \sum_{i} \frac{∣ y _{i} - y _{i - 1} ∣ ^{2}}{2 Δ _{i}} = I_{π} (y)$ . Taking $y = p_{π} f$ gives $I_{π} (p_{π} f) = I (g_{π})$ . Since $g_{π}$ interpolates $f$ at the grid points, $I (g_{π}) \leq I (f)$ (the polygon is the least-energy path through those nodes), and refining $π$ sends $g_{π} \to f$ with $I (g_{π}) ↑ I (f)$ , which is exactly the supremum-over-grids identification.

Exercise 4 (medium, symbolic).

Prove the embedding bound $∥ f ∥_{\infty} \leq ∥ \dot{f} ∥_{L^{2}}$ for $f \in H_{0}^{1} ([0, 1]; R^{d})$ , and use it to show that the Schilder sublevel set $Ψ_{I} (α)$ is bounded and uniformly Hölder- $\frac{1}{2}$ in $∥ \cdot ∥_{\infty}$ .

Hint

$f (t) = \int_{0}^{t} \dot{f}$ ; bound the integral by Cauchy-Schwarz on $[0, t]$ and on $[s, t]$ .

Answer

For $f \in H_{0}^{1}$ , $f (t) = \int_{0}^{t} \dot{f} (u) d u$ , so by Cauchy-Schwarz $∣ f (t) ∣ \leq (\int_{0}^{t} 1)^{1/2} (\int_{0}^{t} ∣ \dot{f} ∣^{2})^{1/2} = t^{1/2} ∥ \dot{f} ∥_{L^{2}} \leq ∥ \dot{f} ∥_{L^{2}}$ ; taking the sup over $t$ gives $∥ f ∥_{\infty} \leq ∥ \dot{f} ∥_{L^{2}}$ . The same estimate on $[s, t]$ gives $∣ f (t) - f (s) ∣ \leq ∣ t - s ∣^{1/2} ∥ \dot{f} ∥_{L^{2}}$ , a Hölder- $\frac{1}{2}$ bound. On $Ψ_{I} (α) = {∥ \dot{f} ∥_{L^{2}}^{2} \leq 2 α}$ we get $∥ f ∥_{\infty} \leq 2 α$ and $∣ f (t) - f (s) ∣ \leq 2 α ∣ t - s ∣^{1/2}$ , so the sublevel set is uniformly bounded and uniformly equicontinuous, hence relatively compact in $C_{0}$ by Arzelà-Ascoli — the goodness of $I$ .

Exercise 5 (medium, symbolic).

State and prove Mogulskii's theorem schematically as the random-walk analogue of Schilder. For the polygonal interpolation $Z_{ε}$ of a centred i.i.d.-increment random walk with cumulant generating function $Λ$ , scaled to $C_{0} ([0, 1])$ , show the projective route gives rate $J (f) = \int_{0}^{1} Λ^{*} (\dot{f}) d t$ , and recover Schilder as the Gaussian case.

Hint

Replace the Gaussian increment rate $\frac{1}{2} ∣ v ∣^{2} /Δ$ by the Cramér rate $Δ Λ^{*} (v /Δ)$ ; the supremum over grids of the Riemann sums is the action $\int Λ^{*} (\dot{f})$ .

Answer

On a grid $π$ the marginal $(Z_{ε} (t_{1}), \dots, Z_{ε} (t_{n}))$ is a vector of independent scaled increment-sums, so Cramér's theorem 37.07.02 gives the finite-dimensional rate $J_{π} (y) = \sum_{i} Δ_{i} Λ^{*} (\frac{y _{i} - y _{i - 1}}{Δ _{i}})$ , the action of the polygonal interpolant evaluated with the convex integrand $Λ^{*}$ . Convexity of $Λ^{*}$ makes coarsening reduce the sum (Jensen), so the marginals are compatible and Dawson-Gärtner 37.07.09 yields $J (f) = sup_{π} J_{π} (p_{π} f) = \int_{0}^{1} Λ^{*} (\dot{f} (t)) d t$ for absolutely continuous $f$ , $+ \infty$ otherwise; exponential approximation handles the polygonal-vs-sampled gap in sup norm. For the Gaussian increment, $Λ (λ) = \frac{1}{2} ∣ λ ∣^{2}$ so $Λ^{*} (v) = \frac{1}{2} ∣ v ∣^{2}$ , and $J (f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2} = I (f)$ , which is Schilder's theorem. Mogulskii is thus the same projective gluing with the Gaussian half-square replaced by the increment's $Λ^{*}$ .

Exercise 6 (hard, symbolic).

Derive the contraction to Freidlin-Wentzell: assuming the Itô solution map $Φ : C_{0} \to C$ sending the driving path to the solution of $d X^{ε} = b (X^{ε}) d t + ε d W$ is continuous in sup norm (additive-noise case), use the contraction principle 37.07.08 on Schilder's LDP to obtain the LDP for $X^{ε}$ with rate $S (φ) = \frac{1}{2} \int_{0}^{1} ∣ \overset{φ}{˙} - b (φ) ∣^{2} d t$ on absolutely continuous $φ$ with $φ (0) = x_{0}$ .

Hint

For additive noise the controlled ODE $\overset{φ}{˙} = b (φ) + \dot{f}$ inverts to $\dot{f} = \overset{φ}{˙} - b (φ)$ ; push $I (f) = \frac{1}{2} \int ∣ \dot{f} ∣^{2}$ through $φ = Φ (f)$ by the contraction infimum.

Answer

The contraction principle 37.07.08 states that if ${μ_{ε}}$ satisfies the LDP with good rate $I$ and $Φ$ is continuous, then ${Φ_{*} μ_{ε}}$ satisfies the LDP with rate $S (φ) = in f {I (f) : Φ (f) = φ}$ . For additive noise the skeleton equation is $\overset{φ}{˙} = b (φ) + \dot{f}$ , $φ (0) = x_{0}$ , which has the unique inverse $\dot{f} = \overset{φ}{˙} - b (φ)$ (so the fibre ${f : Φ (f) = φ}$ is a single point when $φ$ is absolutely continuous with $φ (0) = x_{0}$ , and empty otherwise). Hence $S (φ) = I (f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2} = \frac{1}{2} \int_{0}^{1} ∣ \overset{φ}{˙} (t) - b (φ (t)) ∣^{2} d t$ for such $φ$ and $+ \infty$ otherwise. This is the Freidlin-Wentzell action ^{[Freidlin & Wentzell §3]}: the Cameron-Martin energy of the control $f$ that the noise must supply to steer the deterministic flow $\overset{x}{˙} = b (x)$ along $φ$ . Schilder is the case $b \equiv 0$ , where $S = I$ .

Exercise 7 (hard, symbolic).

Prove the sup-norm exponential tightness used in the proof, at the level of a scaling reduction: show that the modulus-of-continuity bound for $ε W$ reduces to one for $W$ , i.e. that $$ \mathbb{P}\Big(\sup_{|t-s|\le h}|\sqrt\varepsilon(W_t-W_s)|>\eta\Big)=\mathbb{P}\Big(\sup_{|t-s|\le h}|W_t-W_s|>\eta/\sqrt\varepsilon\Big), $$ and explain why a Gaussian tail of the standard modulus then yields the $- \infty$ exponential rate as $h \to 0$ .

Hint

Factor out $ε$ inside the supremum; the right side has threshold $η / ε \to \infty$ , and the Brownian modulus has Gaussian-type tails.

Answer

Since $ε$ is a deterministic positive scalar, $∣ ε (W_{t} - W_{s}) ∣ = ε ∣ W_{t} - W_{s} ∣$ , and the event ${ε sup ∣ W_{t} - W_{s} ∣ > η}$ equals ${sup ∣ W_{t} - W_{s} ∣ > η / ε}$ , giving the identity. The Brownian modulus of continuity satisfies a Gaussian-type tail bound: there exist $C, c > 0$ with $P (sup_{∣ t - s ∣ \leq h} ∣ W_{t} - W_{s} ∣ > u) \leq C h^{- 1} exp (- c u^{2} / h)$ for $u$ large relative to $h lo g (1/ h)$ (a reflection/chaining estimate). Setting $u = η / ε$ , $$ \varepsilon\log\mathbb{P}(\cdots)\le\varepsilon\log(Ch^{-1})-c,\frac{\eta^2}{h}\xrightarrow{\varepsilon\to0}-\frac{c\eta^2}{h}, $$ and letting $h \to 0$ sends this to $- \infty$ . Thus the equicontinuity defect decays super-exponentially in $ε$ once $h$ is small, which is exactly the exponential-tightness input 37.07.09 that upgrades the cylinder-topology LDP to the sup-norm LDP.

Exercise 8 (hard, symbolic).

Use Schilder's theorem to compute the small-noise rate of a sup-functional: find $lim_{ε \to 0} ε lo g P (sup_{t \in [0, 1]} ε W_{t} \geq a)$ for $a > 0$ , by solving the variational problem $in f {I (f) : sup_{t} f (t) \geq a}$ .

Hint

The functional $f \mapsto sup_{t} f (t)$ is continuous in sup norm; minimise the energy over paths reaching height $a$ . The cheapest path rises to $a$ as fast-yet-laziest possible and may then stay flat.

Answer

The map $f \mapsto sup_{t} f (t)$ is $∥ \cdot ∥_{\infty}$ -continuous, and the set ${sup_{t} f \geq a}$ has the same closure/interior infimum, so the LDP (or the contraction principle 37.07.08) gives $lim_{ε} ε lo g P (sup_{t} ε W_{t} \geq a) = - in f {I (f) : sup_{t} f (t) \geq a}$ . To minimise $\frac{1}{2} \int_{0}^{1} \dot{f}^{2}$ subject to $f$ reaching height $a$ at some time $τ \in (0, 1]$ : on $[0, τ]$ the cheapest rise to $a$ is the straight ramp with energy $\frac{1}{2} a^{2} / τ$ (Exercise 2 scaled), and on $[τ, 1]$ the path costs nothing by staying flat. This is decreasing in $τ$ , minimised at $τ = 1$ , giving $in f I = \frac{1}{2} a^{2}$ . Hence $lim_{ε} ε lo g P (sup_{t} ε W_{t} \geq a) = - \frac{1}{2} a^{2}$ , recovering the reflection-principle exact rate $P (sup_{t} W_{t} \geq a / ε) = 2 P (W_{1} \geq a / ε) \approx e^{- a^{2} /2 ε}$ .

Advanced results Master

The Cameron-Martin space as the reproducing structure of Wiener measure

The rate function $I = \frac{1}{2} ∥ \cdot ∥_{H}^{2}$ is not an accident of the proof: it is the energy of the Cameron-Martin Hilbert space $H_{0}^{1}$ , the unique Hilbert space continuously embedded in $C_{0}$ whose unit ball is the family of admissible shifts of Wiener measure ^{[Cameron & Martin 1944]}. The Cameron-Martin theorem says that translating $W$ by $h$ leaves the law quasi-invariant exactly when $h \in H_{0}^{1}$ , with Radon-Nikodym density $exp (\int_{0}^{1} \dot{h} d W - \frac{1}{2} ∥ h ∥_{H}^{2})$ ; the exponent's deterministic part is precisely $- I (h)$ . Schilder's theorem is the large-deviation shadow of this: the cost of the small-noise path resembling $h$ is the same quadratic energy that prices the translation. The abstract-Wiener-space generalisation (Gross) replaces $(C_{0}, H_{0}^{1}, μ)$ by any $(B, H, μ)$ with $H ↪ B$ dense and $μ$ Gaussian, and Schilder becomes the Donsker-Varadhan-Stroock LDP $ε lo g μ (\cdot / ε)$ with rate $\frac{1}{2} ∥ \cdot ∥_{H}^{2}$ — the half-square of the Cameron-Martin norm, $+ \infty$ off $H$ ^{[Deuschel & Stroock §3.4]}.

Mogulskii, the action integral, and the variational principle

Mogulskii's theorem ^{[Mogulskii 1976]} places Schilder inside a family: any i.i.d.-increment random walk, polygonally interpolated and diffusively scaled, obeys a sample-path LDP with rate $J (f) = \int_{0}^{1} Λ^{*} (\dot{f}) d t$ , the action with Lagrangian $Λ^{*}$ . The Gaussian case $Λ^{*} (v) = \frac{1}{2} ∣ v ∣^{2}$ is Schilder, with Lagrangian the kinetic energy; the general $Λ^{*}$ is the Legendre dual of the increment's cumulant generating function, so the path-space rate is a classical action whose Euler-Lagrange equations are the most-likely deviation paths. This is the entry point to the calculus of variations in large deviations: the infimum $in f {J (f) : f \in A}$ over an event $A$ is solved by a geodesic of the Lagrangian $Λ^{*}$ , and the minimiser is the instanton — the dominant rare trajectory. For Schilder the minimisers are straight lines (free kinetic Lagrangian), which is why every Schilder variational problem reduces to a Cauchy-Schwarz computation.

Freidlin-Wentzell as the contraction of Schilder

The payoff is the Freidlin-Wentzell theory of small random perturbations of $\overset{x}{˙} = b (x)$ ^{[Freidlin & Wentzell §3]}. For additive noise the Itô map is continuous on $C_{0}$ , so the contraction principle 37.07.08 transports Schilder's LDP to the diffusion $d X^{ε} = b (X^{ε}) d t + ε d W$ , producing the rate $S (φ) = \frac{1}{2} \int_{0}^{1} ∣ \overset{φ}{˙} - b (φ) ∣^{2} d t$ — the energy of the control the noise must inject to steer the deterministic flow along $φ$ . The quasipotential $V (x_{0}, x) = in f {S (φ) : φ (0) = x_{0}, φ (T) = x, T > 0}$ governs exit times, metastable transition rates (Eyring-Kramers), and the structure of the invariant measure's small-noise asymptotics. For multiplicative noise $σ (X) d W$ the Itô map is only continuous after a Wong-Zakai / rough-path correction, but the conclusion persists with $∣ \overset{φ}{˙} - b ∣^{2}$ replaced by the $σ σ^{⊤}$ -weighted norm — Schilder remains the seed LDP at the top of the construction.

Synthesis. The central insight of Schilder's theorem is that the small-noise Brownian LDP is exactly the projective gluing of finite-dimensional Gaussian Cramér rates 37.07.02 into the Cameron-Martin energy, so the infinite-dimensional principle generalises the single-increment half-square to the whole path with no direct infinite-dimensional estimate beyond a sup-norm tightness bound 37.07.09. The foundational reason the rate is the energy $\frac{1}{2} \int ∣ \dot{f} ∣^{2}$ is the convexity of $∣ v ∣^{2}$ , which makes the grid rates compatible and their supremum the $L^{2}$ action; this is dual to Mogulskii's $\int Λ^{*} (\dot{f})$ , the same gluing with the Gaussian replaced by an arbitrary increment's Legendre dual. Putting these together with the contraction principle 37.07.08 yields Freidlin-Wentzell: the bridge is the Itô map, which carries the Cameron-Martin energy to the diffusion action $\frac{1}{2} \int ∣ \overset{φ}{˙} - b (φ) ∣^{2}$ , and the quasipotential that prices metastability appears again in Eyring-Kramers exit-rate asymptotics and the small-noise structure of invariant measures. This is exactly the architecture promised by the chapter: Cramér supplies the local cost, Dawson-Gärtner lifts it to path space, exponential tightness fixes the topology, and contraction propagates the rate to every downstream stochastic system.

Full proof set Master

Proposition 1 (finite-dimensional increment LDP). Fix a grid $π : 0 = t_{0} < \dots < t_{n} = 1$ . The laws of $p_{π} (ε W) = (ε W_{t_{1}}, \dots, ε W_{t_{n}})$ satisfy the LDP on $(R^{d})^{n}$ at speed $ε$ with good rate $I_{π} (y) = \sum_{i = 1}^{n} \frac{∣ y _{i} - y _{i - 1} ∣ ^{2}}{2 ( t _{i} - t _{i - 1} )}$ , $y_{0} = 0$ .

Proof. The increment vector $ξ = (ξ_{1}, \dots, ξ_{n})$ , $ξ_{i} = ε (W_{t_{i}} - W_{t_{i - 1}})$ , has independent coordinates $ξ_{i} \sim N (0, ε Δ_{i} I_{d})$ . For a single Gaussian $N (0, ε Δ_{i} I_{d})$ , writing it as $ε$ times $N (0, Δ_{i} I_{d})$ , the small- $ε$ LDP at speed $ε$ has rate the Gaussian $Λ^{*}$ : the cumulant generating function of $N (0, Δ_{i} I_{d})$ is $Λ_{i} (λ) = \frac{Δ _{i}}{2} ∣ λ ∣^{2}$ , with Legendre dual $Λ_{i}^{*} (v) = \frac{1}{2 Δ _{i}} ∣ v ∣^{2}$ 37.07.02. By independence the joint rate is the sum $\sum_{i} Λ_{i}^{*} (ξ_{i})$ , and the linear change of coordinates $ξ_{i} = y_{i} - y_{i - 1}$ (a bijection with unit Jacobian on $(R^{d})^{n}$ ) transports the rate to $I_{π} (y) = \sum_{i} \frac{∣ y _{i} - y _{i - 1} ∣ ^{2}}{2 Δ _{i}}$ by the contraction principle along a continuous bijection. Goodness holds because $I_{π}$ is a positive-definite quadratic form, with compact (ellipsoidal) sublevel sets. $□$

Proposition 2 (compatibility under coarsening). If $π^{'} \subset π$ drops the node $t_{j}$ , then $I_{π^{'}} (p_{π^{'} π} y) \leq I_{π} (y)$ for all $y$ , so the family ${I_{π}}$ is compatible (monotone non-decreasing along refinement).

Proof. Coarsening merges the two terms over $[t_{j - 1}, t_{j}]$ and $[t_{j}, t_{j + 1}]$ into one over $[t_{j - 1}, t_{j + 1}]$ . Set $u = y_{j} - y_{j - 1}$ , $w = y_{j + 1} - y_{j}$ , $α = Δ_{j}$ , $β = Δ_{j + 1}$ . The two-term cost is $\frac{∣ u ∣ ^{2}}{2 α} + \frac{∣ w ∣ ^{2}}{2 β}$ and the merged cost is $\frac{∣ u + w ∣ ^{2}}{2 ( α + β )}$ . By the weighted-Cauchy-Schwarz (or the convexity of $v \mapsto ∣ v ∣^{2}$ applied to the average $\frac{α \cdot ( u / α ) + β \cdot ( w / β )}{α + β}$ ), $$ \frac{|u+w|^2}{\alpha+\beta}=\frac{\big|\alpha\tfrac{u}{\alpha}+\beta\tfrac{w}{\beta}\big|^2}{\alpha+\beta}\le\alpha\Big|\frac{u}{\alpha}\Big|^2+\beta\Big|\frac{w}{\beta}\Big|^2=\frac{|u|^2}{\alpha}+\frac{|w|^2}{\beta}, $$ the inequality being Jensen for the convex $∣ \cdot ∣^{2}$ with weights $α / (α + β), β / (α + β)$ . Halving gives the merged term $\leq$ the two-term sum, so $I_{π^{'}} (p_{π^{'} π} y) \leq I_{π} (y)$ . Dropping several nodes iterates this. $□$

Proposition 3 (identification of the projective rate with the energy). For $f \in C_{0}$ , $sup_{π} I_{π} (p_{π} f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2} d t$ if $f \in H_{0}^{1}$ , and $= + \infty$ otherwise.

Proof. If $f \in H_{0}^{1}$ : on $[t_{i - 1}, t_{i}]$ , Jensen for $∣ \cdot ∣^{2}$ gives $\frac{1}{Δ _{i}} \int_{t_{i - 1}}^{t_{i}} \dot{f}^{2} \leq \frac{1}{Δ _{i}} \int_{t_{i - 1}}^{t_{i}} ∣ \dot{f} ∣^{2}$ , so $\frac{∣ f ( t _{i} ) - f ( t _{i - 1} ) ∣ ^{2}}{2 Δ _{i}} \leq \frac{1}{2} \int_{t_{i - 1}}^{t_{i}} ∣ \dot{f} ∣^{2}$ ; summing, $I_{π} (p_{π} f) \leq \frac{1}{2} ∥ \dot{f} ∥_{L^{2}}^{2}$ for every $π$ , so $sup_{π} I_{π} \leq \frac{1}{2} ∥ \dot{f} ∥_{L^{2}}^{2}$ . Conversely, the piecewise-constant function $\dot{f}_{π} := \sum_{i} (\frac{1}{Δ _{i}} \int_{t_{i - 1}}^{t_{i}} \dot{f}) 1_{[t_{i - 1}, t_{i})}$ is the $L^{2}$ -conditional expectation of $\dot{f}$ on the grid $σ$ -algebra, and $∥ \dot{f}_{π} ∥_{L^{2}}^{2} = 2 I_{π} (p_{π} f)$ ; as $π$ refines, $\dot{f}_{π} \to \dot{f}$ in $L^{2}$ by the martingale convergence / Lebesgue differentiation theorem, so $2 I_{π} (p_{π} f) = ∥ \dot{f}_{π} ∥_{L^{2}}^{2} ↑ ∥ \dot{f} ∥_{L^{2}}^{2}$ . Hence $sup_{π} I_{π} (p_{π} f) = \frac{1}{2} ∥ \dot{f} ∥_{L^{2}}^{2}$ . If $f \in / H_{0}^{1}$ : either $f$ is not absolutely continuous or $\dot{f} \in / L^{2}$ . In both cases the conditional-expectation norms $∥ \dot{f}_{π} ∥_{L^{2}}^{2} = 2 I_{π} (p_{π} f)$ are unbounded over refinements — if they stayed bounded by $C$ , the $\dot{f}_{π}$ would be an $L^{2}$ -bounded martingale converging in $L^{2}$ to an $L^{2}$ derivative, forcing $f \in H_{0}^{1}$ with $∥ \dot{f} ∥_{L^{2}}^{2} \leq C$ , a contradiction. So $sup_{π} I_{π} (p_{π} f) = + \infty$ . $□$

Proposition 4 (goodness of the Schilder rate). $I (f) = \frac{1}{2} \int_{0}^{1} ∣ \dot{f} ∣^{2}$ on $H_{0}^{1}$ , $+ \infty$ off it, is a good rate function on $(C_{0}, ∥ \cdot ∥_{\infty})$ .

Proof. Lower semicontinuity: $I = sup_{π} I_{π} \circ p_{π}$ is a supremum of the continuous (hence lsc) maps $f \mapsto I_{π} (p_{π} f)$ — each $p_{π}$ is sup-norm continuous and $I_{π}$ continuous — so $I$ is lsc. Compact sublevel sets: $Ψ_{I} (α) = {f \in H_{0}^{1} : ∥ \dot{f} ∥_{L^{2}}^{2} \leq 2 α}$ . By the embedding estimate $∣ f (t) - f (s) ∣ \leq ∣ t - s ∣^{1/2} ∥ \dot{f} ∥_{L^{2}} \leq 2 α ∣ t - s ∣^{1/2}$ and $∥ f ∥_{\infty} \leq 2 α$ , the set is uniformly bounded and uniformly equicontinuous, hence relatively compact in $C_{0}$ by Arzelà-Ascoli; it is closed by lower semicontinuity of $I$ . A closed subset of a relatively compact set is compact, so $Ψ_{I} (α)$ is compact and $I$ is good. $□$

Connections Master

Schilder's theorem is the canonical application of the Dawson-Gärtner projective limit 37.07.09: the path space is the projective limit of its finite-time evaluation marginals, each marginal carries a Gaussian Cramér LDP, and the projective-limit rate $sup_{π} I_{π} \circ p_{π}$ is the Cameron-Martin energy. The exponential tightness in sup norm that closes the upper bound is exactly the closure datum that unit isolates, supplied here by a Brownian modulus-of-continuity estimate.
The finite-dimensional increment rates are instances of Cramér's theorem 37.07.02: the Gaussian increment $N (0, Δ_{i} I_{d})$ has cumulant generating function $\frac{Δ _{i}}{2} ∣ λ ∣^{2}$ and Legendre dual $\frac{1}{2 Δ _{i}} ∣ v ∣^{2}$ , so the per-increment cost is the Gaussian $Λ^{*}$ and the path cost is its glued sum; replacing the Gaussian by a general increment's $Λ^{*}$ turns Schilder into Mogulskii.
The object whose small-noise scaling is studied is the Wiener process 02.15.01: the scaling $ε W$ is Brownian motion with variance dialled down by $ε$ , and the proof leans on Brownian increment independence, Gaussianity, and the sup-norm modulus of continuity. The rate function lives on the Cameron-Martin subspace, a Wiener-null set, encoding that the rare small-noise paths are the smooth ones the process most resists.
The downstream payoff is Freidlin-Wentzell theory via the contraction principle 37.07.08: pushing Schilder's LDP through the continuous Itô solution map produces the diffusion rate $\frac{1}{2} \int ∣ \overset{φ}{˙} - b (φ) ∣^{2}$ , whose quasipotential governs exit times and metastability — the contraction principle is the exact transport, and the Itô calculus 02.15.02 is the map.

Historical & philosophical context Master

The small-noise large-deviation principle for Wiener integrals was established by Michael Schilder in 1966 ^{[Schilder 1966]}, who computed the asymptotics of $\int e^{F / ε} d μ_{ε}$ for the rescaled Wiener measure and identified the exponential rate with the Cameron-Martin energy; Varadhan, in the same year ^{[Varadhan 1966]}, gave the principle its modern variational form and connected it to small-parameter asymptotics of partial differential equations. The energy functional itself predates the large-deviation reading: Cameron and Martin had isolated the space $H_{0}^{1}$ and its quadratic form in 1944 ^{[Cameron & Martin 1944]} as the admissible translations of Wiener measure, so Schilder's rate is the large-deviation incarnation of a structure already central to Gaussian analysis.

The random-walk analogue was proved by Anatolii Mogulskii in 1976 ^{[Mogulskii 1976]}, replacing the Gaussian half-square by the increment's Legendre dual and exhibiting the path-space rate as a classical action; the projective-limit organisation that unifies the two — and that the present proof follows — is the systematic treatment of Dembo and Zeitouni ^{[Dembo & Zeitouni §5.1-§5.2]} and of Deuschel and Stroock ^{[Deuschel & Stroock §3.4]}. The application that made the theorem indispensable is the Freidlin-Wentzell theory of randomly perturbed dynamical systems ^{[Freidlin & Wentzell §3]}, developed through the 1970s, which contracts Schilder's Brownian principle along the Itô map to price the rare excursions of a diffusion against its deterministic drift.

Bibliography Master

@article{schilder1966asymptotic,
  author  = {Schilder, Michael},
  title   = {Some asymptotic formulas for Wiener integrals},
  journal = {Transactions of the American Mathematical Society},
  volume  = {125},
  number  = {1},
  pages   = {63--85},
  year    = {1966}
}

@article{varadhan1966asymptotic,
  author  = {Varadhan, S. R. S.},
  title   = {Asymptotic probabilities and differential equations},
  journal = {Communications on Pure and Applied Mathematics},
  volume  = {19},
  number  = {3},
  pages   = {261--286},
  year    = {1966}
}

@article{cameronmartin1944transformations,
  author  = {Cameron, R. H. and Martin, W. T.},
  title   = {Transformations of Wiener integrals under translations},
  journal = {Annals of Mathematics},
  volume  = {45},
  number  = {2},
  pages   = {386--396},
  year    = {1944}
}

@article{mogulskii1976large,
  author  = {Mogulskii, A. A.},
  title   = {Large deviations for trajectories of multidimensional random walks},
  journal = {Theory of Probability and its Applications},
  volume  = {21},
  number  = {2},
  pages   = {300--315},
  year    = {1976}
}

@book{dembozeitouni1998ldp,
  author    = {Dembo, Amir and Zeitouni, Ofer},
  title     = {Large Deviations Techniques and Applications},
  edition   = {2nd},
  series    = {Applications of Mathematics},
  number    = {38},
  publisher = {Springer},
  year      = {1998}
}

@book{deuschelstroock1989large,
  author    = {Deuschel, Jean-Dominique and Stroock, Daniel W.},
  title     = {Large Deviations},
  series    = {Pure and Applied Mathematics},
  number    = {137},
  publisher = {Academic Press},
  year      = {1989}
}

@book{freidlinwentzell2012random,
  author    = {Freidlin, Mark I. and Wentzell, Alexander D.},
  title     = {Random Perturbations of Dynamical Systems},
  edition   = {3rd},
  series    = {Grundlehren der mathematischen Wissenschaften},
  number    = {260},
  publisher = {Springer},
  year      = {2012}
}

Prerequisites

37.07.02
37.07.09
02.15.01

Tier anchors

beginner: Touchette 2009 *The large deviation approach to statistical mechanics* (Physics Reports 478) §3-§4 (small-parameter rare events and the action picture); den Hollander 2000 *Large Deviations* (AMS Fields Institute Monographs) §VI (the small-noise heuristic for paths, the energy as cost)
intermediate: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §5.2 (Theorem 5.2.3 Schilder), §5.1 (Theorem 5.1.2 Mogulskii); Deuschel & Stroock 1989 *Large Deviations* (Academic Press) §1.3-§3.4 (Schilder via the Cameron-Martin space)
master: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §5.1-§5.2 (Mogulskii, Schilder; the Cameron-Martin rate, exponential tightness in sup norm) and §5.6 (Freidlin-Wentzell); Deuschel & Stroock 1989 *Large Deviations* §1.3-§3.4; Freidlin & Wentzell 2012 *Random Perturbations of Dynamical Systems* 3rd ed. (Springer) §3 (the contraction from Schilder to diffusions)

References

Dembo, A. & Zeitouni, O. — Large Deviations Techniques and Applications, 2nd ed. (Springer, 1998) · §5.1 (Theorem 5.1.2 Mogulskii); §5.2 (Theorem 5.2.3 Schilder; the Cameron-Martin rate I(f)=½∫|ḟ|²); §5.6.1 (Theorem 5.6.3 Freidlin-Wentzell via contraction)
Schilder, M. — Some asymptotic formulas for Wiener integrals · Transactions of the American Mathematical Society 125 (1966), 63-85 (the original small-noise Wiener LDP)
Mogulskii, A. A. — Large deviations for trajectories of multidimensional random walks · Theory of Probability and its Applications 21 (1976), 300-315 (the random-walk sample-path LDP with rate ∫Λ*(ḟ))
Deuschel, J.-D. & Stroock, D. W. — Large Deviations (Academic Press, 1989) · §1.3-§3.4 (Schilder via Cameron-Martin; exponential tightness in C_0[0,1])
Freidlin, M. I. & Wentzell, A. D. — Random Perturbations of Dynamical Systems, 3rd ed. (Springer, 2012) · §3 (the Freidlin-Wentzell rate S(φ)=½∫|φ̇-b(φ)|² obtained by contracting Schilder along the Itô map)
Cameron, R. H. & Martin, W. T. — Transformations of Wiener integrals under translations · Annals of Mathematics 45 (1944), 386-396 (the Cameron-Martin space H^1_0 and the admissible-shift quadratic form)
Varadhan, S. R. S. — Asymptotic probabilities and differential equations · Communications on Pure and Applied Mathematics 19 (1966), 261-286 (the Wiener small-noise principle and the variational rate)

Estimated time

beginner: 18m
intermediate: 45m
master: 80m