37.07.01 · probability / 07-large-deviations

The Large Deviation Principle: Rate Functions, Bounds, and Goodness

shipped3 tiersLean: none

Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §1.2, §4.1 (Lemmas 1.2.18, 4.1.4-4.1.18, uniqueness, exponential tightness); Deuschel & Stroock 1989 *Large Deviations* (Academic Press) §2.1; Varadhan 1984 *Large Deviations and Applications* (SIAM CBMS-NSF 46)

Intuition Beginner

Flip a fair coin a thousand times and the fraction of heads will sit very near one half. Could it instead land near $0.6$ ? Yes, but only with a probability so small it is hard to write down. The large deviation principle is the precise statement of how small: not just that rare events are unlikely, but that their unlikeliness shrinks at a clean, predictable exponential rate as the experiment grows.

Picture a whole family of random quantities indexed by a smallness parameter — say the average of $n$ coin flips, with the family getting sharper as $n$ grows. For each possible outcome value, there is a number that measures how costly that value is to reach. Outcomes near the typical answer cost nothing; outcomes far away cost a lot. The probability of landing near a costly value decays like $e^{- n \times (cost)}$ . That cost-per-unit-size function is the rate function, and it is the central character of the whole theory.

A useful way to feel it: think of the rate function as the height of a landscape. The typical outcome sits at the bottom of a valley at height zero. To be seen at any other point, the random average must "climb" to that point's height, and the chance of being seen there falls off exponentially in the height times the size of the experiment. Watching where the landscape is low tells you where the random quantity likes to be; watching where it is high tells you which rare events are merely rare versus essentially impossible.

The principle comes in two halves, one for each direction of estimate. The first half says rare events are at most as likely as their cheapest point allows — an upper bound on probability. The second half says they are at least that likely — a matching lower bound. When the two halves agree, the rate function is pinned down exactly, and we say the family obeys a large deviation principle.

One last refinement matters. A rate function is called good when its low-lying regions are not just bounded but compact — the valleys never run off to infinity at a fixed height. Goodness is the technical comfort that makes the optimisation "find the cheapest point in a set" actually attain its minimum, which is what most applications quietly rely on.

Visual Beginner

Figure: a landscape-style plot. The horizontal axis is the value of a random average; the vertical axis is the rate function (the "cost"). The curve dips to height zero at the typical value and rises on both sides like a valley. A horizontal dashed line at height $α$ cuts the valley into a low region (the sublevel set) and two high flanks. For a "good" rate function the low region is a closed bounded interval; an inset shows a "bad" rate function whose valley flattens to a horizontal ray, so its low region runs off to infinity.

  cost I(x)
    |\                                   /
    | \                                 /
    |  \                               /
    |   \                             /
 a -+----\---------------------------/----   sublevel set {I <= a}
    |     \                         /         = the segment between the
    |      \_____               ___/            two crossings (compact
    |            \____     ____/                 when I is "good")
    |                \___ ___/
----+--------------------V------------------- x
    |              typical value (I = 0)

   good rate function: every horizontal cut leaves a compact (closed,
   bounded) low region.  bad rate function: some cut leaves a low region
   that escapes to infinity.

Worked example Beginner

Take the average of $n$ fair coin flips, scoring each flip $1$ for heads and $0$ for tails. The average $\overset{ˉ}{X}_{n}$ is the fraction of heads. We ask how the cost lands for a target fraction $x$ between $0$ and $1$ .

Step 1. Name the cost. For fair coins the rate function turns out to be $$ I(x) = x\log(2x) + (1-x)\log\big(2(1-x)\big), $$ the "distance" of a biased coin showing fraction $x$ from a fair coin. We will read values off it; the derivation lives at the higher tiers.

Step 2. Check the typical value. Put $x = \frac{1}{2}$ . Then $2 x = 1$ and $2 (1 - x) = 1$ , so both logarithms are $lo g 1 = 0$ and $I (\frac{1}{2}) = 0$ . The typical fraction costs nothing, exactly as the valley-bottom picture demands.

Step 3. Price a rare value. Put $x = 0.6$ . Then $2 x = 1.2$ and $2 (1 - x) = 0.8$ : $$ I(0.6) = 0.6\log(1.2) + 0.4\log(0.8) = 0.6(0.1823) + 0.4(-0.2231) = 0.1094 - 0.0893 = 0.0201. $$

Step 4. Read off the probability scale. With $n = 1000$ flips, the chance of seeing roughly $60%$ heads decays like $$ e^{-n,I(0.6)} = e^{-1000 \times 0.0201} = e^{-20.1} \approx 1.9 \times 10^{-9}. $$

What this tells us. A modest-looking shift, from $50%$ to $60%$ heads, already costs about two parts in a billion at a thousand flips — and at ten thousand flips it would be $e^{- 201}$ , astronomically smaller. The rate function converts a vague "very unlikely" into an exact exponential rate, and that conversion is the entire point of the large deviation principle.

Check your understanding Beginner

Formal definition Intermediate+

Throughout, $X$ is a topological space equipped with its Borel $σ$ -algebra $B_{X}$ (regular and Hausdorff where uniqueness is asserted), and ${μ_{ε}}_{ε > 0}$ is a family of Borel probability measures on $X$ , carried on a common probability space 37.01.01. A speed is a family ${a_{ε}}_{ε > 0}$ of positive reals with $a_{ε} \to 0$ as $ε \to 0$ (in the index-by- $n$ convention one takes $ε = 1/ n$ and $a_{ε} = 1/ n$ , so $a_{ε} lo g μ_{ε} \approx \frac{1}{n} lo g P$ ). The role of $a_{ε}$ is to be the reciprocal of the size of the experiment.

Definition (rate function). A rate function is a lower-semicontinuous map $I : X \to [0, + \infty]$ , not identically $+ \infty$ ; equivalently, every sublevel set $$ \Psi_I(\alpha) := {x \in \mathcal{X} : I(x) \leq \alpha}, \qquad \alpha \geq 0, $$ is closed. A rate function is good if in addition every $Ψ_{I} (α)$ is compact. The effective domain is $D_{I} := {x : I (x) < \infty}$ .

Definition (large deviation principle). The family ${μ_{ε}}$ satisfies a large deviation principle (LDP) at speed $a_{ε}$ with rate function $I$ if, for every Borel set $Γ$ , writing $Γ^{\circ}$ for its interior and $\overline{Γ}$ for its closure, $$ -\inf_{x \in \Gamma^\circ} I(x) ;;\leq;; \liminf_{\varepsilon \to 0} a_\varepsilon \log \mu_\varepsilon(\Gamma) ;;\leq;; \limsup_{\varepsilon \to 0} a_\varepsilon \log \mu_\varepsilon(\Gamma) ;;\leq;; -\inf_{x \in \overline{\Gamma}} I(x). $$ Unpacked, the LDP is the conjunction of two bounds ^{[Dembo & Zeitouni §1.2]}:

(Lower bound — open sets.) For every open $G \subseteq X$ ,
$ε \to 0 lim inf a_{ε} lo g μ_{ε} (G) \geq - x \in G in f I (x) .$
(Upper bound — closed sets.) For every closed $F \subseteq X$ ,
$ε \to 0 lim sup a_{ε} lo g μ_{ε} (F) \leq - x \in F in f I (x) .$

The convention $in f_{\emptyset} I = + \infty$ makes both bounds vacuously hold on the empty set. For a set $Γ$ that is an $I$ -continuity set, meaning $in f_{Γ^{\circ}} I = in f_{\overline{Γ}} I =: I (Γ)$ , the two bounds collapse to the exact exponential rate $lim_{ε} a_{ε} lo g μ_{ε} (Γ) = - I (Γ)$ .

Definition (weak LDP). The family satisfies a weak LDP if the lower bound holds for all open sets and the upper bound holds only for all compact sets $K$ : $$ \limsup_{\varepsilon \to 0} a_\varepsilon \log \mu_\varepsilon(K) ;\leq; -\inf_{x \in K} I(x), \qquad K \text{ compact}. $$ The weak LDP is genuinely weaker: it controls probabilities of escape only into compact regions, saying nothing about mass that leaks to infinity. The device that closes this gap is the following.

Definition (exponential tightness). The family ${μ_{ε}}$ is exponentially tight if for every $M < \infty$ there is a compact $K_{M} \subseteq X$ with $$ \limsup_{\varepsilon \to 0} a_\varepsilon \log \mu_\varepsilon\big(K_M^{,c}\big) ;\leq; -M. $$ Exponential tightness is the large-deviation analogue of ordinary tightness: ordinary tightness controls the mass outside compacts, exponential tightness controls its exponential rate. It is the bridge that upgrades a weak LDP to a full LDP, and it forces the rate function to be good.

Counterexamples to common slips

The open/closed asymmetry is not cosmetic. Take $X = R$ and the deterministic family $μ_{ε} = δ_{0}$ for all $ε$ , with $I (0) = 0$ , $I (x) = + \infty$ otherwise. The closed set $F = {0}$ has $μ_{ε} (F) = 1$ and $- in f_{F} I = 0$ : the upper bound is tight. But for the open set $G = (0, 1)$ , $μ_{ε} (G) = 0$ and $a_{ε} lo g 0 = - \infty = - in f_{G} I$ . Evaluate the upper bound on the open interval's closure $\overline{G} = [0, 1]$ instead and you would wrongly get $0$ . The two bounds must be read on the correct topological side.
Lower semicontinuity is mandatory; a non-lsc "rate function" is not unique. If one drops lower semicontinuity, the value of $I$ at a single point can be lowered to $0$ on a set of $I$ -measure-zero topological negligibility without changing any $in f_{G}$ over open $G$ , so the bounds cannot pin $I$ down pointwise. Lower semicontinuity is exactly the regularity that the infima over shrinking open neighbourhoods recover, which is why uniqueness holds only within the class of lsc functions.
Good is strictly stronger than lsc on non-compact spaces. On $X = R$ the function $I (x) = 0$ for all $x$ is lsc but not good: $Ψ_{I} (0) = R$ is closed but not compact. A rate function can be a perfectly valid lsc rate function while failing goodness, and then the variational problem $in f_{F} I$ over a closed unbounded $F$ may not be attained.

Key theorem with proof Intermediate+

We prove the two structural pillars an LDP user relies on first: that the rate function is determined by the family (uniqueness), and that exponential tightness plus a weak LDP delivers the full LDP and goodness.

Theorem (uniqueness and the weak-to-full upgrade). Let ${μ_{ε}}$ be Borel probability measures on a Hausdorff regular space $X$ at speed $a_{ε}$ .

(i) (Uniqueness.) If ${μ_{ε}}$ satisfies the LDP with rate function $I$ and also with rate function $J$ , and both are lower-semicontinuous, then $I = J$ .

(ii) (Upgrade.) If ${μ_{ε}}$ satisfies a weak LDP with rate function $I$ and is exponentially tight, then it satisfies the full LDP with the same $I$ , and $I$ is good.

Proof of (i). Fix $x \in X$ . For any open neighbourhood $G ∋ x$ , the lower bound with rate $I$ and the upper bound with rate $J$ (applied to $\overline{G}$ ) give $$ -\inf_{G} I ;\leq; \liminf_\varepsilon a_\varepsilon\log\mu_\varepsilon(G) ;\leq; \limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(\overline G) ;\leq; -\inf_{\overline G} J ;\leq; -\inf_{G} J, $$ the last step because $G \subseteq \overline{G}$ makes the infimum over the larger set no larger. Hence $in f_{G} J \leq in f_{G} I \leq I (x)$ for every open $G ∋ x$ . Taking the supremum over neighbourhoods and using lower semicontinuity of $J$ , which states $J (x) = sup_{G ∋ x} in f_{G} J$ , gives $J (x) \leq I (x)$ . The roles of $I$ and $J$ are symmetric, so $I (x) \leq J (x)$ as well, and $I = J$ . $□$

Proof of (ii). The lower bound is already part of the weak LDP, so only the upper bound on a general closed set $F$ needs proof, and goodness needs to be derived. Fix $M < \infty$ and choose the compact $K_{M}$ from exponential tightness. Decompose $F = (F \cap K_{M}) \cup (F \cap K_{M}^{c})$ . The set $F \cap K_{M}$ is compact (closed subset of a compact set), so the weak LDP applies to it: $$ \limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(F\cap K_M) ;\leq; -\inf_{F\cap K_M} I ;\leq; -\inf_{F} I. $$ For the tail piece, $μ_{ε} (F \cap K_{M}^{c}) \leq μ_{ε} (K_{M}^{c})$ , so $lim sup_{ε} a_{ε} lo g μ_{ε} (F \cap K_{M}^{c}) \leq - M$ . Combining two exponential rates uses the elementary bound $a_{ε} lo g (p + q) \leq a_{ε} lo g 2 + max {a_{ε} lo g p, a_{ε} lo g q}$ , and $a_{ε} lo g 2 \to 0$ ; therefore $$ \limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(F) ;\leq; \max\Big{ -\inf_{F} I,; -M\Big}. $$ Letting $M \to \infty$ yields $lim sup_{ε} a_{ε} lo g μ_{ε} (F) \leq - in f_{F} I$ , the full upper bound.

For goodness, fix $α \geq 0$ and take $M > α$ with its compact $K_{M}$ . If $x \in / K_{M}$ , then ${x}$ has an open neighbourhood $G$ disjoint from $K_{M}$ (regularity), and the upper bound just proved on the closed set $\overline{G} \subseteq K_{M}^{c}$ gives $- in f_{\overline{G}} I \leq - M < - α$ , so $in f_{\overline{G}} I > α$ and in particular $I (x) > α$ . Contraposing, $Ψ_{I} (α) = {I \leq α} \subseteq K_{M}$ . Being a closed subset (lsc) of the compact $K_{M}$ , the sublevel set $Ψ_{I} (α)$ is compact. Hence $I$ is good. $□$

Bridge. This theorem builds toward every concrete large-deviation result and appears again in the proof of Cramér's theorem, where one first establishes the weak LDP from the Chernoff bound and a local lower bound, then invokes exponential tightness from the finiteness of the cumulant generating function near the origin to obtain the full principle. This is exactly the mechanism by which the abstract axioms become usable: the weak LDP is what the Chernoff/tilting estimates naturally produce, and exponential tightness is the separate, geometric input that compactifies the problem. The foundational reason uniqueness holds is that lower semicontinuity makes a rate function the upper envelope of its values over shrinking open neighbourhoods, so the open-set lower bound and the closed-set upper bound between them recover $I$ pointwise; putting these together with the upgrade shows that exponential tightness is precisely the hypothesis that generalises the compact-set control of the weak LDP to all closed sets and simultaneously forces goodness. The weak-LDP-plus-tightness route is dual to the projective-limit (Dawson-Gärtner) route, which builds the same full LDP from finite-dimensional marginals instead of from a tightness estimate.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove that if $I$ is a good rate function then for every closed set $F$ with $in f_{F} I < \infty$ the infimum is attained: there exists $x_{*} \in F$ with $I (x_{*}) = in f_{F} I$ .

Hint

Intersect $F$ with a sublevel set $Ψ_{I} (α)$ for $α$ slightly above $in f_{F} I$ and use compactness plus lower semicontinuity.

Answer

Let $m = in f_{F} I < \infty$ and fix any $α > m$ . The set $F \cap Ψ_{I} (α)$ is the intersection of a closed set with the compact sublevel set $Ψ_{I} (α)$ , hence compact and non-empty (it contains points with $I < α$ near the infimum). The restriction of the lsc function $I$ to a compact set attains its minimum (a lsc function on a compact set is bounded below and attains its infimum). That minimiser $x_{*}$ has $I (x_{*}) = min_{F \cap Ψ_{I} (α)} I = m$ , since any point of $F$ with value $\leq m < α$ already lies in $Ψ_{I} (α)$ . Goodness is exactly what makes the relevant set compact; without it the infimum can fail to be attained.

Exercise 4 (medium, symbolic).

Establish the finite-union (max) rule: if $F = F_{1} \cup \dots \cup F_{k}$ is a finite union of sets, then $lim sup_{ε} a_{ε} lo g μ_{ε} (F) = max_{j} lim sup_{ε} a_{ε} lo g μ_{ε} (F_{j})$ . Why does this fail for countable unions?

Hint

Use $max_{j} μ_{ε} (F_{j}) \leq μ_{ε} (F) \leq \sum_{j} μ_{ε} (F_{j}) \leq k max_{j} μ_{ε} (F_{j})$ and $a_{ε} lo g k \to 0$ .

Answer

From $max_{j} μ_{ε} (F_{j}) \leq μ_{ε} (F) \leq k max_{j} μ_{ε} (F_{j})$ , take $a_{ε} lo g$ : the lower side gives $\geq max_{j} a_{ε} lo g μ_{ε} (F_{j})$ and the upper side gives $\leq a_{ε} lo g k + max_{j} a_{ε} lo g μ_{ε} (F_{j})$ . Since $a_{ε} lo g k \to 0$ , both $lim sup$ s coincide with $max_{j} lim sup_{ε} a_{ε} lo g μ_{ε} (F_{j})$ . This is the "principle of the largest term": on the exponential scale a finite sum is dominated by its biggest summand. It fails for countable unions because the prefactor becomes $a_{ε} lo g (\infty)$ , uncontrolled; a countable union of negligible events can carry full mass (e.g. $R = ⋃_{n} [n, n + 1]$ ), which is precisely why the upper bound is restricted to closed sets and recovered for general closed sets only via exponential tightness.

Exercise 5 (medium, symbolic).

Show that exponential tightness implies goodness of any rate function $I$ governing a weak LDP, without re-proving the full upgrade — isolate just the compactness argument.

Hint

For $α < M$ , show $Ψ_{I} (α) \subseteq K_{M}$ using the weak upper bound on a neighbourhood of any point outside $K_{M}$ .

Answer

Fix $α$ and choose $M > α$ with compact $K_{M}$ from exponential tightness. Let $x \in / K_{M}$ . By regularity there is an open $G ∋ x$ with $\overline{G} \subseteq K_{M}^{c}$ . The weak LDP's compact upper bound does not directly apply to $\overline{G}$ , but the full upper bound on closed sets — itself a consequence of the upgrade — gives $- in f_{\overline{G}} I \leq lim sup_{ε} a_{ε} lo g μ_{ε} (\overline{G}) \leq lim sup_{ε} a_{ε} lo g μ_{ε} (K_{M}^{c}) \leq - M$ . Hence $I (x) \geq in f_{\overline{G}} I \geq M > α$ , so $x \in / Ψ_{I} (α)$ . Thus $Ψ_{I} (α) \subseteq K_{M}$ is a closed subset of a compact set, hence compact, and $I$ is good. The content is that exponential tightness pins the low-cost region inside a fixed compact at each level.

Exercise 6 (hard, symbolic).

Prove the Cramér instance in one dimension for bounded variables: let $X_{1}, X_{2}, \dots$ be i.i.d. real with $∣ X_{i} ∣ \leq c$ and mean $0$ , $Λ (λ) = lo g E e^{λ X_{1}}$ . Show the upper bound $lim sup_{n} \frac{1}{n} lo g P (\overset{ˉ}{X}_{n} \geq a) \leq - Λ^{*} (a)$ for $a > 0$ , where $Λ^{*} (a) = sup_{λ} (λa - Λ (λ))$ .

Hint

Apply Markov's inequality to $e^{λn \overset{ˉ}{X}_{n}}$ with $λ \geq 0$ , use independence, then optimise over $λ$ .

Answer

For $λ \geq 0$ , Markov's inequality on the non-negative variable $e^{λ \sum_{i} X_{i}}$ gives $$ \mathbb{P}(\bar X_n \ge a) = \mathbb{P}\Big(\sum_i X_i \ge na\Big) \le e^{-\lambda n a},\mathbb{E},e^{\lambda\sum_i X_i} = e^{-\lambda na}\big(\mathbb{E},e^{\lambda X_1}\big)^n = e^{-n(\lambda a - \Lambda(\lambda))}, $$ using independence to factor the joint exponential moment. Taking $\frac{1}{n} lo g$ gives $\frac{1}{n} lo g P (\overset{ˉ}{X}_{n} \geq a) \leq - (λa - Λ (λ))$ for every $λ \geq 0$ . Optimising the right side over $λ \geq 0$ yields $- sup_{λ \geq 0} (λa - Λ (λ))$ ; for $a > 0 = E X_{1}$ the unconstrained supremum $Λ^{*} (a)$ is attained at some $λ \geq 0$ because $Λ^{'} (0) = 0 < a$ and $Λ$ is convex, so the constraint $λ \geq 0$ is inactive. Hence $lim sup_{n} \frac{1}{n} lo g P (\overset{ˉ}{X}_{n} \geq a) \leq - Λ^{*} (a)$ . This is the Chernoff half of Cramér's theorem; boundedness guarantees $Λ (λ) < \infty$ for all $λ$ , so $Λ^{*}$ is a good rate function.

Exercise 7 (hard, symbolic).

Let ${μ_{ε}}$ on $R$ satisfy the LDP with good rate $I$ at speed $ε$ , and let $T : R \to R$ be continuous. Prove the contraction principle: the pushforwards ${μ_{ε} \circ T^{- 1}}$ satisfy the LDP with good rate $J (y) = in f {I (x) : T (x) = y}$ .

Hint

For the upper bound use that $T^{- 1} (F)$ is closed; for the lower bound use that $T^{- 1} (G)$ is open; for goodness use that continuous images of compact sublevel sets are compact.

Answer

First, $J$ is a good rate function: $Ψ_{J} (α) = {y : \exists x, T (x) = y, I (x) \leq α} = T (Ψ_{I} (α))$ , the continuous image of a compact set, hence compact; and the infimum defining $J$ is attained (Exercise 3), so $Ψ_{J} (α)$ is exactly this image and is closed. For a closed $F \subseteq R$ , $T^{- 1} (F)$ is closed by continuity, so $$ \limsup_\varepsilon \varepsilon\log(\mu_\varepsilon\circ T^{-1})(F) = \limsup_\varepsilon \varepsilon\log\mu_\varepsilon(T^{-1}F) \le -\inf_{T^{-1}F} I = -\inf_{y\in F}\inf_{T(x)=y}I(x) = -\inf_F J. $$ For open $G$ , $T^{- 1} (G)$ is open, so $lim inf_{ε} ε lo g (μ_{ε} \circ T^{- 1}) (G) \geq - in f_{T^{- 1} G} I = - in f_{G} J$ by the same rewriting of the double infimum. Both LDP bounds hold for $J$ , which is good, completing the proof. The principle says a continuous deterministic readout of a large-deviation system again obeys an LDP, with cost the cheapest pre-image cost.

Exercise 8 (hard, symbolic).

Give a family ${μ_{ε}}$ on $R$ that satisfies a weak LDP but not the full LDP, exhibiting the failure of exponential tightness.

Hint

Let mass escape to $+ \infty$ at a rate that no compact set captures, while the local picture near each point still matches a rate function.

Answer

Let $μ_{ε} = (1 - e^{- 1/ ε}) δ_{0} + e^{- 1/ ε} δ_{1/ ε}$ at speed $a_{ε} = ε$ . Near any fixed compact $K$ , for small $ε$ the atom at $1/ ε$ has left $K$ , so $μ_{ε} (K) = (1 - e^{- 1/ ε}) 1_{0 \in K} \to 1_{0 \in K}$ ; the compact upper bound and the open lower bound are governed by $I (0) = 0$ , $I (x) = + \infty$ for $x \neq = 0$ , giving a weak LDP with this $I$ . But the full upper bound fails: take $F = R$ (closed). Then $μ_{ε} (R) = 1$ , so $ε lo g μ_{ε} (R) = 0$ , while a full LDP with rate $I$ would also need closed sets like $F_{R} = [R, \infty)$ to satisfy $lim sup ε lo g μ_{ε} (F_{R}) \leq - in f_{F_{R}} I = - \infty$ . Yet for $R$ fixed and $ε$ small, $1/ ε \in F_{R}$ , so $μ_{ε} (F_{R}) = e^{- 1/ ε}$ and $ε lo g μ_{ε} (F_{R}) = - 1$ , not $- \infty$ . The escaping atom carries rate $- 1$ to infinity, defeating any compact capture: exponential tightness fails (no $K_{M}$ works for $M > 1$ ), and with it the full LDP.

Advanced results Master

The rate function as a generating object: Varadhan's lemma

Once an LDP with good rate $I$ holds for ${μ_{ε}}$ at speed $a_{ε}$ , the rate function controls not only probabilities but exponential integrals. Varadhan's lemma ^{[Varadhan 1984]} states that for a continuous $ϕ : X \to R$ satisfying the moment condition $lim sup_{ε} a_{ε} lo g \int e^{γ ϕ / a_{ε}} d μ_{ε} < \infty$ for some $γ > 1$ , $$ \lim_{\varepsilon\to0} a_\varepsilon \log \int_{\mathcal{X}} e^{\phi(x)/a_\varepsilon},\mu_\varepsilon(dx) ;=; \sup_{x\in\mathcal{X}}\big(\phi(x) - I(x)\big). $$ This is the infinite-dimensional, probabilistic Laplace method: the integral is dominated by the point where the gain $ϕ$ most exceeds the cost $I$ , a Legendre-Fenchel-type pairing of the functional $ϕ$ against the rate function. The good-rate-function hypothesis is what makes the supremum attained and the upper estimate uniform over level sets of $I$ .

Inverse Varadhan and Bryc's lemma

The implication reverses under exponential tightness. Bryc's lemma states that if ${μ_{ε}}$ is exponentially tight and the limit $Λ (ϕ) := lim_{ε} a_{ε} lo g \int e^{ϕ / a_{ε}} d μ_{ε}$ exists for every bounded continuous $ϕ$ , then ${μ_{ε}}$ satisfies the LDP with good rate function $$ I(x) = \sup_{\phi\in C_b(\mathcal{X})}\big(\phi(x) - \Lambda(\phi)\big), $$ the Legendre-Fenchel transform of $Λ$ over the space of bounded continuous test functions. Thus the LDP and the convergence of all exponential moments are equivalent data, modulo exponential tightness, and the rate function is recovered as a conjugate — the abstract shadow of the cumulant-conjugate identity of the Cramér theory.

Uniqueness sharpened: the role of the topology

Uniqueness of the rate function (Key theorem (i)) is a statement about a fixed topology on $X$ . Strengthening the topology enlarges the class of open sets, hence tightens the lower bound, and can increase the rate function pointwise; weakening it can decrease it. A family may satisfy an LDP in the weak topology with rate $I_{w}$ and in the strong topology with rate $I_{s} \geq I_{w}$ , the two agreeing on a common core but differing at the boundary of the effective domain. This is the precise sense in which "the" rate function is topology-relative, and it is why Sanov's theorem distinguishes the weak and $τ$ -topologies on the space of empirical measures.

Exponential tightness from exponential moment bounds

In practice exponential tightness is verified through a coercive functional. If there is a function $U : X \to [0, \infty]$ with compact sublevel sets and $lim sup_{ε} a_{ε} lo g \int e^{U / a_{ε}} d μ_{ε} < \infty$ , then ${μ_{ε}}$ is exponentially tight: the set $K_{M} = {U \leq c_{M}}$ is compact, and Markov's inequality on $e^{U / a_{ε}}$ bounds $μ_{ε} (K_{M}^{c})$ at the required exponential rate. For Cramér's theorem on $R^{d}$ , $U (x) = ∥ x ∥$ works whenever $Λ$ is finite in a neighbourhood of $0$ ; for Schilder's theorem on path space the coercive functional is built from the Cameron-Martin norm, and exponential tightness is the Arzelà-Ascoli equicontinuity estimate transported to the exponential scale.

Schilder's theorem: a preview on path space

The canonical infinite-dimensional instance is Schilder's theorem ^{[Schilder 1966]}. For $ε$ -scaled Brownian motion $ε W$ on $C_{0} ([0, 1])$ , the laws $μ_{ε} = Law (ε W)$ satisfy an LDP at speed $ε$ with good rate function $$ I(f) = \tfrac12\int_0^1 |\dot f(t)|^2,dt $$ on the Cameron-Martin space of absolutely continuous $f$ with $f (0) = 0$ and square-integrable derivative, and $I (f) = + \infty$ otherwise. The cost of a Brownian path of small noise following a prescribed shape $f$ is its kinetic-energy action — the same Onsager-Machlup action that reappears in the path-integral treatment of fluctuations. Goodness of $I$ here is the statement that bounded-action paths form a compact set in the uniform topology, an Arzelà-Ascoli fact dressed exponentially.

Synthesis. The central insight of this unit is that a single lower-semicontinuous cost function $I$ governs a whole family of exponential asymptotics, and the LDP axioms are the minimal bookkeeping that generalises the elementary Cramér exponent to arbitrary topological spaces. This is exactly why the weak LDP and exponential tightness are separated: the weak LDP is the local, tilting-driven content that appears again in every change-of-measure lower bound and Chernoff upper bound, while exponential tightness is the global compactness input that upgrades it and is dual to ordinary tightness in the theory of weak convergence. The foundational reason the rate function is unique and good is the interplay of lower semicontinuity with the open/closed asymmetry of the two bounds: lsc makes $I$ the upper envelope of neighbourhood infima, and exponential tightness pins its sublevel sets inside fixed compacta. Putting these together, Varadhan's lemma and its Bryc inverse show that the LDP, the rate function, and the convergence of all exponential moments are three encodings of one datum, with the rate function recovered as a Legendre-Fenchel conjugate — the bridge is the conjugacy that the next unit 37.07.03 makes explicit, where $I = Λ^{*}$ for the Cramér cumulant generating function $Λ$ .

Full proof set Master

Proposition 1 (lower semicontinuity is forced by the lower bound). Suppose the open-set lower bound holds for some function $I : X \to [0, \infty]$ . Then it also holds for the lower-semicontinuous regularisation $I_{lsc} (x) := sup_{G ∋ x open} in f_{y \in G} I (y) \leq I (x)$ , and $I_{lsc}$ is lsc. Hence one may always take the rate function lower-semicontinuous.

Proof. From the definition, $I_{lsc} (x) = sup_{G ∋ x} in f_{G} I \leq I (x)$ , since one admissible neighbourhood is any $G$ containing $x$ and each $in f_{G} I \leq I (x)$ . Because $I_{lsc} \leq I$ pointwise, $in f_{G} I_{lsc} \leq in f_{G} I$ for every open $G$ , so $- in f_{G} I_{lsc} \geq - in f_{G} I$ and the lower bound $lim inf a_{ε} lo g μ_{ε} (G) \geq - in f_{G} I \geq - in f_{G} I_{lsc}$ holds with $I_{lsc}$ in place of $I$ . Lower semicontinuity of $I_{lsc}$ : for $α \in R$ and $x$ with $I_{lsc} (x) > α$ , by definition some open $G ∋ x$ has $in f_{G} I > α$ , whence $I_{lsc} (z) \geq in f_{G} I > α$ for all $z \in G$ ; so ${I_{lsc} > α}$ is open and ${I_{lsc} \leq α}$ closed. $□$

Proposition 2 (the upper bound on compacts gives the upper bound on closed $I$ -bounded sets under exponential tightness). Let the weak LDP hold with rate $I$ and let ${μ_{ε}}$ be exponentially tight. Then for every closed $F$ , $lim sup_{ε} a_{ε} lo g μ_{ε} (F) \leq - in f_{F} I$ .

Proof. This is the upgrade clause of the Key theorem; we record it as a standalone proposition with the combination step made explicit. Fix $M$ and the compact $K_{M}$ . Then $F = (F \cap K_{M}) \cup (F \cap K_{M}^{c})$ and by finite subadditivity $μ_{ε} (F) \leq μ_{ε} (F \cap K_{M}) + μ_{ε} (K_{M}^{c})$ . For any two non-negative sequences $p_{ε}, q_{ε}$ , $a_{ε} lo g (p_{ε} + q_{ε}) \leq a_{ε} lo g 2 + max {a_{ε} lo g p_{ε}, a_{ε} lo g q_{ε}}$ , and $a_{ε} lo g 2 \to 0$ . Applying with $p_{ε} = μ_{ε} (F \cap K_{M})$ , $q_{ε} = μ_{ε} (K_{M}^{c})$ and the compact upper bound $lim sup a_{ε} lo g μ_{ε} (F \cap K_{M}) \leq - in f_{F \cap K_{M}} I \leq - in f_{F} I$ together with $lim sup a_{ε} lo g μ_{ε} (K_{M}^{c}) \leq - M$ , $$ \limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(F) \le \max{-\inf_F I, -M}. $$ Take $M ↑ \infty$ . $□$

Proposition 3 (the rate function vanishes somewhere when $μ_{ε}$ are probability measures). If ${μ_{ε}}$ are probability measures satisfying the LDP with rate $I$ , then $in f_{x \in X} I (x) = 0$ .

Proof. Apply the upper bound to the closed set $F = X$ : $lim sup_{ε} a_{ε} lo g μ_{ε} (X) \leq - in f_{X} I$ . The left side is $a_{ε} lo g 1 = 0$ , so $- in f_{X} I \geq 0$ , i.e. $in f_{X} I \leq 0$ ; since $I \geq 0$ , $in f_{X} I = 0$ . If moreover $I$ is good, the infimum is attained at some $x_{*}$ with $I (x_{*}) = 0$ , the large-deviation expression of the law of large numbers: the family concentrates exponentially on the zero set of $I$ . $□$

Connections Master

The rate function defined here is identified concretely as a Legendre-Fenchel conjugate in 37.07.03: for the Cramér family the rate function is $Λ^{*}$ , the convex conjugate of the cumulant generating function $Λ (λ) = lo g E e^{⟨ λ, X ⟩}$ , and the goodness this unit demands abstractly is the compact-sublevel-set property that the convex-duality unit proves from finiteness of $Λ$ near the origin.
The exponential-tightness-plus-weak-LDP machinery is the engine of the Gärtner-Ellis theorem 37.07.04: that theorem produces a weak LDP from the differentiability and steepness of the limiting cumulant generating function, then closes to a full LDP exactly via the exponential tightness isolated here, so the bridge concept of this unit is the load-bearing hypothesis there.
The change-of-measure lower bound that powers every Cramér-type LDP is an exponential tilt $d μ_{ε}^{λ} / d μ_{ε} \propto e^{⟨ λ, \cdot ⟩ / a_{ε}}$ , a Radon-Nikodym derivative in the sense of 02.07.08; absolute continuity of the tilted family with respect to the original is what licenses transferring probability estimates across the tilt.
The liminf/limsup bookkeeping of the two LDP bounds, and the passage of the lower bound through integrals in Varadhan's lemma, rest on the dominated-convergence and Fatou control of 02.07.05; the carrier probability space on which the whole family ${μ_{ε}}$ is realised is the Kolmogorov construction of 37.01.01.

Historical & philosophical context Master

The exponential decay of rare-event probabilities was first computed by Harald Cramér in 1938 ^{[Cramér 1938]}, who found the rate for sums of i.i.d. variables under an analytic-density assumption and identified it as the conjugate of the cumulant generating function. Independent threads ran through Khinchin and the early statistical-mechanics literature on entropy, where the same exponential-of-an-extensive-quantity structure appears as $W \approx e^{S / k_{B}}$ . The abstract formulation — an LDP as a pair of bounds indexed by open and closed sets, governed by a single lower-semicontinuous rate function — is due to S. R. S. Varadhan in 1966 ^{[Varadhan 1966]}, who in the same period proved the integral lemma now bearing his name; this is the formulation systematised in Dembo and Zeitouni ^{[Dembo & Zeitouni §1.2]}.

The path-space instance was settled the same year by Schilder ^{[Schilder 1966]}, whose small-noise asymptotics for Wiener integrals gave the kinetic-action rate function and seeded the Freidlin-Wentzell theory of randomly perturbed dynamical systems. The notions of weak LDP and exponential tightness were isolated as the right abstraction by Deuschel and Stroock and by Dembo and Zeitouni, reflecting the recognition that the local (tilting) content of a large-deviation estimate and the global (compactness) content are logically separate, the second supplied by tightness exactly as in the Prokhorov theory of weak convergence. The goodness condition formalises the requirement, implicit since Cramér, that the variational problems $in f_{F} I$ attached to an LDP actually attain their minima.

Bibliography Master

@book{dembozeitouni1998ldp,
  author    = {Dembo, Amir and Zeitouni, Ofer},
  title     = {Large Deviations Techniques and Applications},
  edition   = {2nd},
  series    = {Applications of Mathematics},
  number    = {38},
  publisher = {Springer},
  year      = {1998}
}

@book{varadhan1984large,
  author    = {Varadhan, S. R. S.},
  title     = {Large Deviations and Applications},
  series    = {CBMS-NSF Regional Conference Series in Applied Mathematics},
  number    = {46},
  publisher = {SIAM},
  year      = {1984}
}

@article{varadhan1966asymptotic,
  author  = {Varadhan, S. R. S.},
  title   = {Asymptotic probabilities and differential equations},
  journal = {Communications on Pure and Applied Mathematics},
  volume  = {19},
  pages   = {261--286},
  year    = {1966}
}

@article{schilder1966asymptotic,
  author  = {Schilder, M.},
  title   = {Some asymptotic formulas for {W}iener integrals},
  journal = {Transactions of the American Mathematical Society},
  volume  = {125},
  pages   = {63--85},
  year    = {1966}
}

@article{cramer1938nouveau,
  author  = {Cram\'er, Harald},
  title   = {Sur un nouveau th\'eor\`eme-limite de la th\'eorie des probabilit\'es},
  journal = {Actualit\'es Scientifiques et Industrielles},
  volume  = {736},
  pages   = {5--23},
  year    = {1938}
}

@book{deuschelstroock1989large,
  author    = {Deuschel, Jean-Dominique and Stroock, Daniel W.},
  title     = {Large Deviations},
  series    = {Pure and Applied Mathematics},
  number    = {137},
  publisher = {Academic Press},
  year      = {1989}
}

@book{denhollander2000large,
  author    = {den Hollander, Frank},
  title     = {Large Deviations},
  series    = {Fields Institute Monographs},
  number    = {14},
  publisher = {American Mathematical Society},
  year      = {2000}
}

Prerequisites

37.01.01
02.07.05
02.07.08

Tier anchors

beginner: Touchette 2009 *The large deviation approach to statistical mechanics* (Physics Reports 478) §3; Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §1.1 (informal statement, Cramér picture)
intermediate: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §1.2 (the LDP, rate functions, weak LDP, exponential tightness); den Hollander 2000 *Large Deviations* (AMS Fields Institute Monographs) §I.1-§I.3
master: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §1.2, §4.1 (Lemmas 1.2.18, 4.1.4-4.1.18, uniqueness, exponential tightness); Deuschel & Stroock 1989 *Large Deviations* (Academic Press) §2.1; Varadhan 1984 *Large Deviations and Applications* (SIAM CBMS-NSF 46)

References

Dembo, A. & Zeitouni, O. — Large Deviations Techniques and Applications, 2nd ed. (Springer, 1998) · §1.2 (Definition 1.2.1 the LDP; Lemma 1.2.18 uniqueness; Definition 1.2.7 exponential tightness; Lemma 1.2.15 weak-to-full upgrade); §4.1
Varadhan, S. R. S. — Large Deviations and Applications (SIAM CBMS-NSF Regional Conference Series 46, 1984) · §1-§3; the abstract LDP and Varadhan's integral lemma
den Hollander, F. — Large Deviations (AMS Fields Institute Monographs 14, 2000) · §I.1-§I.3 (LDP definition, Cramér's theorem, rate-function goodness)
Deuschel, J.-D. & Stroock, D. W. — Large Deviations (Academic Press, 1989) · §2.1 (the large deviation principle and its first properties)
Cramér, H. — Sur un nouveau théorème-limite de la théorie des probabilités · Actualités Scientifiques et Industrielles 736 (1938), 5-23
Varadhan, S. R. S. — Asymptotic probabilities and differential equations · Communications on Pure and Applied Mathematics 19 (1966), 261-286; the first abstract formulation of the LDP
Schilder, M. — Some asymptotic formulas for Wiener integrals · Transactions of the AMS 125 (1966), 63-85; small-noise LDP for Brownian motion

Estimated time

beginner: 17m
intermediate: 42m
master: 75m