37.07.08 · probability / 07-large-deviations

The Contraction Principle and the Inverse Contraction Principle

shipped3 tiersLean: none

Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §4.2.1-§4.2.2 (Theorems 4.2.1, 4.2.4, 4.2.13 approximate contraction); Deuschel & Stroock 1989 *Large Deviations* (Academic Press) §2.1 (the contraction lemma); Varadhan 1984 *Large Deviations and Applications* (SIAM CBMS-NSF 46) §3

Intuition Beginner

Suppose you already know the exact exponential price of every rare outcome of some random system — its full cost landscape, the rate function from the large deviation principle. Now you pass that system through a deterministic machine: you only get to see a processed reading of it, not the raw state. The average of a sample instead of the whole sample; the total instead of the parts; a colour instead of the pixels. The question is whether the reading still has a clean cost landscape of its own, and what it looks like.

The contraction principle answers yes, and tells you exactly how to build the new landscape from the old one. The cost of a reading value is the cheapest cost among all raw states that produce that reading. If many different raw states map to the same output, you pick the one that was least surprising to begin with — nature takes the path of least resistance, so a rare reading is exactly as expensive as its cheapest explanation.

A picture helps. Imagine the cost landscape of the raw system as a hilly terrain, and the machine as a way of grouping points of the terrain into bins, one bin per possible reading. To find the cost of a bin, you do not average the heights in it; you find the lowest point in the bin. The new landscape over the readings is the "floor profile" of the old one, bin by bin. The name fits: the map contracts the rich raw landscape down onto a coarser one, and the rule for the new heights is to take the minimum over each fibre.

There is a second, subtler direction. Sometimes you know the landscape of the readings and want to recover or constrain the landscape upstairs. That reverse step is more delicate, because mass can secretly leak away to infinity without ever showing up in a reading; the inverse contraction principle says that as long as the system cannot lose mass to infinity at the exponential scale — a control called exponential tightness — the reading's landscape determines the raw one along the relevant directions.

The single most useful payoff is that one master result yields a family of others for free. Knowing the cost of the full data shape lets you read off the cost of the average, the maximum, the spread, or any continuous summary, each by the same minimise-over-the-fibre rule. This is how the large deviation principle for averages drops out of the large deviation principle for whole distributions, with no new probability calculation at all.

Visual Beginner

Figure: two stacked cost landscapes joined by a downward map. The top panel shows the rate function of the raw system as a wiggly curve over a wide horizontal axis (the raw states). A many-to-one arrow bundle points down to the bottom panel, whose horizontal axis is the smaller set of readings. Over each reading, several raw points are grouped; a short vertical tick marks the lowest of their heights. The bottom curve traces these lowest values — the contracted rate function — so each output's cost is the floor of its fibre, never the average.

  raw cost I(x)
    |   .                 .-.
    |  / \      .-.       /   \      .--.
    | /   \    /   \     /     \    /    \
    |/     \  /     \   /       \  /      \
    +---o----o---o----o----o-----o----o----  x  (raw states)
        |    |   |    |    |     |    |
        |    grouped into fibres by the map f
        v    v   v    v    v     v    v
    +------------------------------------- y  (readings)
    |     *           *          *
    |      \         / \        /          contracted cost
    |       \       /   \      /           I'(y) = min over the
    |        *-----*     *----*             fibre f^{-1}(y)
    new cost I'(y) = lowest height in each bundle, not the average

Worked example Beginner

Take a fair coin flipped many times, scoring heads $1$ and tails $0$ . The "raw" reading is the full pair of frequencies (fraction of heads, fraction of tails); its cost for a target heads-fraction $p$ is the relative entropy against a fair coin, $$ H(p) = p\log(2p) + (1-p)\log\big(2(1-p)\big). $$ The "machine" is the mean: it turns the frequency pair into a single number, the average score, which here equals the heads-fraction $p$ itself. We will see the contraction rule reproduce a cost we can check by hand.

Step 1. Identify the fibres. A reading is an average value $a$ between $0$ and $1$ . Because the average score equals the heads-fraction, each reading $a$ comes from exactly one frequency shape: $p = a$ . The fibre over $a$ is a single point.

Step 2. Apply the minimise-over-the-fibre rule. When a fibre is a single point, the minimum is just the value there. So the contracted cost of the average being $a$ is $$ I'(a) = \min_{p = a} H(p) = H(a). $$

Step 3. Price a specific reading. Take $a = 0.6$ . Then $2 a = 1.2$ and $2 (1 - a) = 0.8$ : $$ I'(0.6) = 0.6\log(1.2) + 0.4\log(0.8) = 0.6(0.1823) + 0.4(-0.2231) = 0.1094 - 0.0893 = 0.0201. $$

Step 4. Read off the probability scale. With $n = 1000$ flips the chance the average score lands near $0.6$ decays like $$ e^{-n,I'(0.6)} = e^{-1000 \times 0.0201} = e^{-20.1} \approx 1.9 \times 10^{-9}. $$

What this tells us. The cost of the average came straight out of the cost of the full frequency shape by the contraction rule, with no new probability computation. Here the fibre was a single point so the rule was easy; the power of the principle shows when the machine is genuinely many-to-one, and the cheapest explanation in a whole fibre sets the price. That is exactly how the cost of a sample mean is extracted from the cost of a sample distribution.

Check your understanding Beginner

Exercise (easy, multiple choice).

A system has a known cost landscape (rate function) $I$ , and you observe it only through a continuous deterministic readout $f$ . The contraction principle says the cost of seeing reading value $y$ is:

A. the average of $I (x)$ over all raw states $x$ with $f (x) = y$ B. the maximum of $I (x)$ over all raw states $x$ with $f (x) = y$ C. the minimum of $I (x)$ over all raw states $x$ with $f (x) = y$ D. the value of $I$ at the single most typical raw state, regardless of $f$

Hint

A rare reading is as likely as its likeliest cause, so it is as expensive as its cheapest cause.

Answer

C. the minimum of $I (x)$ over the fibre ${x : f (x) = y}$ .

Feedback-correct: correct — nature realises a rare reading through its cheapest explanation, so the contracted cost is the infimum of the raw cost over the fibre. Feedback-wrong: it is not an average and not a maximum; the cheapest pre-image dominates because probabilities of the pre-images add on the natural scale and the largest one (smallest cost) wins.

Formal definition Intermediate+

Throughout, $X$ and $Y$ are Hausdorff topological spaces (regular where exponential tightness is invoked), each carrying its Borel $σ$ -algebra, and ${μ_{ε}}_{ε > 0}$ is a family of Borel probability measures on $X$ satisfying a large deviation principle 37.07.01 at speed $a_{ε} \to 0$ with rate function $I : X \to [0, + \infty]$ . For a measurable map $f : X \to Y$ the pushforward family is $f_{*} μ_{ε} := μ_{ε} \circ f^{- 1}$ , the law of $f$ under $μ_{ε}$ .

Definition (pushforward rate function). For a map $f : X \to Y$ and a rate function $I$ on $X$ , the pushforward rate function $f_{*} I : Y \to [0, + \infty]$ is $$ (f_*I)(y) ;:=; \inf{,I(x) : x\in\mathcal{X},\ f(x)=y,}, \qquad y\in\mathcal{Y}, $$ with the convention $in f \emptyset = + \infty$ , so $(f_{*} I) (y) = + \infty$ when $y$ lies outside the range of $f$ . The set $f^{- 1} ({y})$ is the fibre over $y$ , and the defining quantity is the infimum of the raw cost over the fibre.

Theorem (contraction principle). If ${μ_{ε}}$ satisfies the LDP on $X$ at speed $a_{ε}$ with good rate function $I$ , and $f : X \to Y$ is continuous, then ${f_\mu_\varepsilon} $s a t i s f i es t h e L D P o n$ \mathcal{Y} $a t s p ee d$ a_\varepsilon $w i t h g oo d r a t e f u n c t i o n$ f_I$. ^{[Dembo & Zeitouni §4.2.1]}

The hypothesis that $I$ is good is essential, not cosmetic: it is what makes the defining infimum attained on every fibre and the pushforward sublevel sets compact. With $I$ merely lower-semicontinuous, $f_{*} I$ may fail lower semicontinuity, and the transported bounds need not assemble into an LDP.

Definition (exponential tightness, recalled). The family ${μ_{ε}}$ on $X$ is exponentially tight 37.07.01 if for every $M < \infty$ there is a compact $K_{M} \subseteq X$ with $lim sup_{ε} a_{ε} lo g μ_{ε} (K_{M}^{c}) \leq - M$ . This is the control that prevents mass escaping to infinity on the exponential scale.

Theorem (inverse contraction principle). Let $f : X \to Y$ be continuous and injective, suppose ${f_\mu_\varepsilon} $s a t i s f i es t h e L D P o n$ \mathcal{Y} $w i t h g oo d r a t e f u n c t i o n$ J $, an d s u pp ose$ {\mu_\varepsilon} $i se x p o n e n t ia l l y t i g h t o n$ \mathcal{X} $. T h e n$ {\mu_\varepsilon} $s a t i s f i es t h e L D P o n$ \mathcal{X} $w i t h t h e g oo d r a t e f u n c t i o n$ I = J\circ f$.* ^{[Dembo & Zeitouni §4.2.1]}

The forward principle costs nothing beyond continuity; the inverse principle requires exponential tightness because a downstairs LDP cannot, by itself, rule out upstairs mass leaking to infinity inside a fibre. Exponential tightness is exactly the missing control, and it forces the recovered $I$ to be good.

Counterexamples to common slips

The infimum, not the sum or the average. A common error is to write $(f_{*} I) (y)$ as some average of $I$ over the fibre. On the exponential scale, probabilities of the pre-images of $y$ combine by the principle of the largest term, so the smallest cost in the fibre dominates: $a_{ε} lo g \sum_{k} e^{- I (x_{k}) / a_{ε}} \to - min_{k} I (x_{k})$ . The minimise rule is forced, not chosen.
Continuity is needed, not just measurability. If $f$ is merely Borel, $f^{- 1}$ of an open set need not be open and $f^{- 1}$ of a closed set need not be closed, so the two LDP bounds cannot transport. The map $f (x) = 1_{{x \geq 0}}$ on $R$ destroys the open-set lower bound at the jump: a neighbourhood of the reading $1$ pulls back to $[0, \infty)$ , which is not open, and the lower bound fails there.
Inverse contraction genuinely needs exponential tightness. Take the escaping-atom family $μ_{ε} = (1 - e^{- 1/ ε}) δ_{0} + e^{- 1/ ε} δ_{1/ ε}$ on $R$ and let $f$ collapse $R$ to a point. Then $f_{*} μ_{ε} = δ_{pt}$ has the degenerate point-mass LDP, but ${μ_{ε}}$ has no full LDP (mass leaks to $+ \infty$ at rate $1$ ), so the downstairs LDP cannot recover the upstairs one. Exponential tightness fails, and so does the conclusion.

Key theorem with proof Intermediate+

We prove the forward contraction principle in full, isolating the two transport identities and the goodness preservation that do the work.

Theorem (contraction principle). Let ${μ_{ε}}$ satisfy the LDP on $X$ at speed $a_{ε}$ with good rate function $I$ , and let $f : X \to Y$ be continuous. Then ${f_\mu_\varepsilon} $s a t i s f i es t h e L D P a t s p ee d$ a_\varepsilon $w i t h g oo d r a t e f u n c t i o n$ f_I(y)=\inf_{f(x)=y}I(x)$.

Proof.

*Step 1: $f_{*} I$ is a good rate function.* Fix $α \geq 0$ . The sublevel set is $$ \Psi_{f_*I}(\alpha) = {y : \inf_{f(x)=y}I(x)\le\alpha}. $$ We claim it equals the continuous image $f (Ψ_{I} (α))$ of the raw sublevel set $Ψ_{I} (α) = {x : I (x) \leq α}$ . If $y = f (x)$ with $I (x) \leq α$ then $f_{*} I (y) \leq I (x) \leq α$ , giving $f (Ψ_{I} (α)) \subseteq Ψ_{f_{*} I} (α)$ . Conversely if $f_{*} I (y) \leq α$ then $in f_{f (x) = y} I (x) \leq α$ ; because $I$ is good its restriction to the closed fibre $f^{- 1} ({y})$ is lower-semicontinuous with compact sublevel sets, so the infimum over the fibre is attained at some $x_{*}$ with $f (x_{*}) = y$ and $I (x_{*}) = f_{*} I (y) \leq α$ — hence $y \in f (Ψ_{I} (α))$ . (Attainment: the set $f^{- 1} ({y}) \cap Ψ_{I} (α^{'})$ for $α^{'} > f_{*} I (y)$ is closed inside the compact $Ψ_{I} (α^{'})$ , hence compact and non-empty, and a lsc function attains its minimum on a compact set.) Thus $Ψ_{f_{*} I} (α) = f (Ψ_{I} (α))$ is the continuous image of a compact set, so it is compact, hence closed; as this holds for every $α$ , $f_{*} I$ is lower-semicontinuous with compact sublevel sets, i.e. a good rate function. Properness: $I \neq \equiv + \infty$ gives some $x$ with $I (x) < \infty$ , and $f_{*} I (f (x)) \leq I (x) < \infty$ , so $f_{*} I$ is not identically $+ \infty$ .

Step 2: upper bound on closed sets. Let $F \subseteq Y$ be closed. By continuity $f^{- 1} (F)$ is closed in $X$ , and $(f_{*} μ_{ε}) (F) = μ_{ε} (f^{- 1} (F))$ . The closed-set upper bound for ${μ_{ε}}$ gives $$ \limsup_\varepsilon a_\varepsilon\log(f_*\mu_\varepsilon)(F) = \limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(f^{-1}F) \le -\inf_{x\in f^{-1}F}I(x). $$ The double infimum rewrites as a single one: every $x \in f^{- 1} (F)$ has $f (x) \in F$ , so $$ \inf_{x\in f^{-1}F}I(x) = \inf_{y\in F}\ \inf_{x(x)=y}I(x) = \inf_{y\in F}(f_I)(y). $$ Hence $\limsup_\varepsilon a_\varepsilon\log(f_\mu_\varepsilon)(F)\le -\inf_F f_*I$.

Step 3: lower bound on open sets. Let $G \subseteq Y$ be open. By continuity $f^{- 1} (G)$ is open, and the open-set lower bound for ${μ_{ε}}$ gives $$ \liminf_\varepsilon a_\varepsilon\log(f_*\mu_\varepsilon)(G) = \liminf_\varepsilon a_\varepsilon\log\mu_\varepsilon(f^{-1}G) \ge -\inf_{x\in f^{-1}G}I(x) = -\inf_{y\in G}(f_I)(y), $$ the last equality by the same rewriting of the double infimum as in Step 2. Both LDP bounds hold for $f_\mu_\varepsilon $w i t h t h e g oo d r a t e f u n c t i o n$ f_*I $, co m pl e t in g t h e p r oo f .$ \square$

Bridge. This theorem builds toward every derived large-deviation statement obtained by a continuous readout, and appears again in the proof that Cramér's theorem 37.07.02 is the image of Sanov's theorem 37.07.05 under the mean functional. This is exactly the mechanism that turns one master LDP into a whole family: the open/closed transport identities $f^{- 1} (\overline{\cdot}) \supseteq \overline{f^{- 1} (\cdot)}$ and $f^{- 1} ((\cdot)^{\circ}) \subseteq (f^{- 1} (\cdot))^{\circ}$ , which for continuous $f$ make preimages of closed sets closed and of open sets open, are the foundational reason the two bounds survive the pushforward. Putting these together with the compact-image argument of Step 1 shows that goodness generalises from the raw landscape to the contracted one, so the contracted rate function inherits attainment of its variational problems. The forward principle is dual to the inverse contraction principle, which runs the same transport backwards under the extra hypothesis of exponential tightness; that is the bridge to recovering an upstairs LDP from a downstairs one when no mass escapes to infinity.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Prove the combination (tensor) of LDPs: if ${μ_{ε}}$ on $X$ and ${ν_{ε}}$ on $Y$ satisfy LDPs with good rates $I$ and $J$ at the same speed and are independent, then ${μ_{ε} \otimes ν_{ε}}$ on $X \times Y$ satisfies the LDP with good rate $(I \oplus J) (x, y) = I (x) + J (y)$ . Then deduce, via contraction along $(x, y) \mapsto x + y$ on $R$ , the infimal-convolution rate $(I □ J) (s) = in f_{x + y = s} (I (x) + J (y))$ for the sum.

Hint

For the product LDP use that product sets approximate Borel sets and $a_{ε} lo g$ turns products of probabilities into sums of rates; for the sum apply the contraction principle to the continuous addition map.

Answer

For rectangles $A \times B$ , independence gives $(μ_{ε} \otimes ν_{ε}) (A \times B) = μ_{ε} (A) ν_{ε} (B)$ , so $a_{ε} lo g (μ_{ε} \otimes ν_{ε}) (A \times B) = a_{ε} lo g μ_{ε} (A) + a_{ε} lo g ν_{ε} (B)$ . On open boxes $G_{1} \times G_{2}$ the lower bounds add to $- in f_{G_{1}} I - in f_{G_{2}} J = - in f_{G_{1} \times G_{2}} (I \oplus J)$ ; on compact boxes the upper bounds add likewise, and a general open set is a union of open boxes while a general compact set is covered by finitely many, so the weak LDP holds with rate $I \oplus J$ . Exponential tightness of the product follows from that of each factor (product of the two compacts), upgrading to the full LDP 37.07.01; $I \oplus J$ is good since $Ψ_{I \oplus J} (α) \subseteq Ψ_{I} (α) \times Ψ_{J} (α)$ is a closed subset of a compact product. Now apply the contraction principle to the continuous map $add (x, y) = x + y$ : the rate of the sum is $(add_{*} (I \oplus J)) (s) = in f_{x + y = s} (I (x) + J (y)) = (I □ J) (s)$ , the infimal convolution. Independent large deviations add their rates; summing the variables convolves the rates infimally.

Exercise 4 (medium, symbolic).

Show that Cramér's theorem 37.07.02 for bounded i.i.d. real variables is the contraction of Sanov's theorem 37.07.05 through the mean functional $T (ν) = \int x d ν (x)$ , giving $Λ^{*} (a) = in f {H (ν ∥ μ) : \int x d ν = a}$ .

Hint

The empirical mean is $T$ of the empirical measure; check $T$ is weakly continuous for bounded variables, then apply the contraction principle.

Answer

The empirical mean is a continuous functional of the empirical measure $L_{n} = \frac{1}{n} \sum_{i} δ_{X_{i}}$ : $\overset{ˉ}{X}_{n} = \frac{1}{n} \sum_{i} X_{i} = \int x d L_{n} = T (L_{n})$ , and $T : M_{1} (Σ) \to R$ is weakly continuous when $Σ$ is bounded (because $x \mapsto x$ is then bounded continuous). Sanov 37.07.05 gives the LDP for $L_{n}$ with good rate $H (\cdot ∥ μ)$ . The contraction principle applied to $T$ yields the LDP for $T (L_{n}) = \overset{ˉ}{X}_{n}$ with good rate $$ J(a)=(T_H)(a)=\inf{H(\nu\Vert\mu)(\nu)=a}=\inf\Big{H(\nu\Vert\mu):\int x,d\nu=a\Big}. $$ The constrained minimiser is the exponential tilt $d ν_{⋆} = e^{λ_{a} x - Λ (λ_{a})} d μ$ with $Λ^{'} (λ_{a}) = a$ , for which $H(\nu_\star\Vert\mu)=\lambda_a a-\Lambda(\lambda_a)=\Lambda^(a) $. B y u ni q u e n esso f t h er a t e f u n c t i o n$ J=\Lambda^*$, so Cramér is the contraction of Sanov: level-2 contracts to level-1.

Exercise 5 (medium, symbolic).

Let ${μ_{ε}}$ on $R^{d}$ satisfy the LDP with good rate $I$ , and let $A$ be a linear map $R^{d} \to R^{m}$ . Show the linear image ${A_{*} μ_{ε}}$ obeys the LDP with rate $(A_{*} I) (y) = in f {I (x) : A x = y}$ , and that if $I$ is convex then $A_{*} I$ is convex.

Hint

Linear maps are continuous, so contraction applies; for convexity use that the infimum of a convex function over the affine fibres of a linear map is convex.

Answer

A linear $A$ is continuous, so the contraction principle gives the LDP for $A_{*} μ_{ε}$ with good rate $(A_{*} I) (y) = in f_{A x = y} I (x)$ . For convexity, take $y_{0}, y_{1}$ and $t \in [0, 1]$ , and pick $x_{0}, x_{1}$ with $A x_{i} = y_{i}$ and $I (x_{i})$ within $δ$ of $(A_{*} I) (y_{i})$ (attained by goodness, so $δ = 0$ is allowed). Then $A (t x_{1} + (1 - t) x_{0}) = t y_{1} + (1 - t) y_{0}$ , so by convexity of $I$ , $$ (A_*I)(ty_1+(1-t)y_0)\le I(tx_1+(1-t)x_0)\le tI(x_1)+(1-t)I(x_0)=t(A_*I)(y_1)+(1-t)(A_*I)(y_0). $$ Hence $A_{*} I$ is convex. Marginals and linear combinations of a large-deviation system thus carry convex contracted rates, the partial-Legendre-transform fact underlying the projection formulas of Cramér theory.

Exercise 6 (hard, symbolic).

Prove the inverse contraction principle: if $f : X \to Y$ is continuous and injective, ${f_{*} μ_{ε}}$ satisfies the LDP with good rate $J$ , and ${μ_{ε}}$ is exponentially tight, then ${μ_{ε}}$ satisfies the LDP with good rate $I = J \circ f$ .

Hint

Exponential tightness gives a weak LDP along subsequences; identify the rate via the pushforward and injectivity, then upgrade using tightness 37.07.01.

Answer

Exponential tightness implies that every subsequence of ${μ_{ε}}$ has a further subsequence satisfying a full LDP with some good rate $\tilde{I}$ (the tightness-plus-compactness extraction of 37.07.01). Along such a subsequence the contraction principle gives that ${f_{*} μ_{ε}}$ satisfies the LDP with rate $f_{*} \tilde{I}$ . By assumption ${f_{*} μ_{ε}}$ already satisfies the LDP with rate $J$ , and the rate function of an LDP is unique on the Hausdorff space $Y$ , so $f_{*} \tilde{I} = J$ , i.e. $in f_{f (x) = y} \tilde{I} (x) = J (y)$ for all $y$ . Injectivity of $f$ makes each fibre $f^{- 1} ({y})$ a single point (or empty), so the infimum is just the value there: for $x$ in the range-side, $\tilde{I} (x) = J (f (x))$ , and for $y \in / range (f)$ both sides are $+ \infty$ . Thus $\tilde{I} = J \circ f =: I$ on all of $X$ , independent of the subsequence. Since every subsequence has a further subsequence converging to the same limit $I$ , the full family satisfies the LDP with good rate $I = J \circ f$ ; goodness of $I$ follows from goodness of $J$ and continuity of $f$ (sublevel sets $Ψ_{I} (α) = f^{- 1} (Ψ_{J} (α))$ are closed, and exponential tightness pins them inside compacts).

Exercise 7 (hard, symbolic).

State and prove the approximate contraction principle: if continuous maps $f_{m} : X \to Y$ converge to $f$ uniformly on each compact sublevel set $Ψ_{I} (α)$ , and each ${(f_{m})_{*} μ_{ε}}$ satisfies the LDP with rate $f_{m}_{*} I$ , then ${f_{*} μ_{ε}}$ satisfies the LDP with rate $f_{*} I$ , provided $sup_{x : I (x) \leq α} d_{Y} (f_{m} (x), f (x)) \to 0$ for each $α$ .

Hint

Use the exponential-equivalence lemma: families whose disagreement is exponentially negligible share the same LDP; control the disagreement of $f_{m}$ and $f$ on sublevel sets and outside via goodness.

Answer

Two families ${P_{ε}}, {Q_{ε}}$ on $Y$ are exponentially equivalent if they can be coupled so that for every $δ > 0$ , $lim sup_{ε} a_{ε} lo g P (d_{Y} (Y_{ε}, \tilde{Y}_{ε}) > δ) = - \infty$ ; exponentially equivalent families satisfy the same LDP (the bounds transfer because the $δ$ -discrepancy event is exponentially negligible). Couple $f_{m} (X)$ and $f (X)$ through the same $X \sim μ_{ε}$ . Fix $δ > 0$ and $α$ ; on ${I \leq α}$ uniform convergence gives $d_{Y} (f_{m} (x), f (x)) \leq δ$ for $m$ large, so the discrepancy event ${d_{Y} (f_{m} (X), f (X)) > δ}$ is contained in ${X \in / Ψ_{I} (α)}$ up to the small- $m$ correction; by the upper bound $lim sup_{ε} a_{ε} lo g μ_{ε} (Ψ_{I} (α)^{c}) \leq - α$ . Letting $α \to \infty$ shows $f_{m} (X)$ and $f (X)$ are exponentially equivalent for $m$ large. Since each $(f_{m})_{*} μ_{ε}$ has the LDP with rate $f_{m}_{*} I$ and $f_{m}_{*} I \to f_{*} I$ pointwise with the rates converging in the $Γ$ -sense from the uniform convergence on sublevel sets, the exponential-equivalence lemma transfers the LDP to ${f_{*} μ_{ε}}$ with rate $f_{*} I$ . This is the device that extends contraction to maps that are only limits of continuous maps, e.g. in Mogulskii- and Schilder-type path results.

Exercise 8 (hard, symbolic).

Give an explicit example showing the contraction principle fails if $I$ is lower-semicontinuous but not good: exhibit $X = R^{2}$ , a non-good $I$ , and a continuous $f$ for which $f_{*} I$ is not lower-semicontinuous.

Hint

Let $I$ be finite on an unbounded set whose continuous image accumulates a smaller value at a point not in the image; the infimum over the fibre is then not attained, and the pushforward jumps down in the limit.

Answer

Take $X = R^{2}$ and the projection $f (x_{1}, x_{2}) = x_{1}$ , continuous. Define $I (x_{1}, x_{2}) = e^{- x_{2}} + 1 {x_{1} \neq = 0} \cdot \infty \cdot 0$ — more concretely, set $I (x_{1}, x_{2}) = g (x_{1}) + e^{- x_{2}}$ where we engineer $g \equiv 0$ but couple the second coordinate to the first: let $I (x_{1}, x_{2}) = e^{- x_{2}}$ if $x_{1} = 1/ (1 + ∣ x_{2} ∣)$ for some $x_{2} \geq 0$ , and $I = + \infty$ otherwise. Then $I$ is lsc (its sublevel sets are closed curves) but not good ( $Ψ_{I} (α)$ for $α > 0$ is an unbounded arc running off to $x_{2} \to \infty$ , $x_{1} \to 0^{+}$ , hence not compact). The fibre over $y = x_{1}$ is a single point with $I = e^{- x_{2}}$ , so $(f_{*} I) (y) = e^{- x_{2} (y)}$ for $y \in (0, 1]$ where $x_{2} (y) = 1/ y - 1 \to \infty$ as $y \to 0^{+}$ ; thus $(f_{*} I) (y) \to 0$ as $y ↓ 0$ while $y = 0$ is not in the range, so $(f_{*} I) (0) = + \infty$ . The pushforward jumps from a limit of $0$ up to $+ \infty$ at $y = 0$ : $f_{*} I$ is not lower-semicontinuous, so it is not a rate function and the transported bounds do not assemble into an LDP. Goodness of $I$ is exactly what would have forced the infimum to be attained and the limiting value to be realised in the range, repairing lower semicontinuity.

Advanced results Master

The contraction principle as functoriality of the LDP

The contraction principle expresses that the assignment "(family, speed) $\mapsto$ rate function" is covariant under continuous maps: a continuous $f : X \to Y$ sends an LDP on $X$ to an LDP on $Y$ , and the rate transforms by the pushforward $f_{*} I (y) = in f_{f (x) = y} I (x)$ . This pushforward is the epigraphical image: identifying a rate function with its epigraph ${(x, t) : t \geq I (x)}$ , $f_{*} I$ has epigraph the image of $I$ 's epigraph under $f \times id$ , closed because $f$ restricted to good sublevel sets is a closed map. Composition is respected, $(g \circ f)_{*} I = g_{*} (f_{*} I)$ , so the construction is functorial on the category of Hausdorff spaces with continuous maps; the inverse contraction principle is the partial inverse available when the map is injective and the source is exponentially tight.

Infimal convolution and the algebra of independent deviations

Applied to the addition map on $R^{d}$ , contraction shows the rate of a sum of independent large-deviation systems is the infimal convolution of the summand rates, $(I □ J) (s) = in f_{x + y = s} (I (x) + J (y))$ ^{[den Hollander §III.5]}. Infimal convolution is the operation dual under the Legendre-Fenchel transform to addition of convex conjugates: $(I □ J)^{*} = I^{*} + J^{*}$ . For Cramér rates $I = Λ_{X}^{*}$ , $J = Λ_{Y}^{*}$ this reproduces $Λ_{X + Y} = Λ_{X} + Λ_{Y}$ (independence adds cumulant generating functions), so the contraction-of-a-sum and the additivity of log-moment generating functions are Legendre-dual faces of one fact. The rate function thus carries an algebra: tensor for independent joints, infimal convolution for sums, ordinary pushforward for general continuous readouts.

The inverse principle and the role of injectivity

The inverse contraction principle ^{[Dembo & Zeitouni §4.2.1]} recovers an LDP upstairs from one downstairs when $f$ is a continuous injection and ${μ_{ε}}$ is exponentially tight. Injectivity is what makes the recovered rate unambiguous: a non-injective $f$ would leave the upstairs rate undetermined along fibres, since only the fibrewise minimum is visible downstairs. Exponential tightness supplies the compactness that the downstairs LDP cannot see — without it, mass escaping to infinity inside the source contributes nothing to $f_{*} μ_{ε}$ yet breaks the upstairs LDP. The subsequence-extraction proof shows the principle is really a uniqueness statement: tightness guarantees subsequential limit rates exist and are good, the downstairs LDP pins them all to the same $J \circ f$ , and injectivity removes the fibrewise ambiguity.

Approximate contraction and limits of continuous maps

When the readout is not continuous but is a uniform-on-sublevel-sets limit of continuous maps, the approximate contraction principle ^{[Dembo & Zeitouni §4.2.2]} still transports the LDP, via exponential equivalence: families differing by an exponentially negligible amount share a rate function, and the uniform control on each compact sublevel set $Ψ_{I} (α)$ makes the discrepancy between $f_{m}$ and $f$ negligible at each finite level, with the tail controlled by the upper bound $lim sup_{ε} a_{ε} lo g μ_{ε} (Ψ_{I} (α)^{c}) \leq - α$ . This is the route by which Schilder's theorem on path space and the Mogulskii theorem for polygonal interpolations are obtained — the relevant functionals (suprema, integrals against rough test paths) are approximated by genuinely continuous ones and the rate is transported in the limit.

Topology-relativity and the choice of target space

Contraction interacts with the topology-relativity of rate functions 37.07.01. Pushing forward to a coarser target topology on $Y$ enlarges open sets and can only decrease the contracted rate; to a finer one it increases. The contraction is cleanest when $f$ is continuous from the source topology to the chosen target topology, and choosing the target topology is a real modelling decision: the empirical-measure space carries the weak and the stronger $τ$ -topology, and a functional continuous in one but not the other contracts an LDP only in that one. This is why Sanov is stated in the weak topology when the readout of interest, the mean, is weakly continuous, and in the $τ$ -topology when unbounded test functions are read out.

Synthesis. The central insight is that the rate function is a transportable object: a single LDP generalises into a whole family by pushing forward along continuous readouts, with the new cost the fibrewise infimum of the old. This is exactly the functoriality that makes Cramér 37.07.02 a corollary of Sanov 37.07.05 — contracting the empirical-measure LDP through the mean functional realises the scalar rate $Λ^{*}$ as a constrained relative-entropy minimisation, and putting these together with the addition map shows independent deviations combine by infimal convolution, the Legendre dual of cumulant additivity. The foundational reason goodness is the load-bearing hypothesis is that it forces the fibrewise infimum to be attained and the contracted sublevel sets to be the compact continuous images of the source sublevel sets, so lower semicontinuity survives the pushforward; this is exactly the property that fails in the non-good counterexample, where the pushforward jumps down at an unattained boundary value. The forward principle is dual to the inverse contraction principle, whose extra inputs — injectivity to remove fibrewise ambiguity and exponential tightness to forbid invisible escape to infinity — are precisely the bridge from a downstairs LDP back to an upstairs one. The whole apparatus appears again in the approximate-contraction route to path-space large deviations and in the Legendre-Fenchel duality of 37.07.03, where pushforward and conjugation are the two faces of how a rate function moves through the curriculum.

Full proof set Master

Proposition 1 (the pushforward of a good rate function along a continuous map is good). *Let $I$ be a good rate function on $X$ and $f : X \to Y$ continuous. Then $f_{*} I (y) = in f_{f (x) = y} I (x)$ is a good rate function on $Y$ , and $Ψ_{f_{*} I} (α) = f (Ψ_{I} (α))$ for every $α \geq 0$ .*

Proof. The inclusion $f (Ψ_{I} (α)) \subseteq Ψ_{f_{*} I} (α)$ is immediate: $y = f (x)$ with $I (x) \leq α$ gives $f_{*} I (y) \leq α$ . Conversely, fix $y$ with $f_{*} I (y) \leq α$ and any $β > α$ . The set $C := f^{- 1} ({y}) \cap Ψ_{I} (β)$ is the intersection of the closed fibre (preimage of a closed point under continuous $f$ ) with the compact $Ψ_{I} (β)$ , hence compact; it is non-empty because $in f_{f (x) = y} I (x) = f_{*} I (y) \leq α < β$ supplies points with $I < β$ in the fibre. The lsc function $I$ attains its minimum on the compact $C$ at some $x_{*}$ , and that minimum equals $f_{*} I (y) \leq α$ (points of the fibre with $I \leq α$ already lie in $C$ ). Thus $I (x_{*}) \leq α$ and $f (x_{*}) = y$ , so $y \in f (Ψ_{I} (α))$ . Hence $Ψ_{f_{*} I} (α) = f (Ψ_{I} (α))$ , the continuous image of a compact set, so it is compact and closed. Since every sublevel set is closed, $f_{*} I$ is lower-semicontinuous; since each is compact, $f_{*} I$ is good; and $f_{*} I \neq \equiv + \infty$ because some fibre meets ${I < \infty}$ . $□$

Proposition 2 (both LDP bounds transport under a continuous map). Let ${μ_{ε}}$ satisfy the LDP with rate $I$ and $f$ be continuous. Then for closed $F$ and open $G$ in $Y$ , $\limsup_\varepsilon a_\varepsilon\log(f_\mu_\varepsilon)(F)\le-\inf_F f_I $an d$ \liminf_\varepsilon a_\varepsilon\log(f_\mu_\varepsilon)(G)\ge-\inf_G f_I$.

Proof. Continuity makes $f^{- 1} (F)$ closed and $f^{- 1} (G)$ open. With $(f_{*} μ_{ε}) (B) = μ_{ε} (f^{- 1} B)$ and the staged-infimum identity $in f_{f^{- 1} (B)} I = in f_{y \in B} f_{*} I (y)$ (Exercise 1), $$ \limsup_\varepsilon a_\varepsilon\log(f_*\mu_\varepsilon)(F)=\limsup_\varepsilon a_\varepsilon\log\mu_\varepsilon(f^{-1}F)\le-\inf_{f^{-1}F}I=-\inf_F f_I, $$ $$ \liminf_\varepsilon a_\varepsilon\log(f_\mu_\varepsilon)(G)=\liminf_\varepsilon a_\varepsilon\log\mu_\varepsilon(f^{-1}G)\ge-\inf_{f^{-1}G}I=-\inf_G f_*I, $$ applying the closed-set upper bound and open-set lower bound of the source LDP. $□$

Proposition 3 (inverse contraction). Let $f : X \to Y$ be a continuous injection, ${f_\mu_\varepsilon} $s a t i s f y t h e L D P w i t h g oo d r a t e$ J $, an d$ {\mu_\varepsilon} $b ee x p o n e n t ia l l y t i g h t . T h e n$ {\mu_\varepsilon} $s a t i s f i es t h e L D P w i t h g oo d r a t e$ J\circ f$.*

Proof. By exponential tightness, every subsequence of ${μ_{ε}}$ admits a further subsequence along which a full LDP holds with some good rate $\tilde{I}$ (Pukhalskii/tightness extraction of 37.07.01: tightness gives weak-LDP subsequential limits, and tightness upgrades them to full LDPs with good rates). Along it, Proposition 2 gives the LDP for ${f_{*} μ_{ε}}$ with rate $f_{*} \tilde{I}$ . By hypothesis ${f_{*} μ_{ε}}$ obeys the LDP with rate $J$ , and rate functions are unique on the Hausdorff target, so $f_{*} \tilde{I} = J$ . Injectivity collapses each fibre to one point: $\tilde{I} (x) = in f_{f (x^{'}) = f (x)} \tilde{I} (x^{'}) = f_{*} \tilde{I} (f (x)) = J (f (x))$ for $x$ in the domain, and $J (y) = + \infty$ for $y \in / range (f)$ matches $\tilde{I} = + \infty$ off the range. Hence $\tilde{I} = J \circ f$ , the same limit for every subsequence, so the whole family obeys the LDP with rate $J \circ f$ . Goodness: $Ψ_{J \circ f} (α) = f^{- 1} (Ψ_{J} (α))$ is closed by continuity, and exponential tightness confines it inside a compact $K_{M}$ for $M > α$ (the argument of 37.07.01 forcing sublevel sets into the tightness compacts), so it is compact. $□$

Proposition 4 (contraction of a sum gives infimal convolution). Let ${μ_{ε}}$ , ${ν_{ε}}$ be independent on $R^{d}$ with good rates $I, J$ at speed $a_{ε}$ . Then ${\mu_\varepsilon\nu_\varepsilon} $(t h e l a w o f t h es u m) s a t i s f i es t h e L D P w i t h g oo d r a t e$ (I\square J)(s)=\inf_{x+y=s}(I(x)+J(y))$.*

Proof. The product family ${μ_{ε} \otimes ν_{ε}}$ on $R^{d} \times R^{d}$ satisfies the LDP with good rate $(I \oplus J) (x, y) = I (x) + J (y)$ : on open boxes the independent lower bounds add and on compact boxes the upper bounds add, a general open set is a countable union of open boxes (the lower bound passes by monotonicity to the union) and exponential tightness of each factor gives that of the product, upgrading the weak LDP to the full one 37.07.01; $I \oplus J$ is good since $Ψ_{I \oplus J} (α) \subseteq Ψ_{I} (α) \times Ψ_{J} (α)$ is closed in a compact product. The addition map $add (x, y) = x + y$ is continuous, and the law of $add (X, Y) = X + Y$ is the convolution $μ_{ε} * ν_{ε}$ . Contraction (Proposition 1 and 2) gives the LDP with good rate $$ (\mathrm{add}*(I\oplus J))(s)=\inf{x+y=s}(I(x)+J(y))=(I\square J)(s). \qquad\square $$

Connections Master

The contraction principle is the transport law on top of the abstract LDP scaffold of 37.07.01: it consumes the open-set lower bound and closed-set upper bound defined there, preserves goodness via the compact-image argument, and the inverse direction is exactly the exponential-tightness-plus-uniqueness extraction isolated in that unit, so the bridge concept of 37.07.01 is the load-bearing input here.
Contracting Sanov's empirical-measure LDP 37.07.05 through the mean functional is how Cramér's theorem 37.07.02 is recovered as a corollary, with the Cramér rate $Λ^{*} (a) = in f {H (ν ∥ μ) : \int x d ν = a}$ realised as a constrained relative-entropy minimisation whose minimiser is the exponential tilt; this is the worked application that motivates the whole unit.
The infimal-convolution rate of a sum is the Legendre-Fenchel dual 37.07.03 of cumulant additivity: $(I □ J)^{*} = I^{*} + J^{*}$ , so contracting through the addition map and adding log-moment generating functions are conjugate faces of one identity, tying the pushforward construction to the convex-duality machinery that produces Cramér rates.
The approximate-contraction refinement is the device by which Varadhan's integral lemma and the Laplace principle 37.07.07 transport an LDP through functionals that are only uniform limits of continuous maps, so the exponential-equivalence argument used there is the same one that extends contraction to path-space results such as Schilder's theorem.

Historical & philosophical context Master

The contraction principle was isolated as the natural transport law for large deviations by S. R. S. Varadhan in his 1966 abstract formulation and the 1984 lecture notes ^{[Varadhan 1984]}, where the LDP is presented as a structure stable under continuous maps with the rate transforming by fibrewise infimum. The name reflects the passage from a rich source space to a coarser image, and the principle was already implicit in Harald Cramér's 1938 computation ^{[Cramér 1938]}, whose scalar rate is, in hindsight, the contraction of the empirical-measure rate through the mean. The systematic treatment, including the inverse contraction principle under exponential tightness and the approximate contraction principle under uniform-on-sublevel-sets convergence, was codified by Dembo and Zeitouni ^{[Dembo & Zeitouni §4.2.1]} and, in parallel, by Deuschel and Stroock ^{[Deuschel & Stroock §2.1]} as the contraction lemma.

Den Hollander ^{[den Hollander §III.5]} presents the contraction principle together with the rule for combining independent LDPs, making explicit the infimal-convolution algebra that the principle induces on rate functions and the derivation of Cramér from Sanov. The recognition that the inverse direction requires both injectivity and exponential tightness, rather than continuity alone, sharpened the understanding that a downstairs LDP underdetermines the upstairs one precisely along the fibres and along the directions where mass can escape to infinity unobserved.

Bibliography Master

@book{dembozeitouni1998ldp,
  author    = {Dembo, Amir and Zeitouni, Ofer},
  title     = {Large Deviations Techniques and Applications},
  edition   = {2nd},
  series    = {Applications of Mathematics},
  number    = {38},
  publisher = {Springer},
  year      = {1998}
}

@book{varadhan1984large,
  author    = {Varadhan, S. R. S.},
  title     = {Large Deviations and Applications},
  series    = {CBMS-NSF Regional Conference Series in Applied Mathematics},
  number    = {46},
  publisher = {SIAM},
  year      = {1984}
}

@book{denhollander2000large,
  author    = {den Hollander, Frank},
  title     = {Large Deviations},
  series    = {Fields Institute Monographs},
  number    = {14},
  publisher = {American Mathematical Society},
  year      = {2000}
}

@book{deuschelstroock1989large,
  author    = {Deuschel, Jean-Dominique and Stroock, Daniel W.},
  title     = {Large Deviations},
  series    = {Pure and Applied Mathematics},
  number    = {137},
  publisher = {Academic Press},
  year      = {1989}
}

@article{cramer1938nouveau,
  author  = {Cram\'er, Harald},
  title   = {Sur un nouveau th\'eor\`eme-limite de la th\'eorie des probabilit\'es},
  journal = {Actualit\'es Scientifiques et Industrielles},
  volume  = {736},
  pages   = {5--23},
  year    = {1938}
}

@article{varadhan1966asymptotic,
  author  = {Varadhan, S. R. S.},
  title   = {Asymptotic probabilities and differential equations},
  journal = {Communications on Pure and Applied Mathematics},
  volume  = {19},
  pages   = {261--286},
  year    = {1966}
}

Prerequisites

37.07.01
37.07.05

Tier anchors

beginner: Touchette 2009 *The large deviation approach to statistical mechanics* (Physics Reports 478) §3.6 (contraction); den Hollander 2000 *Large Deviations* (AMS Fields Institute Monographs) §III.5 (informal contraction picture)
intermediate: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §4.2.1 (Theorem 4.2.1 the contraction principle; Theorem 4.2.4 the inverse contraction principle); den Hollander 2000 *Large Deviations* (AMS Fields Institute Monographs) §III.5
master: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §4.2.1-§4.2.2 (Theorems 4.2.1, 4.2.4, 4.2.13 approximate contraction); Deuschel & Stroock 1989 *Large Deviations* (Academic Press) §2.1 (the contraction lemma); Varadhan 1984 *Large Deviations and Applications* (SIAM CBMS-NSF 46) §3

References

Dembo, A. & Zeitouni, O. — Large Deviations Techniques and Applications, 2nd ed. (Springer, 1998) · §4.2.1 (Theorem 4.2.1 contraction principle; Theorem 4.2.4 inverse contraction principle); §4.2.2 (Theorem 4.2.13 approximate contraction)
Varadhan, S. R. S. — Large Deviations and Applications (SIAM CBMS-NSF Regional Conference Series 46, 1984) · §3 (the contraction principle and abstract LDP transport under continuous maps)
den Hollander, F. — Large Deviations (AMS Fields Institute Monographs 14, 2000) · §III.5 (contraction principle, combining LDPs, Cramér from Sanov)
Deuschel, J.-D. & Stroock, D. W. — Large Deviations (Academic Press, 1989) · §2.1 (the contraction lemma and transport of the LDP under continuous maps)
Cramér, H. — Sur un nouveau théorème-limite de la théorie des probabilités · Actualités Scientifiques et Industrielles 736 (1938), 5-23 (the scalar rate recovered here as a contracted divergence)

Estimated time

beginner: 16m
intermediate: 42m
master: 74m