37.07.04 · probability / 07-large-deviations

The Gärtner-Ellis Theorem

shipped3 tiersLean: none

Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.3-§2.4 (Gärtner-Ellis in $\mathbb{R}^d$, Theorem 2.3.6, Lemma 2.3.9 exposed hyperplanes, Corollary 2.3.7 steepness); Ellis 1984 *Large deviations for a general class of random vectors* (Annals of Probability 12); Bryc 1990 *On the large deviation principle for stationary processes* (and the additive-functional applications)

Intuition Beginner

Cramér's theorem priced the rare deviations of an average of many independent, identical samples. But most real data are neither independent nor identical: today's measurement leans on yesterday's, a Markov chain remembers its last state, a Gaussian process is correlated across its whole length. The Gärtner-Ellis theorem keeps the rare-event accounting of Cramér while throwing out the independence that made it easy.

The trick is to stop asking about single samples and ask only one thing of the whole sequence: how fast does its tuned-average grow? Take your quantity $Z_{n}$ — already an average-like object built from $n$ pieces — and tune it with a dial $λ$ , exactly as in Cramér. Look at the logarithm of the tuned expectation, divided by $n$ to keep it from blowing up. If, as $n$ grows, this settles down to a definite limiting curve $Λ (λ)$ , that single curve is all the dependence structure you need. You never have to know how the pieces correlate; you only need the limit to exist.

From that limiting curve the rate function is read off by the same geometric flip as before: the Legendre-Fenchel transform turns the dial-language curve $Λ$ into the outcome-language cost $Λ^{*}$ . The cost of seeing $Z_{n}$ near a forbidden value $x$ decays like $e^{- n Λ^{*} (x)}$ .

There is one new piece of fine print, and it has teeth. To make the cheap upper bound and the constructive lower bound meet — to push the rare event into existence by tilting, as Cramér did — the limiting curve must be steep: its slope must run off to infinity as the dial approaches the edge of where the tuned average stays finite. Steepness guarantees that every target value is hit by some dial setting, so the tilt that realises it always exists. When the curve is steep and smooth, the two bounds close and the rate is exactly $Λ^{*}$ . When it is not, the lower bound can leak at the values no dial can reach.

So Gärtner-Ellis is Cramér with the independence removed and one regularity condition added: give me a limiting tuned-growth curve that is smooth and steep, and I will give you the exponential cost of every deviation, dependence and all.

Visual Beginner

Figure: three stacked panels. The top panel shows several curves $Λ_{n} (λ) /1$ for increasing $n$ converging to a thick limiting curve $Λ (λ)$ over the dial-axis $λ$ ; the limiting curve is a convex bowl. The middle panel marks the edge of the domain of $Λ$ with a dashed vertical wall and shows the slope of $Λ$ shooting up to vertical as the dial nears that wall — the steepness condition. The bottom panel shows the resulting rate function $\Lambda^(x) $a s a co n v e xv a l l ey o v er t h eo u t co m e - a x i s$ x$, touching zero at the typical value, with an arrow labelled "Legendre-Fenchel flip" carrying a slope in the middle panel to a point in the bottom panel.*

   Lambda_n(lambda) -> Lambda(lambda)         "dial" language
      |        _                        curves for n=1,2,3,... pile up
      |   __ ./ \. __                   onto a limiting convex bowl
------+----o-------o-----  lambda
      |
      |  slope of Lambda  -> infinity   STEEPNESS: as lambda nears the
      |  as lambda nears the wall ||    edge of its domain, the tangent
------+------------------------||--     turns vertical, so every target
      |                        ||       slope x is realised by some dial
                |
                |  Legendre-Fenchel flip  (slope x  ->  point x)
                v
   I(x)=Lambda*(x)                          "outcome" language
      |\                 /            valley bottom at the typical value,
      | \               /             where I = 0; I(x) > 0 elsewhere is
      |  \___       ___/              the exponential cost e^{-n I(x)} of
------+------\_____/----------- x     that deviation of the dependent data

Worked example Beginner

Consider a two-state weather chain: each day is Sun ( $+ 1$ ) or Rain ( $- 1$ ). It is sticky — a sunny day is followed by sun with probability $0.8$ , a rainy day by rain with probability $0.8$ . Let $Z_{n}$ be the average daily score over $n$ days. The days are not independent, so Cramér does not apply directly; we use the Gärtner-Ellis input instead.

Step 1. Identify the tuned-growth limit. For such a chain the limiting curve $Λ (λ)$ is the logarithm of the largest growth-rate of the tuned transition table — its dominant eigenvalue. For this symmetric chain one can show the limit works out to $$ \Lambda(\lambda) = \log\Big( 0.8\cosh\lambda + \sqrt{0.64\cosh^2\lambda - 0.36},\Big). $$ We do not need its derivation here, only its values, which we read off numerically.

Step 2. Locate the typical value. The slope at zero, $Λ^{'} (0)$ , is the long-run average score. By symmetry $Λ^{'} (0) = 0$ : equal sun and rain, average $0$ . That is the no-cost value.

Step 3. Price a deviation by the flip. Ask the cost of the average sitting at $x = 0.5$ (three-quarters sunny). The flip gives $Λ^{*} (0.5) = λ^{*} \cdot 0.5 - Λ (λ^{*})$ at the dial $λ^{*}$ whose slope matches $0.5$ . Solving $Λ^{'} (λ^{*}) = 0.5$ numerically gives $λ^{*} \approx 0.60$ , with $Λ (0.60) \approx 0.236$ , so $$ \Lambda^*(0.5) = 0.60 \times 0.5 - 0.236 = 0.300 - 0.236 = 0.064. $$

Step 4. Read off the probability scale. Over $n = 100$ days, the chance the average sits near $0.5$ decays like $e^{- n Λ^{*} (0.5)} = e^{- 100 \times 0.064} = e^{- 6.4} \approx 1.7 \times 1 0^{- 3}$ .

What this tells us. The dependence did not change the shape of the recipe at all — one limiting curve, one flip, one exponent. The stickiness of the weather is fully absorbed into the curve $Λ$ ; everything downstream is identical to Cramér. That absorption is the whole point of Gärtner-Ellis.

Check your understanding Beginner

Exercise (easy, multiple choice).

The Gärtner-Ellis theorem generalises Cramér's theorem by dropping which assumption?

A. that the rate function is the Legendre-Fenchel conjugate B. that the samples are independent and identically distributed C. that the deviation costs decay exponentially D. that the limiting curve is convex

Hint

Think about what kind of data Cramér requires and what Gärtner-Ellis allows — Markov chains, Gaussian processes, correlated sequences.

Answer

B. independence and identical distribution.

Feedback-correct: correct — Gärtner-Ellis keeps the conjugate-rate-function picture and the exponential decay, but applies to dependent, non-identically-distributed sequences. Feedback-wrong: the conjugate rate, the exponential decay, and convexity of the limit are all retained; only the i.i.d. hypothesis is dropped.

Formal definition Intermediate+

Let ${Z_{n}}_{n \geq 1}$ be a sequence of $R^{d}$ -valued random vectors on a common probability space (no independence, no identical distribution assumed). The object that replaces the single-sample cumulant generating function of Cramér is its scaled limit.

Definition (scaled / limiting logarithmic moment generating function). For each $n$ set $$ \Lambda_n(\lambda) := \log \mathbb{E}, e^{n\langle \lambda, Z_n\rangle}, \qquad \lambda \in \mathbb{R}^d, $$ the logarithmic moment generating function of $Z_{n}$ at speed $n$ . The limiting logarithmic moment generating function is the pointwise limit, assumed to exist in $(- \infty, + \infty]$ , $$ \boxed{;\Lambda(\lambda) ;:=; \lim_{n\to\infty} \frac{1}{n},\Lambda_n(\lambda) ;=; \lim_{n\to\infty}\frac{1}{n}\log\mathbb{E}, e^{n\langle\lambda, Z_n\rangle}.;} $$ We assume throughout that $0 \in int (dom Λ)$ , where $dom Λ = {λ : Λ (λ) < \infty}$ . The function $Λ$ is convex (a pointwise limit of the convex $\frac{1}{n} Λ_{n}$ , each convex by Hölder as in 37.07.03) and satisfies $Λ (0) = 0$ .

For an i.i.d. average $Z_{n} = \overset{ˉ}{S}_{n} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$ , one has $Λ_{n} (λ) = lo g E e^{⟨ λ, S_{n} ⟩} = n lo g E e^{⟨ λ, X_{1} ⟩}$ , so $Λ (λ) = lo g E e^{⟨ λ, X_{1} ⟩}$ is exactly the Cramér cumulant generating function — Gärtner-Ellis specialises to Cramér 37.07.02 in that case.

Definition (Fenchel-Legendre rate function). The candidate rate function is the conjugate of $Λ$ from 37.07.03, $$ \Lambda^(x) := \sup_{\lambda \in \mathbb{R}^d}\big(\langle\lambda, x\rangle - \Lambda(\lambda)\big), \qquad x \in \mathbb{R}^d. $$ By the conjugate machinery of 37.07.03, $\Lambda^ $i sco n v e x, l o w er - se mi co n t in u o u s, n o n - n e g a t i v e, an d — b ec a u se$ 0 \in \operatorname{int}(\operatorname{dom}\Lambda)$ — a good rate function with compact sublevel sets.

Definition (exposed point and exposing hyperplane). A point $x \in R^{d}$ is an exposed point of $Λ^{*}$ if there exists $λ \in R^{d}$ (the exposing hyperplane) such that $$ \langle\lambda, x\rangle - \Lambda^(x) > \langle\lambda, y\rangle - \Lambda^(y) \quad\text{for all } y \neq x. $$ Write $F$ for the set of exposed points whose exposing $λ$ lies in $int (dom Λ)$ .

Definition (essential smoothness / steepness). $Λ$ is essentially smooth if $int (dom Λ) \neq = \emptyset$ , $Λ$ is differentiable on $int (dom Λ)$ , and $Λ$ is steep: $∣\nablaΛ (λ_{k}) ∣ \to \infty$ whenever $λ_{k} \to λ_{0} \in \partial (dom Λ)$ , as in 37.07.03.

Theorem statement (Gärtner-Ellis). Assume $Λ$ exists as above with $0 \in int (dom Λ)$ and is lower-semicontinuous.

(Upper bound.) For every closed $F \subseteq R^{d}$ , $$ \limsup_n \tfrac1n\log\mathbb{P}(Z_n\in F)\le -\inf_{x\in F}\Lambda^(x). $$ (Lower bound at exposed points.) For every open $G \subseteq R^{d}$ , $$ \liminf_n \tfrac1n\log\mathbb{P}(Z_n\in G)\ge -\inf_{x\in G\cap\mathcal{F}}\Lambda^(x). $$ (Full LDP.) If $Λ$ is additionally essentially smooth, then $\mathcal{F}=\operatorname{int}(\operatorname{dom}\Lambda^) $in t h ese n se t ha tt h ee x p ose d l o w er b o u n d ma t c h es t h e u pp er b o u n d, an d$ {Z_n} $s a t i s f i es t h e L D P a t s p ee d$ 1/n $w i t h g oo d r a t e f u n c t i o n$ \Lambda^$.

Counterexamples to common slips

Existence of $Λ$ is a real hypothesis, not automatic. If the scaled log moment generating functions oscillate — e.g. $Z_{n}$ alternates between two laws with different cumulant behaviour along even and odd $n$ — the limit $Λ (λ)$ may fail to exist, and Gärtner-Ellis says nothing. One must verify convergence, typically via a spectral or mixing argument, before invoking the theorem.
The bare theorem gives only an exposed-point lower bound. Without essential smoothness, the lower bound is $- in f_{G \cap F} Λ^{*}$ , which can be strictly larger (a weaker bound) than $- in f_{G} Λ^{*}$ when $G$ meets $dom Λ^{*}$ only at non-exposed points. The full LDP with rate $Λ^{*}$ requires steepness; otherwise the true rate may be the lower-semicontinuous regularisation of $Λ^{*}$ restricted to exposed points.
$Λ$ may be non-differentiable, signalling a phase transition. A corner in $Λ$ corresponds to a flat segment (an affine stretch) in $Λ^{*}$ , i.e. an interval of values sharing one tilt. This is not a pathology to be smoothed away; in statistical mechanics it is a first-order phase transition, and the non-exposed points on the flat segment are exactly where the lower bound needs care.

Key theorem with proof Intermediate+

We prove the Gärtner-Ellis theorem in $R^{d}$ in the two-step form: the Chernoff upper bound from convexity of $Λ$ alone, then the exposed-point tilting lower bound, then the essential-smoothness upgrade that makes them meet. The arguments are the Cramér arguments of 37.07.02 run against the limiting $Λ$ rather than a single-sample cumulant generating function.

Theorem (Gärtner-Ellis). Let ${Z_{n}}$ in $R^{d}$ have limiting logarithmic moment generating function $Λ (λ) = lim_{n} \frac{1}{n} lo g E e^{n ⟨ λ, Z_{n} ⟩}$ , lower-semicontinuous with $0 \in int (dom Λ)$ . Then the upper bound holds on all closed sets and the lower bound holds at exposed points with interior exposing tilt; if $Λ$ is essentially smooth, ${Z_{n}}$ satisfies the LDP at speed $1/ n$ with good rate $\Lambda^$.*

Proof — upper bound (Chernoff against $Λ$ ). Fix $λ \in dom Λ$ . By Markov's inequality applied to the non-negative variable $e^{n ⟨ λ, Z_{n} ⟩}$ , for the closed half-space $H_{λ, a} = {x : ⟨ λ, x ⟩ \geq ⟨ λ, a ⟩}$ , $$ \mathbb{P}(Z_n\in H_{\lambda,a})=\mathbb{P}(\langle\lambda,Z_n\rangle\ge\langle\lambda,a\rangle)\le e^{-n\langle\lambda,a\rangle}\mathbb{E},e^{n\langle\lambda,Z_n\rangle}=e^{-n\langle\lambda,a\rangle+\Lambda_n(\lambda)}. $$ Taking $\frac{1}{n} lo g$ and $lim sup_{n}$ , and using $\frac{1}{n} Λ_{n} (λ) \to Λ (λ)$ , $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in H_{\lambda,a})\le -\big(\langle\lambda,a\rangle-\Lambda(\lambda)\big). $$ Now let $K$ be compact and $m = in f_{K} Λ^{*}$ . For each $x \in K$ pick $λ_{x}$ with $⟨ λ_{x}, x ⟩ - Λ (λ_{x}) > min (m, 1/ δ) - δ$ (possible by definition of $Λ^{*}$ , capping at a large value where $Λ^{*} (x) = + \infty$ ); the open half-spaces ${y : ⟨ λ_{x}, y ⟩ > ⟨ λ_{x}, x ⟩ - δ}$ cover $K$ , so a finite subcover $x_{1}, \dots, x_{N}$ exists. Then $P (Z_{n} \in K) \leq \sum_{i = 1}^{N} P (Z_{n} \in H_{λ_{x_{i}}, x_{i} - δ^{'}})$ , and the finite-union (max) rule of 37.07.01 gives $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le -\min_i\big(\langle\lambda_{x_i},x_i\rangle-\Lambda(\lambda_{x_i})\big)+O(\delta)\le -(m-O(\delta)). $$ Letting $δ ↓ 0$ gives $lim sup_{n} \frac{1}{n} lo g P (Z_{n} \in K) \leq - in f_{K} Λ^{*}$ on compacts. Exponential tightness — supplied by finiteness of $Λ$ on a ball $\overset{ˉ}{B} (0, ρ) \subset dom Λ$ , which forces $Λ^{*} (x) \geq ρ ∥ x ∥ - max_{∥ λ ∥ \leq ρ} Λ (λ) \to \infty$ as in 37.07.03 Exercise 7 — upgrades the compact bound to all closed sets via 37.07.01.

Proof — lower bound at exposed points (tilting against $Λ$ ). Let $x$ be an exposed point with exposing tilt $η \in int (dom Λ)$ , so $⟨ η, x ⟩ - Λ^{*} (x) > ⟨ η, y ⟩ - Λ^{*} (y)$ for $y \neq = x$ , equivalently $Λ^{*} (x) = ⟨ η, x ⟩ - Λ (η)$ with $η \in \partial Λ^{*} (x)$ (Fenchel-Young equality 37.07.03). Fix open $G ∋ x$ and $δ > 0$ with $B (x, δ) \subseteq G$ . Define the tilted laws $P_{n}$ of $Z_{n}$ by $$ \frac{d\widetilde{\mathbb{P}}_n}{d\mathbb{P}}=\exp!\big(n\langle\eta,Z_n\rangle-\Lambda_n(\eta)\big), $$ a probability measure since $E e^{n ⟨ η, Z_{n} ⟩ - Λ_{n} (η)} = 1$ . Reversing the change of measure on $A_{n} = {Z_{n} \in B (x, δ)}$ , $$ \mathbb{P}(A_n)=\widetilde{\mathbb{E}}n\big[\mathbf 1{A_n},e^{-n\langle\eta,Z_n\rangle+\Lambda_n(\eta)}\big]. $$ On $A_{n}$ , $⟨ η, Z_{n} ⟩ \leq ⟨ η, x ⟩ + ∥ η ∥ δ$ , so $e^{- n ⟨ η, Z_{n} ⟩} \geq e^{- n ⟨ η, x ⟩ - n ∥ η ∥ δ}$ , giving $$ \mathbb{P}(A_n)\ge e^{-n\langle\eta,x\rangle-n|\eta|\delta+\Lambda_n(\eta)},\widetilde{\mathbb{P}}_n(A_n). $$ Taking $\frac{1}{n} lo g$ , using $\frac{1}{n} Λ_{n} (η) \to Λ (η)$ and $Λ^{*} (x) = ⟨ η, x ⟩ - Λ (η)$ , $$ \liminf_n\tfrac1n\log\mathbb{P}(A_n)\ge -\Lambda^(x)-|\eta|\delta+\liminf_n\tfrac1n\log\widetilde{\mathbb{P}}_n(A_n). $$ It remains to show $P_{n} (A_{n}) \to 1$ , i.e. that under the tilt $Z_{n}$ concentrates at $x$ . The tilted scaled log moment generating function is $λ \mapsto Λ (η + λ) - Λ (η)$ , whose gradient at $λ = 0$ is $\nablaΛ (η) = x$ (the exposing relation reads $x = \nablaΛ (η)$ ). Its conjugate has a unique zero at $x$ ; applying the already-proved upper bound under $P_{n}$ to the closed set $B (x, δ)^{c}$ shows $P_{n} (B (x, δ)^{c}) \to 0$ , hence $P_{n} (A_{n}) \to 1$ and its $\frac{1}{n} lo g \to 0$ . Therefore $\liminf_n\tfrac1n\log\mathbb{P}(A_n)\ge-\Lambda^(x)-|\eta|\delta $; l e t$ \delta\downarrow0 $. T ak in g t h es u p r e m u m o v er e x p ose d$ x\in G $g i v es$ \liminf_n\tfrac1n\log\mathbb{P}(Z_n\in G)\ge-\inf_{G\cap\mathcal F}\Lambda^*$.

Proof — the essential-smoothness upgrade. When $Λ$ is essentially smooth, steepness makes $\nablaΛ$ a surjection of $int (dom Λ)$ onto $int (dom Λ^{*})$ : every $x \in int (dom Λ^{*})$ equals $\nablaΛ (η)$ for some interior $η$ , and convex duality 37.07.03 makes such $x$ exposed with exposing tilt $η$ . Thus $F \supseteq int (dom Λ^{*})$ , and since $Λ^{*}$ is good (hence $in f_{G} Λ^{*} = in f_{G \cap int dom Λ^{*}} Λ^{*}$ by lower-semicontinuity), the exposed lower bound equals $- in f_{G} Λ^{*}$ . The matching upper and lower bounds are the LDP with good rate $Λ^{*}$ . $□$

Bridge. This theorem builds toward the entire dependent-data large-deviations toolkit — Markov-chain occupation costs, queueing under correlated input, Gaussian-field excursions — and appears again in the Donsker-Varadhan theory 37.07.06, where the limiting $Λ$ is computed as a principal eigenvalue and then fed into this exact conjugation. This is exactly the Cramér proof of 37.07.02 with the single-sample cumulant generating function replaced by the limit $Λ$ : the Chernoff upper bound and the exponential-tilting lower bound are reused verbatim, now indexed by $Λ_{n}$ . The central insight is that all dependence is compressed into one convex limit $Λ$ , after which the rare-event cost is its conjugate by the same Fenchel-Young equality of 37.07.03; putting these together, Gärtner-Ellis generalises Cramér 37.07.02 from independent to arbitrary correlated sequences and is dual to the moment description through the Legendre-Fenchel transform, with steepness the precise condition that makes the construction's optimal tilt always exist.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

A two-state Markov chain on ${0, 1}$ has additive functional $S_{n} = \sum_{i = 1}^{n} f (X_{i})$ with $f (0) = 0, f (1) = 1$ . The limiting curve is $Λ (λ) = lo g ρ (P_{λ})$ , the log of the dominant eigenvalue of the tilted matrix $P_{λ} = P diag (e^{λ f (0)}, e^{λ f (1)})$ . For $P=\begin{psmallmatrix}1-a&a\\b&1-b\end{psmallmatrix}$ , write $P_{λ}$ and the characteristic equation for $ρ (P_{λ})$ .

Hint

Multiply column $1$ by $e^{0} = 1$ and column $2$ by $e^{λ}$ , then solve $det (P_{λ} - ρ I) = 0$ .

Answer

$P_\lambda=\begin{psmallmatrix}1-a & a\,e^{\lambda}\\ b & (1-b)e^{\lambda}\end{psmallmatrix}$ . The characteristic equation $det (P_{λ} - ρ I) = 0$ is $$ \rho^2-\big((1-a)+(1-b)e^{\lambda}\big)\rho+e^{\lambda}\big((1-a)(1-b)-ab\big)=0, $$ so $ρ (P_{λ}) = \frac{1}{2} [(1 - a) + (1 - b) e^{λ} + ((1 - a) + (1 - b) e^{λ})^{2} - 4 e^{λ} ((1 - a) (1 - b) - ab)]$ and $Λ (λ) = lo g ρ (P_{λ})$ . This $Λ$ is the Gärtner-Ellis input for the chain's additive functional; the Perron-Frobenius eigenvalue is smooth and steep, so the full LDP holds with rate $Λ^{*}$ .

Exercise 4 (medium, symbolic).

Show that if $Λ (λ) = \frac{1}{2} ⟨ Σ λ, λ ⟩$ for a symmetric positive-definite $Σ$ (the general Gaussian limit), then $Λ^{*} (x) = \frac{1}{2} ⟨ Σ^{- 1} x, x ⟩$ , and confirm $Λ$ is essentially smooth so the full LDP holds.

Hint

Conjugate as in 37.07.03 Exercise 2; for steepness note $dom Λ = R^{d}$ .

Answer

Maximising $⟨ λ, x ⟩ - \frac{1}{2} ⟨ Σ λ, λ ⟩$ gives stationary point $x - Σ λ = 0$ , i.e. $λ = Σ^{- 1} x$ , and $Λ^{*} (x) = ⟨ Σ^{- 1} x, x ⟩ - \frac{1}{2} ⟨ Σ Σ^{- 1} x, Σ^{- 1} x ⟩ = \frac{1}{2} ⟨ Σ^{- 1} x, x ⟩$ . Since $dom Λ = R^{d}$ and $Λ$ is everywhere differentiable, the boundary of the domain is empty, so steepness holds vacuously and $Λ$ is essentially smooth. Hence Gärtner-Ellis gives the full LDP with the Gaussian rate $\frac{1}{2} ⟨ Σ^{- 1} x, x ⟩$ ; $Σ$ is the long-run covariance, into which all correlations are folded.

Exercise 6 (hard, symbolic).

Carry out the tilting lower bound for a half-line: with $d = 1$ , $x > Λ^{'} (0)$ an exposed point with interior exposing tilt $η > 0$ ( $Λ^{'} (η) = x$ ), show $lim inf_{n} \frac{1}{n} lo g P (Z_{n} \geq x) \geq - Λ^{*} (x)$ .

Hint

Tilt by $η$ ; restrict to ${x \leq Z_{n} < x + δ}$ and bound the density factor $e^{- n η Z_{n}}$ there using $η > 0$ .

Answer

Tilt by $η$ via $d P_{n} / d P = e^{n η Z_{n} - Λ_{n} (η)}$ . Then $$ \mathbb{P}(Z_n\ge x)\ge\widetilde{\mathbb{E}}n\big[\mathbf 1{{x\le Z_n<x+\delta}}e^{-n\eta Z_n+\Lambda_n(\eta)}\big]. $$ On ${Z_{n} < x + δ}$ with $η > 0$ , $e^{- n η Z_{n}} \geq e^{- n η (x + δ)}$ , so $$ \mathbb{P}(Z_n\ge x)\ge e^{-n\eta(x+\delta)+\Lambda_n(\eta)},\widetilde{\mathbb{P}}_n(x\le Z_n<x+\delta). $$ Take $\frac{1}{n} lo g$ : using $\frac{1}{n} Λ_{n} (η) \to Λ (η)$ and $Λ^{*} (x) = η x - Λ (η)$ , the exponential factor tends to $- Λ^{*} (x) - η δ$ . Under $P_{n}$ the tilted curve $Λ (η + \cdot) - Λ (η)$ has slope $x$ at $0$ , so $Z_{n}$ concentrates at $x$ and $P_{n} (x \leq Z_{n} < x + δ)$ is bounded below by a positive constant (mass on one side of the tilted mean), giving $\frac{1}{n} lo g \to 0$ . Hence $lim inf_{n} \frac{1}{n} lo g P (Z_{n} \geq x) \geq - Λ^{*} (x) - η δ$ ; let $δ ↓ 0$ .

Exercise 7 (hard, symbolic).

Exhibit a non-steep limit $Λ$ where the bare Gärtner-Ellis lower bound fails to reach $- in f_{G} Λ^{*}$ , and identify the non-exposed values.

Hint

Reuse $Λ (λ) = 1 + λ^{2}$ from 37.07.03: $\nablaΛ$ has bounded range $(- 1, 1)$ .

Answer

Suppose a sequence $Z_{n}$ has limit $Λ (λ) = 1 + λ^{2} - 1$ (subtracting $1$ enforces $Λ (0) = 0$ ), which is smooth and strictly convex with $Λ^{'} (λ) = λ / 1 + λ^{2} \in (- 1, 1)$ , bounded — hence not steep. The conjugate is $Λ^{*} (x) = 1 - 1 - x^{2}$ on $[- 1, 1]$ and $+ \infty$ outside. Only $x \in (- 1, 1)$ are realised as $Λ^{'} (η)$ for finite $η$ , so the exposed set is $F = (- 1, 1)$ ; the endpoints $x = \pm 1$ , where $Λ^{*} = 1$ , are non-exposed (no finite tilt reaches them). For $G$ a neighbourhood of $1$ within $[1, \infty)$ , $in f_{G \cap F} Λ^{*}$ is taken over points $< 1$ and the bare lower bound is governed by interior values, while the true infimum $in f_{G} Λ^{*} = Λ^{*} (1) = 1$ is attained only at the non-exposed endpoint. Steepness is exactly what would close this gap; without it the lower bound is genuinely weaker.

Exercise 8 (hard, symbolic).

Prove that the limiting curve $Λ (λ) = lim_{n} \frac{1}{n} lo g E e^{n ⟨ λ, Z_{n} ⟩}$ is convex whenever the limit exists, with no independence assumption on $Z_{n}$ .

Hint

Each $\frac{1}{n} Λ_{n}$ is convex by Hölder; a pointwise limit of convex functions is convex.

Answer

Fix $θ \in [0, 1]$ and $λ_{1}, λ_{2}$ . For each $n$ , applying Hölder's inequality with exponents $1/ θ, 1/ (1 - θ)$ to the factorisation $e^{n ⟨ θ λ_{1} + (1 - θ) λ_{2}, Z_{n} ⟩} = e^{n θ ⟨ λ_{1}, Z_{n} ⟩} \cdot e^{n (1 - θ) ⟨ λ_{2}, Z_{n} ⟩}$ gives $$ \mathbb{E},e^{n\langle\theta\lambda_1+(1-\theta)\lambda_2,Z_n\rangle}\le\big(\mathbb{E},e^{n\langle\lambda_1,Z_n\rangle}\big)^{\theta}\big(\mathbb{E},e^{n\langle\lambda_2,Z_n\rangle}\big)^{1-\theta}, $$ which uses no independence — only the single-expectation Hölder inequality. Taking $\frac{1}{n} lo g$ , $\frac{1}{n} Λ_{n} (θ λ_{1} + (1 - θ) λ_{2}) \leq θ \frac{1}{n} Λ_{n} (λ_{1}) + (1 - θ) \frac{1}{n} Λ_{n} (λ_{2})$ . Passing to the limit $n \to \infty$ preserves the inequality, so $Λ (θ λ_{1} + (1 - θ) λ_{2}) \leq θ Λ (λ_{1}) + (1 - θ) Λ (λ_{2})$ : $Λ$ is convex. Convexity is what guarantees $Λ^{*}$ is a sensible (convex, lsc) rate function via 37.07.03.

Advanced results Master

Markov additive functionals and the tilted-generator eigenvalue

For an ergodic finite-state Markov chain ${X_{i}}$ with transition matrix $P$ and a function $f$ , the additive functional $S_{n} = \sum_{i = 1}^{n} f (X_{i})$ has $Z_{n} = S_{n} / n$ with limiting curve $$ \Lambda(\lambda)=\log\rho\big(P_\lambda\big),\qquad (P_\lambda){xy}=P{xy},e^{\lambda f(y)}, $$ where $ρ (P_{λ})$ is the Perron-Frobenius eigenvalue of the tilted matrix $P_{λ}$ . That this is the limit follows from $E e^{λ S_{n}} = ⟨ μ_{0}, P_{λ}^{n} 1 ⟩$ and the spectral-gap asymptotics $P_{λ}^{n} \sim ρ (P_{λ})^{n} v_{λ} w_{λ}^{⊤}$ ; the irreducibility and positivity of $P_{λ}$ (for $λ$ where entries stay finite) make $ρ (P_{λ})$ a simple, analytic, strictly-log-convex eigenvalue by Perron-Frobenius, hence $Λ$ is smooth and steep on the interior of its domain ^{[den Hollander §III.4]}. Gärtner-Ellis then yields the LDP for $S_{n} / n$ with rate $Λ^{*}$ , the level-1 rate for the chain. Contracting the level-2 occupation-measure LDP (whose rate is the Donsker-Varadhan functional) along the map $ν \mapsto \int f d ν$ recovers the same $Λ^{*}$ , exhibiting Gärtner-Ellis as the contracted shadow of the empirical-measure principle.

Stationary Gaussian processes and the spectral-density rate

For a centred stationary Gaussian sequence with summable autocovariance $r (k) = Cov (X_{0}, X_{k})$ and spectral density $g (ω) = \sum_{k} r (k) e^{ik ω}$ , the sample mean $Z_{n} = \overset{ˉ}{X}_{n}$ has $Var (n \overset{ˉ}{X}_{n}) = \sum_{∣ k ∣ < n} (n - ∣ k ∣) r (k) \sim n g (0)$ , so $Λ_{n} (λ) = \frac{1}{2} λ^{2} Var (n \overset{ˉ}{X}_{n}) / n \cdot n$ yields $$ \Lambda(\lambda)=\tfrac12,g(0),\lambda^2,\qquad \Lambda^*(x)=\frac{x^2}{2,g(0)}. $$ The long-run variance $g (0) = \sum_{k} r (k)$ — the spectral density at frequency zero — is the single scalar into which all temporal correlation collapses. Because $dom Λ = R$ has empty boundary, $Λ$ is steep by vacuity and the full LDP holds. Quadratic functionals (e.g. empirical covariances) give a non-Gaussian $Λ$ involving $lo g det$ of a tilted Toeplitz operator, where steepness must be checked against the spectral support, and phase-transition corners can appear when $g$ touches zero.

The role of steepness and the gap at non-exposed points

The proof delivers the upper bound from convexity of $Λ$ alone and the lower bound only at exposed points. The map closing the gap is $\nablaΛ : int (dom Λ) \to int (dom Λ^{*})$ . Steepness forces this map to be onto: as $η$ runs to the boundary of $dom Λ$ , $∣\nablaΛ (η) ∣ \to \infty$ , so the image exhausts $int (dom Λ^{*})$ , and every interior target is exposed with interior exposing tilt. Drop steepness and $\nablaΛ$ has bounded range; targets beyond that range are non-exposed, the tilt realising them does not exist, and the lower bound stalls — the same loss of surjectivity of $\nablaΛ$ recorded for $1 + λ^{2}$ in 37.07.03, now deciding whether the dependent-data LDP rate is the full conjugate or only its exposed restriction. A non-differentiable $Λ$ (a corner) marks a flat segment of $Λ^{*}$ and, in statistical-mechanics applications, a first-order phase transition.

The abstract sub-additive route

When the scaled log moment generating function is hard to compute, the rate can still be defined intrinsically. If $c_{n} (x) = - lo g P (Z_{n} \in B (x, δ))$ is approximately super-additive, $c_{m + n} \leq c_{m} + c_{n} + o (m + n)$ , Fekete's lemma makes $lim_{n} \frac{1}{n} c_{n} (x)$ exist and serve as the rate, with no recourse to $Λ$ . This abstract Cramér theorem of Bahadur and Zabell ^{[Bahadur & Zabell 1979]} subsumes Gärtner-Ellis: when $Λ$ exists and is essentially smooth, the Fekete limit equals $Λ^{*}$ , but the sub-additive formulation continues to apply to stationary mixing sequences and Banach-space-valued averages where no manageable $Λ$ is available ^{[Bryc 1990]}.

Synthesis. The central insight of Gärtner-Ellis is that arbitrary dependence in ${Z_{n}}$ is compressed into a single convex limit $Λ (λ) = lim_{n} \frac{1}{n} lo g E e^{n ⟨ λ, Z_{n} ⟩}$ , after which the rare-event cost is exactly its Fenchel-Legendre conjugate $Λ^{*}$ , so the principle is the dependent-data realisation of the conjugacy proved in 37.07.03. The foundational reason the bounds coincide is that both compute the same supremum: the Chernoff upper bound optimises the exponential tilt and the tilting lower bound realises the optimal tilt as a change of measure, with Fenchel-Young equality identifying them at exposed points — this is exactly the two-bets structure of Cramér 37.07.02, now run against $Λ$ rather than a single-sample cumulant generating function. Putting these together, Gärtner-Ellis generalises Cramér from independent to correlated data and is dual to the moment-generating description, while the bridge is steepness: it makes $\nablaΛ$ surject onto $int (dom Λ^{*})$ , so every interior value is exposed and the exposed lower bound upgrades to the full LDP. The construction appears again in the Donsker-Varadhan eigenvalue formula 37.07.06, which supplies the explicit $Λ$ for Markov additive functionals, and the bridge is that the level-1 rate $Λ^{*}$ proved here is the contraction of the level-2 occupation-measure rate along the mean map.

Full proof set Master

Proposition 1 (Chernoff upper bound on a half-space against $Λ$ ). Let ${Z_{n}}$ in $R^{d}$ have $\frac{1}{n} Λ_{n} (λ) \to Λ (λ)$ for $λ \in dom Λ$ . For the half-space $H = {x : ⟨ λ, x ⟩ \geq ⟨ λ, a ⟩}$ , $lim sup_{n} \frac{1}{n} lo g P (Z_{n} \in H) \leq - (⟨ λ, a ⟩ - Λ (λ))$ .

Proof. Markov's inequality on the non-negative variable $e^{n ⟨ λ, Z_{n} ⟩}$ : $$ \mathbb{P}(Z_n\in H)=\mathbb{P}(\langle\lambda,Z_n\rangle\ge\langle\lambda,a\rangle)\le e^{-n\langle\lambda,a\rangle}\mathbb{E},e^{n\langle\lambda,Z_n\rangle}=e^{-n\langle\lambda,a\rangle+\Lambda_n(\lambda)}. $$ Take $\frac{1}{n} lo g$ to get $\frac{1}{n} lo g P (Z_{n} \in H) \leq - ⟨ λ, a ⟩ + \frac{1}{n} Λ_{n} (λ)$ , then $lim sup_{n}$ and $\frac{1}{n} Λ_{n} (λ) \to Λ (λ)$ give the claim. $□$

Proposition 2 (compact upper bound). Under the hypotheses of Proposition 1 with $0 \in int (dom Λ)$ , for every compact $K$ , $\limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le-\inf_K\Lambda^$.*

Proof. Fix $δ > 0$ . For each $x \in K$ choose $λ_{x} \in dom Λ$ with $⟨ λ_{x}, x ⟩ - Λ (λ_{x}) \geq min (Λ^{*} (x), 1/ δ) - δ$ , possible by definition of $Λ^{*} (x) = sup_{λ} (⟨ λ, x ⟩ - Λ (λ))$ . The open half-spaces $U_{x} = {y : ⟨ λ_{x}, y ⟩ > ⟨ λ_{x}, x ⟩ - δ}$ contain $x$ and cover $K$ ; extract a finite subcover $U_{x_{1}}, \dots, U_{x_{N}}$ . Each $U_{x_{i}} \subseteq \overline{U_{x_{i}}} = H_{λ_{x_{i}}, x_{i} - δ^{'}}$ (a closed half-space with the constant shifted by $δ ∥ λ_{x_{i}} ∥$ ), so by Proposition 1 and the finite-union rule of 37.07.01, $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le-\min_{1\le i\le N}\big(\langle\lambda_{x_i},x_i\rangle-\Lambda(\lambda_{x_i})-\delta|\lambda_{x_i}|\big)\le-\big(\inf_K\Lambda^*-c\delta\big), $$ for a constant $c$ from the bounded selected tilts. Let $δ ↓ 0$ . $□$

Proposition 3 (exposed-point lower bound). Let $x$ be exposed for $\Lambda^ $w i t h e x p os in g t i l t$ \eta\in\operatorname{int}(\operatorname{dom}\Lambda) $an d$ \nabla\Lambda(\eta)=x $. T h e n f or e v er y o p e n$ G\ni x $,$ \liminf_n\tfrac1n\log\mathbb{P}(Z_n\in G)\ge-\Lambda^(x)$.

Proof. Pick $δ > 0$ with $B (x, δ) \subseteq G$ and let $A_{n} = {Z_{n} \in B (x, δ)}$ . Define $P_{n}$ by $d P_{n} / d P = e^{n ⟨ η, Z_{n} ⟩ - Λ_{n} (η)}$ (total mass $1$ ). Reversing the tilt, $$ \mathbb{P}(A_n)=\widetilde{\mathbb{E}}n\big[\mathbf 1{A_n}e^{-n\langle\eta,Z_n\rangle+\Lambda_n(\eta)}\big]\ge e^{-n\langle\eta,x\rangle-n|\eta|\delta+\Lambda_n(\eta)},\widetilde{\mathbb{P}}_n(A_n), $$ using $⟨ η, Z_{n} ⟩ \leq ⟨ η, x ⟩ + ∥ η ∥ δ$ on $A_{n}$ . Take $\frac{1}{n} lo g$ ; the exponential prefactor tends to $- (⟨ η, x ⟩ - Λ (η)) - ∥ η ∥ δ = - Λ^{*} (x) - ∥ η ∥ δ$ by Fenchel-Young equality at the exposed point 37.07.03. Under $P_{n}$ the scaled log moment generating function is $λ \mapsto Λ (η + λ) - Λ (η)$ , with gradient $\nablaΛ (η) = x$ at $λ = 0$ and conjugate vanishing only at $x$ ; applying Proposition 2 under $P_{n}$ to the compact-in- $G^{c}$ pieces of $B (x, δ)^{c}$ gives $P_{n} (A_{n}) \to 1$ , so $\frac{1}{n} lo g P_{n} (A_{n}) \to 0$ . Hence $lim inf_{n} \frac{1}{n} lo g P (A_{n}) \geq - Λ^{*} (x) - ∥ η ∥ δ$ ; let $δ ↓ 0$ . $□$

Proposition 4 (essential smoothness gives the full LDP). If $Λ$ is essentially smooth and lsc with $0 \in int (dom Λ)$ , then every $x\in\operatorname{int}(\operatorname{dom}\Lambda^) $i se x p ose d w i t hin t er i or e x p os in g t i l t, an d$ {Z_n} $s a t i s f i es t h e L D P a t s p ee d$ 1/n $w i t h g oo d r a t e$ \Lambda^$.

Proof. By steepness, $\nablaΛ : int (dom Λ) \to R^{d}$ has image all of $int (dom Λ^{*})$ : differentiability and strict convexity (on the relative interior) make $\nablaΛ$ injective, and steepness ( $∣\nablaΛ (η_{k}) ∣ \to \infty$ at the domain boundary) prevents the image from having a finite boundary inside $int (dom Λ^{*})$ , so the open image equals $int (dom Λ^{*})$ 37.07.03. For each such $x = \nablaΛ (η)$ , the Fenchel-Young equality and strict convexity give the strict inequality defining exposedness with exposing tilt $η \in int (dom Λ)$ . Hence Proposition 3 applies at every interior $x$ , and since $Λ^{*}$ is good (lsc with compact sublevels, by $0 \in int dom Λ$ as in 37.07.03) we have $in f_{G} Λ^{*} = in f_{G \cap int dom Λ^{*}} Λ^{*}$ for open $G$ . Thus the lower bound reads $- in f_{G} Λ^{*}$ , matching Proposition 2's upper bound extended to closed sets by exponential tightness 37.07.01. The two bounds are the LDP. $□$

Connections Master

Gärtner-Ellis is the dependent-data generalisation of Cramér's theorem 37.07.02: the Chernoff upper bound and the exponential-tilting lower bound are imported wholesale, with the single-sample cumulant generating function replaced by the limiting $Λ$ , so the i.i.d. case where $Λ (λ) = lo g E e^{⟨ λ, X_{1} ⟩}$ recovers Cramér exactly.
The conjugacy $I = Λ^{*}$ , the Fenchel-Young equality at exposed points, the essential-smoothness/steepness condition, and the goodness of $Λ^{*}$ are all the convex-duality results of 37.07.03; steepness there decides surjectivity of $\nablaΛ$ , which is precisely what turns the exposed-point lower bound into the full LDP here.
The Donsker-Varadhan variational formula 37.07.06 computes the limiting $Λ$ for a Markov additive functional as the log principal eigenvalue of the tilted generator, supplying the explicit input that Gärtner-Ellis conjugates; the level-1 rate $Λ^{*}$ proved here is the contraction of that unit's level-2 occupation-measure rate along the integration map $ν \mapsto \int f d ν$ .
Sanov's theorem 37.07.05 is the level-2 empirical-measure principle whose projective-limit machinery underlies abstract Gärtner-Ellis in function spaces; restricting Sanov to a bounded test function and contracting gives a scalar Gärtner-Ellis statement with the same limiting log moment generating function as input.

Historical & philosophical context Master

Jürgen Gärtner proved the dependent-data theorem in 1977 ^{[Gärtner 1977]} (Theory of Probability and its Applications 22, 24-39), studying large deviations from the invariant measure of a diffusion and isolating the convergence of the scaled logarithmic moment generating function as the only structural input needed. The $R^{d}$ formulation under the now-standard essential-smoothness hypothesis, with the exposed-point analysis controlling the lower bound, is due to Richard Ellis in 1984 ^{[Ellis 1984]} (Annals of Probability 12, 1-12), whence the theorem's joint attribution. The systematic textbook treatment, including the half-space cover for the upper bound and the steepness upgrade, is that of Dembo and Zeitouni ^{[Dembo & Zeitouni §2.3]}.

The abstract sub-additive route that subsumes the theorem when $Λ$ is intractable was developed by Bahadur and Zabell in 1979 ^{[Bahadur & Zabell 1979]} (Annals of Probability 7, 587-621) for averages in general vector spaces, and extended to stationary mixing sequences by Bryc ^{[Bryc 1990]}. The Markov-additive-functional applications, where $Λ$ is the Perron-Frobenius eigenvalue of a tilted kernel, connect the theorem to the Donsker-Varadhan large-deviation theory of occupation measures developed across the same period; the identification of $Λ$ with a free energy and $Λ^{*}$ with an entropy places the non-differentiable case of $Λ$ in correspondence with first-order phase transitions, the reading emphasised in Ellis's monograph and in den Hollander's account ^{[den Hollander §III.4]}.

Bibliography Master

@article{gartner1977large,
  author  = {G\"artner, J\"urgen},
  title   = {On large deviations from the invariant measure},
  journal = {Theory of Probability and its Applications},
  volume  = {22},
  number  = {1},
  pages   = {24--39},
  year    = {1977}
}

@article{ellis1984large,
  author  = {Ellis, Richard S.},
  title   = {Large deviations for a general class of random vectors},
  journal = {Annals of Probability},
  volume  = {12},
  number  = {1},
  pages   = {1--12},
  year    = {1984}
}

@article{bahadurzabell1979large,
  author  = {Bahadur, R. R. and Zabell, S. L.},
  title   = {Large deviations of the sample mean in general vector spaces},
  journal = {Annals of Probability},
  volume  = {7},
  number  = {4},
  pages   = {587--621},
  year    = {1979}
}

@article{bryc1990large,
  author  = {Bryc, W{\l}odzimierz},
  title   = {On the large deviation principle for stationary processes},
  journal = {Studia Mathematica},
  volume  = {95},
  number  = {2},
  pages   = {131--145},
  year    = {1990}
}

@book{dembozeitouni1998ldp,
  author    = {Dembo, Amir and Zeitouni, Ofer},
  title     = {Large Deviations Techniques and Applications},
  edition   = {2nd},
  series    = {Applications of Mathematics},
  number    = {38},
  publisher = {Springer},
  year      = {1998}
}

@book{denhollander2000large,
  author    = {den Hollander, Frank},
  title     = {Large Deviations},
  series    = {Fields Institute Monographs},
  number    = {14},
  publisher = {American Mathematical Society},
  year      = {2000}
}

Prerequisites

37.07.02
37.07.03

Tier anchors

beginner: Touchette 2009 *The large deviation approach to statistical mechanics* (Physics Reports 478) §4.4 (the Gärtner-Ellis route for correlated data); Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.3 (the dependent-data generalisation of Cramér)
intermediate: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.3 (the Gärtner-Ellis theorem, Theorem 2.3.6; the upper bound for compacts, the exposed-point lower bound, the essential-smoothness upgrade); den Hollander 2000 *Large Deviations* (AMS Fields Institute Monographs) §III.4
master: Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.3-§2.4 (Gärtner-Ellis in $\mathbb{R}^d$, Theorem 2.3.6, Lemma 2.3.9 exposed hyperplanes, Corollary 2.3.7 steepness); Ellis 1984 *Large deviations for a general class of random vectors* (Annals of Probability 12); Bryc 1990 *On the large deviation principle for stationary processes* (and the additive-functional applications)

References

Dembo, A. & Zeitouni, O. — Large Deviations Techniques and Applications, 2nd ed. (Springer, 1998) · §2.3 (Theorem 2.3.6 Gärtner-Ellis; Lemma 2.3.9 exposed points; Corollary 2.3.7 essential smoothness gives the full LDP)
Gärtner, J. — On large deviations from the invariant measure · Theory of Probability and its Applications 22 (1977), 24-39 (the original dependent-data theorem)
Ellis, R. S. — Large deviations for a general class of random vectors · Annals of Probability 12 (1984), 1-12 (the $\mathbb{R}^d$ theorem under essential smoothness)
den Hollander, F. — Large Deviations (AMS Fields Institute Monographs 14, 2000) · §III.4 (Gärtner-Ellis, Markov additive functionals, Gaussian processes)
Bryc, W. — On the large deviation principle for stationary processes · Studia Mathematica 95 (1990), 131-145 (additive functionals of mixing sequences)
Bahadur, R. R. & Zabell, S. L. — Large deviations of the sample mean in general vector spaces · Annals of Probability 7 (1979), 587-621 (the abstract sub-additive Cramér theorem)

Estimated time

beginner: 18m
intermediate: 44m
master: 78m