37.07.04 · probability / 07-large-deviations

The Gärtner-Ellis Theorem

shipped3 tiersLean: none

Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.3-§2.4 (Gärtner-Ellis in $\mathbb{R}^d$, Theorem 2.3.6, Lemma 2.3.9 exposed hyperplanes, Corollary 2.3.7 steepness); Ellis 1984 *Large deviations for a general class of random vectors* (Annals of Probability 12); Bryc 1990 *On the large deviation principle for stationary processes* (and the additive-functional applications)

Intuition Beginner

Cramér's theorem priced the rare deviations of an average of many independent, identical samples. But most real data are neither independent nor identical: today's measurement leans on yesterday's, a Markov chain remembers its last state, a Gaussian process is correlated across its whole length. The Gärtner-Ellis theorem keeps the rare-event accounting of Cramér while throwing out the independence that made it easy.

The trick is to stop asking about single samples and ask only one thing of the whole sequence: how fast does its tuned-average grow? Take your quantity — already an average-like object built from pieces — and tune it with a dial , exactly as in Cramér. Look at the logarithm of the tuned expectation, divided by to keep it from blowing up. If, as grows, this settles down to a definite limiting curve , that single curve is all the dependence structure you need. You never have to know how the pieces correlate; you only need the limit to exist.

From that limiting curve the rate function is read off by the same geometric flip as before: the Legendre-Fenchel transform turns the dial-language curve into the outcome-language cost . The cost of seeing near a forbidden value decays like .

There is one new piece of fine print, and it has teeth. To make the cheap upper bound and the constructive lower bound meet — to push the rare event into existence by tilting, as Cramér did — the limiting curve must be steep: its slope must run off to infinity as the dial approaches the edge of where the tuned average stays finite. Steepness guarantees that every target value is hit by some dial setting, so the tilt that realises it always exists. When the curve is steep and smooth, the two bounds close and the rate is exactly . When it is not, the lower bound can leak at the values no dial can reach.

So Gärtner-Ellis is Cramér with the independence removed and one regularity condition added: give me a limiting tuned-growth curve that is smooth and steep, and I will give you the exponential cost of every deviation, dependence and all.

Visual Beginner

Figure: three stacked panels. The top panel shows several curves for increasing converging to a thick limiting curve over the dial-axis ; the limiting curve is a convex bowl. The middle panel marks the edge of the domain of with a dashed vertical wall and shows the slope of shooting up to vertical as the dial nears that wall — the steepness condition. The bottom panel shows the resulting rate function $\Lambda^(x)x$, touching zero at the typical value, with an arrow labelled "Legendre-Fenchel flip" carrying a slope in the middle panel to a point in the bottom panel.*

   Lambda_n(lambda) -> Lambda(lambda)         "dial" language
      |        _                        curves for n=1,2,3,... pile up
      |   __ ./ \. __                   onto a limiting convex bowl
------+----o-------o-----  lambda
      |
      |  slope of Lambda  -> infinity   STEEPNESS: as lambda nears the
      |  as lambda nears the wall ||    edge of its domain, the tangent
------+------------------------||--     turns vertical, so every target
      |                        ||       slope x is realised by some dial
                |
                |  Legendre-Fenchel flip  (slope x  ->  point x)
                v
   I(x)=Lambda*(x)                          "outcome" language
      |\                 /            valley bottom at the typical value,
      | \               /             where I = 0; I(x) > 0 elsewhere is
      |  \___       ___/              the exponential cost e^{-n I(x)} of
------+------\_____/----------- x     that deviation of the dependent data

Worked example Beginner

Consider a two-state weather chain: each day is Sun () or Rain (). It is sticky — a sunny day is followed by sun with probability , a rainy day by rain with probability . Let be the average daily score over days. The days are not independent, so Cramér does not apply directly; we use the Gärtner-Ellis input instead.

Step 1. Identify the tuned-growth limit. For such a chain the limiting curve is the logarithm of the largest growth-rate of the tuned transition table — its dominant eigenvalue. For this symmetric chain one can show the limit works out to $$ \Lambda(\lambda) = \log\Big( 0.8\cosh\lambda + \sqrt{0.64\cosh^2\lambda - 0.36},\Big). $$ We do not need its derivation here, only its values, which we read off numerically.

Step 2. Locate the typical value. The slope at zero, , is the long-run average score. By symmetry : equal sun and rain, average . That is the no-cost value.

Step 3. Price a deviation by the flip. Ask the cost of the average sitting at (three-quarters sunny). The flip gives at the dial whose slope matches . Solving numerically gives , with , so $$ \Lambda^*(0.5) = 0.60 \times 0.5 - 0.236 = 0.300 - 0.236 = 0.064. $$

Step 4. Read off the probability scale. Over days, the chance the average sits near decays like .

What this tells us. The dependence did not change the shape of the recipe at all — one limiting curve, one flip, one exponent. The stickiness of the weather is fully absorbed into the curve ; everything downstream is identical to Cramér. That absorption is the whole point of Gärtner-Ellis.

Check your understanding Beginner

Formal definition Intermediate+

Let be a sequence of -valued random vectors on a common probability space (no independence, no identical distribution assumed). The object that replaces the single-sample cumulant generating function of Cramér is its scaled limit.

Definition (scaled / limiting logarithmic moment generating function). For each set $$ \Lambda_n(\lambda) := \log \mathbb{E}, e^{n\langle \lambda, Z_n\rangle}, \qquad \lambda \in \mathbb{R}^d, $$ the logarithmic moment generating function of at speed . The limiting logarithmic moment generating function is the pointwise limit, assumed to exist in , $$ \boxed{;\Lambda(\lambda) ;:=; \lim_{n\to\infty} \frac{1}{n},\Lambda_n(\lambda) ;=; \lim_{n\to\infty}\frac{1}{n}\log\mathbb{E}, e^{n\langle\lambda, Z_n\rangle}.;} $$ We assume throughout that , where . The function is convex (a pointwise limit of the convex , each convex by Hölder as in 37.07.03) and satisfies .

For an i.i.d. average , one has , so is exactly the Cramér cumulant generating function — Gärtner-Ellis specialises to Cramér 37.07.02 in that case.

Definition (Fenchel-Legendre rate function). The candidate rate function is the conjugate of from 37.07.03, $$ \Lambda^(x) := \sup_{\lambda \in \mathbb{R}^d}\big(\langle\lambda, x\rangle - \Lambda(\lambda)\big), \qquad x \in \mathbb{R}^d. $$ By the conjugate machinery of 37.07.03, $\Lambda^0 \in \operatorname{int}(\operatorname{dom}\Lambda)$ — a good rate function with compact sublevel sets.

Definition (exposed point and exposing hyperplane). A point is an exposed point of if there exists (the exposing hyperplane) such that $$ \langle\lambda, x\rangle - \Lambda^(x) > \langle\lambda, y\rangle - \Lambda^(y) \quad\text{for all } y \neq x. $$ Write for the set of exposed points whose exposing lies in .

Definition (essential smoothness / steepness). is essentially smooth if , is differentiable on , and is steep: whenever , as in 37.07.03.

Theorem statement (Gärtner-Ellis). Assume exists as above with and is lower-semicontinuous.

(Upper bound.) For every closed , $$ \limsup_n \tfrac1n\log\mathbb{P}(Z_n\in F)\le -\inf_{x\in F}\Lambda^(x). $$ (Lower bound at exposed points.) For every open , $$ \liminf_n \tfrac1n\log\mathbb{P}(Z_n\in G)\ge -\inf_{x\in G\cap\mathcal{F}}\Lambda^(x). $$ (Full LDP.) If is additionally essentially smooth, then $\mathcal{F}=\operatorname{int}(\operatorname{dom}\Lambda^){Z_n}1/n\Lambda^$.

Counterexamples to common slips

  • Existence of is a real hypothesis, not automatic. If the scaled log moment generating functions oscillate — e.g. alternates between two laws with different cumulant behaviour along even and odd — the limit may fail to exist, and Gärtner-Ellis says nothing. One must verify convergence, typically via a spectral or mixing argument, before invoking the theorem.
  • The bare theorem gives only an exposed-point lower bound. Without essential smoothness, the lower bound is , which can be strictly larger (a weaker bound) than when meets only at non-exposed points. The full LDP with rate requires steepness; otherwise the true rate may be the lower-semicontinuous regularisation of restricted to exposed points.
  • may be non-differentiable, signalling a phase transition. A corner in corresponds to a flat segment (an affine stretch) in , i.e. an interval of values sharing one tilt. This is not a pathology to be smoothed away; in statistical mechanics it is a first-order phase transition, and the non-exposed points on the flat segment are exactly where the lower bound needs care.

Key theorem with proof Intermediate+

We prove the Gärtner-Ellis theorem in in the two-step form: the Chernoff upper bound from convexity of alone, then the exposed-point tilting lower bound, then the essential-smoothness upgrade that makes them meet. The arguments are the Cramér arguments of 37.07.02 run against the limiting rather than a single-sample cumulant generating function.

Theorem (Gärtner-Ellis). Let in have limiting logarithmic moment generating function , lower-semicontinuous with . Then the upper bound holds on all closed sets and the lower bound holds at exposed points with interior exposing tilt; if is essentially smooth, satisfies the LDP at speed with good rate $\Lambda^$.*

Proof — upper bound (Chernoff against ). Fix . By Markov's inequality applied to the non-negative variable , for the closed half-space , $$ \mathbb{P}(Z_n\in H_{\lambda,a})=\mathbb{P}(\langle\lambda,Z_n\rangle\ge\langle\lambda,a\rangle)\le e^{-n\langle\lambda,a\rangle}\mathbb{E},e^{n\langle\lambda,Z_n\rangle}=e^{-n\langle\lambda,a\rangle+\Lambda_n(\lambda)}. $$ Taking and , and using , $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in H_{\lambda,a})\le -\big(\langle\lambda,a\rangle-\Lambda(\lambda)\big). $$ Now let be compact and . For each pick with (possible by definition of , capping at a large value where ); the open half-spaces cover , so a finite subcover exists. Then , and the finite-union (max) rule of 37.07.01 gives $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le -\min_i\big(\langle\lambda_{x_i},x_i\rangle-\Lambda(\lambda_{x_i})\big)+O(\delta)\le -(m-O(\delta)). $$ Letting gives on compacts. Exponential tightness — supplied by finiteness of on a ball , which forces as in 37.07.03 Exercise 7 — upgrades the compact bound to all closed sets via 37.07.01.

Proof — lower bound at exposed points (tilting against ). Let be an exposed point with exposing tilt , so for , equivalently with (Fenchel-Young equality 37.07.03). Fix open and with . Define the tilted laws of by $$ \frac{d\widetilde{\mathbb{P}}_n}{d\mathbb{P}}=\exp!\big(n\langle\eta,Z_n\rangle-\Lambda_n(\eta)\big), $$ a probability measure since . Reversing the change of measure on , $$ \mathbb{P}(A_n)=\widetilde{\mathbb{E}}n\big[\mathbf 1{A_n},e^{-n\langle\eta,Z_n\rangle+\Lambda_n(\eta)}\big]. $$ On , , so , giving $$ \mathbb{P}(A_n)\ge e^{-n\langle\eta,x\rangle-n|\eta|\delta+\Lambda_n(\eta)},\widetilde{\mathbb{P}}_n(A_n). $$ Taking , using and , $$ \liminf_n\tfrac1n\log\mathbb{P}(A_n)\ge -\Lambda^(x)-|\eta|\delta+\liminf_n\tfrac1n\log\widetilde{\mathbb{P}}_n(A_n). $$ It remains to show , i.e. that under the tilt concentrates at . The tilted scaled log moment generating function is , whose gradient at is (the exposing relation reads ). Its conjugate has a unique zero at ; applying the already-proved upper bound under to the closed set shows , hence and its . Therefore $\liminf_n\tfrac1n\log\mathbb{P}(A_n)\ge-\Lambda^(x)-|\eta|\delta\delta\downarrow0x\in G\liminf_n\tfrac1n\log\mathbb{P}(Z_n\in G)\ge-\inf_{G\cap\mathcal F}\Lambda^*$.

Proof — the essential-smoothness upgrade. When is essentially smooth, steepness makes a surjection of onto : every equals for some interior , and convex duality 37.07.03 makes such exposed with exposing tilt . Thus , and since is good (hence by lower-semicontinuity), the exposed lower bound equals . The matching upper and lower bounds are the LDP with good rate .

Bridge. This theorem builds toward the entire dependent-data large-deviations toolkit — Markov-chain occupation costs, queueing under correlated input, Gaussian-field excursions — and appears again in the Donsker-Varadhan theory 37.07.06, where the limiting is computed as a principal eigenvalue and then fed into this exact conjugation. This is exactly the Cramér proof of 37.07.02 with the single-sample cumulant generating function replaced by the limit : the Chernoff upper bound and the exponential-tilting lower bound are reused verbatim, now indexed by . The central insight is that all dependence is compressed into one convex limit , after which the rare-event cost is its conjugate by the same Fenchel-Young equality of 37.07.03; putting these together, Gärtner-Ellis generalises Cramér 37.07.02 from independent to arbitrary correlated sequences and is dual to the moment description through the Legendre-Fenchel transform, with steepness the precise condition that makes the construction's optimal tilt always exist.

Exercises Intermediate+

Advanced results Master

Markov additive functionals and the tilted-generator eigenvalue

For an ergodic finite-state Markov chain with transition matrix and a function , the additive functional has with limiting curve $$ \Lambda(\lambda)=\log\rho\big(P_\lambda\big),\qquad (P_\lambda){xy}=P{xy},e^{\lambda f(y)}, $$ where is the Perron-Frobenius eigenvalue of the tilted matrix . That this is the limit follows from and the spectral-gap asymptotics ; the irreducibility and positivity of (for where entries stay finite) make a simple, analytic, strictly-log-convex eigenvalue by Perron-Frobenius, hence is smooth and steep on the interior of its domain [den Hollander §III.4]. Gärtner-Ellis then yields the LDP for with rate , the level-1 rate for the chain. Contracting the level-2 occupation-measure LDP (whose rate is the Donsker-Varadhan functional) along the map recovers the same , exhibiting Gärtner-Ellis as the contracted shadow of the empirical-measure principle.

Stationary Gaussian processes and the spectral-density rate

For a centred stationary Gaussian sequence with summable autocovariance and spectral density , the sample mean has , so yields $$ \Lambda(\lambda)=\tfrac12,g(0),\lambda^2,\qquad \Lambda^*(x)=\frac{x^2}{2,g(0)}. $$ The long-run variance — the spectral density at frequency zero — is the single scalar into which all temporal correlation collapses. Because has empty boundary, is steep by vacuity and the full LDP holds. Quadratic functionals (e.g. empirical covariances) give a non-Gaussian involving of a tilted Toeplitz operator, where steepness must be checked against the spectral support, and phase-transition corners can appear when touches zero.

The role of steepness and the gap at non-exposed points

The proof delivers the upper bound from convexity of alone and the lower bound only at exposed points. The map closing the gap is . Steepness forces this map to be onto: as runs to the boundary of , , so the image exhausts , and every interior target is exposed with interior exposing tilt. Drop steepness and has bounded range; targets beyond that range are non-exposed, the tilt realising them does not exist, and the lower bound stalls — the same loss of surjectivity of recorded for in 37.07.03, now deciding whether the dependent-data LDP rate is the full conjugate or only its exposed restriction. A non-differentiable (a corner) marks a flat segment of and, in statistical-mechanics applications, a first-order phase transition.

The abstract sub-additive route

When the scaled log moment generating function is hard to compute, the rate can still be defined intrinsically. If is approximately super-additive, , Fekete's lemma makes exist and serve as the rate, with no recourse to . This abstract Cramér theorem of Bahadur and Zabell [Bahadur & Zabell 1979] subsumes Gärtner-Ellis: when exists and is essentially smooth, the Fekete limit equals , but the sub-additive formulation continues to apply to stationary mixing sequences and Banach-space-valued averages where no manageable is available [Bryc 1990].

Synthesis. The central insight of Gärtner-Ellis is that arbitrary dependence in is compressed into a single convex limit , after which the rare-event cost is exactly its Fenchel-Legendre conjugate , so the principle is the dependent-data realisation of the conjugacy proved in 37.07.03. The foundational reason the bounds coincide is that both compute the same supremum: the Chernoff upper bound optimises the exponential tilt and the tilting lower bound realises the optimal tilt as a change of measure, with Fenchel-Young equality identifying them at exposed points — this is exactly the two-bets structure of Cramér 37.07.02, now run against rather than a single-sample cumulant generating function. Putting these together, Gärtner-Ellis generalises Cramér from independent to correlated data and is dual to the moment-generating description, while the bridge is steepness: it makes surject onto , so every interior value is exposed and the exposed lower bound upgrades to the full LDP. The construction appears again in the Donsker-Varadhan eigenvalue formula 37.07.06, which supplies the explicit for Markov additive functionals, and the bridge is that the level-1 rate proved here is the contraction of the level-2 occupation-measure rate along the mean map.

Full proof set Master

Proposition 1 (Chernoff upper bound on a half-space against ). Let in have for . For the half-space , .

Proof. Markov's inequality on the non-negative variable : $$ \mathbb{P}(Z_n\in H)=\mathbb{P}(\langle\lambda,Z_n\rangle\ge\langle\lambda,a\rangle)\le e^{-n\langle\lambda,a\rangle}\mathbb{E},e^{n\langle\lambda,Z_n\rangle}=e^{-n\langle\lambda,a\rangle+\Lambda_n(\lambda)}. $$ Take to get , then and give the claim.

Proposition 2 (compact upper bound). Under the hypotheses of Proposition 1 with , for every compact , $\limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le-\inf_K\Lambda^$.*

Proof. Fix . For each choose with , possible by definition of . The open half-spaces contain and cover ; extract a finite subcover . Each (a closed half-space with the constant shifted by ), so by Proposition 1 and the finite-union rule of 37.07.01, $$ \limsup_n\tfrac1n\log\mathbb{P}(Z_n\in K)\le-\min_{1\le i\le N}\big(\langle\lambda_{x_i},x_i\rangle-\Lambda(\lambda_{x_i})-\delta|\lambda_{x_i}|\big)\le-\big(\inf_K\Lambda^*-c\delta\big), $$ for a constant from the bounded selected tilts. Let .

Proposition 3 (exposed-point lower bound). Let be exposed for $\Lambda^\eta\in\operatorname{int}(\operatorname{dom}\Lambda)\nabla\Lambda(\eta)=xG\ni x\liminf_n\tfrac1n\log\mathbb{P}(Z_n\in G)\ge-\Lambda^(x)$.

Proof. Pick with and let . Define by (total mass ). Reversing the tilt, $$ \mathbb{P}(A_n)=\widetilde{\mathbb{E}}n\big[\mathbf 1{A_n}e^{-n\langle\eta,Z_n\rangle+\Lambda_n(\eta)}\big]\ge e^{-n\langle\eta,x\rangle-n|\eta|\delta+\Lambda_n(\eta)},\widetilde{\mathbb{P}}_n(A_n), $$ using on . Take ; the exponential prefactor tends to by Fenchel-Young equality at the exposed point 37.07.03. Under the scaled log moment generating function is , with gradient at and conjugate vanishing only at ; applying Proposition 2 under to the compact-in- pieces of gives , so . Hence ; let .

Proposition 4 (essential smoothness gives the full LDP). If is essentially smooth and lsc with , then every $x\in\operatorname{int}(\operatorname{dom}\Lambda^){Z_n}1/n\Lambda^$.

Proof. By steepness, has image all of : differentiability and strict convexity (on the relative interior) make injective, and steepness ( at the domain boundary) prevents the image from having a finite boundary inside , so the open image equals 37.07.03. For each such , the Fenchel-Young equality and strict convexity give the strict inequality defining exposedness with exposing tilt . Hence Proposition 3 applies at every interior , and since is good (lsc with compact sublevels, by as in 37.07.03) we have for open . Thus the lower bound reads , matching Proposition 2's upper bound extended to closed sets by exponential tightness 37.07.01. The two bounds are the LDP.

Connections Master

  • Gärtner-Ellis is the dependent-data generalisation of Cramér's theorem 37.07.02: the Chernoff upper bound and the exponential-tilting lower bound are imported wholesale, with the single-sample cumulant generating function replaced by the limiting , so the i.i.d. case where recovers Cramér exactly.

  • The conjugacy , the Fenchel-Young equality at exposed points, the essential-smoothness/steepness condition, and the goodness of are all the convex-duality results of 37.07.03; steepness there decides surjectivity of , which is precisely what turns the exposed-point lower bound into the full LDP here.

  • The Donsker-Varadhan variational formula 37.07.06 computes the limiting for a Markov additive functional as the log principal eigenvalue of the tilted generator, supplying the explicit input that Gärtner-Ellis conjugates; the level-1 rate proved here is the contraction of that unit's level-2 occupation-measure rate along the integration map .

  • Sanov's theorem 37.07.05 is the level-2 empirical-measure principle whose projective-limit machinery underlies abstract Gärtner-Ellis in function spaces; restricting Sanov to a bounded test function and contracting gives a scalar Gärtner-Ellis statement with the same limiting log moment generating function as input.

Historical & philosophical context Master

Jürgen Gärtner proved the dependent-data theorem in 1977 [Gärtner 1977] (Theory of Probability and its Applications 22, 24-39), studying large deviations from the invariant measure of a diffusion and isolating the convergence of the scaled logarithmic moment generating function as the only structural input needed. The formulation under the now-standard essential-smoothness hypothesis, with the exposed-point analysis controlling the lower bound, is due to Richard Ellis in 1984 [Ellis 1984] (Annals of Probability 12, 1-12), whence the theorem's joint attribution. The systematic textbook treatment, including the half-space cover for the upper bound and the steepness upgrade, is that of Dembo and Zeitouni [Dembo & Zeitouni §2.3].

The abstract sub-additive route that subsumes the theorem when is intractable was developed by Bahadur and Zabell in 1979 [Bahadur & Zabell 1979] (Annals of Probability 7, 587-621) for averages in general vector spaces, and extended to stationary mixing sequences by Bryc [Bryc 1990]. The Markov-additive-functional applications, where is the Perron-Frobenius eigenvalue of a tilted kernel, connect the theorem to the Donsker-Varadhan large-deviation theory of occupation measures developed across the same period; the identification of with a free energy and with an entropy places the non-differentiable case of in correspondence with first-order phase transitions, the reading emphasised in Ellis's monograph and in den Hollander's account [den Hollander §III.4].

Bibliography Master

@article{gartner1977large,
  author  = {G\"artner, J\"urgen},
  title   = {On large deviations from the invariant measure},
  journal = {Theory of Probability and its Applications},
  volume  = {22},
  number  = {1},
  pages   = {24--39},
  year    = {1977}
}

@article{ellis1984large,
  author  = {Ellis, Richard S.},
  title   = {Large deviations for a general class of random vectors},
  journal = {Annals of Probability},
  volume  = {12},
  number  = {1},
  pages   = {1--12},
  year    = {1984}
}

@article{bahadurzabell1979large,
  author  = {Bahadur, R. R. and Zabell, S. L.},
  title   = {Large deviations of the sample mean in general vector spaces},
  journal = {Annals of Probability},
  volume  = {7},
  number  = {4},
  pages   = {587--621},
  year    = {1979}
}

@article{bryc1990large,
  author  = {Bryc, W{\l}odzimierz},
  title   = {On the large deviation principle for stationary processes},
  journal = {Studia Mathematica},
  volume  = {95},
  number  = {2},
  pages   = {131--145},
  year    = {1990}
}

@book{dembozeitouni1998ldp,
  author    = {Dembo, Amir and Zeitouni, Ofer},
  title     = {Large Deviations Techniques and Applications},
  edition   = {2nd},
  series    = {Applications of Mathematics},
  number    = {38},
  publisher = {Springer},
  year      = {1998}
}

@book{denhollander2000large,
  author    = {den Hollander, Frank},
  title     = {Large Deviations},
  series    = {Fields Institute Monographs},
  number    = {14},
  publisher = {American Mathematical Society},
  year      = {2000}
}