The Legendre-Fenchel Transform and Convex Duality of Rate Functions
Anchor (Master): Dembo & Zeitouni 1998 *Large Deviations Techniques and Applications* 2nd ed. (Springer) §2.2-§2.3; Rockafellar 1970 *Convex Analysis* (Princeton) §12-§16, §23-§26; Hiriart-Urruty & Lemaréchal 2001 *Fundamentals of Convex Analysis* (Springer) Ch. E
Intuition Beginner
Imagine a function as a hill: at each location it reports a height . Most of the time you describe the hill by listing heights position by position. But there is a second, equally complete way to describe it: record, for every possible slope , the tangent line of that slope that touches the hill from below, and write down how far that line dips below zero. This second description, indexed by slope instead of position, is the conjugate function .
Why bother with a slope-based description? Because some questions are easier in slope language. In probability, the rare-event behaviour of an average is controlled by a quantity called the cumulant generating function, which is naturally a function of a "tilting" slope . The cost of seeing a rare value — the rate function — is its conjugate. Switching between the two is the whole game.
You have met a version of this before. In mechanics, the Legendre transform swaps velocity for momentum; in thermodynamics, it swaps entropy for temperature. Those used smooth, bowl-shaped functions where each slope occurs exactly once, so the swap is reversible and clean. The Legendre-Fenchel transform is the rugged-terrain version. It works even when the hill has flat stretches, sharp corners, or vertical cliffs, where a single slope might touch a whole flat segment, or where no tangent line of a given slope exists at all.
The defining recipe is a maximisation:
For each slope , you tilt the hill by subtracting the straight line , then ask how high the tilted hill rises. That highest value is the conjugate. No derivatives, no inverting — just a maximisation, which always makes sense even on rough terrain.
The miracle is that for nicely shaped (convex, downward-cupping-free) hills, doing this twice returns you exactly where you started: . The slope-description loses nothing. This round-trip is the source of "convex duality," and in probability it is the reason the rate function and the cumulant generating function are two faces of one object.
Visual Beginner
Figure: a convex curve with several tangent lines drawn beneath it. Each tangent line has a slope and crosses the vertical axis at some height $-f^(\lambda)$. As the slope is dialed up, the tangent rolls along the underside of the curve, and the family of all these supporting lines reconstructs the curve completely. A second small panel shows a curve with a flat segment: there, one single slope supports the whole flat stretch, and the conjugate develops a corner.*
f(x) conjugate viewpoint
| . every slope lambda picks
| . the tangent line that
| . touches f from below;
| .. its axis-crossing is -f*(lambda)
| .. \ tangent slope L
|.. \ slope L -> intercept -f*(L)
----+-----------\----- x ------------------------------
| \ a flat piece of f -> a corner of f*
| (intercept = -f*(L)) a corner of f -> a flat piece of f*
Worked example Beginner
Take the simplest interesting hill, a parabola . We compute its conjugate by the recipe and check the round-trip.
Step 1. Write down the tilted hill. For a fixed slope , form . We want the largest value of over all .
Step 2. Find the top of the tilted hill. The tilted hill is itself a downward parabola. Its peak sits where the slope of is zero. The slope of is , which is zero at .
Step 3. Read off the height. Plug back in:
So the conjugate of is — the same shape, now read in slope language. The parabola is its own conjugate.
Step 4. Round-trip check. Conjugating again, , which by the identical computation gives . We returned to the start.
What this tells us. The conjugate is genuinely a faithful re-encoding: nothing was lost, because the round-trip recovered exactly. For the parabola the two descriptions even look identical, but that is special. For a lopsided hill the conjugate has a different shape, yet the round-trip still lands back on whenever is convex.
Check your understanding Beginner
Formal definition Intermediate+
Let be a function, not identically , with effective domain . Such an is proper if and never takes the value . It is lower-semicontinuous (lsc) if is closed for every , equivalently if everywhere. It is convex if its epigraph is a convex subset of .
Definition (Fenchel conjugate). The Legendre-Fenchel transform (or convex conjugate) of is the function defined by
Here is the standard pairing. The supremum is over ; it may equal for some . As a pointwise supremum of the affine functions , the conjugate is always convex and lsc, regardless of any regularity of [Rockafellar §12].
The biconjugate is obtained by conjugating once more:
A convex function is closed if it is lsc; equivalently its epigraph is a closed convex set. (The only proper convex function that is lsc but whose epigraph would need separate handling is the constant , excluded by properness.)
Definition (essential smoothness and steepness). A proper convex is essentially smooth if the interior is non-empty, is differentiable there, and is steep: whenever . It is essentially strictly convex if it is strictly convex on every convex subset of . A closed proper convex is a Legendre function if it is both essentially smooth and essentially strictly convex; for such the gradient is a bijection of onto , with inverse , and the classical Legendre transform of 09.04.01 and 11.01.02 is recovered exactly [Rockafellar §26].
Counterexamples to common slips
- The conjugate of a non-convex is the conjugate of its convex envelope. If (non-convex), then coincides with the conjugate of the closed convex hull , and . Conjugation cannot detect the non-convex dimple; it always returns a convex function.
- A corner of becomes a flat piece of , and vice versa. For , the conjugate is the indicator for and otherwise — a flat plateau with a vertical wall. The non-differentiable corner of at corresponds to the flat segment in the domain of . Differentiability of one side is dual to strict convexity of the other.
- Steepness is not optional in the bijection. The function is smooth and strictly convex but not steep: stays bounded. Its conjugate has effective domain only , so is not onto . Steepness is exactly what forces .
Key theorem with proof Intermediate+
We prove the two structural pillars: the Fenchel-Young inequality, and the Fenchel-Moreau biconjugation theorem, the latter resting on the geometric Hahn-Banach separation of 02.11.02.
Theorem (Fenchel-Young inequality and biconjugation). Let be proper.
(i) (Fenchel-Young.) For all ,
with equality if and only if , the subdifferential of at .
(ii) (Fenchel-Moreau.) If in addition is convex and lsc, then .
Proof of (i). By the definition of the supremum, for every ,
which rearranges to . Equality holds precisely when attains the supremum, i.e. when is maximised at . That maximisation condition is for all , which is exactly the statement .
Proof of (ii). The inequality is immediate from (i): for every ,
since for each . The content is the reverse inequality .
Suppose, for contradiction, that at some . The epigraph is a closed convex subset of because is convex and lsc. The point lies strictly below , hence is not in it. The second geometric Hahn-Banach theorem 02.11.02 separates the compact convex point from the closed convex set strictly: there is a non-zero and with
Because ranges up to on the epigraph, . First take ; rescale so and set . Evaluating the right inequality at gives , i.e. for all , so . Then
a contradiction. The boundary case (a vertical separating hyperplane) forces for all ; combined with any for which (one exists because is proper, by an application of (i) at any subgradient point or a recession argument), the functionals drive as , so , again contradicting . Hence .
Bridge. This theorem builds toward the entire large-deviations dictionary and appears again in the analysis of every rate function. This is exactly the closed-convex involution that generalises the smooth, invertible Legendre transform of 09.04.01 and 11.01.02: where mechanics required to be a Legendre function so that inverts, biconjugation needs only closedness and convexity, recovering the smooth case when is essentially smooth and essentially strictly convex. The foundational reason the cumulant generating function and the rate function are interchangeable is that one is the Fenchel conjugate of the other, and putting these together with the Fenchel-Young equality condition pins down the optimal exponential tilt that realises a given rare value. The supporting-hyperplane step is dual to the subgradient existence guaranteed by Hahn-Banach separation 02.11.02.
Exercises Intermediate+
Advanced results Master
Conjugacy of subdifferentials and the duality of differentiability with strict convexity
For a closed proper convex , the Fenchel-Young equality condition promotes to an inversion of set-valued maps: if and only if , so as relations on . Single-valuedness of at (differentiability of ) corresponds to being constant on an exposed face (strict convexity of along that direction), and conversely. This is the precise sense in which differentiability of is dual to strict convexity of . When is a Legendre function in the sense of [Rockafellar §26], everywhere on , the inverse map is , and the smooth Legendre transform of mechanics 09.04.01 and thermodynamics 11.01.02 is recovered as the differentiable core of the general theory.
Cramér's theorem: the conjugate as rate function
Let be i.i.d. in with cumulant generating function , finite in a neighbourhood of the origin. Cramér's theorem [Cramér 1938] asserts that the empirical means satisfy a large deviation principle with good rate function : for closed and open ,
The upper bound is the Chernoff argument: for any , Markov's inequality on gives uniformly, and optimising over produces the conjugate . The lower bound uses the exponential change of measure , the tilted law, whose mean is ; the Fenchel-Young equality at the tilt solving identifies the cost. That is a good rate function is Exercise 7.
The Gärtner-Ellis theorem and steepness
When the are not i.i.d. but the scaled limit exists as a lsc convex function that is essentially smooth with , the Gärtner-Ellis theorem [Dembo & Zeitouni §2.3] gives the large deviation principle with rate function . Steepness is the load-bearing hypothesis: it guarantees that every in the interior of is exposed, i.e. realised as for some , so the lower bound's tilt exists. Without steepness, exposed points may be missing and the rate function can fail to be the conjugate at non-exposed boundary points — the same phenomenon visible in Exercise 8, now with probabilistic teeth.
Varadhan's lemma and the contraction principle
Two consequences extend the dictionary. Varadhan's lemma evaluates exponential integrals: if satisfies an LDP with good rate and is continuous with a mild tail condition, then , a Fenchel-type variational formula generalising the Laplace method. The contraction principle transports an LDP through a continuous map : the pushed-forward rate function is , an infimal projection that is conjugacy-compatible — conjugation turns infimal projections into restrictions, so in the linear case. The Legendre-Fenchel transform threads through all of these as the operation converting the multiplicative, moment-side description into the additive, cost-side description.
Synthesis. The Legendre-Fenchel transform is exactly the involution that generalises the smooth Legendre transform of mechanics 09.04.01 and thermodynamics 11.01.02 to the closed-convex category, and the central insight of large-deviation theory is that the cumulant generating function and the rate function are a conjugate pair. The foundational reason Cramér's theorem holds is that the Chernoff upper bound optimised over exponential tilts is the supremum defining , while the lower bound's change of measure realises the Fenchel-Young equality at the optimal tilt; putting these together, steepness and essential smoothness control exactly which points are exposed and hence where the conjugate is sharp, the same regularity that makes the mechanical transform a diffeomorphism. The construction is dual to the smooth case in the literal Fenchel sense, appears again in statistical mechanics where is the entropy and the free energy, and builds toward the full Cramér 37.07.01, Sanov, and Gärtner-Ellis 37.07.04 apparatus. The bridge is biconjugation: on the convex closure is what lets one pass freely between the rare-event cost and the moment generating data.
Full proof set Master
Proposition 1 (conjugacy inverts subdifferentials). Let be closed proper convex. Then $\lambda \in \partial f(x) \iff x \in \partial f^(\lambda) \iff f(x) + f^(\lambda) = \langle\lambda, x\rangle$.
Proof. The Fenchel-Young theorem (Key theorem, part (i)) gives . By Fenchel-Moreau (part (ii)), , so applying the same equivalence to in place of yields . The three conditions coincide.
Proposition 2 ( is convex and is a good rate function). Let be an -valued random vector with for in an open neighbourhood of . Then is convex on , , and $\Lambda^$ is non-negative, lsc, convex, with compact sublevel sets.*
Proof. Convexity is the Hölder argument of Exercise 6 applied coordinatewise to the pairing : for and , by Hölder with exponents ; take logarithms. . Non-negativity of : . Convexity and lower-semicontinuity of hold for any conjugate (Exercise 5). For compact sublevels, pick with the closed ball and set (a maximum of a continuous convex function on a compact set). For each choose (for ): . Hence , bounded; closed by lsc; therefore compact.
Proposition 3 (Chernoff upper bound). With as above and for i.i.d. copies, for any closed half-space with in the domain of ,
Proof. By Markov's inequality applied to the non-negative variable ,
using independence in the last step. Taking gives . Optimising the bound over admissible and over half-spaces containing a closed set recovers , the Cramér upper bound.
Connections Master
The smooth, invertible Legendre transform of Lagrangian-to-Hamiltonian mechanics
09.04.01is the special case of this transform restricted to essentially smooth, essentially strictly convex (Legendre) functions, where is a diffeomorphism and the supremum is attained at the unique stationary point; the general lsc-convex theory developed here is what that unit's "Legendre-Fenchel" coda gestures at without proving.The thermodynamic potentials of
11.01.02— Helmholtz and Gibbs free energies, enthalpy — are Legendre-Fenchel conjugates of the internal energy, and the convex-duality structure here is exactly the Legendre-duality between entropy and free energy that statistical mechanics makes precise; the rate function of large deviations is the thermodynamic entropy and the scaled free energy.The biconjugation theorem rests on the geometric Hahn-Banach separation of a point from the closed convex epigraph proved in
02.11.02; that unit's subdifferential and Fenchel-Moreau material is the functional-analytic engine, specialised here from a general normed space to with the additional probabilistic payload.Cramér's theorem
37.07.01takes the conjugate constructed here as its rate function, and the Gärtner-Ellis theorem37.07.04extends the identification to non-i.i.d. arrays under the steepness hypothesis whose meaning is pinned down in this unit's essential-smoothness discussion.
Historical & philosophical context Master
The one-variable Legendre transformation dates to Legendre's work on differential equations and the theory of surfaces in the 1780s, and entered mechanics through Hamilton (1834) and thermodynamics through Massieu (1869) and Gibbs (1873). Its convex-analytic generalisation is due to Werner Fenchel, who in On conjugate convex functions [Fenchel 1949] (Canadian Journal of Mathematics 1, 73-77) defined the conjugate of a general convex function via the supremum formula and proved the biconjugation duality without any smoothness hypothesis, replacing the requirement that be invertible by supporting-hyperplane arguments. Jean-Jacques Moreau extended the theory to infinite-dimensional spaces in the early 1960s, and R. Tyrrell Rockafellar's Convex Analysis [Rockafellar §12, §26] (Princeton, 1970) gave the now-standard systematic account, including the Legendre-function characterisation that locates the classical smooth transform inside the general framework.
The probabilistic significance traces to Harald Cramér's 1938 paper [Cramér 1938] (Actualités Scientifiques et Industrielles 736), which computed the exponential decay rate of large deviations of sums of i.i.d. variables and identified the rate as the conjugate of the cumulant generating function, although Cramér worked under analytic-density assumptions later removed. The general formulation as a large deviation principle is due to Varadhan (1966) and was placed in its convex-duality setting by Lanford, Ellis, Gärtner, and others through the 1970s; Dembo and Zeitouni [Dembo & Zeitouni §2.2] systematised the i.i.d. and Gärtner-Ellis routes. The identification of with thermodynamic entropy and with free energy was made explicit by Ellis (1985) and recast for physicists by Touchette (2009).
Bibliography Master
@article{fenchel1949conjugate,
author = {Fenchel, Werner},
title = {On conjugate convex functions},
journal = {Canadian Journal of Mathematics},
volume = {1},
pages = {73--77},
year = {1949}
}
@book{rockafellar1970convex,
author = {Rockafellar, R. Tyrrell},
title = {Convex Analysis},
series = {Princeton Mathematical Series},
number = {28},
publisher = {Princeton University Press},
year = {1970}
}
@book{dembozeitouni1998ldp,
author = {Dembo, Amir and Zeitouni, Ofer},
title = {Large Deviations Techniques and Applications},
edition = {2nd},
series = {Applications of Mathematics},
number = {38},
publisher = {Springer},
year = {1998}
}
@article{cramer1938nouveau,
author = {Cram\'er, Harald},
title = {Sur un nouveau th\'eor\`eme-limite de la th\'eorie des probabilit\'es},
journal = {Actualit\'es Scientifiques et Industrielles},
volume = {736},
pages = {5--23},
year = {1938}
}
@book{ellis1985entropy,
author = {Ellis, Richard S.},
title = {Entropy, Large Deviations, and Statistical Mechanics},
series = {Grundlehren der mathematischen Wissenschaften},
number = {271},
publisher = {Springer},
year = {1985}
}
@article{touchette2009large,
author = {Touchette, Hugo},
title = {The large deviation approach to statistical mechanics},
journal = {Physics Reports},
volume = {478},
number = {1--3},
pages = {1--69},
year = {2009}
}
@book{hiriarturruty2001fundamentals,
author = {Hiriart-Urruty, Jean-Baptiste and Lemar\'echal, Claude},
title = {Fundamentals of Convex Analysis},
series = {Grundlehren Text Editions},
publisher = {Springer},
year = {2001}
}