Characteristic Functions, Inversion, and the Lévy Continuity Theorem
Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.3-3.4; Kallenberg, Foundations of Modern Probability 2e Ch. 5; Feller II §XV-XVII; Lukacs, Characteristic Functions 2e; Bochner, Harmonic Analysis and the Theory of Probability Ch. 3-4
Intuition Beginner
Every probability distribution carries a kind of fingerprint. Instead of describing the distribution by its density or by the chance of landing in each interval, you describe it by how it responds to being wrapped around a circle at every possible speed. Spin the values of a random quantity around a clock face at speed , take the average position of the resulting cloud of points, and record that average. Doing this for every speed produces a single function — the characteristic function — and that function pins down the distribution completely.
The reason this fingerprint is worth the trouble is that it turns hard operations on distributions into easy operations on functions. Adding two independent random quantities is a messy smearing operation on densities, but their fingerprints simply multiply. The bell-shaped normal law has an especially clean fingerprint, and that cleanliness is exactly why sums of many small independent contributions tend toward a bell curve: when you multiply many fingerprints together and rescale, the product slides toward the normal fingerprint.
The deepest payoff is a translation rule for convergence. A sequence of distributions settling down to a limit is hard to check directly, but it corresponds to their fingerprints settling down pointwise. As long as the limiting fingerprint is well-behaved right at speed zero, pointwise settling of the fingerprints guarantees the distributions themselves are converging. This dictionary between fingerprints and distributions is the engine behind nearly every classical limit theorem.
The one-sentence takeaway: the characteristic function is an invertible fingerprint of a distribution that converts convolution into multiplication and converts convergence of distributions into pointwise convergence of functions.
Visual Beginner
Picture the values of a random quantity scattered along a line. Now wrap that line around a unit circle at a chosen winding speed: each value lands at some angle, becoming a point on the circle. The characteristic function at that speed is the center of mass of all those circle points, a single dot somewhere inside the disk. When the values are tightly clustered, the dots land close together and their center of mass sits near the rim. When the values are spread out, the dots smear all the way around the circle and their center of mass collapses toward the middle.
At winding speed zero every value maps to the same point, so the center of mass sits exactly on the rim and the characteristic function equals one. As the speed grows, faster winding spreads the dots and pulls the center of mass inward, which is why the fingerprint starts at one and generally fades. How fast it fades records how spread out the distribution is.
Worked example Beginner
Take a fair coin scored as plus one for heads and minus one for tails, each with chance one half. We compute its fingerprint as a function of winding speed .
Step 1. Wrapping a value around the circle at speed sends it to the point with angle , whose horizontal and vertical coordinates are the cosine and sine of . The fingerprint at speed is the average of these circle points over the distribution.
Step 2. Our value is plus one with chance one half and minus one with chance one half. So the average circle point has horizontal coordinate equal to one half times the cosine of plus one half times the cosine of . Because cosine is symmetric, the cosine of equals the cosine of , and the two halves combine to give exactly the cosine of .
Step 3. The vertical coordinate is one half times the sine of plus one half times the sine of . Because sine is antisymmetric, the sine of is the negative of the sine of , and the two halves cancel to zero. So the fingerprint is purely horizontal: it equals the cosine of .
Step 4. Check the speed-zero value: the cosine of zero is one, matching the rule that every fingerprint starts at one. Check a quarter turn: at the speed where equals a quarter of the full turn the cosine is zero, meaning the two circle points sit at opposite ends of a diameter and their center of mass is the origin.
Step 5. What this tells us: the symmetry of the coin shows up as a real, even fingerprint — no vertical part at all. A lopsided coin would tilt the average off the horizontal axis, producing a vertical component, so the imaginary part of a fingerprint measures the asymmetry of a distribution. The whole distribution is recoverable from the single function the cosine of , which is the simplest instance of the inversion idea developed later.
Check your understanding Beginner
Formal definition Intermediate+
Let be a Borel probability measure on and let be a random vector with law , defined on a probability space [from 26.02.01]. The characteristic function of (equivalently, of ) is the function defined by $$ \varphi_X(t) = \mathbb{E}!\left[ e^{i, t \cdot X} \right] = \int_{\mathbb{R}^n} e^{i, t \cdot x} , d\mu(x), \qquad t \in \mathbb{R}^n, $$ where is the Euclidean inner product. The integrand has modulus one and is a probability measure, so the integral converges absolutely for every with . Up to the sign in the exponent and the absence of the normalisation, is the Fourier transform of the measure [from 02.10.04]; the probabilistic convention places (no ) in the exponent so that derivatives at the origin read off moments cleanly.
The expected-value integral is the one introduced for random vectors in 26.03.01, applied to the bounded complex-valued function .
Basic properties. For random vectors , a matrix , and :
- Normalisation and bound. and for all .
- Hermitian symmetry. . In particular is real-valued iff and have the same law (symmetric ).
- Affine maps. .
- Independence. If and are independent then ; the law of is the convolution , and the characteristic function turns convolution into pointwise product.
- Uniform continuity. is uniformly continuous on .
- Positive-definiteness. For all and , .
Definition (positive-definite function). A function is positive-definite when for every finite collection the Hermitian matrix is positive semidefinite. Every characteristic function is continuous, positive-definite, and equal to one at the origin; Bochner's theorem asserts the converse, identifying characteristic functions with exactly the continuous positive-definite functions normalised at the origin.
Definition (moments via derivatives). If then and the mixed partial derivatives at the origin recover the moments: for a multi-index with , $$ \partial^\alpha \varphi_X(0) = i^{|\alpha|}, \mathbb{E}!\left[ X^\alpha \right], \qquad X^\alpha = X_1^{\alpha_1} \cdots X_n^{\alpha_n}. $$ In one dimension this reads , so and .
Counterexamples to common slips Intermediate+
- Pointwise convergence alone is not enough. The continuity-at-zero hypothesis in the Lévy theorem is essential. The characteristic functions of the centred Gaussian with variance converge pointwise to the function equal to one at and zero elsewhere. That limit is discontinuous at the origin, the mass escapes to infinity, and there is no limiting probability measure. Without continuity at zero the conclusion fails.
- Differentiability of does not guarantee a finite mean. The implication "" is one-directional in general. There exist distributions with no finite mean whose characteristic function is differentiable at the origin; the clean converse holds for even orders, where finite does force .
- Real characteristic function does not mean real random variable. is real exactly when is symmetric about the origin, not when takes real values (it always does). The fair-coin fingerprint is real because the law is symmetric.
- A modulus-one value off the origin signals a lattice. If for some , then is supported on a coset of the lattice . Continuous distributions have for all .
Key theorem with proof Intermediate+
Theorem (Lévy inversion and the continuity theorem). Let be Borel probability measures on with characteristic functions .
(Inversion.) For any that are continuity points of (i.e. ), $$ \mu\big((a, b)\big) = \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^{T} \frac{e^{-i t a} - e^{-i t b}}{i t}, \varphi(t), dt. $$ Consequently determines uniquely.
(Continuity.) If for every , then (weak convergence). Conversely, if exists for every and is continuous at , then is the characteristic function of a probability measure and .
Proof. Inversion. Fix and define . Substitute and apply Fubini [from 02.10.04] — legitimate because the integrand is bounded by uniformly in , so the double integrand is integrable on : $$ I_T = \int_{\mathbb{R}} \left( \frac{1}{2\pi} \int_{-T}^{T} \frac{e^{it(x - a)} - e^{it(x - b)}}{it}, dt \right) d\mu(x). $$ The inner integral is evaluated using the Dirichlet kernel: with , the bracket equals after combining the real parts (the imaginary parts cancel by oddness). As , , so the bracket tends to the function equal to for , equal to at , and equal to otherwise. The functions are uniformly bounded (the sine integral is bounded), so by bounded convergence . When are continuity points the boundary term vanishes, giving the stated formula. Uniqueness follows: two measures with the same assign the same value to every interval with continuity-point endpoints, and such intervals generate the Borel -algebra, so the measures coincide.
Continuity, forward direction. Suppose pointwise, with the characteristic function of . Weak convergence is equivalent to for all bounded continuous . It suffices to test against the dense class of functions and combinations dense in , but the clean route is through tightness. The family is tight: continuity of at , together with the bound (proved below) , forces uniform control of the tails because the right side tends to as uniformly in (dominated convergence using and with continuous at ). By Prokhorov's theorem tightness yields a weakly convergent subsequence ; its characteristic function is , so by uniqueness. Every subsequence has a further subsequence converging to the same , hence .
Continuity, converse. Assume only that exists pointwise and is continuous at . The tail bound above uses only , and continuity of at with gives, by dominated convergence, as . So is tight, Prokhorov supplies a subsequential weak limit with characteristic function , and is therefore the characteristic function of a probability measure. Uniqueness of the subsequential limit forces .
The tail bound. For any probability measure on with characteristic function and any , Fubini gives $$ \frac1u \int_{-u}^{u}\big(1 - \mathrm{Re},\varphi(t)\big), dt = \int_{\mathbb{R}} \left( \frac1u \int_{-u}^{u}(1 - \cos(tx)), dt \right) d\rho(x) = \int_{\mathbb{R}} 2\Big(1 - \frac{\sin(ux)}{ux}\Big), d\rho(x). $$ Since for all and whenever , the integrand is at least , yielding , which is the bound used above.
Bridge. This theorem builds toward the central limit theorem and the entire theory of infinitely divisible laws, and the same inversion-plus-continuity mechanism appears again in the Lévy-Khinchine representation and in the proof of the law of large numbers via characteristic functions. The foundational reason the continuity theorem holds is that the characteristic transform is injective and bicontinuous between two topologies — weak convergence of measures and pointwise convergence of fingerprints — once tail mass is prevented from escaping; this is exactly the role tightness plays, converting pointwise control of into uniform control of the measures. The continuity theorem generalises the inversion formula from a single fixed measure to a whole convergent net, and the bridge is that Bochner's theorem completes the circle: it tells us precisely which limit functions can arise, namely the continuous positive-definite ones, so that the continuity-at-zero hypothesis is not a technical patch but the exact boundary between fingerprints that come from probability measures and those that do not.
Exercises Intermediate+
Advanced results Master
The characteristic transform is a homeomorphism from the space of probability measures on , under weak convergence, onto the set of continuous positive-definite functions normalised at the origin, under uniform convergence on compacts. Bochner's theorem is the surjectivity half: a function is the characteristic function of some probability measure iff is continuous, positive-definite, and satisfies . The proof recovers the measure by an inversion-with-regularisation argument — multiply by a Gaussian damping , invert the now-integrable product to obtain a non-negative density (non-negativity is exactly positive-definiteness), and pass using the continuity theorem. This is the finite-dimensional model whose infinite-dimensional generalisation is the Bochner-Minlos theorem on nuclear spaces [from 02.10.06]: there the dual of a nuclear space replaces , the continuity hypothesis is upgraded to continuity in the nuclear topology, and the resulting measure lives on the topological dual.
Moment expansions sharpen the inversion-continuity dictionary into rates. If then $$ \varphi_X(t) = \sum_{j=0}^{k} \frac{(it)^j}{j!}, \mathbb{E}[X^j] + o(|t|^k), \qquad t \to 0, $$ and the error term is controlled by . This Taylor control at the origin is what turns the central limit theorem into a quantitative statement: standardising and raising to the -th power, the quadratic term survives and the cubic and higher terms wash out at rate , the Berry-Esseen rate when a third moment is present.
The continuity theorem upgrades to a metric statement. The Lévy metric and the bounded-Lipschitz metric both metrise weak convergence, and the characteristic-function distance , combined with a tail bound from a smoothing inequality, dominates the Kolmogorov distance between distribution functions — the Esseen smoothing inequality: for distributions with having bounded density , $$ \sup_x |F(x) - G(x)| \le \frac{1}{\pi}\int_{-T}^{T} \left| \frac{\varphi_F(t) - \varphi_G(t)}{t} \right| dt + \frac{24, m}{\pi T}. $$ This converts pointwise closeness of characteristic functions on a finite window into uniform closeness of distribution functions, the analytic core of the Berry-Esseen theorem.
Stability and infinite divisibility are read directly off the fingerprint. A law is infinitely divisible iff for every its characteristic function has an -th root that is again a characteristic function, and the Lévy-Khinchine formula gives the universal form with the Lévy measure. The Gaussian () and the Poisson (, atomic) are the extreme cases; the whole class is the set of possible limits of triangular arrays of small independent summands, characterised entirely through the continuity theorem applied to products of fingerprints.
Synthesis. The foundational reason this transform organises classical probability is that it linearises convolution: the messy operation of adding independent laws becomes multiplication of fingerprints, and the central insight is that multiplication is analysable through its behaviour near the origin, where moments live. Putting these together, inversion makes the transform injective, Bochner's theorem makes its range exactly the continuous positive-definite functions, and the continuity theorem makes it bicontinuous once tightness blocks the escape of mass — three facts that are dual to one another, since injectivity plus surjectivity onto a closed set plus bicontinuity is precisely a homeomorphism. This is exactly why the central limit theorem is a one-line computation in the transform domain: the standardised sum's fingerprint is a power of a fingerprint expanded near zero, the quadratic Gaussian term survives, and the continuity theorem reads the surviving exponential back as the normal law. The same machinery generalises upward to the Lévy-Khinchine classification of all limit laws and appears again in the nuclear-space Bochner-Minlos theorem, where the finite-dimensional positive-definiteness condition becomes the seed of a measure on an infinite-dimensional dual.
Full proof set Master
The inversion formula, the uniqueness corollary, the continuity theorem in both directions, and the tail bound are proved in full in the Key theorem section. The remaining Master claims are recorded here.
Proposition (uniform continuity and positive-definiteness). Every characteristic function on is uniformly continuous and positive-definite.
Proof. For uniform continuity, estimate , a bound independent of . As the integrand tends to pointwise and is dominated by , so dominated convergence gives , which is uniform continuity. For positive-definiteness, given and , $$ \sum_{j,k} c_j \overline{c_k}, \varphi_X(t_j - t_k) = \mathbb{E}\Big[\sum_{j,k} c_j \overline{c_k}, e^{i(t_j - t_k)\cdot X}\Big] = \mathbb{E}\Big[\Big| \sum_j c_j, e^{i t_j \cdot X}\Big|^2\Big] \ge 0. \qquad \square $$
Proposition (moments from derivatives). If in dimension one, then and .
Proof. The function has -th -derivative with modulus , integrable for by hypothesis. Differentiation under the expectation is justified times by dominated convergence with dominating functions , giving . Continuity of each follows from the previous proposition applied to the finite measure . Setting yields .
Proposition (Bochner, regularised-inversion proof). A continuous positive-definite with is the characteristic function of a probability measure on .
Proof. For set , which is positive-definite (a product of positive-definite functions, by the Schur product theorem) and integrable. Define . Positive-definiteness of makes : writing the inverse-transform as a limit of Riemann sums and passing to the limit. Total mass is in the limit, so each is (after the normalisation that the Gaussian convolution supplies) a probability density, and its characteristic function is . As these fingerprints converge pointwise to , which is continuous at ; by the Lévy continuity theorem the measures converge weakly to a probability measure with characteristic function .
Proposition (lattice characterisation). If for some , then for some .
Proof. Write . Then , so . The integrand is non-negative and vanishes only where , so that event has probability one.
Connections Master
The Fourier transform on 02.10.04 is the analytic engine underneath this unit. The characteristic function is the Fourier transform of the law under the probabilistic sign-and-normalisation convention, the inversion formula is the measure-valued Fourier inversion theorem, and the Plancherel isometry of that unit reappears here as the Parseval identity relating to the energy of an absolutely continuous law; the smoothing-and-regularisation techniques in the Bochner proof are the Gaussian approximate-identity arguments from the Fourier unit.
The Bochner-Minlos theorem and characteristic functionals on nuclear spaces 02.10.06 are the infinite-dimensional continuation of the finite-dimensional Bochner theorem proved here. The positive-definiteness condition that singles out characteristic functions on becomes, on a nuclear space, the condition that singles out characteristic functionals of measures on the topological dual; this unit is the classical companion the dispatcher named, and the continuity-at-the-origin hypothesis here is the seed of the nuclear-continuity hypothesis there.
Probability theory: rules and distributions 26.02.01 supplies the measure-theoretic foundation — probability spaces, laws, independence, and convolution — on which the characteristic-function calculus is built. The independence property is the transform-side image of the convolution-of-laws rule defined there, and the affine and symmetry properties of are read directly off the distribution-transformation rules of that unit.
Random variables and expected value 26.03.01 provides the expectation integral that defines and the moment functionals recovered by differentiating at the origin; the dominated-convergence justifications for differentiating under the expectation and for the uniform-continuity estimate all run on the integration theory established there.
Historical & philosophical context Master
The characteristic function entered probability through Laplace and Cauchy in the early nineteenth century as the Fourier transform of a density, but its decisive use as a tool for limit theorems is due to Paul Lévy, whose 1925 Calcul des probabilités [Lévy 1925] introduced the continuity theorem in essentially its modern form and made the characteristic function the central instrument of the subject. Lévy's 1937 Théorie de l'addition des variables aléatoires [Lévy 1937] developed the connection to infinitely divisible laws that became the Lévy-Khinchine formula. The abstract characterisation of which functions can be characteristic functions was settled by Salomon Bochner, whose 1932 Vorlesungen über Fouriersche Integrale [Bochner 1932] proved that the continuous positive-definite functions are exactly the Fourier-Stieltjes transforms of finite measures. The uniqueness of factorisation results — for instance Cramér's 1936 theorem [Cramér 1936] that a sum of independent variables is normal only if each summand is normal — were among the first deep applications of the inversion machinery.
The tightness step that completes the continuity theorem was placed on a general topological footing by Yuri Prokhorov in 1956 [Prokhorov 1956], whose theorem identifies relative compactness in the weak topology with tightness and supplies the subsequential limits the proof requires. The conceptual content is that two genuinely different descriptions of a distribution — the measure itself and its transform — carry the same information and the same notion of convergence, with positive-definiteness marking the exact image of the transform and continuity at the origin marking the exact condition under which a limit of transforms is again a transform.
Bibliography Master
@book{levy1925,
author = {L\'evy, Paul},
title = {Calcul des probabilit\'es},
publisher = {Gauthier-Villars, Paris},
year = {1925}
}
@book{levy1937,
author = {L\'evy, Paul},
title = {Th\'eorie de l'addition des variables al\'eatoires},
publisher = {Gauthier-Villars, Paris},
year = {1937}
}
@book{bochner1932,
author = {Bochner, Salomon},
title = {Vorlesungen \"uber Fouriersche Integrale},
publisher = {Akademische Verlagsgesellschaft, Leipzig},
year = {1932}
}
@article{cramer1936,
author = {Cram\'er, Harald},
title = {\"Uber eine Eigenschaft der normalen Verteilungsfunktion},
journal = {Mathematische Zeitschrift},
volume = {41},
pages = {405--414},
year = {1936}
}
@article{prokhorov1956,
author = {Prokhorov, Yuri V.},
title = {Convergence of random processes and limit theorems in probability theory},
journal = {Theory of Probability and Its Applications},
volume = {1},
number = {2},
pages = {157--214},
year = {1956}
}
@book{durrett2019,
author = {Durrett, Rick},
title = {Probability: Theory and Examples},
edition = {5th},
publisher = {Cambridge University Press},
year = {2019}
}
@book{kallenberg2002,
author = {Kallenberg, Olav},
title = {Foundations of Modern Probability},
edition = {2nd},
publisher = {Springer-Verlag, New York},
year = {2002}
}