37.03.01 · probability / 03-clt-characteristic-functions

Characteristic Functions, Inversion, and the Lévy Continuity Theorem

shipped3 tiersLean: none

Anchor (Master): Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.3-3.4; Kallenberg, Foundations of Modern Probability 2e Ch. 5; Feller II §XV-XVII; Lukacs, Characteristic Functions 2e; Bochner, Harmonic Analysis and the Theory of Probability Ch. 3-4

Intuition Beginner

Every probability distribution carries a kind of fingerprint. Instead of describing the distribution by its density or by the chance of landing in each interval, you describe it by how it responds to being wrapped around a circle at every possible speed. Spin the values of a random quantity around a clock face at speed $t$ , take the average position of the resulting cloud of points, and record that average. Doing this for every speed $t$ produces a single function — the characteristic function — and that function pins down the distribution completely.

The reason this fingerprint is worth the trouble is that it turns hard operations on distributions into easy operations on functions. Adding two independent random quantities is a messy smearing operation on densities, but their fingerprints simply multiply. The bell-shaped normal law has an especially clean fingerprint, and that cleanliness is exactly why sums of many small independent contributions tend toward a bell curve: when you multiply many fingerprints together and rescale, the product slides toward the normal fingerprint.

The deepest payoff is a translation rule for convergence. A sequence of distributions settling down to a limit is hard to check directly, but it corresponds to their fingerprints settling down pointwise. As long as the limiting fingerprint is well-behaved right at speed zero, pointwise settling of the fingerprints guarantees the distributions themselves are converging. This dictionary between fingerprints and distributions is the engine behind nearly every classical limit theorem.

The one-sentence takeaway: the characteristic function is an invertible fingerprint of a distribution that converts convolution into multiplication and converts convergence of distributions into pointwise convergence of functions.

Visual Beginner

Picture the values of a random quantity scattered along a line. Now wrap that line around a unit circle at a chosen winding speed: each value lands at some angle, becoming a point on the circle. The characteristic function at that speed is the center of mass of all those circle points, a single dot somewhere inside the disk. When the values are tightly clustered, the dots land close together and their center of mass sits near the rim. When the values are spread out, the dots smear all the way around the circle and their center of mass collapses toward the middle.

At winding speed zero every value maps to the same point, so the center of mass sits exactly on the rim and the characteristic function equals one. As the speed grows, faster winding spreads the dots and pulls the center of mass inward, which is why the fingerprint starts at one and generally fades. How fast it fades records how spread out the distribution is.

Worked example Beginner

Take a fair coin scored as plus one for heads and minus one for tails, each with chance one half. We compute its fingerprint as a function of winding speed $t$ .

Step 1. Wrapping a value $x$ around the circle at speed $t$ sends it to the point with angle $t x$ , whose horizontal and vertical coordinates are the cosine and sine of $t x$ . The fingerprint at speed $t$ is the average of these circle points over the distribution.

Step 2. Our value is plus one with chance one half and minus one with chance one half. So the average circle point has horizontal coordinate equal to one half times the cosine of $t$ plus one half times the cosine of $- t$ . Because cosine is symmetric, the cosine of $- t$ equals the cosine of $t$ , and the two halves combine to give exactly the cosine of $t$ .

Step 3. The vertical coordinate is one half times the sine of $t$ plus one half times the sine of $- t$ . Because sine is antisymmetric, the sine of $- t$ is the negative of the sine of $t$ , and the two halves cancel to zero. So the fingerprint is purely horizontal: it equals the cosine of $t$ .

Step 4. Check the speed-zero value: the cosine of zero is one, matching the rule that every fingerprint starts at one. Check a quarter turn: at the speed where $t$ equals a quarter of the full turn the cosine is zero, meaning the two circle points sit at opposite ends of a diameter and their center of mass is the origin.

Step 5. What this tells us: the symmetry of the coin shows up as a real, even fingerprint — no vertical part at all. A lopsided coin would tilt the average off the horizontal axis, producing a vertical component, so the imaginary part of a fingerprint measures the asymmetry of a distribution. The whole distribution is recoverable from the single function the cosine of $t$ , which is the simplest instance of the inversion idea developed later.

Check your understanding Beginner

Exercise (easy, multiple choice).

What is the value of every characteristic function at winding speed zero?

A. Zero B. One C. The mean of the distribution D. The variance of the distribution

Hint

At speed zero every value maps to the same point on the circle, so the center of mass sits exactly there.

Answer

B. One. At winding speed zero each value $x$ maps to angle zero, the point at the rim with horizontal coordinate one and vertical coordinate zero. Averaging identical points gives that same point, so the characteristic function equals one. Feedback-correct: the speed-zero value is always one because the total probability is one. Feedback-wrong: A would require the points to average to the origin, which happens at certain nonzero speeds, not at zero; C and D are read off from the slope and curvature of the fingerprint near zero, not from its value at zero.

Formal definition Intermediate+

Let $μ$ be a Borel probability measure on $R^{n}$ and let $X$ be a random vector with law $μ$ , defined on a probability space $(Ω, F, P)$ [from 26.02.01]. The characteristic function of $μ$ (equivalently, of $X$ ) is the function $φ_{X} : R^{n} \to C$ defined by $$ \varphi_X(t) = \mathbb{E}!\left[ e^{i, t \cdot X} \right] = \int_{\mathbb{R}^n} e^{i, t \cdot x} , d\mu(x), \qquad t \in \mathbb{R}^n, $$ where $t \cdot x = \sum_{j = 1}^{n} t_{j} x_{j}$ is the Euclidean inner product. The integrand has modulus one and $μ$ is a probability measure, so the integral converges absolutely for every $t$ with $∣ φ_{X} (t) ∣ \leq 1$ . Up to the sign in the exponent and the absence of the $2 π$ normalisation, $φ_{X}$ is the Fourier transform of the measure $μ$ [from 02.10.04]; the probabilistic convention places $i t \cdot x$ (no $2 π$ ) in the exponent so that derivatives at the origin read off moments cleanly.

The expected-value integral is the one introduced for random vectors in 26.03.01, applied to the bounded complex-valued function $x \mapsto e^{i t \cdot x}$ .

Basic properties. For random vectors $X, Y$ , a matrix $A \in R^{m \times n}$ , and $b \in R^{m}$ :

Normalisation and bound. $φ_{X} (0) = 1$ and $∣ φ_{X} (t) ∣ \leq 1$ for all $t$ .
Hermitian symmetry. $φ_{X} (- t) = \overline{φ_{X} (t)}$ . In particular $φ_{X}$ is real-valued iff $X$ and $- X$ have the same law (symmetric $μ$ ).
Affine maps. $φ_{A X + b} (t) = e^{i t \cdot b} φ_{X} (A^{T} t)$ .
Independence. If $X$ and $Y$ are independent then $φ_{X + Y} (t) = φ_{X} (t) φ_{Y} (t)$ ; the law of $X + Y$ is the convolution $μ_{X} * μ_{Y}$ , and the characteristic function turns convolution into pointwise product.
Uniform continuity. $φ_{X}$ is uniformly continuous on $R^{n}$ .
Positive-definiteness. For all $t_{1}, \dots, t_{m} \in R^{n}$ and $c_{1}, \dots, c_{m} \in C$ , $\sum_{j, k} c_{j} \overline{c_{k}} φ_{X} (t_{j} - t_{k}) \geq 0$ .

Definition (positive-definite function). A function $φ : R^{n} \to C$ is positive-definite when for every finite collection $t_{1}, \dots, t_{m} \in R^{n}$ the Hermitian matrix $[φ (t_{j} - t_{k})]_{j, k = 1}^{m}$ is positive semidefinite. Every characteristic function is continuous, positive-definite, and equal to one at the origin; Bochner's theorem asserts the converse, identifying characteristic functions with exactly the continuous positive-definite functions normalised at the origin.

Definition (moments via derivatives). If $E [∣ X ∣^{k}] < \infty$ then $φ_{X} \in C^{k} (R^{n})$ and the mixed partial derivatives at the origin recover the moments: for a multi-index $α$ with $∣ α ∣ \leq k$ , $$ \partial^\alpha \varphi_X(0) = i^{|\alpha|}, \mathbb{E}!\left[ X^\alpha \right], \qquad X^\alpha = X_1^{\alpha_1} \cdots X_n^{\alpha_n}. $$ In one dimension this reads $φ_{X}^{(k)} (0) = i^{k} E [X^{k}]$ , so $E [X] = - i φ_{X}^{'} (0)$ and $E [X^{2}] = - φ_{X}^{''} (0)$ .

Counterexamples to common slips Intermediate+

Pointwise convergence alone is not enough. The continuity-at-zero hypothesis in the Lévy theorem is essential. The characteristic functions $φ_{n} (t) = e^{- n t^{2} /2}$ of the centred Gaussian with variance $n$ converge pointwise to the function equal to one at $t = 0$ and zero elsewhere. That limit is discontinuous at the origin, the mass escapes to infinity, and there is no limiting probability measure. Without continuity at zero the conclusion fails.
Differentiability of $φ$ does not guarantee a finite mean. The implication " $E ∣ X ∣^{k} < \infty \Rightarrow φ \in C^{k}$ " is one-directional in general. There exist distributions with no finite mean whose characteristic function is differentiable at the origin; the clean converse holds for even orders, where $φ^{''} (0)$ finite does force $E [X^{2}] < \infty$ .
Real characteristic function does not mean real random variable. $φ_{X}$ is real exactly when $μ$ is symmetric about the origin, not when $X$ takes real values (it always does). The fair-coin fingerprint $cos t$ is real because the law is symmetric.
A modulus-one value off the origin signals a lattice. If $∣ φ_{X} (t_{0}) ∣ = 1$ for some $t_{0} \neq = 0$ , then $X$ is supported on a coset of the lattice ${x : t_{0} \cdot x \in 2 π Z}$ . Continuous distributions have $∣ φ_{X} (t) ∣ < 1$ for all $t \neq = 0$ .

Key theorem with proof Intermediate+

Theorem (Lévy inversion and the continuity theorem). Let $μ, μ_{1}, μ_{2}, \dots$ be Borel probability measures on $R$ with characteristic functions $φ, φ_{1}, φ_{2}, \dots$ .

(Inversion.) For any $a < b$ that are continuity points of $μ$ (i.e. $μ ({a}) = μ ({b}) = 0$ ), $$ \mu\big((a, b)\big) = \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^{T} \frac{e^{-i t a} - e^{-i t b}}{i t}, \varphi(t), dt. $$ Consequently $φ$ determines $μ$ uniquely.

(Continuity.) If $φ_{n} (t) \to φ (t)$ for every $t \in R$ , then $μ_{n} \Rightarrow μ$ (weak convergence). Conversely, if $ψ (t) = lim_{n} φ_{n} (t)$ exists for every $t$ and $ψ$ is continuous at $t = 0$ , then $ψ$ is the characteristic function of a probability measure $μ$ and $μ_{n} \Rightarrow μ$ .

Proof. Inversion. Fix $a < b$ and define $I_{T} = \frac{1}{2 π} \int_{- T}^{T} \frac{e ^{- i t a} - e ^{- i t b}}{i t} φ (t) d t$ . Substitute $φ (t) = \int_{R} e^{i t x} d μ (x)$ and apply Fubini [from 02.10.04] — legitimate because the integrand is bounded by $\frac{∣ e ^{- i t a} - e ^{- i t b} ∣}{∣ t ∣} \leq (b - a)$ uniformly in $x$ , so the double integrand is integrable on $[- T, T] \times R$ : $$ I_T = \int_{\mathbb{R}} \left( \frac{1}{2\pi} \int_{-T}^{T} \frac{e^{it(x - a)} - e^{it(x - b)}}{it}, dt \right) d\mu(x). $$ The inner integral is evaluated using the Dirichlet kernel: with $S (θ, T) = \frac{1}{π} \int_{0}^{T} \frac{s i n ( θ t )}{t} d t = \frac{1}{π} Si (θ T)$ , the bracket equals $\frac{1}{2} (S (x - a, T) + S (b - x, T))$ after combining the real parts (the imaginary parts cancel by oddness). As $T \to \infty$ , $\frac{1}{π} Si (θ T) \to \frac{1}{2} sgn (θ)$ , so the bracket tends to the function $g (x)$ equal to $1$ for $a < x < b$ , equal to $\frac{1}{2}$ at $x \in {a, b}$ , and equal to $0$ otherwise. The functions $x \mapsto \frac{1}{2 π} \int_{- T}^{T} \frac{\dots}{i t} d t$ are uniformly bounded (the sine integral $Si$ is bounded), so by bounded convergence $I_{T} \to \int_{R} g d μ = μ ((a, b)) + \frac{1}{2} μ ({a, b})$ . When $a, b$ are continuity points the boundary term vanishes, giving the stated formula. Uniqueness follows: two measures with the same $φ$ assign the same value to every interval with continuity-point endpoints, and such intervals generate the Borel $σ$ -algebra, so the measures coincide.

Continuity, forward direction. Suppose $φ_{n} \to φ$ pointwise, with $φ$ the characteristic function of $μ$ . Weak convergence $μ_{n} \Rightarrow μ$ is equivalent to $\int f d μ_{n} \to \int f d μ$ for all bounded continuous $f$ . It suffices to test against the dense class of functions $e^{i t x}$ and combinations dense in $C_{0}$ , but the clean route is through tightness. The family ${μ_{n}}$ is tight: continuity of $φ$ at $0$ , together with the bound (proved below) $μ_{n} ({∣ x ∣ > 2/ u}) \leq \frac{1}{u} \int_{- u}^{u} (1 - Re φ_{n} (t)) d t$ , forces uniform control of the tails because the right side tends to $0$ as $u \to 0$ uniformly in $n$ (dominated convergence using $φ_{n} \to φ$ and $φ (0) = 1$ with $φ$ continuous at $0$ ). By Prokhorov's theorem tightness yields a weakly convergent subsequence $μ_{n_{k}} \Rightarrow ν$ ; its characteristic function is $lim_{k} φ_{n_{k}} = φ$ , so $ν = μ$ by uniqueness. Every subsequence has a further subsequence converging to the same $μ$ , hence $μ_{n} \Rightarrow μ$ .

Continuity, converse. Assume only that $ψ = lim_{n} φ_{n}$ exists pointwise and is continuous at $0$ . The tail bound above uses only $φ_{n}$ , and continuity of $ψ$ at $0$ with $ψ (0) = lim φ_{n} (0) = 1$ gives, by dominated convergence, $lim sup_{n} μ_{n} ({∣ x ∣ > 2/ u}) \leq \frac{1}{u} \int_{- u}^{u} (1 - Re ψ (t)) d t \to 0$ as $u \to 0$ . So ${μ_{n}}$ is tight, Prokhorov supplies a subsequential weak limit $ν$ with characteristic function $ψ$ , and $ψ$ is therefore the characteristic function of a probability measure. Uniqueness of the subsequential limit forces $μ_{n} \Rightarrow ν$ .

The tail bound. For any probability measure $ρ$ on $R$ with characteristic function $φ$ and any $u > 0$ , Fubini gives $$ \frac1u \int_{-u}^{u}\big(1 - \mathrm{Re},\varphi(t)\big), dt = \int_{\mathbb{R}} \left( \frac1u \int_{-u}^{u}(1 - \cos(tx)), dt \right) d\rho(x) = \int_{\mathbb{R}} 2\Big(1 - \frac{\sin(ux)}{ux}\Big), d\rho(x). $$ Since $1 - \frac{s i n s}{s} \geq 0$ for all $s$ and $1 - \frac{s i n s}{s} \geq \frac{1}{2}$ whenever $∣ s ∣ \geq 2$ , the integrand is at least $1_{{∣ ux ∣ \geq 2}}$ , yielding $\frac{1}{u} \int_{- u}^{u} (1 - Re φ) d t \geq ρ ({∣ x ∣ \geq 2/ u})$ , which is the bound used above. $□$

Bridge. This theorem builds toward the central limit theorem and the entire theory of infinitely divisible laws, and the same inversion-plus-continuity mechanism appears again in the Lévy-Khinchine representation and in the proof of the law of large numbers via characteristic functions. The foundational reason the continuity theorem holds is that the characteristic transform is injective and bicontinuous between two topologies — weak convergence of measures and pointwise convergence of fingerprints — once tail mass is prevented from escaping; this is exactly the role tightness plays, converting pointwise control of $φ_{n}$ into uniform control of the measures. The continuity theorem generalises the inversion formula from a single fixed measure to a whole convergent net, and the bridge is that Bochner's theorem completes the circle: it tells us precisely which limit functions $ψ$ can arise, namely the continuous positive-definite ones, so that the continuity-at-zero hypothesis is not a technical patch but the exact boundary between fingerprints that come from probability measures and those that do not.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Compute the characteristic function of the standard normal law $N (0, 1)$ with density $\frac{1}{2 π} e^{- x^{2} /2}$ , and confirm it is its own fingerprint up to no normalising constant.

Hint

Complete the square in the exponent $i t x - x^{2} /2$ , or differentiate $φ$ and solve the resulting ODE $φ^{'} (t) = - tφ (t)$ .

Answer

Differentiating under the integral sign, $φ^{'} (t) = \frac{1}{2 π} \int i x e^{i t x} e^{- x^{2} /2} d x$ . Integrate by parts using $x e^{- x^{2} /2} = - \frac{d}{d x} e^{- x^{2} /2}$ : the boundary terms vanish and $φ^{'} (t) = - t φ (t)$ . With $φ (0) = 1$ this linear ODE has the unique solution $φ (t) = e^{- t^{2} /2}$ . So the standard normal characteristic function is $e^{- t^{2} /2}$ , a Gaussian in $t$ of the same shape as the density — the self-transforming property carried over from the Fourier picture [from 02.10.04]. This clean exponential form is the reason the normal law is the universal limit: products of standardised Gaussian-like fingerprints stay Gaussian.

Exercise 4 (medium, symbolic).

Use characteristic functions to prove that the sum of independent $N (μ_{1}, σ_{1}^{2})$ and $N (μ_{2}, σ_{2}^{2})$ random variables is $N (μ_{1} + μ_{2}, σ_{1}^{2} + σ_{2}^{2})$ .

Hint

The $N (μ, σ^{2})$ characteristic function is $e^{i μ t - σ^{2} t^{2} /2}$ . Multiply the two fingerprints and match the form.

Answer

From Exercises 2 and 3, $N (μ, σ^{2})$ has characteristic function $φ (t) = e^{i μ t - σ^{2} t^{2} /2}$ . For independent summands the fingerprint of the sum is the product: $$ \varphi_{X_1 + X_2}(t) = e^{i\mu_1 t - \sigma_1^2 t^2/2}, e^{i\mu_2 t - \sigma_2^2 t^2/2} = e^{i(\mu_1 + \mu_2)t - (\sigma_1^2 + \sigma_2^2)t^2/2}. $$ This is exactly the $N (μ_{1} + μ_{2}, σ_{1}^{2} + σ_{2}^{2})$ fingerprint, and by the uniqueness half of the inversion theorem the sum has that law. The convolution of two Gaussians — an awkward integral — collapses to addition of means and variances in the exponent.

Exercise 6 (hard, short-answer).

Prove the weak law of large numbers via characteristic functions: if $X_{1}, X_{2}, \dots$ are independent and identically distributed with finite mean $μ$ , then the sample average $\overset{ˉ}{X}_{n} = \frac{1}{n} \sum_{k = 1}^{n} X_{k}$ converges in distribution (hence in probability, to a constant) to $μ$ .

Hint

Write the fingerprint of $\overset{ˉ}{X}_{n}$ as $φ (t / n)^{n}$ and use the first-order expansion $φ (s) = 1 + i μ s + o (s)$ valid when the mean is finite.

Answer

Finiteness of the mean gives the expansion $φ (s) = 1 + i μ s + o (s)$ as $s \to 0$ (one derivative at the origin). The sample average has fingerprint $φ_{\overset{ˉ}{X}_{n}} (t) = φ (t / n)^{n} = (1 + \frac{i μ t}{n} + o (1/ n))^{n}$ . Taking logarithms, $n lo g (1 + \frac{i μ t}{n} + o (1/ n)) = i μ t + o (1) \to i μ t$ , so $φ_{\overset{ˉ}{X}_{n}} (t) \to e^{i μ t}$ for every $t$ . The limit $e^{i μ t}$ is continuous at $0$ and is the fingerprint of the point mass at $μ$ . By the converse half of the Lévy continuity theorem, $\overset{ˉ}{X}_{n} \Rightarrow δ_{μ}$ . Convergence in distribution to a constant is equivalent to convergence in probability, giving the weak law. The whole argument is a single application of the continuity theorem to the multiplicative structure of fingerprints under sums.

Exercise 7 (hard, short-answer).

Show that the function $ψ (t) = e^{- ∣ t ∣}$ is a characteristic function by exhibiting the law whose fingerprint it is, and contrast with $e^{- ∣ t ∣^{α}}$ for $α > 2$ , which is not a characteristic function.

Hint

Invert $e^{- ∣ t ∣}$ using the inversion formula, or recognise it as the Fourier transform of a standard density. For the second part, examine $ψ^{''} (0)$ .

Answer

The inversion integral $\frac{1}{2 π} \int_{R} e^{- i t x} e^{- ∣ t ∣} d t = \frac{1}{π} \frac{1}{1 + x ^{2}}$ is the standard Cauchy density, so $e^{- ∣ t ∣}$ is the characteristic function of the Cauchy law (a stable law of index $1$ ). More generally $e^{- ∣ t ∣^{α}}$ is a characteristic function exactly for $0 < α \leq 2$ , the symmetric $α$ -stable family. For $α > 2$ the function $ψ (t) = e^{- ∣ t ∣^{α}}$ is twice differentiable at $0$ with $ψ^{''} (0) = 0$ , which would force $E [X^{2}] = 0$ and hence $X = 0$ almost surely, with fingerprint identically one — a contradiction. So no probability law has $e^{- ∣ t ∣^{α}}$ as its characteristic function when $α > 2$ ; the boundary $α = 2$ is the Gaussian, the only stable law with finite variance.

Advanced results Master

The characteristic transform is a homeomorphism from the space of probability measures on $R^{n}$ , under weak convergence, onto the set of continuous positive-definite functions normalised at the origin, under uniform convergence on compacts. Bochner's theorem is the surjectivity half: a function $φ : R^{n} \to C$ is the characteristic function of some probability measure iff $φ$ is continuous, positive-definite, and satisfies $φ (0) = 1$ . The proof recovers the measure by an inversion-with-regularisation argument — multiply $φ$ by a Gaussian damping $e^{- ε ∣ t ∣^{2} /2}$ , invert the now-integrable product to obtain a non-negative density (non-negativity is exactly positive-definiteness), and pass $ε \to 0$ using the continuity theorem. This is the finite-dimensional model whose infinite-dimensional generalisation is the Bochner-Minlos theorem on nuclear spaces [from 02.10.06]: there the dual of a nuclear space replaces $R^{n}$ , the continuity hypothesis is upgraded to continuity in the nuclear topology, and the resulting measure lives on the topological dual.

Moment expansions sharpen the inversion-continuity dictionary into rates. If $E ∣ X ∣^{k} < \infty$ then $$ \varphi_X(t) = \sum_{j=0}^{k} \frac{(it)^j}{j!}, \mathbb{E}[X^j] + o(|t|^k), \qquad t \to 0, $$ and the error term is controlled by $\frac{∣ t ∣ ^{k}}{k !} E [min (2∣ X ∣^{k}, ∣ t ∣ ∣ X ∣^{k + 1} / (k + 1))]$ . This Taylor control at the origin is what turns the central limit theorem into a quantitative statement: standardising and raising to the $n$ -th power, the quadratic term $- \frac{1}{2} t^{2}$ survives and the cubic and higher terms wash out at rate $n^{- 1/2}$ , the Berry-Esseen rate when a third moment is present.

The continuity theorem upgrades to a metric statement. The Lévy metric and the bounded-Lipschitz metric both metrise weak convergence, and the characteristic-function distance $sup_{∣ t ∣ \leq T} ∣ φ_{X} (t) - φ_{Y} (t) ∣$ , combined with a tail bound from a smoothing inequality, dominates the Kolmogorov distance between distribution functions — the Esseen smoothing inequality: for distributions $F, G$ with $G$ having bounded density $m = sup ∣ G^{'} ∣$ , $$ \sup_x |F(x) - G(x)| \le \frac{1}{\pi}\int_{-T}^{T} \left| \frac{\varphi_F(t) - \varphi_G(t)}{t} \right| dt + \frac{24, m}{\pi T}. $$ This converts pointwise closeness of characteristic functions on a finite window into uniform closeness of distribution functions, the analytic core of the Berry-Esseen theorem.

Stability and infinite divisibility are read directly off the fingerprint. A law is infinitely divisible iff for every $n$ its characteristic function has an $n$ -th root that is again a characteristic function, and the Lévy-Khinchine formula gives the universal form $φ (t) = exp (ib t - \frac{1}{2} a t^{2} + \int (e^{i t x} - 1 - i t x 1_{∣ x ∣ < 1}) d ν (x))$ with $ν$ the Lévy measure. The Gaussian ( $ν = 0$ ) and the Poisson ( $a = 0$ , $ν$ atomic) are the extreme cases; the whole class is the set of possible limits of triangular arrays of small independent summands, characterised entirely through the continuity theorem applied to products of fingerprints.

Synthesis. The foundational reason this transform organises classical probability is that it linearises convolution: the messy operation of adding independent laws becomes multiplication of fingerprints, and the central insight is that multiplication is analysable through its behaviour near the origin, where moments live. Putting these together, inversion makes the transform injective, Bochner's theorem makes its range exactly the continuous positive-definite functions, and the continuity theorem makes it bicontinuous once tightness blocks the escape of mass — three facts that are dual to one another, since injectivity plus surjectivity onto a closed set plus bicontinuity is precisely a homeomorphism. This is exactly why the central limit theorem is a one-line computation in the transform domain: the standardised sum's fingerprint is a power of a fingerprint expanded near zero, the quadratic Gaussian term survives, and the continuity theorem reads the surviving exponential $e^{- t^{2} /2}$ back as the normal law. The same machinery generalises upward to the Lévy-Khinchine classification of all limit laws and appears again in the nuclear-space Bochner-Minlos theorem, where the finite-dimensional positive-definiteness condition becomes the seed of a measure on an infinite-dimensional dual.

Full proof set Master

The inversion formula, the uniqueness corollary, the continuity theorem in both directions, and the tail bound are proved in full in the Key theorem section. The remaining Master claims are recorded here.

Proposition (uniform continuity and positive-definiteness). Every characteristic function $φ_{X}$ on $R^{n}$ is uniformly continuous and positive-definite.

Proof. For uniform continuity, estimate $∣ φ_{X} (t + h) - φ_{X} (t) ∣ = ∣ E [e^{i t \cdot X} (e^{ih \cdot X} - 1)] ∣ \leq E ∣ e^{ih \cdot X} - 1∣$ , a bound independent of $t$ . As $h \to 0$ the integrand tends to $0$ pointwise and is dominated by $2$ , so dominated convergence gives $E ∣ e^{ih \cdot X} - 1∣ \to 0$ , which is uniform continuity. For positive-definiteness, given $t_{1}, \dots, t_{m}$ and $c_{1}, \dots, c_{m}$ , $$ \sum_{j,k} c_j \overline{c_k}, \varphi_X(t_j - t_k) = \mathbb{E}\Big[\sum_{j,k} c_j \overline{c_k}, e^{i(t_j - t_k)\cdot X}\Big] = \mathbb{E}\Big[\Big| \sum_j c_j, e^{i t_j \cdot X}\Big|^2\Big] \ge 0. \qquad \square $$

Proposition (moments from derivatives). If $E ∣ X ∣^{k} < \infty$ in dimension one, then $φ_{X} \in C^{k} (R)$ and $φ_{X}^{(k)} (0) = i^{k} E [X^{k}]$ .

Proof. The function $t \mapsto e^{i tX}$ has $j$ -th $t$ -derivative $(i X)^{j} e^{i tX}$ with modulus $∣ X ∣^{j}$ , integrable for $j \leq k$ by hypothesis. Differentiation under the expectation is justified $k$ times by dominated convergence with dominating functions $∣ X ∣^{j}$ , giving $φ_{X}^{(j)} (t) = E [(i X)^{j} e^{i tX}]$ . Continuity of each $φ_{X}^{(j)}$ follows from the previous proposition applied to the finite measure $∣ x ∣^{j} d μ$ . Setting $t = 0$ yields $φ_{X}^{(k)} (0) = E [(i X)^{k}] = i^{k} E [X^{k}]$ . $□$

Proposition (Bochner, regularised-inversion proof). A continuous positive-definite $φ$ with $φ (0) = 1$ is the characteristic function of a probability measure on $R^{n}$ .

Proof. For $ε > 0$ set $φ_{ε} (t) = φ (t) e^{- ε ∣ t ∣^{2} /2}$ , which is positive-definite (a product of positive-definite functions, by the Schur product theorem) and integrable. Define $p_{ε} (x) = (2 π)^{- n} \int φ_{ε} (t) e^{- i t \cdot x} d t$ . Positive-definiteness of $φ_{ε}$ makes $p_{ε} \geq 0$ : writing the inverse-transform as a limit of Riemann sums $\sum_{j, k} φ (t_{j} - t_{k}) \dots \geq 0$ and passing to the limit. Total mass is $\int p_{ε} d x = φ_{ε} (0) = e^{0} = 1$ in the limit, so each $p_{ε}$ is (after the normalisation that the Gaussian convolution supplies) a probability density, and its characteristic function is $φ (t) e^{- ε ∣ t ∣^{2} /2}$ . As $ε \to 0$ these fingerprints converge pointwise to $φ$ , which is continuous at $0$ ; by the Lévy continuity theorem the measures $p_{ε} d x$ converge weakly to a probability measure $μ$ with characteristic function $φ$ . $□$

Proposition (lattice characterisation). If $∣ φ_{X} (t_{0}) ∣ = 1$ for some $t_{0} \neq = 0$ , then $P (t_{0} \cdot X \in θ + 2 π Z) = 1$ for some $θ \in R$ .

Proof. Write $φ_{X} (t_{0}) = e^{i θ}$ . Then $E [e^{i (t_{0} \cdot X - θ)}] = 1$ , so $E [1 - cos (t_{0} \cdot X - θ)] = 0$ . The integrand is non-negative and vanishes only where $t_{0} \cdot X - θ \in 2 π Z$ , so that event has probability one. $□$

Connections Master

The Fourier transform on $R^{n}$ 02.10.04 is the analytic engine underneath this unit. The characteristic function is the Fourier transform of the law $μ$ under the probabilistic sign-and-normalisation convention, the inversion formula is the measure-valued Fourier inversion theorem, and the Plancherel isometry of that unit reappears here as the Parseval identity relating $\int ∣ φ ∣^{2}$ to the energy of an absolutely continuous law; the smoothing-and-regularisation techniques in the Bochner proof are the Gaussian approximate-identity arguments from the Fourier unit.

The Bochner-Minlos theorem and characteristic functionals on nuclear spaces 02.10.06 are the infinite-dimensional continuation of the finite-dimensional Bochner theorem proved here. The positive-definiteness condition that singles out characteristic functions on $R^{n}$ becomes, on a nuclear space, the condition that singles out characteristic functionals of measures on the topological dual; this unit is the classical companion the dispatcher named, and the continuity-at-the-origin hypothesis here is the seed of the nuclear-continuity hypothesis there.

Probability theory: rules and distributions 26.02.01 supplies the measure-theoretic foundation — probability spaces, laws, independence, and convolution — on which the characteristic-function calculus is built. The independence property $φ_{X + Y} = φ_{X} φ_{Y}$ is the transform-side image of the convolution-of-laws rule defined there, and the affine and symmetry properties of $φ$ are read directly off the distribution-transformation rules of that unit.

Random variables and expected value 26.03.01 provides the expectation integral that defines $φ_{X} (t) = E [e^{i t \cdot X}]$ and the moment functionals $E [X^{k}]$ recovered by differentiating $φ$ at the origin; the dominated-convergence justifications for differentiating under the expectation and for the uniform-continuity estimate all run on the integration theory established there.

Historical & philosophical context Master

The characteristic function entered probability through Laplace and Cauchy in the early nineteenth century as the Fourier transform of a density, but its decisive use as a tool for limit theorems is due to Paul Lévy, whose 1925 Calcul des probabilités ^{[Lévy 1925]} introduced the continuity theorem in essentially its modern form and made the characteristic function the central instrument of the subject. Lévy's 1937 Théorie de l'addition des variables aléatoires ^{[Lévy 1937]} developed the connection to infinitely divisible laws that became the Lévy-Khinchine formula. The abstract characterisation of which functions can be characteristic functions was settled by Salomon Bochner, whose 1932 Vorlesungen über Fouriersche Integrale ^{[Bochner 1932]} proved that the continuous positive-definite functions are exactly the Fourier-Stieltjes transforms of finite measures. The uniqueness of factorisation results — for instance Cramér's 1936 theorem ^{[Cramér 1936]} that a sum of independent variables is normal only if each summand is normal — were among the first deep applications of the inversion machinery.

The tightness step that completes the continuity theorem was placed on a general topological footing by Yuri Prokhorov in 1956 ^{[Prokhorov 1956]}, whose theorem identifies relative compactness in the weak topology with tightness and supplies the subsequential limits the proof requires. The conceptual content is that two genuinely different descriptions of a distribution — the measure itself and its transform — carry the same information and the same notion of convergence, with positive-definiteness marking the exact image of the transform and continuity at the origin marking the exact condition under which a limit of transforms is again a transform.

Bibliography Master

@book{levy1925,
  author    = {L\'evy, Paul},
  title     = {Calcul des probabilit\'es},
  publisher = {Gauthier-Villars, Paris},
  year      = {1925}
}

@book{levy1937,
  author    = {L\'evy, Paul},
  title     = {Th\'eorie de l'addition des variables al\'eatoires},
  publisher = {Gauthier-Villars, Paris},
  year      = {1937}
}

@book{bochner1932,
  author    = {Bochner, Salomon},
  title     = {Vorlesungen \"uber Fouriersche Integrale},
  publisher = {Akademische Verlagsgesellschaft, Leipzig},
  year      = {1932}
}

@article{cramer1936,
  author  = {Cram\'er, Harald},
  title   = {\"Uber eine Eigenschaft der normalen Verteilungsfunktion},
  journal = {Mathematische Zeitschrift},
  volume  = {41},
  pages   = {405--414},
  year    = {1936}
}

@article{prokhorov1956,
  author  = {Prokhorov, Yuri V.},
  title   = {Convergence of random processes and limit theorems in probability theory},
  journal = {Theory of Probability and Its Applications},
  volume  = {1},
  number  = {2},
  pages   = {157--214},
  year    = {1956}
}

@book{durrett2019,
  author    = {Durrett, Rick},
  title     = {Probability: Theory and Examples},
  edition   = {5th},
  publisher = {Cambridge University Press},
  year      = {2019}
}

@book{kallenberg2002,
  author    = {Kallenberg, Olav},
  title     = {Foundations of Modern Probability},
  edition   = {2nd},
  publisher = {Springer-Verlag, New York},
  year      = {2002}
}

Prerequisites

02.10.04
02.10.06
26.02.01
26.03.01

Tier anchors

beginner: Durrett, Probability: Theory and Examples 5e §3.3 (informal characteristic-function picture); Grimmett-Stirzaker, Probability and Random Processes 3e §5.7; physical intuition from the spectrum of a signal and the fingerprint of a distribution
intermediate: Durrett, Probability: Theory and Examples 5e §3.3 (characteristic functions, inversion, continuity theorem); Billingsley, Probability and Measure 3e §26-29; Williams, Probability with Martingales §16; Feller, An Introduction to Probability Theory and Its Applications II §XV
master: Durrett, Probability: Theory and Examples (Cambridge 5e, 2019) §3.3-3.4; Kallenberg, Foundations of Modern Probability 2e Ch. 5; Feller II §XV-XVII; Lukacs, Characteristic Functions 2e; Bochner, Harmonic Analysis and the Theory of Probability Ch. 3-4

References

Lévy — Calcul des probabilités · Gauthier-Villars, Paris, 1925, Ch. VI (fonctions caractéristiques, théorème de continuité)
Lévy — Théorie de l'addition des variables aléatoires · Gauthier-Villars, Paris, 1937, Ch. II-III
Bochner — Vorlesungen über Fouriersche Integrale · Akademische Verlagsgesellschaft, Leipzig, 1932 (Satz 23: positive-definite functions)
Cramér — Über eine Eigenschaft der normalen Verteilungsfunktion · Math. Z. 41 (1936), 405-414
Durrett — Probability: Theory and Examples · Cambridge University Press, 5th ed., 2019, §3.3-3.4
Prokhorov — Convergence of random processes and limit theorems in probability theory · Theory Probab. Appl. 1 (1956), 157-214
Kallenberg — Foundations of Modern Probability · Springer, 2nd ed., 2002, Ch. 5

Estimated time

beginner: 20m
intermediate: 55m
master: 95m