02.14.04 · analysis / microlocal-analysis

The theory of distributions and the Schwartz kernel theorem

shipped3 tiersLean: none

Anchor (Master): Hörmander *The Analysis of Linear Partial Differential Operators* Vol. I §2–§7 (test functions, distributions, $\mathcal{S}'$, Fourier transform, kernel theorem); Schwartz *Théorie des distributions* (Hermann, 1950/51); Trèves *Topological Vector Spaces, Distributions and Kernels* (Academic Press, 1967) Part III

Intuition Beginner

You already know how to differentiate a smooth function. But many objects in physics and analysis are not smooth: a point mass, a sudden jump in voltage, the charge of an electron sitting at a single point. The theory of distributions is the machine that lets you treat these singular objects as if they were functions you can differentiate as often as you like.

The trick is to stop asking what value the object has at each point. Instead, you only ever ask how it integrates against a smooth test bump. A test bump is a smooth function that is non-zero only on a small region. A distribution is any rule that eats a test bump and returns a number, in a way that is linear and continuous. Ordinary functions become distributions by integration; the point mass becomes the rule "hand me a bump, I give you its value at the origin."

Once you describe an object by how it pairs with bumps, differentiation becomes free. You move the derivative onto the bump, where it is harmless, and flip a sign. So the point mass has a derivative, and a second derivative, and so on, even though it was never a function in the first place.

Visual Beginner

Picture two columns. On the left sits the world of smooth test bumps: gentle hills that rise and fall and are flat zero outside a small window. On the right sits a single number, the output. A distribution is an arrow from the left column to the right column: feed it a bump, read off a number.

The Dirac delta is the simplest interesting arrow: it reads the height of the incoming bump right at the origin and reports that one number. A jump function is a slightly richer arrow. The deepest picture in this unit is that every reasonable linear machine taking bumps to distributions is itself just one big distribution living on the product of the two spaces — a single generalized kernel that encodes the whole operator at once.

Worked example Beginner

Let us compute the derivative of the Heaviside jump on the line. The jump function equals zero for negative inputs and one for positive inputs. It has a cliff at the origin, so classically it has no derivative there. As a distribution it has a perfectly good one.

Step 1. Write the rule for as a distribution: hand it a test bump, and it returns the area under the bump over the positive half-line.

Step 2. Define the distributional derivative by the rule that moves the derivative onto the bump and flips the sign. So the derivative of , paired with a bump, equals minus the area under the bump's slope over the positive half-line.

Step 3. The area under a slope is just the change in height. The bump is flat zero far to the right and has some height at the origin. So minus the change in height, read from the origin out to infinity, is exactly the height of the bump at the origin.

Step 4. But "return the height of the bump at the origin" is the defining rule of the Dirac delta. So the distributional derivative of the jump is the delta.

What this tells us: a cliff in a function turns into a spike in its derivative, and the size of the spike is the size of the cliff. This single calculation is the seed of every fundamental-solution computation in this unit.

Check your understanding Beginner

Formal definition Intermediate+

Let be open. The test-function space consists of smooth functions with compact support in , equipped with the inductive-limit (LF) topology 02.11.06: a sequence iff all supports lie in a fixed compact and uniformly for every multi-index . The space carries the Fréchet topology of uniform convergence of all derivatives on compact sets, and the Schwartz space of rapidly decreasing smooth functions 02.10.04 carries the Fréchet topology of the seminorms .

Definition (distribution). A distribution on is a continuous linear functional . The space of distributions is the continuous dual . The pairing is written . Continuity is equivalent to a local estimate: for every compact there are and an integer with for all supported in . The least such that works uniformly over all (when one exists) is the order of .

Definition (compact support and tempered). is the dual of ; it is identified with the distributions of compact support. The tempered distributions form the dual of . The inclusions hold, dual to the dense continuous inclusions .

Definition (operations). For , , and a multi-index , define $$ \langle \partial^\alpha u, \varphi\rangle = (-1)^{|\alpha|}\langle u, \partial^\alpha \varphi\rangle, \qquad \langle a u, \varphi\rangle = \langle u, a\varphi\rangle, $$ for all . Both produce distributions. The convolution of with is the smooth function , and convolution extends to a pairing .

Definition (support and singular support). The support is the complement of the largest open set on which vanishes (meaning for every supported there). The singular support is the complement of the largest open set on which equals a smooth function.

Notation and conventions

  • : the action of on . Conjugate-linear pairings are avoided; everything is bilinear over .
  • : the Fourier transform on , defined by , with 02.10.04.
  • : the tensor product distribution on , characterised by for , extended by density of finite sums in .
  • : the Dirac distribution at , . The principal value is the order-one distribution defined below.

Fourier transform on the tempered dual

The Fourier transform is a topological isomorphism 02.10.04, so by duality it is an isomorphism extending the Plancherel transform on . One has and , turning differentiation into multiplication. The Paley-Wiener-Schwartz theorem characterises : a tempered distribution is the Fourier transform of a compactly supported distribution iff it extends to an entire function of exponential type with polynomial bounds on the real axis (pointer; full statement in the Advanced section).

Counterexamples to common slips

  • Distributions cannot, in general, be multiplied together. The product has no meaning; the order-of-singularity bookkeeping that rescues some products is the wave-front-set calculus of 02.14.01.
  • The convolution needs a support condition (one factor compactly supported, or convolvable supports); without it associativity fails, as the chain shows.
  • A distribution of infinite order exists: on has no global order, only locally finite order.

Key theorem with proof Intermediate+

Theorem (Schwartz kernel theorem; Hörmander Vol. I Theorem 5.2.1; Schwartz 1950–51 [Hörmander Vol. I]). Let and be open. For every continuous linear map there is a unique distribution , the Schwartz kernel of , such that $$ \langle A u, \varphi\rangle = \langle K_A, \varphi \otimes u\rangle, \qquad \varphi \in \mathcal{D}(X),\ u \in \mathcal{D}(Y). $$ Conversely, every defines such a map by this formula. The correspondence is a linear bijection .

Proof. Uniqueness is the simple case: finite sums are dense in , and the displayed identity fixes on every such sum, hence everywhere by continuity. So at most one kernel exists, and the conjectured formula determines it.

Existence is the content. The map is a separately continuous bilinear form on : linear and continuous in because , and linear and continuous in because is continuous into . The whole theorem reduces to showing every separately continuous bilinear form on a product of test-function spaces extends to a continuous linear functional on .

This is where the topology earns its keep. The space is a nuclear space (Schwartz; Grothendieck): its projective and injective tensor-product topologies coincide, so the algebraic tensor product completes to , and continuous bilinear forms on the factors are exactly continuous linear forms on the completed tensor product. The nuclearity of follows from that of , which in turn follows because the seminorm system of is generated by the eigenfunction expansion of the harmonic oscillator, whose eigenvalue growth is fast enough to make every inclusion between seminorm-Hilbert-completions a trace-class (nuclear) map. Granting nuclearity, extends uniquely to a continuous functional on the completion , and that functional is the kernel .

The converse direction is direct: given , the map is a continuous linear map , because pairing with the partial test function depends continuously on .

Bridge. The kernel theorem builds toward the entire operator-theoretic side of microlocal analysis and appears again in 02.14.02, where the Schwartz kernel of a pseudo-differential operator is smooth off the diagonal and singular along it. This is exactly the distributional analogue of the statement that a matrix is the array of its entries: the kernel is the continuous-index "matrix" of the operator , with the pairing playing the role of the bilinear form . The foundational reason the theory of distributions is the right setting for linear PDE is that every linear operator one meets — differential, integral, solution, or propagator — is represented by such a kernel, and the singular support of the kernel is the geometric record of where the operator fails to smooth its input. The central insight is that nuclearity of the test-function spaces is what collapses "separately continuous bilinear form" to "single distribution on the product", and putting these together gives the dictionary in which the differential operator has kernel , a distribution supported exactly on the diagonal. The bridge is that singular-support-on-the-diagonal is precisely the pseudolocality that the wave-front-set calculus of 02.14.01 refines into a microlocal statement.

Exercises Intermediate+

Advanced results Master

Theorem (structure theorem; Hörmander Vol. I Theorem 2.1.6 [Hörmander Vol. I]). Let . For every compact there are finitely many continuous functions on and a finite index set such that $$ u = \sum_{|\alpha|\le N} \partial^\alpha f_\alpha \quad\text{on a neighbourhood of } K. $$ If has compact support, the representation is global with finitely many . Thus every distribution is locally a finite sum of derivatives of continuous functions, and the order is the distributional order on .

This is the precise sense in which distributions are "functions differentiated finitely many times." The proof embeds the local estimate of order into a Sobolev-type inequality and integrates the test function times, transferring the derivatives onto representing continuous functions by duality.

Theorem (Paley-Wiener-Schwartz; Hörmander Vol. I Theorem 7.3.1 [Hörmander Vol. I]). A distribution with has Fourier transform extending to an entire function on with the bound for some . Conversely every entire function with such a bound is for a unique supported in . The smooth-compactly-supported case corresponds to rapid decay in every horizontal strip (the bound for all ).

The exponential type encodes the support radius, and the polynomial factor encodes the order. This is the analytic engine behind the support theorems and the parametrix constructions of 02.13.02, and it is the reason the Fourier transform of a compactly supported distribution is a genuine (analytic) function rather than another singular object.

Theorem (Schwartz kernel theorem, nuclear form; Trèves Theorem 51.6; Schwartz 1950–51 [Trèves]). Because and are nuclear Fréchet (resp. LF) spaces, the canonical maps $$ \mathcal{D}'(X\times Y) \cong \mathcal{D}'(X),\hat\otimes,\mathcal{D}'(Y) \cong L(\mathcal{D}(Y), \mathcal{D}'(X)) $$ are isomorphisms of topological vector spaces, and the same holds with throughout. Continuity of the operator in stronger topologies refines the kernel: maps iff is smooth in with values in ; is smoothing iff .

The nuclear form is the conceptual home of the kernel theorem: it is not a coincidence of but a consequence of the nuclearity of the test-function spaces, the property Grothendieck isolated precisely to make the tensor-product topology unambiguous. The regularity dictionary at the end is what lets 02.14.02 read pseudo-differential order off the diagonal singularity of the kernel.

Theorem (Schwartz impossibility of multiplication). There is no associative bilinear product on extending pointwise multiplication of continuous functions and compatible with differentiation by the Leibniz rule, in which has a square. The obstruction is the associativity chain against .

This impossibility is the reason microlocal analysis exists: products of distributions are defined only when their wave-front sets satisfy the diagonal-avoidance condition of 02.14.01, and the multiplicative theory is therefore microlocal rather than global. Colombeau algebras circumvent the obstruction by relaxing the compatibility requirements, at the cost of a coarser equivalence.

Synthesis. The theory of distributions is the foundational reason every later microlocal construction makes sense: it is the unique extension of the function concept in which differentiation is unconditional, Fourier transform is an isomorphism on the tempered class, and every linear operator is a kernel. This is exactly the local-to-global pattern that recurs throughout analysis — the structure theorem shows distributions are continuous functions differentiated finitely often, the kernel theorem shows operators are distributions on the product, and the Paley-Wiener-Schwartz theorem shows compact support is entire-function exponential type. Putting these together, the central insight is that nuclearity of and is the single hypothesis that collapses bilinear forms to kernels and makes the operator-as-kernel dictionary exact, and this is dual to the duality that turns differentiation into a continuous operation by transposing it onto test functions. The bridge to microlocal analysis is the failure of multiplication: the impossibility theorem above generalises into the wave-front-set product calculus of 02.14.01, so that the one operation distributions lack globally is recovered microlocally exactly when conormal directions do not cancel. Every theorem of this unit appears again in the consuming units — the kernel theorem in 02.14.02, the distributional Fourier transform and fundamental solutions in 02.13.02, and the singular-support bookkeeping in 02.14.01 — and the whole edifice is the foundation those units silently assume.

Full proof set Master

Proposition (well-definedness of the distributional derivative). For and a multi-index , the formula defines a distribution, and on smooth it agrees with the classical derivative.

Proof. The map is continuous for the LF topology, since it preserves supports and is continuous in each seminorm. Composing with the continuous functional gives a continuous linear functional, so . For identified with a distribution by integration, integration by parts has no boundary terms because has compact support, so the distributional and classical derivatives coincide.

Proposition (Schwartz kernel of the identity is the diagonal delta). The kernel of is the distribution on defined by , i.e. .

Proof. For , the kernel theorem requires . The functional is continuous on and agrees with the required value on the dense set of finite sums , hence equals by the uniqueness half of the kernel theorem. Its support is the diagonal, and in the suggestive notation it is .

Proposition ( is a distribution of order one). The principal value exists for every and defines a distribution of order , not .

Proof. Existence: split with smooth. The term integrates to zero over the symmetric set by oddness of , and the term gives , which converges as . So the limit exists and on a fixed compact, an order-one estimate. Order is not zero: testing against bumps with but steep slope near makes the pairing unbounded under a sup-norm-only estimate, so no order-zero bound holds.

Proposition (smoothing operators have smooth kernels). A continuous linear map has Schwartz kernel that is smooth in the -variable; if moreover extends continuously to then .

Proof. By the kernel theorem exists in . For fixed , is smooth, so the partial pairing is given by integration against the smooth function ; this says is smooth in valued in . If extends to , then pairing with is admissible and is smooth as well, so is jointly smooth by the regularity dictionary of the nuclear kernel theorem.

Connections Master

  • Fourier transform and Schwartz space 02.10.04. The Schwartz space and its dual are introduced there as the natural home of the Fourier transform; this unit promotes that dual to a full calculus, with turning the differential operators of PDE into multiplication operators on the Fourier side. The Plancherel isometry on is the special case of the tempered Fourier isomorphism restricted to square-integrable distributions.

  • Fundamental solution of the Laplacian 02.13.02. The Newtonian potential is exactly the statement in , which is meaningless without the distributional derivative defined here. Every fundamental solution is a distribution whose image under the differential operator is a delta on the diagonal, and the kernel theorem identifies the solution operator with convolution against that fundamental solution.

  • Topological vector spaces 02.11.06. The LF topology on , the Fréchet topology on , and the nuclearity that powers the kernel theorem all live in the world of topological vector spaces. The kernel theorem is the headline application of Grothendieck's nuclearity: it fails for general Banach spaces and holds for and precisely because their seminorm systems make the tensor-product topology unambiguous.

  • Wave-front set of a distribution 02.14.01. The singular support defined here is refined there into the wave-front set, which records not just where but in which directions a distribution fails to be smooth; the Schwartz impossibility of multiplication proved here is exactly what the wave-front-set product theorem of that unit repairs microlocally. This unit is the foundation 02.14.01 recalls in a single sentence and then assumes throughout.

  • Pseudo-differential operators 02.14.02. The kernel theorem proved here is the unstated foundation of that unit: a pseudo-differential operator is a continuous map whose Schwartz kernel is smooth off the diagonal, and the entire symbol calculus is a bookkeeping of the diagonal singularity of . The regularity dictionary at the end of the nuclear kernel theorem is what lets pseudo-differential order be read off the kernel.

Historical & philosophical context Master

Distribution theory was created by Laurent Schwartz in Théorie des distributions (Hermann, Tomes I–II, 1950–51) [Schwartz 1950], synthesising and superseding a generation of partial precursors: Heaviside's operational calculus, Dirac's delta in The Principles of Quantum Mechanics (1930), Sobolev's 1936 weak solutions, Bochner's generalised Fourier integrals, and Wiener's generalised harmonic analysis. Schwartz's decisive move was topological — to define generalised functions as the continuous dual of a space of test functions, so that the singular objects of physics inherit a rigorous calculus from the duality. For this he received the Fields Medal in 1950. The kernel theorem (théorème des noyaux) is the structural summit of the theory: it asserts that the abstract notion of a continuous linear operator between test-function spaces coincides exactly with the concrete notion of a distribution on the product, the analyst's version of "an operator is its matrix."

The conceptual completion came from Alexander Grothendieck, whose thesis Produits tensoriels topologiques et espaces nucléaires (Mem. Amer. Math. Soc. 16, 1955) isolated nuclearity as the precise property of and that makes the kernel theorem true, recasting it as a statement about tensor products of nuclear spaces. Hörmander's The Analysis of Linear Partial Differential Operators Vol. I (Springer, 1983) [Hörmander Vol. I] gave the definitive PDE-oriented treatment, developing the structure theorem, the Paley-Wiener-Schwartz characterisation, and the kernel theorem as the launch pad for the wave-front set and pseudo-differential calculus of Vols. I and III. Christian Gérard's Microlocal Analysis of Quantum Fields on Curved Spacetimes (EMS, 2019) [Gérard] opens by developing distribution theory from scratch in Chapter 1, because the Hadamard-state programme of algebraic quantum field theory rests entirely on reading the wave-front set of an operator's Schwartz kernel — the philosophical payoff being that the "generalized functions" Dirac wrote down heuristically in 1930 are, sixty years later, the exact mathematical carriers of the vacuum structure of quantum fields on a curved spacetime.

Bibliography Master

@book{Schwartz1950Distributions,
  author    = {Schwartz, Laurent},
  title     = {Th{\'e}orie des distributions},
  publisher = {Hermann},
  address   = {Paris},
  note      = {Tomes I--II},
  year      = {1950}
}

@book{HormanderVolI,
  author    = {H{\"o}rmander, Lars},
  title     = {The Analysis of Linear Partial Differential Operators, Vol. {I}: Distribution Theory and Fourier Analysis},
  publisher = {Springer-Verlag},
  series    = {Grundlehren der mathematischen Wissenschaften},
  volume    = {256},
  year      = {1983}
}

@book{Treves1967TVS,
  author    = {Tr{\`e}ves, Fran{\c c}ois},
  title     = {Topological Vector Spaces, Distributions and Kernels},
  publisher = {Academic Press},
  address   = {New York},
  year      = {1967}
}

@book{FriedlanderJoshiDistributions,
  author    = {Friedlander, F. G. and Joshi, M.},
  title     = {Introduction to the Theory of Distributions},
  publisher = {Cambridge University Press},
  edition   = {2},
  year      = {1998}
}

@article{Grothendieck1955Nuclear,
  author  = {Grothendieck, Alexander},
  title   = {Produits tensoriels topologiques et espaces nucl{\'e}aires},
  journal = {Mem. Amer. Math. Soc.},
  volume  = {16},
  year    = {1955}
}

@book{Gerard2019Microlocal,
  author    = {G{\'e}rard, Christian},
  title     = {Microlocal Analysis of Quantum Fields on Curved Spacetimes},
  publisher = {European Mathematical Society},
  series    = {ESI Lectures in Mathematics and Physics},
  year      = {2019}
}