05.15.01 · symplectic / optimal-transport

Wasserstein metric and Otto's formal Riemannian calculus

shipped3 tiersLean: none

Anchor (Master): Monge 1781 *Mémoire sur la théorie des déblais et des remblais* (Histoire de l'Académie Royale des Sciences, originator of the transport problem); Kantorovich 1942 *On the translocation of masses* (Dokl. Akad. Nauk SSSR 37, linear-programming relaxation); Brenier 1991 *Polar factorization and monotone rearrangement of vector-valued functions* (Comm. Pure Appl. Math. 44); Otto 2001 *The geometry of dissipative evolution equations: the porous medium equation* (Comm. Partial Differential Equations 26); Jordan-Kinderlehrer-Otto 1998 (SIAM J. Math. Anal. 29); Villani 2008 *Optimal Transport: Old and New* (Springer); Ambrosio-Gigli-Savaré 2005 *Gradient Flows in Metric Spaces and in the Space of Probability Measures* (Birkhäuser); Arnold-Khesin *Topological Methods in Hydrodynamics* 2nd ed.

Intuition Beginner

Imagine a pile of sand on a beach and a hole next to it. You want to move every grain of sand into the hole, paying for each grain a cost proportional to the square of the distance you carry it. The cheapest plan for moving the entire pile is the optimal transport plan, and the minimum total cost (the square root of it, to be precise) is the Wasserstein distance between the initial mass distribution on the pile and the final distribution in the hole. Gaspard Monge wrote down this problem in 1781 to plan military earthworks; it is the original problem in optimal transport.

What is striking is that Wasserstein distance is more than a number. Felix Otto noticed in 2001 that you can put a Riemannian-geometry structure on the space of probability distributions in such a way that Wasserstein distance becomes the geodesic distance. The straightest path from one distribution to another is a continuous interpolation that shifts mass at constant speed along straight lines — the visual is mass flowing without pressure. And the geodesic equation, viewed in this geometry, turns out to be the equation describing dust particles moving freely under inertia (the pressureless Euler equation).

The remarkable consequence is that the heat equation, which describes how a hot spot of mass diffuses outward, becomes in Otto's picture the gradient descent of the entropy functional in Wasserstein geometry. Diffusion is descent down the entropy landscape, with the descent direction set by the Wasserstein metric. This single observation has reshaped twentieth-century mathematics around the unifying idea that many evolution equations are gradient flows on the space of probability distributions, and that the geometry of these flows controls their stability, regularity, and long-time behaviour.

Visual Beginner

Picture three things stacked vertically. At the bottom, two mass distributions on the real line — say a Gaussian bell curve centred at and another centred at , both the same width. In the middle, a continuous slow-motion movie of one bell sliding rigidly along the -axis until it lands on the other — this is the Wasserstein geodesic. At the top, the abstract space of all probability measures with finite second moment, drawn as a curved surface, with the two bells appearing as two points and the geodesic as a smooth arc joining them.

The visual captures Otto's insight: optimal transport gives the space of probability distributions a geometry. Geodesics are interpolations by mass-preserving sliding. The heat equation runs downhill on the entropy potential in this geometry.

Worked example Beginner

The simplest optimal transport is between two point masses. Place a single grain of sand at and ask how to move it to .

Step 1. The initial distribution is the Dirac mass , supported entirely at the point . The final distribution is .

Step 2. The only transport plan is to carry the single grain from to . The cost is the squared distance .

Step 3. The Wasserstein-2 distance is . In this simplest case the distance between Dirac masses is the ordinary distance between their supports.

Step 4. The geodesic interpolation is the Dirac mass at the point , written for . The grain moves at constant speed from to , just as a free particle would.

What this tells us: Wasserstein distance generalises ordinary Euclidean distance from points to whole distributions, and Wasserstein geodesics generalise straight lines. When the distributions are spread out (not single points), the geodesic does not move mass uniformly: each tiny piece of mass takes its own straight-line route to its image under the optimal transport map.

Check your understanding Beginner

Formal definition Intermediate+

Let denote a Polish metric space (the canonical case is with Euclidean distance). Write for the space of Borel probability measures on .

Definition (second moment). For and a chosen base point , the second moment is . Let $$ \mathcal{P}_2(\mathcal{X}) = {\mu \in \mathcal{P}(\mathcal{X}) : M_2(\mu) < \infty}. $$ The space is independent of the choice of .

Definition (coupling). For , a coupling of and is a probability measure on with marginals and . The set of couplings is denoted .

Definition (Wasserstein-2 distance). The Wasserstein-2 distance between is $$ W_2(\mu, \nu) = \left(\inf_{\pi \in \Pi(\mu, \nu)} \int_{\mathcal{X} \times \mathcal{X}} d(x, y)^2 , d\pi(x, y)\right)^{1/2}. $$ The infimum is attained (existence via the direct method on the weak-* compact set combined with lower-semicontinuity of the cost functional); an attaining coupling is called an optimal coupling.

Theorem (Kantorovich-Rubinstein duality). For , $$ W_2^2(\mu, \nu) = \sup_{(\phi, \psi)} \left{\int_{\mathbb{R}^n} \phi , d\mu + \int_{\mathbb{R}^n} \psi , d\nu : \phi(x) + \psi(y) \leq |x-y|^2 \text{ for all } x, y\right}. $$ The supremum is attained at a pair of -conjugate functions, called the Kantorovich potentials.

Definition (Monge map). A Monge map for the transport problem from to is a Borel map with minimising the Monge cost . Equivalently, the coupling is optimal in the Kantorovich sense.

Theorem (Brenier 1991). Let with absolutely continuous with respect to Lebesgue measure. Then there exists a unique (up to -null modifications) Monge map , and for a convex function . Conversely, the gradient of any convex with is optimal.

Definition (Otto Riemannian structure on ). Otto 2001 introduced the formal Riemannian structure on :

  • Tangent space at : , identifying tangent vectors with gradient vector fields modulo the equivalence under .
  • Metric: .
  • Continuity equation: a curve has tangent vector iff in the weak sense.

Definition (Hirota gradient flow). A curve in is the -gradient flow of a functional if $$ \partial_t \mu_t + \nabla \cdot (\mu_t \nabla \phi_t) = 0, \qquad \nabla \phi_t = -\nabla_{\mu_t} F, $$ where denotes the Otto gradient computed from the first variation by .

Counterexamples to common slips

  • Brenier's theorem needs absolute continuity of . If has atoms, the optimal coupling need not be supported on a graph: splitting mass is generically required, so no Monge map exists.
  • The Otto tangent space is not all of . Vector fields that are not (closures of) gradients lie in the orthogonal complement and do not represent infinitesimal changes of probability measure; they generate volume-preserving rearrangements of mass.
  • Wasserstein-2 is not Hilbertian on . The cone-of-measures structure makes Wasserstein space into a positively curved metric space in the Alexandrov sense when , and tangent cones (rather than tangent spaces) are the rigorous replacement of Otto's heuristic.
  • Geodesic interpolation is not linear interpolation. The interpolation is a curve in but it is not the Wasserstein geodesic; the geodesic is , supported on a sliding deformation of .

Key theorem with proof Intermediate+

Theorem (Otto 2001 — gradient flow of entropy is the heat equation). Let denote the Boltzmann entropy $$ H(\rho) = \int_{\mathbb{R}^n} \rho(x) \log \rho(x) , dx $$ for measures absolutely continuous with respect to Lebesgue measure, with for singular . The Otto-Wasserstein gradient flow of on is the heat equation .

Proof. The first variation of at in the direction (a signed measure with ) is $$ \delta H(\rho)[h] = \int_{\mathbb{R}^n} (\log \rho(x) + 1) h(x) , dx, $$ so the Euclidean gradient is .

By the Otto recipe, the Wasserstein gradient is the gradient of the first variation: $$ \nabla_\rho H = \nabla \left(\frac{\delta H}{\delta \rho}\right) = \nabla \log \rho = \frac{\nabla \rho}{\rho}. $$

The Wasserstein gradient flow equation in the Otto sense reads (via the continuity equation linking tangent vector to through with ): $$ \partial_t \rho_t + \nabla \cdot \left(\rho_t \cdot \left(-\frac{\nabla \rho_t}{\rho_t}\right)\right) = 0. $$

Simplifying gives $$ \partial_t \rho_t - \nabla \cdot \nabla \rho_t = \partial_t \rho_t - \Delta \rho_t = 0, $$ i.e. the heat equation.

Theorem (Otto 2001 — the geodesic equation is pressureless Euler). A curve in is a Wasserstein geodesic iff there is a velocity field such that $$ \partial_t \rho_t + \nabla \cdot (\rho_t v_t) = 0, \qquad \partial_t v_t + (v_t \cdot \nabla) v_t = 0, $$ i.e. iff solves the pressureless Euler system with potential velocity field.

Proof. The Riemannian geodesic equation for an abstract Riemannian manifold is , where is the Levi-Civita connection. On with the Otto metric, the tangent vector corresponds to a gradient field via the continuity equation .

The Otto Levi-Civita connection computes the covariant time-derivative of along as $$ (\nabla_{\dot\mu_t} \dot\mu_t)_x = \partial_t \nabla\phi_t(x) + (\nabla\phi_t \cdot \nabla) \nabla\phi_t(x), $$ which is the material derivative of the velocity field along the flow. Setting this to zero yields $$ \partial_t \nabla\phi_t + (\nabla\phi_t \cdot \nabla) \nabla\phi_t = 0. $$ Writing and using that for irrotational , this is equivalent to $$ \partial_t v_t + (v_t \cdot \nabla) v_t = 0, $$ the pressureless Euler equation. The continuity equation gives the mass-transport component.

Bridge. Otto's calculus identifies with the dual orbit-space picture of the Euler-Arnold programme 05.09.05. The diffeomorphism group acts on a fixed reference measure by push-forward, and the quotient is identified with . The right-invariant metric on descends to the Otto-Wasserstein metric on the quotient. The pressureless Euler equation is the geodesic equation on the quotient, and the JKO scheme discretises gradient flow in this geometry.

Exercises Intermediate+

Lean formalization Intermediate+

Mathlib does not yet support the Wasserstein metric, the Brenier theorem, or Otto's formal Riemannian calculus on . The unit is formalisation-free at the symbolic level; meaningful Lean statements would require the entire upstream chain documented in Mathlib gap analysis. The theorem statement that would be the target, once the infrastructure exists, has the schematic form:

-- Aspirational, not currently realisable in Mathlib.
theorem otto_heat_gradient_flow
    (ρ : ℝ → ProbabilityMeasure.WithFiniteSecondMoment ℝ^n) :
    IsWassersteinGradientFlow ρ BoltzmannEntropy ↔
    ∀ t, ∂ ρ.density t / ∂t = Laplacian (ρ.density t) :=
sorry

The statement requires ProbabilityMeasure.WithFiniteSecondMoment (the space ), WassersteinDistance (Monge-Kantorovich infimum), IsWassersteinGradientFlow (the Otto-AGS gradient-flow framework on metric measure spaces), and BoltzmannEntropy (with its first variation ). None of this exists in current Mathlib. The closest existing infrastructure is MeasureTheory.ProbabilityMeasure and the MeasureTheory.Lp Bochner-integral machinery; Otto's tangent-space identification with -gradients would be the bridge. Tracked as a long-horizon contribution roadmap.

Advanced results Master

Brenier's theorem and Monge-Ampère regularity (1991, Caffarelli 1992-2000). Brenier 1991 [Brenier 1991] (Comm. Pure Appl. Math. 44) proved the existence and uniqueness of optimal transport maps for quadratic cost on when the source is absolutely continuous. The map is the gradient of a convex function , and solves the Monge-Ampère equation where , . Caffarelli 1992 (Comm. Pure Appl. Math. 45) proved -regularity of on the interior of convex source domains; under stronger assumptions (uniform convexity, smooth densities) the regularity bootstraps to . The Monge-Ampère viewpoint underlies the entire regularity theory of optimal transport and has driven significant developments in fully nonlinear elliptic PDE.

McCann interpolation and displacement convexity (1995-1997). McCann 1995 [McCann 1995] (Duke Math. J. 80) and McCann 1997 [McCann 1997] (Adv. Math. 128) introduced displacement interpolation: the curve is the Wasserstein geodesic from to . The notion of displacement convexity — convexity along these geodesics — gave a new perspective on classical functional inequalities (sharp Brunn-Minkowski, Prékopa-Leindler, sharp Sobolev) by linearising the relevant quantities along the geodesic. McCann's PhD thesis (Princeton 1994, advised by Lieb) is widely viewed as the bridge from the static Monge-Kantorovich theory to the modern geometric optimal-transport programme.

Otto's Riemannian calculus (1998-2001). Otto's Comm. PDE 26 paper [Otto 2001] is the canonical reference; the 1998 Bonn preprint (with Westdickenberg and others) developed the calculus piecewise. The formal structure: , the Levi-Civita connection is , and the Otto Hessian of a functional at in the direction is when , with the potential. This Hessian formula, combined with the Bakry-Émery curvature-dimension condition , yields the -displacement convexity of free energy and hence functional inequalities. The Otto calculus made rigorous in metric-measure-space generality by Ambrosio-Gigli-Savaré 2005 [Ambrosio-Gigli-Savaré 2005] Gradient Flows in Metric Spaces and in the Space of Probability Measures (Birkhäuser).

JKO scheme and Wasserstein gradient flows (1998). Jordan-Kinderlehrer-Otto 1998 [JKO 1998] (SIAM J. Math. Anal. 29) discovered that the heat equation arises as the gradient flow of Boltzmann entropy in Wasserstein geometry, via the iterative variational scheme . The convergence as to the heat equation is the prototype JKO theorem, and the scheme generalises: Fokker-Planck is the -gradient flow of ; porous medium is the gradient flow of (Otto 2001); -Laplacian flows, granular media (Carrillo-McCann-Villani 2003 Rev. Mat. Iberoam. 19), McKean-Vlasov flows with nonlocal interaction, Patlak-Keller-Segel chemotaxis (Blanchet-Calvez-Carrillo 2008). The JKO scheme is now the standard tool for proving existence, uniqueness, and long-time behaviour of degenerate parabolic PDEs.

Lott-Sturm-Villani synthetic Ricci theory (2005-2009). Lott-Villani 2009 [Lott-Villani 2009] (Ann. Math. 169) and Sturm 2006 [Sturm 2006] (Acta Math. 196) used displacement convexity of entropy along Wasserstein geodesics to define synthetic Ricci-curvature lower bounds on metric measure spaces . The condition reads: for every Wasserstein-2 geodesic in , suitable entropy functionals satisfy displacement convexity with parameter and dimension . The theory recovers smooth Ricci-curvature bounds on Riemannian manifolds (Cordero-Erausquin-McCann-Schmuckenschläger 2001), is stable under measured Gromov-Hausdorff convergence, gives a self-contained framework for Ricci-limit spaces (Cheeger-Colding 1997 ff.), and has been refined by Ambrosio-Gigli-Savaré 2014 ( Riemannian-curvature-dimension condition), Bacher-Sturm 2010 (reduced ), and Cavalletti-Mondino 2017 ( on essentially non-branching spaces).

Dynamic formulation and connection to fluid mechanics (Benamou-Brenier 2000). Benamou-Brenier 2000 [Benamou-Brenier 2000] (Numer. Math. 84) recast the Wasserstein-2 distance as $$ W_2^2(\mu_0, \mu_1) = \inf_{\rho_t, v_t} \int_0^1 \int |v_t|^2 \rho_t , dx , dt, $$ subject to , , . The minimisers have irrotational, the Wasserstein geodesic, and the Eulerian potential. This dynamic or fluid formulation makes the connection to pressureless Euler explicit and underlies efficient numerical algorithms (Benamou-Brenier alternating-minimisation, Sinkhorn iterations via entropy regularisation due to Cuturi 2013 NeurIPS).

Optimal transport on Riemannian manifolds and beyond. McCann 2001 (Geom. Funct. Anal. 11) extended Brenier's theorem to compact Riemannian manifolds with cost , the geodesic distance; the optimal map is for -concave. Ma-Trudinger-Wang 2005 (Arch. Rational Mech. Anal. 177) identified the curvature-of-cost condition controlling regularity. Loeper 2009 (Acta Math. 202) proved the equivalence of MTW non-negativity with -regularity of optimal maps on the round sphere, opening the regularity theory in non-flat geometries. Figalli-Kim-McCann 2013 (Comm. PDE 38) extended this to general MTW manifolds.

Connections to PDE, probability, and machine learning. Wasserstein gradient flow has reshaped the analysis of dissipative PDE: porous medium (Otto 2001), thin-film equations (Bertozzi-Pugh 1996, recast in Wasserstein by Matthes-McCann-Savaré 2009), Cahn-Hilliard (Lisini-Matthes-Savaré 2012), Patlak-Keller-Segel (Blanchet 2013). In probability: Talagrand's inequality (1996), the Otto-Villani 2000 HWI inequality, Bakry-Émery hypercontractivity. In machine learning: Wasserstein GANs (Arjovsky-Chintala-Bottou 2017 ICML), sliced Wasserstein and entropy-regularised transport (Sinkhorn algorithm: Cuturi 2013), Wasserstein gradient descent for training neural networks (Chizat-Bach 2018 NeurIPS), and optimal-transport-based domain adaptation (Courty-Flamary-Tuia-Rakotomamonjy 2017 IEEE TPAMI).

Arnold-Khesin synthesis (2021). The 2nd edition of Arnold-Khesin [Arnold-Khesin] adds a new chapter integrating Otto's calculus with the Euler-Arnold programme. The identification places Wasserstein geometry in the same right-invariant-metric framework as ideal fluid hydrodynamics, with the quotient metric on producing the Otto-Wasserstein metric and the geodesic equation producing the pressureless Euler system. The full Euler equation with pressure is recovered on the volume-preserving subgroup via Arnold's 1966 reduction. Khesin-Misiolek 2003 (Adv. Math. 176) developed the moment-map description: is the cotangent space at of the abelian Lie algebra , and the symplectic structure is the canonical one. The Madelung-Brenier framework (Brenier 1989, Carlen-Gangbo 2003) realises the quantum-mechanical wavefunction equation as a Wasserstein gradient flow of the Fisher information, completing the unification of optimal transport, fluid mechanics, and quantum hydrodynamics under a single geometric heading.

Full proof set Master

Lemma (existence of optimal couplings). For , the infimum is attained.

Proof. The set is convex and weakly-* compact in (tightness follows from the marginal constraints and Prokhorov's theorem applied to and separately). The cost functional is weak-* lower semicontinuous (by Fatou applied to the non-negative continuous integrand , with the uniform bound from finite second moments giving uniform integrability on compact sets). A lower-semicontinuous functional on a weak-* compact convex set attains its infimum.

Theorem (Brenier 1991 — existence and uniqueness of the Monge map). Let with . There exists a unique (up to -null modifications) map with convex and $T_ \mu = \nu(\mathrm{id} \times T)_* \mu$ is the unique optimal Kantorovich coupling.*

Proof. Step 1 (duality). Apply Kantorovich-Rubinstein duality: over pairs with everywhere. By a standard argument (replacing with its -conjugate and iterating), we may assume the dual is achieved at a pair with and , both -concave functions.

Step 2 (envelope characterisation). In the quadratic-cost case, . Defining exhibits as a supremum of affine functions, hence convex.

Step 3 (optimality at the graph of ). The complementary-slackness condition holds on for any optimal coupling . Substituting and rewriting in terms of gives the condition ( the subdifferential) on . Since and convex functions are Lebesgue-a.e. differentiable (Alexandrov 1939, sharpened by Rademacher), for -a.e. . Hence is supported on the graph of , and is the optimal Monge map.

Step 4 (uniqueness). Suppose two convex potentials both yield optimal maps. Their average is convex and pushes to a measure . Strict convexity of the cost in the coupling forces and -a.e. Uniqueness up to additive constant on the potential and -null modifications of the map.

Theorem (Otto 2001 — heat equation is the entropy gradient flow, full version). Let on . A curve is the Wasserstein gradient flow of iff in the sense of distributions.

Proof. Necessity. Otto's recipe identifies the Wasserstein gradient as . The gradient flow equation in the Otto sense reads (via the continuity equation linking tangent vector to through with ): $$ \partial_t \rho_t + \nabla \cdot (\rho_t \cdot (-\nabla\rho_t/\rho_t)) = 0 \quad \Longleftrightarrow \quad \partial_t \rho_t = \nabla \cdot \nabla \rho_t = \Delta\rho_t. $$

Sufficiency. For the converse, work with the JKO scheme. Define . The Euler-Lagrange equation at the minimiser gives (using a first-order perturbation argument and the Brenier-map characterisation of ): there exists convex with being the optimal map from to , and in the sense. Passing to the limit (justified by AGS gradient-flow compactness machinery) recovers the continuous equation . The argument generalises to Fokker-Planck and to gradient flows of any displacement-convex functional. Full detail: JKO 1998 [JKO 1998] and Ambrosio-Gigli-Savaré 2005 [Ambrosio-Gigli-Savaré 2005] Ch. 11.

Theorem (geodesic equation = pressureless Euler, formal proof). On with Otto's formal Riemannian structure, the geodesic equation in the variables is the pressureless Euler system , .

Proof. The energy functional along a curve with tangent is $$ E[\rho, \phi] = \int_0^1 \int |\nabla\phi_t|^2 \rho_t , dx , dt. $$ The constraint is . Introducing a Lagrange multiplier and computing first variations: gives , and (after integration by parts) gives so . Substituting back: (using symmetry of and the identity for irrotational), so $$ \partial_t \nabla\phi_t + (\nabla\phi_t \cdot \nabla)\nabla\phi_t = 0, $$ which is the pressureless Euler equation. The continuity equation is the constraint. Geodesics are precisely the solutions of this coupled system with endpoint conditions , , and where is the Brenier potential from to .

Corollary (closed-form Gaussian Wasserstein). For Gaussians on , $$ W_2^2(\mu_0, \mu_1) = |m_1 - m_0|^2 + \mathrm{tr}(\Sigma_0 + \Sigma_1 - 2(\Sigma_0^{1/2}\Sigma_1\Sigma_0^{1/2})^{1/2}). $$ The optimal map is affine: with , the unique positive-symmetric solution of .

Proof. The convex potential has gradient . Push-forward: . By Brenier's theorem is optimal. The cost is computed directly: , which simplifies using to the stated formula.

Connections Master

  • Euler-Arnold equations 05.09.05. The pressureless Euler equation is the geodesic equation on with the Otto metric, and the identification places optimal transport inside Arnold's 1966 framework of geodesics on Lie groups. The volume-preserving subgroup gives the incompressible Euler equation; the full diffeomorphism group modulo volume-preservation gives the optimal-transport / pressureless picture. Khesin-Misiolek 2003 (Adv. Math. 176) developed the moment-map description connecting the two.

  • Hamiltonian vector field 05.02.05. Otto's calculus realises as a (formal) Riemannian manifold whose Levi-Civita connection has a Hamiltonian interpretation: the Hamilton-Jacobi equation generates Wasserstein geodesics. The Wasserstein geodesic flow is a Hamiltonian flow on for the kinetic-energy Hamiltonian.

  • Liouville volume 05.02.07. The connection between optimal transport and Liouville's theorem is via the dual / Eulerian description: along a Wasserstein geodesic, the volume element is conserved by the continuity equation, just as the symplectic volume is conserved by Hamiltonian flow. The Madelung-Bohm transformation connects Wasserstein geometry to the Madelung quantum-hydrodynamic equations and to Liouville volume on phase space.

  • Probability theory rules and distributions 26.02.01. The Wasserstein metric metrizes weak convergence on supplemented by second-moment convergence — a refinement of the weak topology. It dominates the total variation in a precise sense (Kantorovich-Rubinstein for ; for the comparison goes through Pinsker's inequality and entropy). Convergence rates in central limit theorems, large deviations, and concentration inequalities are most naturally expressed in Wasserstein distance.

  • Ideal-fluid hydrodynamics (Arnold-Khesin Ch. I). Arnold-Khesin 2nd-edition new chapter on optimal-transport geometry of integrates the Wasserstein-Otto framework with the topological-fluid-mechanics programme. The pressureless Euler equation on and the incompressible Euler equation on are two faces of a single geodesic-flow picture, with the link supplied by the moment-map / coadjoint-orbit identifications.

  • Information geometry and statistical inference. The Fisher-Rao metric on is a different Riemannian structure on probability distributions, equivalent to the Otto metric only at the level of infinitesimal second-order analysis (both have the same Hessian of Boltzmann entropy at any reference measure). The Otto metric is the unique structure for which the heat equation is gradient flow of entropy; the Fisher-Rao metric is the unique invariant structure under sufficient-statistic transformations. The Madelung-Brenier picture (Carlen-Gangbo 2003) realises the quantum-mechanical Schrödinger equation as Wasserstein gradient flow of the Fisher information , unifying the two metrics within a single PDE.

Historical & philosophical context Master

Gaspard Monge 1781 [Monge 1781] (Histoire de l'Académie Royale des Sciences) posed the original transport problem in the context of military earthworks: given a pile of earth at a construction site and a hole to be filled, find the cheapest assignment of mass from source to destination. Monge proved that the optimal map has the geometric property that transport rays do not cross — anticipating, by 200 years, the convex-potential characterisation of Brenier. The problem was studied sporadically through the nineteenth century by Appell, Hadamard, and others, but did not attract sustained attention until Leonid Kantorovich's 1942 paper [Kantorovich 1942] (Dokl. Akad. Nauk SSSR 37) introduced the linear-programming relaxation: instead of insisting on a deterministic transport map, allow mass to split and look for the optimal coupling. Kantorovich's reformulation made the problem mathematically tractable and earned him the 1975 Nobel Prize in Economics (jointly with T. Koopmans) for applications to resource allocation.

Yann Brenier 1987 (C. R. Acad. Sci. Paris 305) and 1991 [Brenier 1991] (Comm. Pure Appl. Math. 44) revolutionised the theory by proving that the optimal map with quadratic cost is the gradient of a convex function. This polar-factorisation theorem connected optimal transport to the Monge-Ampère equation and to the theory of convex duality, opening the door to regularity theory (Caffarelli 1992, Comm. Pure Appl. Math. 45). Robert McCann's 1994 PhD thesis (Princeton, advised by Lieb) and his 1995 [McCann 1995] Duke Math. J. and 1997 [McCann 1997] Adv. Math. papers introduced displacement interpolation and displacement convexity, providing the bridge from static Monge-Kantorovich to dynamic Wasserstein geometry.

Felix Otto's 1998-2001 work [Otto 2001] (Comm. Partial Differential Equations 26) introduced the formal Riemannian calculus on , recognising that the Wasserstein-2 distance defines a geodesic Riemannian structure whose tangent spaces are -gradients and whose geodesic equation is the pressureless Euler system. Jordan-Kinderlehrer-Otto 1998 [JKO 1998] (SIAM J. Math. Anal. 29) had already shown the heat equation is gradient flow of entropy in this geometry. Cédric Villani's 2003 [Villani 2003] Topics in Optimal Transportation (AMS) and 2008 Optimal Transport: Old and New (Springer) made the modern theory accessible to a broad mathematical audience; Villani received the 2010 Fields Medal partly for this work.

Ambrosio-Gigli-Savaré 2005 [Ambrosio-Gigli-Savaré 2005] (Birkhäuser) provided rigorous analytic foundations for Otto's formal calculus, developing the theory of gradient flows on metric spaces in full generality. Lott-Villani 2009 [Lott-Villani 2009] (Ann. Math. 169) and Sturm 2006 [Sturm 2006] (Acta Math. 196) used Wasserstein geometry to define synthetic Ricci-curvature lower bounds on metric measure spaces, extending Cheeger-Colding theory to a fully axiomatic setting. Otto-Villani 2000 (J. Funct. Anal. 173) and Bakry-Émery 1985 (Sém. Probab. XIX) linked Wasserstein gradient flow to log-Sobolev inequalities and to the Bakry-Émery -criterion. By the 2010s optimal transport had become a unifying language across PDE, probability, geometric analysis, computer science, and machine learning, and the 2nd edition of Arnold-Khesin [Arnold-Khesin] integrated it explicitly into the Euler-Arnold programme.

Bibliography Master

@article{Monge1781,
  author = {Monge, Gaspard},
  title = {M{\'e}moire sur la th{\'e}orie des d{\'e}blais et des remblais},
  journal = {Histoire de l'Acad{\'e}mie Royale des Sciences de Paris},
  year = {1781},
  pages = {666--704},
}

@article{Kantorovich1942,
  author = {Kantorovich, Leonid V.},
  title = {On the translocation of masses},
  journal = {Doklady Akademii Nauk SSSR},
  volume = {37},
  year = {1942},
  pages = {199--201},
  note = {Reprinted in J. Math. Sci. 133 (2006), 1381--1382},
}

@article{Brenier1991,
  author = {Brenier, Yann},
  title = {Polar factorization and monotone rearrangement of vector-valued functions},
  journal = {Communications on Pure and Applied Mathematics},
  volume = {44},
  year = {1991},
  pages = {375--417},
}

@article{Caffarelli1992,
  author = {Caffarelli, Luis A.},
  title = {The regularity of mappings with a convex potential},
  journal = {Journal of the American Mathematical Society},
  volume = {5},
  year = {1992},
  pages = {99--104},
}

@article{McCann1995,
  author = {McCann, Robert J.},
  title = {Existence and uniqueness of monotone measure-preserving maps},
  journal = {Duke Mathematical Journal},
  volume = {80},
  year = {1995},
  pages = {309--323},
}

@article{McCann1997,
  author = {McCann, Robert J.},
  title = {A convexity principle for interacting gases},
  journal = {Advances in Mathematics},
  volume = {128},
  year = {1997},
  pages = {153--179},
}

@article{JKO1998,
  author = {Jordan, Richard and Kinderlehrer, David and Otto, Felix},
  title = {The variational formulation of the {F}okker--{P}lanck equation},
  journal = {SIAM Journal on Mathematical Analysis},
  volume = {29},
  year = {1998},
  pages = {1--17},
}

@article{Otto2001,
  author = {Otto, Felix},
  title = {The geometry of dissipative evolution equations: the porous medium equation},
  journal = {Communications in Partial Differential Equations},
  volume = {26},
  year = {2001},
  pages = {101--174},
}

@article{BenamouBrenier2000,
  author = {Benamou, Jean-David and Brenier, Yann},
  title = {A computational fluid mechanics solution to the {M}onge--{K}antorovich mass transfer problem},
  journal = {Numerische Mathematik},
  volume = {84},
  year = {2000},
  pages = {375--393},
}

@article{OttoVillani2000,
  author = {Otto, Felix and Villani, C{\'e}dric},
  title = {Generalization of an inequality by {T}alagrand and links with the logarithmic {S}obolev inequality},
  journal = {Journal of Functional Analysis},
  volume = {173},
  year = {2000},
  pages = {361--400},
}

@book{Villani2003,
  author = {Villani, C{\'e}dric},
  title = {Topics in Optimal Transportation},
  publisher = {American Mathematical Society},
  series = {Graduate Studies in Mathematics},
  volume = {58},
  year = {2003},
}

@book{Villani2008,
  author = {Villani, C{\'e}dric},
  title = {Optimal Transport: Old and New},
  publisher = {Springer},
  series = {Grundlehren der mathematischen Wissenschaften},
  volume = {338},
  year = {2008},
}

@book{AmbrosioGigliSavare2005,
  author = {Ambrosio, Luigi and Gigli, Nicola and Savar{\'e}, Giuseppe},
  title = {Gradient Flows in Metric Spaces and in the Space of Probability Measures},
  publisher = {Birkh{\"a}user},
  series = {Lectures in Mathematics ETH Z{\"u}rich},
  year = {2005},
}

@article{LottVillani2009,
  author = {Lott, John and Villani, C{\'e}dric},
  title = {{R}icci curvature for metric-measure spaces via optimal transport},
  journal = {Annals of Mathematics},
  volume = {169},
  year = {2009},
  pages = {903--991},
}

@article{Sturm2006,
  author = {Sturm, Karl-Theodor},
  title = {On the geometry of metric measure spaces {I, II}},
  journal = {Acta Mathematica},
  volume = {196},
  year = {2006},
  pages = {65--131; 133--177},
}

@article{CEMS2001,
  author = {Cordero-Erausquin, Dario and McCann, Robert J. and Schmuckenschl{\"a}ger, Michael},
  title = {A {R}iemannian interpolation inequality {\`a} la {B}orell, {B}rascamp and {L}ieb},
  journal = {Inventiones Mathematicae},
  volume = {146},
  year = {2001},
  pages = {219--257},
}

@book{Santambrogio2015,
  author = {Santambrogio, Filippo},
  title = {Optimal Transport for Applied Mathematicians},
  publisher = {Birkh{\"a}user},
  series = {Progress in Nonlinear Differential Equations and Their Applications},
  volume = {87},
  year = {2015},
}

@book{ArnoldKhesin2021,
  author = {Arnold, Vladimir I. and Khesin, Boris A.},
  title = {Topological Methods in Hydrodynamics},
  publisher = {Springer},
  edition = {2nd},
  series = {Applied Mathematical Sciences},
  volume = {125},
  year = {2021},
}

@article{KhesinMisiolek2003,
  author = {Khesin, Boris and Misio{\l}ek, Gerard},
  title = {{E}uler equations on homogeneous spaces and {V}irasoro orbits},
  journal = {Advances in Mathematics},
  volume = {176},
  year = {2003},
  pages = {116--144},
}

@article{Cuturi2013,
  author = {Cuturi, Marco},
  title = {{S}inkhorn distances: lightspeed computation of optimal transport},
  journal = {Advances in Neural Information Processing Systems},
  volume = {26},
  year = {2013},
  pages = {2292--2300},
}