02.13.03 · analysis / pde

Heat Equation, Heat Kernel, and Duhamel's Principle

shipped3 tiersLean: none

Anchor (Master): Evans §2.3; John §7; Friedman, Partial Differential Equations of Parabolic Type (Prentice-Hall 1964); Lieberman, Second Order Parabolic Differential Equations (World Scientific 1996); Stroock-Varadhan, Multidimensional Diffusion Processes (Springer 1979)

Intuition Beginner

The heat equation is the partial differential equation that describes how temperature evens out over time. Drop a hot coin into a metal block and the block warms near the coin, then warms further away, then warms more and more uniformly until the whole block sits at a single intermediate temperature. The heat equation is the local rule that produces this global behaviour.

At every point in space and at every instant in time, the rate at which the temperature changes equals a positive constant (the thermal diffusivity) times the Laplacian of the temperature field. The Laplacian measures how much a function bows above or below the average of its surroundings; a positive Laplacian means the point is colder than its neighbours, so heat flows in and the temperature rises.

The same equation governs an enormous range of physical processes. Salt diffusing through water, ink spreading in a glass, smoke dissipating in a still room, oxygen diffusing across a cell membrane, voltage smoothing across a passive electrical cable, even the probability density of a Brownian particle: all of these obey a heat-equation-style law. The unifying picture is diffusion, the gradual smoothing of a quantity that is locally conserved but whose flux moves from high concentration toward low concentration in proportion to the local gradient. The heat equation is the canonical example of a parabolic PDE, the class of equations describing irreversible time-evolution toward equilibrium.

Two qualitative properties make the heat equation stand apart from the wave equation and the Laplace equation. First, it smooths. If the initial temperature distribution has a jagged spike, even an honest discontinuity, the temperature at any later positive time is infinitely smooth in space. Diffusion is a relentless smoothing operator: high-frequency fluctuations decay first and fastest, low-frequency fluctuations decay last. After a millisecond the spike has rounded into a bump; after a second the bump has spread into a gentle hump; after a minute the hump is barely a ripple above the background.

Second, the heat equation has infinite propagation speed. A point heated at one location instantly raises the temperature, by an exponentially small amount, at every other point in the universe. This is mathematical fact and physical fiction; the heat equation is an excellent model on intermediate scales but breaks down at the relativistic limit, where finite-speed corrections are needed. Engineers and physicists use the heat equation knowing it is an idealisation; the breakdown matters only when the heated region is so small that mean-free-path effects (and ultimately the speed of light) intrude.

The third property is the one that converts the heat equation into a powerful computational tool: a single explicit function does all the work. The heat kernel is the function that records the temperature distribution that develops from a unit of heat dropped at a single point at a single instant. It is a Gaussian bump centred at the source location, broader and shorter as time advances, whose total area is conserved (this is conservation of energy: the total heat stays fixed, only its spatial distribution changes).

Once we know the heat kernel, we can solve the heat equation for any initial distribution by adding up the contributions of each initial point, weighted by the heat kernel that has spread out from that point over the elapsed time. The recipe is the same convolution idea that solved the Poisson equation, now adapted to the time-dependent setting.

The non-homogeneous version of the equation, with a heat source that varies in space and time, is solved by Duhamel's principle. The idea is simple and physical: a continuously acting heat source is equivalent to an infinite stream of instantaneous sources fired one after another. Each instantaneous source produces a heat-kernel-shaped contribution that subsequently spreads and decays; the total temperature is the integral over time of these contributions. Duhamel's principle is the parabolic version of variation-of-parameters from ordinary differential equations, the universal trick for converting a homogeneous solution operator into a non-homogeneous one.

The takeaway in a single sentence: the heat equation describes diffusion, the heat kernel is its fundamental solution, convolution with the heat kernel solves the initial-value problem, and Duhamel's principle solves the inhomogeneous problem.

Visual Beginner

Picture a long thin metal rod heated at one point at one instant. Just after the heat is delivered, the temperature is a tall narrow spike at the source. A moment later the spike has melted into a slightly broader and slightly shorter bump. A bit later the bump is broader and shorter still. The shape stays Gaussian throughout, a smooth bell curve, but the width grows in proportion to the square root of the elapsed time and the height shrinks correspondingly so that the area under the curve stays fixed at one unit of heat.

This single picture, the spreading Gaussian, is the heat kernel. It is the response of the rod to a unit of heat dropped at one point at one instant. Every other heat-equation solution on the rod is built from this one picture by superposition: stack up shifted and rescaled copies of the heat kernel, one for each initial point, weighted by the initial temperature there, and the sum is the solution.

The same picture extends to two or three dimensions. The heat kernel in three-dimensional space is a three-dimensional Gaussian, a fuzzy ball of concentrated temperature that spreads radially outward over time. The width of the ball grows as the square root of time, and the peak height drops in proportion so that the total heat is conserved. Adding up copies of the three-dimensional heat kernel, weighted by initial temperatures, gives the temperature field for any initial heat distribution in three dimensions.

The visual contrast with the wave equation is sharp. A wave on a string preserves the shape of an initial pulse, moving it bodily without distortion (in one space dimension), or at most spreading it in a controlled wavefront with sharp edges (in three space dimensions). The heat equation has no such structure: every initial pulse, no matter its shape, melts into a single smooth Gaussian-shaped blob at any positive time. The wave equation conserves the initial regularity; the heat equation destroys it, replacing whatever roughness was there with perfect smoothness.

Worked example Beginner

We solve the heat equation on an infinite metal bar with a simple piecewise-constant initial temperature. The setup: at time zero, the temperature is equal to one degree on the segment from minus one to one, and zero elsewhere. The bar is infinite in both directions. We compute the temperature at the origin at time one (in units where the diffusion constant is one).

Step 1. Set up the heat equation. The temperature $u (x, t)$ satisfies $u_{t} = u_{xx}$ for every $x$ on the real line and every $t > 0$ , with initial condition $u (x, 0) = 1$ for $∣ x ∣ \leq 1$ and $u (x, 0) = 0$ for $∣ x ∣ > 1$ .

Step 2. The heat kernel in one space dimension is the Gaussian $Φ (x, t) = \frac{1}{4 π t} e^{- x^{2} / (4 t)} .$ The solution of the initial-value problem at the point $x$ and time $t$ is obtained by adding up (in the continuous-summation sense) the contributions to the temperature at $x$ from each source point $y$ . The contribution from source point $y$ is the value $Φ (x - y, t)$ of the heat kernel, weighted by the initial temperature $u (y, 0)$ at that source point. We write this continuous sum in the standard convolution shorthand $u (\cdot, t) = Φ (\cdot, t) * u (\cdot, 0)$ .

Step 3. Use the initial condition. Only source points with $∣ y ∣ \leq 1$ contribute (the others have zero initial temperature). So $u (x, t)$ equals the continuous-sum of $Φ (x - y, t)$ as $y$ varies over the interval $- 1 \leq y \leq 1$ . In standard symbols this is the area under the curve $y \mapsto Φ (x - y, t)$ between $y = - 1$ and $y = 1$ .

Step 4. Specialise to $x = 0$ and $t = 1$ , and substitute $z = y /2$ (so the area-element factor of $1/2$ appears as a compensating $2$ ). The target becomes the area under $\frac{1}{π} e^{- z^{2}}$ between $z = - 1/2$ and $z = 1/2$ .

Step 5. Recognise the error function. The standard error function $erf (a)$ is defined as twice the area under $\frac{1}{π} e^{- z^{2}}$ between $z = 0$ and $z = a$ . Our target is the area between $z = - 1/2$ and $z = 1/2$ , which by symmetry equals $erf (1/2)$ exactly. So $u (0, 1) = erf (1/2) .$ Numerically $erf (1/2) \approx 0.5205$ .

Step 6. Interpret. At time zero the temperature at the origin was one degree. At time one, the temperature has dropped to about $0.52$ degrees as heat has spread out beyond the initial segment. The bar to the left and to the right of the origin has warmed slightly above zero, drawing heat away from the central region. As $t \to \infty$ , $u (x, t) \to 0$ for every fixed $x$ , since the total amount of heat is finite (equal to $2$ degree-units) and it spreads over an infinite bar.

What this tells us: the heat kernel converts an initial-data area-computation into an error-function value. The same recipe handles every reasonable initial condition: convolve with the Gaussian, evaluate either symbolically (often in terms of error functions) or numerically (a single Gaussian area). The error function and its close relatives are the workhorses of explicit heat-equation calculations.

Check your understanding Beginner

Exercise (easy, multiple choice).

Which equation is the heat equation in one space dimension, with $u (x, t)$ the temperature, with diffusion constant set to one?

A. $u_{xx} = 0$ B. $u_{t} = u_{xx}$ C. $u_{tt} = u_{xx}$ D. $u_{t} = u_{x}$

Hint

The heat equation relates the rate of change in time of the temperature to the second spatial derivative of the temperature.

Answer

B. $u_{t} = u_{xx}$ . The heat equation in one space dimension sets the time derivative of the temperature equal to the second spatial derivative (with the diffusion constant absorbed into the units). Feedback-correct: this is the canonical parabolic equation. Feedback-wrong: A is the steady-state Laplace equation (no time dependence); C is the wave equation (second time derivative on the left); D is a first-order transport equation moving a signal to the right at unit speed without diffusion.

Formal definition Intermediate+

Let $U \subseteq R^{n}$ be an open set and let $T > 0$ . Write $U_{T} = U \times (0, T]$ for the parabolic cylinder. The heat equation is the second-order linear PDE $u_{t} - Δ u = 0 on U_{T},$ where $u_{t} = \partial u / \partial t$ and $Δ u = \sum_{i = 1}^{n} \partial^{2} u / \partial x_{i}^{2}$ is the spatial Laplacian. The inhomogeneous heat equation has a forcing term $f : U_{T} \to R$ : $u_{t} - Δ u = f on U_{T} .$ A classical solution is a function $u \in C^{2, 1} (U_{T})$ (twice continuously differentiable in $x$ , once in $t$ ) satisfying the equation pointwise ^{[Evans 2010 §2.3]}.

Cauchy problem. The Cauchy problem (initial-value problem) on the whole space $R^{n}$ asks for $u : R^{n} \times [0, T] \to R$ satisfying ${u_{t} - Δ u = 0 u (\cdot, 0) = g on R^{n} \times (0, T], on R^{n},$ where $g : R^{n} \to R$ is the initial datum. The inhomogeneous Cauchy problem replaces the right side of the PDE by a given source $f (x, t)$ and asks for the same initial condition $u (\cdot, 0) = g$ .

Definition (heat kernel). The heat kernel on $R^{n}$ is the function $Φ (x, t) = ⎩ ⎨ ⎧ \frac{1}{( 4 π t ) ^{n /2}} exp (- \frac{∣ x ∣ ^{2}}{4 t}), 0, t > 0, t \leq 0,$ defined for $x \in R^{n}$ and $t \in R$ . For each fixed $t > 0$ , $Φ (\cdot, t)$ is a Gaussian probability density on $R^{n}$ with mean zero and covariance matrix $2 t I_{n}$ .

Properties of the heat kernel. Direct calculation shows:

$Φ \in C^{\infty} (R^{n} \times (0, \infty))$ .
$Φ_{t} - Δ_{x} Φ = 0$ on $R^{n} \times (0, \infty)$ . That is, $Φ$ itself solves the heat equation, away from $t = 0$ .
$\int_{R^{n}} Φ (x, t) d x = 1$ for every $t > 0$ (total heat is conserved).
As $t \to 0^{+}$ , $Φ (\cdot, t) \to δ_{0}$ in the sense of distributions: for every $φ \in C_{b} (R^{n})$ (continuous bounded), $\int Φ (x, t) φ (x) d x \to φ (0)$ .
Semigroup property: $Φ (\cdot, t + s) = Φ (\cdot, t) * Φ (\cdot, s)$ for $t, s > 0$ , where $*$ denotes spatial convolution.

The heat kernel is the fundamental solution of the heat operator $\partial_{t} - Δ$ , in the sense that it solves the operator equation $(\partial_{t} - Δ) Φ = δ_{0} \otimes δ_{0}$ on $R^{n} \times R$ , where $δ_{0} \otimes δ_{0}$ is the Dirac mass at the space-time origin.

Solution formula (homogeneous Cauchy problem). For $g \in C_{b} (R^{n})$ (continuous bounded), the function $u (x, t) = \int_{R^{n}} Φ (x - y, t) g (y) d y = (Φ (\cdot, t) * g) (x), x \in R^{n}, t > 0,$ is $C^{\infty}$ on $R^{n} \times (0, \infty)$ , solves the heat equation $u_{t} - Δ u = 0$ on $R^{n} \times (0, \infty)$ , and satisfies $u (x, t) \to g (x_{0})$ as $(x, t) \to (x_{0}, 0)$ with $t > 0$ for every continuity point $x_{0}$ of $g$ .

Definition (Duhamel formula). For $g \in C_{b} (R^{n})$ and $f \in C (R^{n} \times [0, T])$ with $f$ bounded and suitably regular, the Duhamel formula gives the solution of the inhomogeneous Cauchy problem $u_{t} - Δ u = f$ with $u (\cdot, 0) = g$ as $u (x, t) = \int_{R^{n}} Φ (x - y, t) g (y) d y + \int_{0}^{t} \int_{R^{n}} Φ (x - y, t - s) f (y, s) d y d s .$ The first integral is the homogeneous solution from the initial data; the second is the time-integrated convolution against the source.

Counterexamples to common slips Intermediate+

Bounded continuous initial data is the natural class. The convolution formula extends to $g \in L^{p} (R^{n})$ for $p \geq 1$ , but the boundary behaviour as $t \to 0^{+}$ degrades: for $g \in L^{p}$ one recovers $u (\cdot, t) \to g$ in $L^{p}$ -norm rather than pointwise. For $g$ merely measurable and locally integrable without a growth bound, the convolution can diverge; the standard growth restriction $∣ g (x) ∣ \leq C e^{a ∣ x ∣^{2}}$ for some $a < 1/ (4 T)$ guarantees the integral converges for $t \in (0, T)$ .
The heat equation cannot be solved backwards in time. Given a temperature distribution at time $T > 0$ , attempting to reconstruct the temperature at $t = 0$ is an ill-posed problem in the sense of Hadamard: the solution does not exist for arbitrary final data (only for final data in the image of the forward semigroup), and even when it exists it depends discontinuously on the data. The phenomenon reflects the smoothing property of the forward equation: forward evolution irreversibly destroys high-frequency information.
Uniqueness requires a growth condition. Tikhonov's 1935 counterexample exhibits a non-zero $C^{\infty}$ function on $R \times (- \infty, \infty)$ that solves the heat equation everywhere, vanishes identically at $t = 0$ , and grows super-exponentially in $x$ for $t \neq = 0$ . Without a growth condition on the solution, the Cauchy problem has infinitely many solutions; uniqueness in the class $∣ u (x, t) ∣ \leq C e^{a ∣ x ∣^{2}}$ for some $a > 0$ holds (Widder 1944).
Infinite propagation speed is mathematical, not physical. The heat kernel is strictly positive everywhere for every $t > 0$ : a unit of heat placed at the origin instantly raises the temperature, by an exponentially small amount, at every point in space. This is a feature of the linear parabolic model; relativistic generalisations (Cattaneo's equation, telegrapher's equation) modify the heat equation to enforce finite propagation speed. The standard heat equation remains the workhorse model because its analytic tractability outweighs the unphysical-speed flaw on every length scale relevant to ordinary physics.
The maximum principle holds on bounded domains, not the whole space. On a bounded parabolic cylinder $\overset{ˉ}{U}_{T} = \overset{ˉ}{U} \times [0, T]$ , the maximum and minimum of a $C^{2, 1}$ subsolution are attained on the parabolic boundary $Γ_{T} = \overset{ˉ}{U}_{T} ∖ U_{T} = (\overset{ˉ}{U} \times {0}) \cup (\partial U \times [0, T])$ . The top face $\overset{ˉ}{U} \times {T}$ is excluded: a solution can attain its max in the interior of $U_{T}$ only at time $T$ (and the strong version forbids even that, modulo connectedness). On the unbounded whole space, the heat kernel itself is a counterexample to a naive maximum principle: $Φ (0, t) \to \infty$ as $t \to 0^{+}$ even though $Φ$ vanishes everywhere at $t = 0$ .

Key theorem with proof Intermediate+

Theorem (heat kernel solves the Cauchy problem). Let $g \in C_{b} (R^{n})$ (continuous bounded). Define $u : R^{n} \times (0, \infty) \to R$ by $u (x, t) = \int_{R^{n}} Φ (x - y, t) g (y) d y .$ Then $u \in C^{\infty} (R^{n} \times (0, \infty))$ , $u_{t} - Δ u = 0$ on $R^{n} \times (0, \infty)$ , $sup_{R^{n} \times (0, \infty)} ∣ u ∣ \leq sup_{R^{n}} ∣ g ∣$ , and $(x, t) \to (x_{0}, 0^{+}) lim u (x, t) = g (x_{0}) for every x_{0} \in R^{n}$ ^{[Evans 2010 §2.3, Theorem 1]}.

Proof. Throughout, fix $g \in C_{b} (R^{n})$ with $M = sup ∣ g ∣ < \infty$ . The proof has four steps.

Step 1 (smoothness). Fix any compact $K \subset R^{n} \times (0, \infty)$ , so $K \subseteq {(x, t) : ∣ x ∣ \leq R, 1/ R \leq t \leq R}$ for some $R > 0$ . On $K$ , $Φ$ and all its $x$ - and $t$ -derivatives are bounded by computable Gaussian-decay expressions in $∣ x - y ∣$ that are integrable in $y$ uniformly over $K$ . For instance, $∣ \partial_{t} Φ (x - y, t) ∣ \leq C (R) (1 + ∣ x - y ∣^{2}) e^{- ∣ x - y ∣^{2} / (8 R)}$ on $K$ , which is integrable in $y$ and bounded uniformly over $K$ . Differentiation under the integral sign is justified by the dominated convergence theorem 02.07.04 applied to the difference quotients, giving $\partial_{t} u (x, t) = \int_{R^{n}} \partial_{t} Φ (x - y, t) g (y) d y, \partial_{x_{i}} u (x, t) = \int_{R^{n}} \partial_{x_{i}} Φ (x - y, t) g (y) d y,$ and similarly for all higher derivatives. Repeating gives $u \in C^{\infty} (R^{n} \times (0, \infty))$ .

Step 2 (heat equation). From the explicit formula for $Φ$ , direct calculation gives $Φ_{t} = Δ_{x} Φ$ on $R^{n} \times (0, \infty)$ . (Compute: $lo g Φ = - \frac{n}{2} lo g (4 π t) - \frac{∣ x ∣ ^{2}}{4 t}$ , so $Φ_{t} /Φ = - n / (2 t) + ∣ x ∣^{2} / (4 t^{2})$ and $\partial_{x_{i}} lo g Φ = - x_{i} / (2 t)$ , $\partial_{x_{i}}^{2} lo g Φ = - 1/ (2 t)$ , giving $Δ_{x} Φ/Φ = - n / (2 t) + ∣ x ∣^{2} / (4 t^{2}) = Φ_{t} /Φ$ .) By the differentiation-under-the-integral identity from Step 1: $u_{t} (x, t) - Δ u (x, t) = \int_{R^{n}} (Φ_{t} (x - y, t) - Δ_{x} Φ (x - y, t)) g (y) d y = \int_{R^{n}} 0 \cdot g (y) d y = 0.$ So $u_{t} - Δ u = 0$ on $R^{n} \times (0, \infty)$ .

Step 3 (sup bound). For every $t > 0$ and $x \in R^{n}$ : $∣ u (x, t) ∣ \leq \int_{R^{n}} Φ (x - y, t) ∣ g (y) ∣ d y \leq M \int_{R^{n}} Φ (x - y, t) d y = M \cdot 1 = M,$ using positivity of $Φ$ and the unit-mass property $\int Φ (\cdot, t) = 1$ .

Step 4 (initial condition). Fix $x_{0} \in R^{n}$ and $ε > 0$ . By continuity of $g$ , choose $δ > 0$ so that $∣ g (y) - g (x_{0}) ∣ < ε$ whenever $∣ y - x_{0} ∣ < δ$ . Compute, for $∣ x - x_{0} ∣ < δ /2$ and $t > 0$ : $u (x, t) - g (x_{0}) = \int_{R^{n}} Φ (x - y, t) (g (y) - g (x_{0})) d y,$ using $\int Φ (\cdot, t) = 1$ to absorb the constant $g (x_{0})$ into the kernel-integral. Split the integration domain into $A = B_{δ} (x_{0})$ and $A^{c} = R^{n} ∖ A$ : $∣ u (x, t) - g (x_{0}) ∣ \leq I \int_{A} Φ (x - y, t) ∣ g (y) - g (x_{0}) ∣ d y + I I \int_{A^{c}} Φ (x - y, t) ∣ g (y) - g (x_{0}) ∣ d y .$

For $I$ : $∣ y - x_{0} ∣ < δ$ so $∣ g (y) - g (x_{0}) ∣ < ε$ , and $I \leq ε \int_{A} Φ (x - y, t) d y \leq ε$ .

For $I I$ : $∣ g (y) - g (x_{0}) ∣ \leq 2 M$ . For $y \in A^{c}$ and $∣ x - x_{0} ∣ < δ /2$ , $∣ y - x ∣ \geq ∣ y - x_{0} ∣ - ∣ x_{0} - x ∣ \geq δ - δ /2 = δ /2$ , so $∣ x - y ∣ \geq δ /2$ . Then $I I \leq 2 M \int_{∣ y - x ∣ \geq δ /2} Φ (x - y, t) d y = 2 M \int_{∣ z ∣ \geq δ /2} Φ (z, t) d z .$ The Gaussian tail integral satisfies $\int_{∣ z ∣ \geq r} Φ (z, t) d z \to 0$ as $t \to 0^{+}$ for every fixed $r > 0$ . (Concretely, $\int_{∣ z ∣ \geq r} (4 π t)^{- n /2} e^{- ∣ z ∣^{2} / (4 t)} d z \leq C n / r^{n} \cdot t \cdot e^{- r^{2} / (8 t)}$ or simpler explicit bounds via the change of variables $z = t \cdot w$ .) Choose $t_{0} > 0$ small enough that $I I < ε$ for $t < t_{0}$ .

Combining: for $∣ x - x_{0} ∣ < δ /2$ and $0 < t < t_{0}$ , $∣ u (x, t) - g (x_{0}) ∣ < 2 ε$ . Since $ε$ was arbitrary, $u (x, t) \to g (x_{0})$ as $(x, t) \to (x_{0}, 0^{+})$ . $□$

Bridge. The same convolution recipe that worked for the Poisson equation 02.13.02 works again for the heat equation, with the parabolic heat kernel $Φ (x, t)$ replacing the elliptic Newtonian-potential kernel $Φ (x)$ . The added ingredient is time: $Φ (x, t)$ is no longer a fixed kernel but a family of kernels parametrised by $t$ , each of which is a smoothing operator whose smoothing strength grows with $t$ . Duhamel's principle below packages the time-dependent inhomogeneous problem into the same convolution-against-fundamental-solution form. The pattern recurs in 02.13.04 the wave equation via retarded Green functions (the Kirchhoff and d'Alembert formulae), in 02.13.07 separation of variables and Fourier-series expansions on bounded domains, and in the modern Brownian-motion picture (the heat kernel is the transition density of $n$ -dimensional Brownian motion, the bridge to stochastic analysis and Feynman-Kac formulas).

Exercises Intermediate+

Exercise 4 (medium, symbolic).

Prove the semigroup property of the heat kernel: $Φ (x, t + s) = (Φ (\cdot, t) * Φ (\cdot, s)) (x)$ for $x \in R^{n}$ and $t, s > 0$ .

Hint

Use the Fourier transform. The Fourier transform of the Gaussian $Φ (\cdot, t)$ is $e^{- t ∣ ξ ∣^{2}}$ . Multiplication of Fourier transforms corresponds to convolution in real space.

Answer

Take the spatial Fourier transform of $Φ (\cdot, t)$ (with convention $\hat{f} (ξ) = \int f (x) e^{- i x \cdot ξ} d x$ ): direct computation gives $Φ (\cdot, t) (ξ) = e^{- t ∣ ξ ∣^{2}}$ (the Fourier transform of a Gaussian is a Gaussian, with the dual width).

Convolution in real space becomes multiplication in Fourier space: $Φ (\cdot, t) * Φ (\cdot, s) (ξ) = Φ (\cdot, t) (ξ) \cdot Φ (\cdot, s) (ξ) = e^{- t ∣ ξ ∣^{2}} \cdot e^{- s ∣ ξ ∣^{2}} = e^{- (t + s) ∣ ξ ∣^{2}} = Φ (\cdot, t + s) (ξ) .$ By Fourier inversion (legitimate on Schwartz space or $L^{2}$ ), $Φ (\cdot, t) * Φ (\cdot, s) = Φ (\cdot, t + s)$ as functions on $R^{n}$ .

The semigroup law encodes the physical statement: diffusing for time $t$ and then for additional time $s$ is the same as diffusing for total time $t + s$ . The operator family $S_{t} f = Φ (\cdot, t) * f$ is a strongly continuous semigroup on $L^{p} (R^{n})$ for $1 \leq p < \infty$ , the heat semigroup, with infinitesimal generator $Δ$ .

Exercise 5 (medium, numeric).

Solve the heat equation on the real line with initial data $g (x) = e^{- x^{2}}$ and compute $u (0, 1)$ . (Hint: Gaussian convolution stays Gaussian.)

Hint

The convolution of two Gaussians is a Gaussian whose variance is the sum of the variances. Initial Gaussian has variance $1/2$ (since $e^{- x^{2}}$ corresponds to a Gaussian with $σ^{2} = 1/2$ ); the heat kernel at time $t = 1$ has variance $2 t = 2$ .

Answer

$u (0, 1) = 1/ 5$ . The initial datum $g (x) = e^{- x^{2}}$ is a Gaussian: writing $g (x) = e^{- x^{2} / (2 σ_{0}^{2})}$ requires $σ_{0}^{2} = 1/2$ . The heat kernel at $t = 1$ has variance $σ_{t}^{2} = 2 t = 2$ .

The convolution of two centred Gaussians, with variances $σ_{0}^{2}$ and $σ_{t}^{2}$ , is a centred Gaussian with variance $σ_{0}^{2} + σ_{t}^{2} = 1/2 + 2 = 5/2$ , multiplied by an overall amplitude. More carefully: $u (x, 1) = \int_{- \infty}^{\infty} \frac{1}{4 π} e^{- (x - y)^{2} /4} e^{- y^{2}} d y .$

Expand the exponent in the integrand: $- (x - y)^{2} /4 - y^{2} = - (x^{2} - 2 x y + y^{2}) /4 - y^{2} = - x^{2} /4 + x y /2 - y^{2} /4 - y^{2} = - x^{2} /4 + x y /2 - 5 y^{2} /4$ . Complete the square in $y$ : $- 5 y^{2} /4 + x y /2 = - \frac{5}{4} (y - x /5)^{2} + x^{2} /20$ . So the integrand becomes $e^{- x^{2} /4 + x^{2} /20} e^{- 5 (y - x /5)^{2} /4} = e^{- x^{2} /5} e^{- 5 (y - x /5)^{2} /4}$ .

Integrate in $y$ : $\int e^{- 5 (y - x /5)^{2} /4} d y = 4 π /5 = 2 π /5$ . So $u (x, 1) = \frac{1}{4 π} \cdot 2 π /5 \cdot e^{- x^{2} /5} = \frac{1}{5} e^{- x^{2} /5} .$

At $x = 0$ : $u (0, 1) = 1/ 5 \approx 0.447$ .

Exercise 6 (medium, symbolic).

Use Duhamel's principle to write the solution of the inhomogeneous heat equation $u_{t} - u_{xx} = 1$ on $R \times (0, \infty)$ with $u (\cdot, 0) = 0$ , then evaluate the answer.

Hint

Duhamel formula with $g = 0$ and $f = 1$ : $u (x, t) = \int_{0}^{t} \int_{- \infty}^{\infty} Φ (x - y, t - s) \cdot 1 d y d s$ . The inner integral is the unit-mass property of the heat kernel.

Answer

$u (x, t) = t$ . The inner spatial integral evaluates by the unit-mass property: $\int_{- \infty}^{\infty} Φ (x - y, t - s) d y = 1$ for every $s < t$ . So $u (x, t) = \int_{0}^{t} 1 \cdot 1 d s = t .$ The solution is uniform in space and grows linearly in time. The physical interpretation: a uniform heat source produces a uniform temperature that increases linearly with elapsed time, with no spatial variation because the source has none and the initial condition has none. The answer also matches direct verification: $u_{t} = 1$ , $u_{xx} = 0$ , so $u_{t} - u_{xx} = 1$ and $u (x, 0) = 0$ , both confirmed.

Exercise 7 (hard, symbolic).

Prove the weak maximum principle: if $u \in C^{2, 1} (U_{T}) \cap C (\overset{ˉ}{U}_{T})$ satisfies $u_{t} - Δ u \leq 0$ on $U_{T}$ , where $U \subset R^{n}$ is open and bounded, then $max_{\overset{ˉ}{U}_{T}} u = max_{Γ_{T}} u$ , where $Γ_{T} = (\overset{ˉ}{U} \times {0}) \cup (\partial U \times [0, T])$ is the parabolic boundary.

Hint

Set $v = u - εt$ for small $ε > 0$ so that $v_{t} - Δ v = u_{t} - Δ u - ε \leq - ε < 0$ . Argue that $v$ cannot attain an interior maximum, then send $ε \to 0$ .

Answer

Fix $ε > 0$ and define $v (x, t) = u (x, t) - εt$ on $\overset{ˉ}{U}_{T}$ . Then $v_{t} - Δ v = u_{t} - Δ u - ε \leq - ε < 0$ on $U_{T}$ .

Suppose for contradiction that $v$ attains its maximum at an interior point $(x_{0}, t_{0}) \in U_{T}$ (with $t_{0} > 0$ but $t_{0} \leq T$ ; the case $t_{0} = T$ included). At such a point: $v_{t} (x_{0}, t_{0}) \geq 0$ (with equality if $t_{0} < T$ ; the inequality $\geq 0$ holds at $t_{0} = T$ since the max is attained and $v$ is non-increasing in $t$ at the boundary from the interior side). Also $Δ v (x_{0}, t_{0}) \leq 0$ since $(x_{0}, t_{0})$ is an interior max in $x$ . So $v_{t} - Δ v \geq 0$ at $(x_{0}, t_{0})$ , contradicting $v_{t} - Δ v < 0$ .

So the maximum of $v$ on $\overset{ˉ}{U}_{T}$ is attained on $Γ_{T}$ . Hence $\overset{ˉ}{U}_{T} max v = Γ_{T} max v, i.e., \overset{ˉ}{U}_{T} max (u - εt) = Γ_{T} max (u - εt) .$

For any $(x, t) \in \overset{ˉ}{U}_{T}$ : $u (x, t) - εt \leq max_{Γ_{T}} (u - εt) \leq max_{Γ_{T}} u$ , so $u (x, t) \leq max_{Γ_{T}} u + εT$ . Sending $ε \to 0$ : $u (x, t) \leq max_{Γ_{T}} u$ for every $(x, t) \in \overset{ˉ}{U}_{T}$ , so $max_{\overset{ˉ}{U}_{T}} u \leq max_{Γ_{T}} u$ . The reverse inequality is automatic since $Γ_{T} \subset \overset{ˉ}{U}_{T}$ . Equality follows.

Exercise 8 (hard, symbolic).

Prove uniqueness for the initial-boundary-value problem on a bounded domain by the energy method: if $u_{1}, u_{2} \in C^{2, 1} (U_{T}) \cap C (\overset{ˉ}{U}_{T})$ both solve $u_{t} - Δ u = f$ on $U_{T}$ with the same Dirichlet data on $\partial U \times [0, T]$ and the same initial data on $\overset{ˉ}{U} \times {0}$ , then $u_{1} = u_{2}$ on $\overset{ˉ}{U}_{T}$ .

Hint

Set $w = u_{1} - u_{2}$ and compute the time derivative of the energy $E (t) = \int_{U} w (x, t)^{2} d x$ . Use integration by parts and the Dirichlet boundary condition on $w$ (which is zero on $\partial U$ ).

Answer

Let $w = u_{1} - u_{2}$ . Then $w_{t} - Δ w = 0$ on $U_{T}$ , $w = 0$ on $\partial U \times [0, T]$ , and $w (\cdot, 0) = 0$ on $\overset{ˉ}{U}$ .

Define the energy $E (t) = \int_{U} w (x, t)^{2} d x$ . Differentiate under the integral (justified by the smoothness of $w$ and the bounded domain $U$ ): $E^{'} (t) = \int_{U} 2 w w_{t} d x = \int_{U} 2 w Δ w d x,$ using $w_{t} = Δ w$ from the PDE.

Apply integration by parts (Green's first identity): $\int_{U} w Δ w d x = \int_{\partial U} w (\partial_{ν} w) d S - \int_{U} ∣\nabla w ∣^{2} d x$ . The boundary term vanishes because $w = 0$ on $\partial U \times [0, T]$ . So $E^{'} (t) = - 2 \int_{U} ∣\nabla w (x, t) ∣^{2} d x \leq 0.$

The energy is non-increasing in time. Combined with $E (0) = \int_{U} w (x, 0)^{2} d x = 0$ and $E (t) \geq 0$ , this forces $E (t) = 0$ for every $t \in [0, T]$ .

$E (t) = \int_{U} w (x, t)^{2} d x = 0$ together with continuity of $w$ gives $w (x, t) = 0$ for every $(x, t) \in U_{T}$ . Continuity to the boundary extends this to $\overset{ˉ}{U}_{T}$ . So $u_{1} = u_{2}$ on $\overset{ˉ}{U}_{T}$ .

The energy method gives uniqueness without using the maximum principle, and extends naturally to other equations (the wave equation, the Schrödinger equation, the Navier-Stokes equations) where the appropriate energy identity replaces the heat-equation energy decay.

Advanced results Master

The modern theory of the heat equation organises around six pillars: the fundamental-solution apparatus and Cauchy-problem theory, the maximum-principle framework on bounded domains, uniqueness theorems and their counterexamples (Tikhonov), regularity theory and the Nash-De Giorgi-Moser apparatus, the parabolic Harnack inequality, and the probabilistic interpretation via Brownian motion.

Theorem 1 (heat kernel via Fourier transform; Fourier 1822). The heat kernel $Φ (x, t)$ on $R^{n}$ is the inverse Fourier transform of $e^{- t ∣ ξ ∣^{2}}$ : $Φ (x, t) = \frac{1}{( 2 π ) ^{n}} \int_{R^{n}} e^{i x \cdot ξ} e^{- t ∣ ξ ∣^{2}} d ξ = \frac{1}{( 4 π t ) ^{n /2}} e^{- ∣ x ∣^{2} / (4 t)} for t > 0$ ^{[Fourier 1822]}. Conversely, Fourier-transforming the Cauchy problem $u_{t} - Δ u = 0$ , $u (\cdot, 0) = g$ gives the algebraic ODE $\overset{u}{^}_{t} = - ∣ ξ ∣^{2} \overset{u}{^}$ with $\overset{u}{^} (ξ, 0) = \overset{g}{^} (ξ)$ , whose solution is $\overset{u}{^} (ξ, t) = \overset{g}{^} (ξ) e^{- t ∣ ξ ∣^{2}}$ . Fourier inversion recovers the convolution formula $u (x, t) = (Φ (\cdot, t) * g) (x)$ . The Fourier-transform derivation was Fourier's original method in his 1822 Théorie analytique de la chaleur: Fourier introduced both the Fourier series (for the heat equation on a bounded interval) and the Fourier integral (for the heat equation on the line), inventing the modern technique of decomposition into eigenmodes of the spatial differential operator.

Theorem 2 (smoothing and infinite propagation speed). Let $g \in L^{p} (R^{n})$ for some $1 \leq p \leq \infty$ , and let $u (x, t) = (Φ (\cdot, t) * g) (x)$ for $t > 0$ . Then $u \in C^{\infty} (R^{n} \times (0, \infty))$ , and for every $t > 0$ , $u (\cdot, t)$ is real-analytic in $x$ . Moreover, if $g \geq 0$ with $g \neq = 0$ , then $u (x, t) > 0$ for every $(x, t) \in R^{n} \times (0, \infty)$ (strict positivity at every space-time point with $t > 0$ ). The smoothing reflects the regularising character of the heat semigroup: arbitrary $L^{p}$ data become real-analytic after any positive time. The strict positivity reflects infinite propagation speed: an arbitrarily small amount of heat anywhere instantly produces a strictly positive temperature everywhere else, however far away.

The analyticity is precise: $u (\cdot, t)$ extends to a complex-analytic function on $R^{n} + i R^{n}$ in the strip $∣ Im z ∣ < c t$ for an explicit constant $c$ depending only on $∥ g ∥_{L^{p}}$ . The proof uses the Gaussian decay of the heat kernel together with the fact that the heat kernel itself extends to a complex-analytic function in $x$ . This is the parabolic analogue of the elliptic real-analyticity for harmonic functions; the parabolic version is sharper because it gives explicit complex-analytic-strip bounds.

Theorem 3 (strong maximum principle; Nirenberg 1953). Let $u \in C^{2, 1} (U_{T}) \cap C (\overset{ˉ}{U}_{T})$ be a subsolution of the heat equation: $u_{t} - Δ u \leq 0$ on $U_{T}$ . Suppose $U \subset R^{n}$ is open, connected, and bounded. If $u$ attains its maximum at an interior space-time point $(x_{0}, t_{0}) \in U \times (0, T]$ , then $u$ is constant on $\overset{ˉ}{U} \times [0, t_{0}]$ . The strong maximum principle sharpens the weak maximum principle of Exercise 7 in two ways. First, it rules out interior maxima even at the top time $t_{0} = T$ (whereas the weak version allows the max on the closed parabolic boundary, which includes the top face). Second, it propagates the maximum backward in time: if the max is attained at $(x_{0}, t_{0})$ , then it equals the same value everywhere in the parabolic cylinder up to time $t_{0}$ . The mechanism is the Hopf-type lemma for parabolic equations and a connectedness argument.

Theorem 4 (Tikhonov non-uniqueness counterexample; Tikhonov 1935). Without a growth restriction, the Cauchy problem for the heat equation on $R$ has infinitely many solutions. Concretely, define $ψ (t) = {e^{- 1/ t^{2}}, 0, t > 0, t \leq 0,$ and consider the formal power series $u (x, t) = k = 0 \sum \infty \frac{ψ ^{(k)} ( t )}{( 2 k )!} x^{2 k} .$ Tikhonov's 1935 Mat. Sbornik paper ^{[Tikhonov 1935]} proved that this series converges for every $(x, t) \in R \times R$ , defines a $C^{\infty}$ function on $R \times R$ , satisfies $u_{t} - u_{xx} = 0$ on $R \times R$ (away from $t = 0$ , where $u$ is identically zero), and vanishes identically at $t = 0$ . So $u$ is a non-zero solution of the Cauchy problem with zero initial data. The non-uniqueness reflects the super-exponential growth in $x$ : $∣ u (x, t) ∣ \sim e^{∣ x ∣/ (2 t)}$ schematically, growing faster than any Gaussian as $∣ x ∣ \to \infty$ . The Widder uniqueness theorem (next item) restores uniqueness under a Gaussian-growth restriction.

Theorem 5 (Widder uniqueness; Widder 1944). Let $u : R^{n} \times [0, T] \to R$ be $C^{2, 1}$ on $R^{n} \times (0, T]$ , continuous on $R^{n} \times [0, T]$ , and satisfy $u_{t} - Δ u = 0$ on $R^{n} \times (0, T]$ with $u (\cdot, 0) = 0$ on $R^{n}$ . Assume the growth condition $∣ u (x, t) ∣ \leq C e^{a ∣ x ∣^{2}} for some a < \frac{1}{4 T}, C > 0.$ Then $u \equiv 0$ on $R^{n} \times [0, T]$ ^{[Widder 1944]}. Widder's positive-temperature theorem identifies the critical growth rate $a = 1/ (4 T)$ : solutions growing faster than $e^{∣ x ∣^{2} / (4 T)}$ can fail to be unique (Tikhonov's example saturates this rate); solutions growing strictly slower are unique. The proof uses the Phragmén-Lindelöf principle for parabolic equations, comparing the candidate solution to the explicit Gaussian growth bound. Widder's 1944 paper handled the positive-data case; the symmetric two-sided result is a standard extension.

Theorem 6 (backward uniqueness; Lions-Malgrange 1960). Let $u, v$ be two classical solutions of the heat equation on $R^{n} \times (0, T]$ with appropriate growth conditions. If $u (x, T) = v (x, T)$ for every $x \in R^{n}$ , then $u \equiv v$ on $R^{n} \times (0, T]$ (and on the closure if both extend continuously). The backward uniqueness theorem says that distinct trajectories of the heat semigroup cannot converge to the same final state: the forward evolution, though irreversible in the sense that the inverse problem is ill-posed, is injective on its image. The ill-posedness is in stability (the inverse depends discontinuously on the data), not in uniqueness.

The backward heat equation $u_{t} + Δ u = 0$ , which is what one gets by trying to evolve $- Δ$ forward in time, is ill-posed in the sense of Hadamard: arbitrarily small final data can produce arbitrarily large $L^{\infty}$ norms at earlier times. Concretely, the function $u_{n} (x, t) = e^{- n (T - t)} sin (n \cdot x)$ solves $u_{t} + u_{xx} = 0$ on $R \times [0, T]$ , has final data $u_{n} (x, T) = sin (n \cdot x)$ uniformly bounded by $1$ , but earlier data $u_{n} (x, 0) = e^{- n T} sin (n \cdot x)$ has supremum bounded by $e^{- n T}$ (which is small for large $n$ ). The issue is that small perturbations in high-frequency modes amplify under backward evolution. The ill-posedness of the backward heat equation underlies the regularisation theory of inverse problems (Tikhonov regularisation, named for the same Tikhonov who produced the non-uniqueness counterexample) and the engineering challenges of inverse-conduction problems (reconstructing initial temperature from a present-time measurement).

Theorem 7 (parabolic Harnack inequality; Moser 1964). Let $u : R^{n} \times (0, T) \to R$ be a non-negative solution of the heat equation. Fix a space-time ball $Q = B_{r} (x_{0}) \times [t_{0} - r^{2}, t_{0}]$ (with appropriate scaling so that the spatial and temporal extents match the parabolic scaling $x \sim t$ ). Then there exists a constant $C (n) > 0$ such that for every $(x, t) \in B_{r /2} (x_{0}) \times [t_{0} - 3 r^{2} /4, t_{0} - r^{2} /4]$ : $(y, s) \in B_{r /2} (x_{0}) \times [t_{0} - 3 r^{2} /4, t_{0} - r^{2} /2] sup u (y, s) \leq C (y, s) \in B_{r /2} (x_{0}) \times [t_{0} - r^{2} /4, t_{0}] in f u (y, s) .$ The parabolic Harnack inequality says: in any parabolic-scaled cylinder, a non-negative solution's maximum on the lower half-cylinder is controlled by its minimum on the upper half-cylinder. Moser's 1964 Comm. Pure Appl. Math. paper ^{[Moser 1964]} proved the inequality by an iteration scheme that converts $L^{2}$ energy estimates into $L^{\infty}$ control. The inequality is the parabolic analogue of the classical elliptic Harnack inequality and underlies the Nash-De Giorgi-Moser regularity theory for divergence-form equations.

Theorem 8 (Aronson Gaussian bounds; Aronson 1967). For any divergence-form parabolic operator $L = \partial_{t} - \sum_{i, j} \partial_{i} (a_{ij} (x, t) \partial_{j})$ with bounded measurable coefficients $a_{ij}$ satisfying uniform ellipticity $λ I \leq (a_{ij}) \leq Λ I$ , the fundamental solution $Γ_{L} (x, t; y, s)$ satisfies two-sided Gaussian bounds $\frac{c _{1}}{( t - s ) ^{n /2}} e^{- c_{2} ∣ x - y ∣^{2} / (t - s)} \leq Γ_{L} (x, t; y, s) \leq \frac{c _{3}}{( t - s ) ^{n /2}} e^{- c_{4} ∣ x - y ∣^{2} / (t - s)}$ for $t > s$ , where the constants $c_{1}, c_{2}, c_{3}, c_{4} > 0$ depend only on $n, λ, Λ$ ^{[Aronson 1967]}. The Aronson bounds say that even for non-smooth, non-symmetric, time-dependent coefficient matrices, the fundamental solution behaves like a Gaussian with explicit constants. The result is the parabolic analogue of the elliptic Green-function pointwise bounds, and is the keystone of modern parabolic regularity theory and stochastic analysis with rough coefficients.

Theorem 9 (Brownian motion connection; Wiener 1923, Einstein 1905, Kac 1949). Let $B_{t}$ be standard $n$ -dimensional Brownian motion on a probability space, with $B_{0} = 0$ . Then the heat kernel is the transition density of $B_{t}$ : $P (B_{t} \in A) = \int_{A} Φ (x, t) d x for every Borel set A \subseteq R^{n},$ and more generally $Φ (x - y, t)$ is the density of the random variable $B_{t}$ conditioned on $B_{0} = y$ . The heat equation appears as the Kolmogorov forward equation of Brownian motion: the density $p (x, t)$ of $B_{t}$ satisfies $p_{t} = \frac{1}{2} Δ p$ with $p (x, 0) = δ_{0}$ (the factor $1/2$ is the standard probabilistic normalisation; the analyst's heat kernel uses the normalisation $p_{t} = Δ p$ , which corresponds to Brownian motion with variance $2 t$ instead of $t$ ). The Brownian-motion connection was identified by Einstein 1905 Ann. Phys. 17 ^{[Einstein 1905]} (heuristic diffusion-equation derivation from molecular kinetic theory) and rigorously formalised by Wiener 1923 J. Math. Phys. 2 ^{[Wiener 1923]} (Wiener measure on path space).

Theorem 10 (Feynman-Kac formula; Kac 1949). Let $V : R^{n} \to R$ be a continuous bounded function (or measurable with appropriate decay), and let $u (x, t)$ solve the heat equation with potential: $u_{t} - Δ u + V u = 0 on R^{n} \times (0, \infty), u (\cdot, 0) = g .$ Then for every $x \in R^{n}$ and $t > 0$ : $u (x, t) = E^{x} [g (B_{t}) exp (- \int_{0}^{t} V (B_{s}) d s)],$ where $E^{x}$ denotes expectation with respect to Brownian motion started at $x$ ^{[Kac 1949]}. The Feynman-Kac formula identifies the solution operator of the heat equation with a potential as the path integral of an exponential weight against Brownian paths, and is the bridge between PDE theory and stochastic analysis. Wick-rotating $t \to i t$ (replacing imaginary time by real time) converts the heat equation with potential into the Schrödinger equation with the same potential, and the Feynman-Kac formula becomes Feynman's path-integral formula for quantum mechanics (Feynman 1948 Rev. Mod. Phys. 20). The bridge is the most striking instance of the deep mathematical similarity between diffusion and quantum mechanics.

Theorem 11 (Nash-De Giorgi regularity; Nash 1958, De Giorgi 1957). For a divergence-form parabolic operator $L = \partial_{t} - \sum_{i, j} \partial_{i} (a_{ij} (x, t) \partial_{j})$ with bounded measurable coefficients satisfying uniform ellipticity, every weak solution $u$ of $Lu = 0$ on a parabolic cylinder is Hölder continuous, with Hölder exponent and norm depending only on $n, λ, Λ$ ^{[Nash 1958]} ^{[De Giorgi 1957]}. The Nash-De Giorgi theorem is the foundational regularity result for second-order divergence-form equations with rough coefficients. Nash 1958 and De Giorgi 1957 proved the theorem independently, with very different methods: Nash by a probabilistic argument tracking entropy along Brownian motion, De Giorgi by an iteration scheme converting $L^{2}$ control into Hölder control. Moser 1964 unified the two approaches via the Harnack inequality (Theorem 7 above). The Nash-De Giorgi-Moser apparatus is the foundation of modern PDE regularity theory: it provides Hölder continuity for solutions of divergence-form equations without any smoothness assumption on the coefficients, and is the entry point to the modern theory of elliptic and parabolic equations on Riemannian manifolds (Saloff-Coste, Grigor'yan), to homogenisation theory, and to the theory of stochastic differential equations with non-smooth drift.

Synthesis. The heat equation is the prototype parabolic equation, and its solution apparatus (heat kernel, convolution formula, Duhamel principle, maximum principle, energy method, Harnack inequality, Brownian-motion identification) is the prototype of every parabolic regularity and existence theory. The pattern recurs in three main escalations. First, replace the Laplacian by a divergence-form operator with rough coefficients: the convolution formula is replaced by a Green-function representation with Aronson Gaussian bounds, the smoothing property becomes Nash-De Giorgi-Moser Hölder regularity, and the Harnack inequality becomes the centrepiece of the modern theory. Second, replace the Euclidean background by a Riemannian manifold: the heat kernel becomes the Riemannian heat kernel, with Li-Yau gradient estimates replacing the explicit Gaussian formula, and the long-time behaviour becomes a probe of the underlying geometry (Hamilton's Ricci flow, Perelman's $L$ -functional, the Atiyah-Singer index theorem via heat-kernel asymptotics). Third, replace the linear equation by a nonlinear equation: the porous-medium equation, the $p$ -Laplacian, the Hele-Shaw flow, the Stefan problem, and ultimately mean-curvature flow and Ricci flow all build on the linear-heat-equation apparatus as their starting point.

The probabilistic side of the equation has been equally fertile. The identification of the heat kernel with Brownian motion's transition density, due to Einstein and Wiener, opened the modern theory of stochastic processes. The Feynman-Kac formula, due to Kac, is the bridge to quantum mechanics via Wick rotation. The Itô-Stratonovich calculus extends the framework to stochastic differential equations with drift and diffusion. The Malliavin calculus, the Wiener chaos decomposition, and the modern theory of large deviations (Freidlin-Wentzell, Varadhan) all build on the foundational identification of the heat semigroup with Brownian motion.

The conceptual closure is the recognition that the heat equation packages five distinct mathematical phenomena into a single equation: the spectral decomposition of the Laplacian (Fourier 1822, Sturm-Liouville 1836), the smoothing property of irreversible time evolution, the probabilistic structure of Brownian motion, the analytic continuation to the Schrödinger equation by Wick rotation, and the foundational example of a parabolic PDE underlying modern regularity theory. The arc from Fourier's 1822 Théorie analytique de la chaleur to modern Ricci-flow theory is a two-century lineage in which the same equation, the same heat kernel, and the same convolution recipe have been continuously refined into ever more general and ever more powerful tools.

Full proof set Master

Proposition 1 (heat kernel via Fourier transform). For $t > 0$ and $x \in R^{n}$ , $Φ (x, t) = (2 π)^{- n} \int_{R^{n}} e^{i x \cdot ξ - t ∣ ξ ∣^{2}} d ξ$ .

Proof. The Fourier transform of a Gaussian is a Gaussian. Specifically, the standard $n$ -dimensional Gaussian integral with linear shift is $\int_{R^{n}} e^{- a ∣ ξ ∣^{2} + i x \cdot ξ} d ξ = (\frac{π}{a})^{n /2} e^{- ∣ x ∣^{2} / (4 a)}, a > 0.$ The proof factors the integral over coordinates (since both the exponential of a sum-of-squares and the inner product decompose into one-dimensional pieces) and reduces to the one-dimensional identity $\int_{R} e^{- a ξ^{2} + i x ξ} d ξ = π / a \cdot e^{- x^{2} / (4 a)}$ , which follows from completing the square and shifting the contour of integration.

Set $a = t$ to get $\int_{R^{n}} e^{- t ∣ ξ ∣^{2} + i x \cdot ξ} d ξ = (π / t)^{n /2} e^{- ∣ x ∣^{2} / (4 t)}$ . Multiply both sides by $(2 π)^{- n}$ : $\frac{1}{( 2 π ) ^{n}} \int_{R^{n}} e^{i x \cdot ξ - t ∣ ξ ∣^{2}} d ξ = \frac{1}{( 2 π ) ^{n}} (π / t)^{n /2} e^{- ∣ x ∣^{2} / (4 t)} = \frac{1}{( 4 π t ) ^{n /2}} e^{- ∣ x ∣^{2} / (4 t)} = Φ (x, t) .$

The intermediate identity $(2 π)^{- n} (π / t)^{n /2} = (4 π t)^{- n /2}$ follows from algebraic simplification: $(2 π)^{- n} (π / t)^{n /2} = 2^{- n} π^{- n} π^{n /2} t^{- n /2} = 2^{- n} π^{- n /2} t^{- n /2} = (2^{2} π t)^{- n /2} = (4 π t)^{- n /2}$ . $□$

Proposition 2 (Duhamel principle). Let $g \in C_{b} (R^{n})$ and $f \in C (R^{n} \times [0, T])$ with $f$ bounded and locally Hölder-continuous in space uniformly in time. Then the function $u (x, t) = \int_{R^{n}} Φ (x - y, t) g (y) d y + \int_{0}^{t} \int_{R^{n}} Φ (x - y, t - s) f (y, s) d y d s$ is in $C^{2, 1} (R^{n} \times (0, T])$ and solves $u_{t} - Δ u = f$ on $R^{n} \times (0, T)$ with $u (\cdot, 0) = g$ on $R^{n}$ .

Proof. Write $u = u_{0} + u_{f}$ where $u_{0} (x, t) = \int_{R^{n}} Φ (x - y, t) g (y) d y, u_{f} (x, t) = \int_{0}^{t} \int_{R^{n}} Φ (x - y, t - s) f (y, s) d y d s .$ By the Key Theorem of the Intermediate tier, $u_{0} \in C^{\infty} (R^{n} \times (0, \infty))$ solves $u_{0, t} - Δ u_{0} = 0$ on $R^{n} \times (0, \infty)$ with $u_{0} (\cdot, 0) = g$ . It suffices to show $u_{f}$ is in $C^{2, 1}$ and solves $u_{f, t} - Δ u_{f} = f$ on $R^{n} \times (0, T)$ with $u_{f} (\cdot, 0) = 0$ .

The convergence to zero at $t = 0$ is immediate: $∣ u_{f} (x, t) ∣ \leq t \cdot ∥ f ∥_{\infty}$ by the unit-mass property of $Φ$ , so $u_{f} (\cdot, t) \to 0$ uniformly as $t \to 0$ .

For the PDE, compute the time derivative. Change variables in the time integral via $r = t - s$ (so $r$ runs from $t$ down to $0$ ): $u_{f} (x, t) = \int_{0}^{t} \int_{R^{n}} Φ (x - y, t - s) f (y, s) d y d s = \int_{0}^{t} \int_{R^{n}} Φ (x - y, r) f (y, t - r) d y d r .$

Take $\partial_{t}$ . The integrand depends on $t$ both through the upper limit and through $f (y, t - r)$ : $u_{f, t} (x, t) = \int_{R^{n}} Φ (x - y, t) f (y, 0) d y + \int_{0}^{t} \int_{R^{n}} Φ (x - y, r) f_{t} (y, t - r) d y d r,$ provided $f$ is $C^{1}$ in $t$ (this is the cleanest case; for merely continuous $f$ with Hölder regularity in $x$ , an integration-by-parts argument shifts the time derivative onto $Φ$ instead).

The cleaner derivation: $u_{f, t} (x, t) - Δ u_{f} (x, t) = ?$ . Compute $Δ u_{f} (x, t) = \int_{0}^{t} \int_{R^{n}} Δ_{x} Φ (x - y, t - s) f (y, s) d y d s,$ where the differentiation under the integral is justified by the Gaussian decay of $Φ$ (the singularity at $s = t$ is controlled by the Hölder regularity of $f$ ; a standard cutoff and limiting argument handles the diagonal behaviour). And $u_{f, t} (x, t) = ε \to 0^{+} lim \int_{R^{n}} Φ (x - y, ε) f (y, t - ε) d y + \int_{0}^{t - ε} \int_{R^{n}} Φ_{t} (x - y, t - s) f (y, s) d y d s$ splits the integral into a near-the-diagonal piece (which captures $f (x, t)$ in the limit $ε \to 0$ , via the same approximation-to-the-identity argument as Step 4 of the Key Theorem proof) and a far-from-the-diagonal piece (where $Φ_{t} = Δ_{x} Φ$ , by Step 2 of the Key Theorem proof, so the integrand cancels against the corresponding piece of $Δ u_{f}$ ).

Concretely: $u_{f, t} (x, t) - Δ u_{f} (x, t) = ε \to 0^{+} lim \int_{R^{n}} Φ (x - y, ε) f (y, t - ε) d y = f (x, t),$ where the last limit uses the approximation-to-the-identity property of $Φ (\cdot, ε)$ and the continuity of $f$ at $(x, t)$ .

So $u_{f, t} - Δ u_{f} = f$ on $R^{n} \times (0, T)$ .

$□$

Proposition 3 (Tikhonov non-uniqueness, sketch). Define $ψ (t) = e^{- 1/ t^{2}}$ for $t > 0$ and $ψ (t) = 0$ for $t \leq 0$ . Then the series $u (x, t) = k = 0 \sum \infty \frac{ψ ^{(k)} ( t )}{( 2 k )!} x^{2 k}$ converges absolutely and uniformly on every compact subset of $R \times R$ , defines a $C^{\infty}$ function $u$ , satisfies $u_{t} = u_{xx}$ on $R \times R$ , and vanishes identically on $R \times (- \infty, 0]$ (in particular $u (\cdot, 0) = 0$ ). The function $u$ is not identically zero: $u (x, t) \neq = 0$ for every $t > 0$ and almost every $x \neq = 0$ .

Proof sketch. The function $ψ$ is in $C^{\infty} (R)$ , with all derivatives vanishing at $t = 0$ (this is the standard test for non-analytic $C^{\infty}$ functions). The bound $∣ ψ^{(k)} (t) ∣ \leq C_{k}$ on bounded intervals, with $C_{k}$ growing factorially in $k$ but slower than $(2 k)!$ , gives absolute convergence of the series on bounded space-time domains.

Term-by-term verification of $u_{t} = u_{xx}$ : $\partial_{t} [ψ^{(k)} (t) x^{2 k} / (2 k)!] = ψ^{(k + 1)} (t) x^{2 k} / (2 k)!$ , and $\partial_{x}^{2} [ψ^{(k)} (t) x^{2 k} / (2 k)!] = ψ^{(k)} (t) (2 k) (2 k - 1) x^{2 k - 2} / (2 k)! = ψ^{(k)} (t) x^{2 k - 2} / (2 k - 2)!$ . Reindexing $k \to k + 1$ in the second sum gives $ψ^{(k + 1)} (t) x^{2 k} / (2 k)!$ , matching the time derivative term-by-term.

That $u$ is not identically zero requires a more delicate argument: one shows that for any fixed $t > 0$ , the function $x \mapsto u (x, t)$ has a non-zero power-series expansion at $x = 0$ , so cannot be identically zero by analyticity in $x$ . (Indeed, $u (0, t) = ψ (t) \neq = 0$ for $t > 0$ , so $u$ is non-zero at every $(0, t)$ with $t > 0$ .) The detailed verification is in Tikhonov 1935 Mat. Sbornik 42, 199-216 ^{[Tikhonov 1935]} and in John 1982 PDE 4e ^{[John 1982]} §7. $□$

Proposition 4 (energy decay for the linear heat equation). Let $u : R^{n} \times (0, T] \to R$ solve $u_{t} - Δ u = 0$ with $u (\cdot, 0) = g \in L^{2} (R^{n})$ . Then $∥ u (\cdot, t) ∥_{L^{2}}^{2}$ is non-increasing in $t$ .

Proof. Multiply the equation by $u$ and integrate over $R^{n}$ : $\int_{R^{n}} u u_{t} d x = \int_{R^{n}} u Δ u d x .$ The left side is $\frac{1}{2} \frac{d}{d t} \int_{R^{n}} u^{2} d x$ (assuming sufficient decay to interchange the derivative and the integral, which holds for $g \in L^{2}$ since the solution decays at infinity by the Gaussian heat kernel). The right side is $- \int_{R^{n}} ∣\nabla u ∣^{2} d x$ (integration by parts, with vanishing boundary terms at infinity by the decay). So $\frac{1}{2} \frac{d}{d t} ∥ u (\cdot, t) ∥_{L^{2}}^{2} = - ∥\nabla u (\cdot, t) ∥_{L^{2}}^{2} \leq 0.$

So $∥ u (\cdot, t) ∥_{L^{2}}^{2}$ is non-increasing in $t$ , and Plancherel together with the explicit Fourier-side semigroup $e^{- t ∣ ξ ∣^{2}}$ gives the sharper rate $∥ u (\cdot, t) ∥_{L^{2}}^{2} = \int_{R^{n}} e^{- 2 t ∣ ξ ∣^{2}} ∣ \overset{g}{^} (ξ) ∣^{2} d ξ,$ which decays in $t$ at a rate determined by the frequency content of $g$ near the origin. $□$

Connections Master

Laplace equation and harmonic functions 02.13.01. The steady-state limit of the heat equation. Setting $u_{t} = 0$ in $u_{t} - Δ u = f$ gives the Poisson equation $- Δ u = f$ (next item); setting $u_{t} = 0$ and $f = 0$ gives the Laplace equation $Δ u = 0$ . The heat equation's long-time behaviour on bounded domains with time-independent boundary data is convergence to the steady-state harmonic-function solution: $u (x, t) \to u_{\infty} (x)$ exponentially fast as $t \to \infty$ , with $u_{\infty}$ harmonic and satisfying the same boundary data. The heat equation is the parabolic relaxation toward elliptic equilibrium.
Poisson equation and Newtonian potential 02.13.02. The elliptic precursor of the heat equation. The Poisson equation $- Δ u = f$ is the steady state of the inhomogeneous heat equation with a time-independent source. The Newtonian potential $Φ_{Poisson} * f$ is the elliptic analogue of the Duhamel formula, and the elliptic fundamental solution is the long-time integral of the parabolic heat kernel: $\int_{0}^{\infty} Φ_{heat} (x, t) d t = (const) Φ_{Poisson} (x)$ when the integral converges (in dimension $n \geq 3$ ). The parabolic heat kernel is the time-resolved version of the elliptic fundamental solution.
Lebesgue integral and dominated convergence 02.07.04. Supplies the integration framework on which the heat kernel apparatus rests. The convolution $Φ (\cdot, t) * g$ is a Lebesgue integral, and the differentiation-under-the-integral arguments of the Key Theorem proof rely on the dominated convergence theorem to pass derivatives through the integral. The $L^{p}$ -semigroup theory of the heat semigroup, the energy method for uniqueness, and the Brownian-motion picture all assume the full Lebesgue-integration apparatus.
Wave equation 02.13.04. The hyperbolic cousin of the heat equation. The wave equation $u_{tt} - Δ u = 0$ has finite propagation speed (signals travel at unit speed), conserves an energy $\int (u_{t}^{2} + ∣\nabla u ∣^{2})$ , and does not smooth (initial regularity is preserved). The heat equation has infinite propagation speed, dissipates energy (the $L^{2}$ norm decays), and smooths arbitrarily (any $L^{p}$ data become real-analytic at any positive time). The two equations are the prototypes of parabolic and hyperbolic theory respectively, and many problems of mathematical physics combine both (the telegrapher's equation $u_{tt} + 2 α u_{t} - Δ u = 0$ interpolates between them; the Klein-Gordon equation adds a potential to the wave equation).
Fourier series and Fourier transform [02.10]. The diagonalisation framework for the heat equation. The Fourier transform diagonalises the spatial Laplacian: $Δ u (ξ) = - ∣ ξ ∣^{2} \overset{u}{^} (ξ)$ . The heat equation becomes the algebraic ODE $\overset{u}{^}_{t} = - ∣ ξ ∣^{2} \overset{u}{^}$ with solution $\overset{u}{^} (ξ, t) = e^{- t ∣ ξ ∣^{2}} \overset{g}{^} (ξ)$ . Inverse-Fourier-transforming recovers the convolution formula. The Fourier-series version (for the heat equation on a bounded interval or ring) was Fourier's original 1822 method and the historical motivation for the entire Fourier-analytic apparatus.
Separation of variables 02.13.07. The bounded-domain analogue of the Fourier-transform method. For the heat equation on a bounded interval $[0, L]$ with Dirichlet boundary conditions $u (0, t) = u (L, t) = 0$ , the eigenfunctions of $- Δ$ are $sin (k π x / L)$ for $k = 1, 2, 3, \dots$ , with eigenvalues $(k π / L)^{2}$ . The solution of the heat equation with initial data $g$ is $u (x, t) = \sum_{k} a_{k} e^{- (k π / L)^{2} t} sin (k π x / L)$ where $a_{k} = (2/ L) \int_{0}^{L} g (x) sin (k π x / L) d x$ are the Fourier sine coefficients. The Sturm-Liouville theory (Sturm 1836, Liouville 1836) ^{[Sturm 1836]} ^{[Liouville 1836]} generalises the spectral decomposition to second-order ODEs with variable coefficients, providing the foundational tool for heat-equation problems on bounded domains with general geometry or variable material coefficients.
Schrödinger equation and Wick rotation [12.04]. The quantum-mechanical analogue of the heat equation, related by analytic continuation in time. The free Schrödinger equation $i u_{t} = - Δ u$ (with $ℏ = 1$ , $m = 1/2$ ) becomes the heat equation under the substitution $t \to - i t$ . The free-particle propagator in quantum mechanics is the analytic continuation of the heat kernel: $K (x, y; t) = (4 π i t)^{- n /2} e^{i ∣ x - y ∣^{2} / (4 t)}$ . The Feynman-Kac formula on the heat-equation side becomes the Feynman path integral on the Schrödinger-equation side. The bridge is the deep mathematical similarity between diffusion and quantum mechanics, with the imaginary-time analytic continuation as the bridge.
Brownian motion and stochastic processes [11.05]. The probabilistic interpretation of the heat equation. The heat kernel $Φ (x - y, t)$ is the transition density of $n$ -dimensional Brownian motion: the probability that a Brownian particle starting at $y$ is at $x$ at time $t$ is $Φ (x - y, t) d x$ . The heat equation is the Kolmogorov forward equation for Brownian motion. Diffusion processes with drift and variable diffusion coefficients satisfy more general parabolic equations of the form $u_{t} - div (A \nabla u) - b \cdot \nabla u - c u = 0$ , with the Aronson Gaussian bounds (Theorem 8) connecting the parabolic-equation theory to the underlying stochastic differential equation.
Thermodynamics and Brownian motion [11.01]. The physical origin of the heat equation. Fourier's 1822 derivation of the heat equation modelled the diffusion of thermal energy in a solid, with the diffusion constant determined by the thermal conductivity, specific heat, and density of the material. Einstein's 1905 paper ^{[Einstein 1905]} derived the heat equation for the position density of a Brownian particle in a fluid, with the diffusion constant given by the Stokes-Einstein relation $D = k_{B} T / (6 π η r)$ (with $k_{B}$ Boltzmann's constant, $T$ absolute temperature, $η$ fluid viscosity, $r$ particle radius). The Brownian-motion-derived diffusion constant gave the first measurement of Avogadro's number and confirmed the atomic-molecular picture of matter.

Historical & philosophical context Master

Fourier's 1822 Théorie analytique de la chaleur ^{[Fourier 1822]} is the founding document of the modern theory of partial differential equations. Fourier derived the heat equation from a physical model of heat conduction in a solid, introducing the now-standard idea that the flux of heat at a point is proportional to the negative gradient of the temperature (Fourier's law of heat conduction). The conservation of energy then gives the local PDE $\partial_{t} u = k Δ u$ for a homogeneous isotropic material with thermal diffusivity $k$ . Fourier solved the equation by his eponymous method: decompose the temperature into eigenmodes of the spatial Laplacian (sines and cosines on a bounded interval; complex exponentials in the integral version on the whole line), evolve each mode independently as a damped exponential in time, and reassemble the solution by superposition. The Fourier series and Fourier integral were both invented for this purpose; their applications to other branches of mathematics (number theory, harmonic analysis, quantum mechanics, signal processing) came later.

Sturm and Liouville's 1836 papers ^{[Sturm 1836]} ^{[Liouville 1836]} in the Journal de Mathématiques Pures et Appliquées gave the first systematic treatment of the eigenvalue problem for second-order linear ODEs with variable coefficients, the Sturm-Liouville theory. Their motivation was the heat equation on bounded domains with general geometry: separation of variables reduces the PDE to an ODE eigenvalue problem, and the Sturm-Liouville theory guarantees the existence of a complete orthogonal basis of eigenfunctions in $L^{2}$ , generalising the Fourier-sine basis on the interval to arbitrary Sturm-Liouville problems. The theorem on monotone interlacing of eigenvalues (zeros of consecutive eigenfunctions interlace) is one of the foundational results of spectral theory.

Tikhonov's 1935 Matematicheskii Sbornik paper ^{[Tikhonov 1935]} gave the explicit non-uniqueness counterexample for the Cauchy problem without growth restrictions. Tikhonov's example was a landmark: it showed that the natural-looking PDE $u_{t} - u_{xx} = 0$ on $R \times (- \infty, \infty)$ with zero initial data has infinitely many smooth solutions if one drops the boundedness assumption. The example exhibits the precise growth rate at which uniqueness fails: super-exponential in $∣ x ∣$ for any fixed $t > 0$ . Widder's 1944 Trans. Amer. Math. Soc. paper ^{[Widder 1944]} proved the matching uniqueness theorem: uniqueness holds for solutions in the growth class $∣ u ∣ \leq C e^{a ∣ x ∣^{2}}$ with $a < 1/ (4 T)$ . The Tikhonov-Widder pair frames the modern understanding of well-posedness for the heat-equation Cauchy problem: the growth class matters, the critical rate is Gaussian with width $T$ , and the boundedness assumption that mathematicians take for granted is not gratuitous but a precise technical hypothesis at the edge of the uniqueness theorem.

Einstein's 1905 Annalen der Physik paper on Brownian motion ^{[Einstein 1905]} showed that the heat equation governs the position density of a Brownian particle suspended in a fluid. Einstein derived the diffusion constant from kinetic theory and the equipartition of energy, giving the Stokes-Einstein relation $D = k_{B} T / (6 π η r)$ . Jean Perrin's experimental verification (1909, Nobel Prize 1926) of Einstein's predictions for Brownian motion gave the first direct measurement of Avogadro's number and was the decisive confirmation of the atomic-molecular theory of matter. The heat equation, originally a tool for predicting how heat diffuses in iron rods, turned out to be the precise mathematical statement of the existence of atoms.

Wiener's 1923 Journal of Mathematics and Physics paper ^{[Wiener 1923]} gave the first rigorous construction of Brownian motion as a stochastic process on the path space of continuous functions $C ([0, T]; R^{n})$ . Wiener constructed what is now called Wiener measure on this path space: a probability measure on continuous paths starting at the origin, with the property that the finite-dimensional marginal distributions are exactly those of an $n$ -dimensional Brownian motion. The heat kernel appears as the transition density of the process. Wiener's construction was the foundation of modern stochastic analysis and the bridge between probability theory and PDE theory.

Kac's 1949 Transactions of the American Mathematical Society paper ^{[Kac 1949]} gave the Feynman-Kac formula representing solutions of the heat equation with potential as path integrals against Wiener measure. The formula is the probabilistic analogue of the variation-of-parameters formula in ODE theory, and is the bridge between the analyst's view of the heat equation (solve via Fourier transform and convolution) and the probabilist's view (compute the expected value of a functional of Brownian paths). The Wick rotation $t \to i t$ converts the Feynman-Kac formula into Feynman's path-integral formula for quantum mechanics (Feynman 1948), making the heat equation and the Schrödinger equation two analytic-continuation versions of the same mathematical structure.

Nash 1958 American Journal of Mathematics ^{[Nash 1958]} and De Giorgi 1957 Mem. Accad. Sci. Torino ^{[De Giorgi 1957]} independently proved the Hölder regularity of solutions of divergence-form parabolic equations with bounded measurable coefficients, the foundational result of modern PDE regularity theory. Nash's method was a probabilistic argument tracking entropy along Brownian motion; De Giorgi's was a deterministic iteration scheme converting $L^{2}$ control into Hölder control. Moser 1964 ^{[Moser 1964]} unified the two approaches via the parabolic Harnack inequality. The Nash-De Giorgi-Moser apparatus is the keystone of the modern theory of partial differential equations with rough coefficients, underlying the theory of homogenisation, the modern theory of stochastic differential equations with non-smooth coefficients, and the Cheeger-Gromov theory of geometric analysis on metric-measure spaces.

Aronson's 1967 Bulletin of the AMS paper ^{[Aronson 1967]} proved the two-sided Gaussian bounds on the fundamental solution of divergence-form parabolic equations with bounded measurable coefficients, sharpening the Nash-De Giorgi-Moser regularity into pointwise estimates. The Aronson bounds say that the fundamental solution of an arbitrary uniformly-elliptic parabolic equation with rough coefficients behaves like the heat kernel, up to multiplicative constants depending only on the ellipticity bounds. The result is the parabolic analogue of the Green-function pointwise bounds for the Laplacian and is the foundational tool of modern parabolic theory.

The heat equation now appears as the foundational example in essentially every textbook of partial differential equations and mathematical physics, and the heat-kernel apparatus underlies a vast range of modern mathematics: spectral geometry (heat-kernel expansions on Riemannian manifolds; Atiyah-Singer index theorem), geometric flows (Hamilton's Ricci flow; Perelman's resolution of the Poincaré conjecture), stochastic analysis (Itô's stochastic calculus; the Malliavin calculus; large deviations theory), quantum field theory (Wick rotation and the Schwinger-DeWitt expansion), and computational physics (finite-difference and finite-element methods for diffusion problems in engineering and science). The arc from Fourier's 1822 Théorie analytique de la chaleur to modern Ricci-flow theory is a two-century lineage in which the same equation, the same heat kernel, and the same convolution recipe have been continuously refined into ever more general and ever more powerful tools.

Bibliography Master

@book{Fourier1822,
  author    = {Fourier, Jean Baptiste Joseph},
  title     = {Th\'eorie analytique de la chaleur},
  publisher = {Firmin Didot},
  address   = {Paris},
  year      = {1822}
}

@article{Sturm1836,
  author  = {Sturm, Charles},
  title   = {M\'emoire sur les \'equations diff\'erentielles lin\'eaires du second ordre},
  journal = {Journal de Math\'ematiques Pures et Appliqu\'ees},
  volume  = {1},
  year    = {1836},
  pages   = {106--186}
}

@article{Liouville1836,
  author  = {Liouville, Joseph},
  title   = {Sur le d\'eveloppement des fonctions ou parties de fonctions en s\'eries},
  journal = {Journal de Math\'ematiques Pures et Appliqu\'ees},
  volume  = {1},
  year    = {1836},
  pages   = {253--265}
}

@article{Tikhonov1935,
  author  = {Tikhonov, Andrey N.},
  title   = {Th\'eor\`emes d'unicit\'e pour l'\'equation de la chaleur},
  journal = {Matematicheskii Sbornik (N.S.)},
  volume  = {42},
  year    = {1935},
  pages   = {199--216}
}

@article{Widder1944,
  author  = {Widder, David V.},
  title   = {Positive temperatures on an infinite rod},
  journal = {Transactions of the American Mathematical Society},
  volume  = {55},
  year    = {1944},
  pages   = {85--95}
}

@article{Nash1958,
  author  = {Nash, John},
  title   = {Continuity of solutions of parabolic and elliptic equations},
  journal = {American Journal of Mathematics},
  volume  = {80},
  year    = {1958},
  pages   = {931--954}
}

@article{DeGiorgi1957,
  author  = {De Giorgi, Ennio},
  title   = {Sulla differenziabilit\`a e l'analiticit\`a delle estremali degli integrali multipli regolari},
  journal = {Memorie dell'Accademia delle Scienze di Torino. Classe di Scienze Fisiche, Matematiche e Naturali. Serie 3},
  volume  = {3},
  year    = {1957},
  pages   = {25--43}
}

@article{Moser1964,
  author  = {Moser, J\"urgen},
  title   = {A {H}arnack inequality for parabolic differential equations},
  journal = {Communications on Pure and Applied Mathematics},
  volume  = {17},
  year    = {1964},
  pages   = {101--134}
}

@article{Aronson1967,
  author  = {Aronson, Donald G.},
  title   = {Bounds for the fundamental solution of a parabolic equation},
  journal = {Bulletin of the American Mathematical Society},
  volume  = {73},
  year    = {1967},
  pages   = {890--896}
}

@article{Einstein1905,
  author  = {Einstein, Albert},
  title   = {\"Uber die von der molekularkinetischen Theorie der W\"arme geforderte Bewegung von in ruhenden Fl\"ussigkeiten suspendierten Teilchen},
  journal = {Annalen der Physik},
  volume  = {17},
  year    = {1905},
  pages   = {549--560}
}

@article{Wiener1923,
  author  = {Wiener, Norbert},
  title   = {Differential-space},
  journal = {Journal of Mathematics and Physics},
  volume  = {2},
  year    = {1923},
  pages   = {131--174}
}

@article{Kac1949,
  author  = {Kac, Mark},
  title   = {On distributions of certain {W}iener functionals},
  journal = {Transactions of the American Mathematical Society},
  volume  = {65},
  year    = {1949},
  pages   = {1--13}
}

@book{Evans2010,
  author    = {Evans, Lawrence C.},
  title     = {Partial Differential Equations},
  edition   = {2},
  publisher = {American Mathematical Society},
  series    = {Graduate Studies in Mathematics},
  volume    = {19},
  year      = {2010}
}

@book{John1982,
  author    = {John, Fritz},
  title     = {Partial Differential Equations},
  edition   = {4},
  publisher = {Springer},
  year      = {1982}
}

@book{Strauss2008,
  author    = {Strauss, Walter A.},
  title     = {Partial Differential Equations: An Introduction},
  edition   = {2},
  publisher = {Wiley},
  year      = {2008}
}

@book{Friedman1964,
  author    = {Friedman, Avner},
  title     = {Partial Differential Equations of Parabolic Type},
  publisher = {Prentice-Hall},
  year      = {1964}
}

@book{Lieberman1996,
  author    = {Lieberman, Gary M.},
  title     = {Second Order Parabolic Differential Equations},
  publisher = {World Scientific},
  year      = {1996}
}

@book{StroockVaradhan1979,
  author    = {Stroock, Daniel W. and Varadhan, S. R. Srinivasa},
  title     = {Multidimensional Diffusion Processes},
  publisher = {Springer},
  series    = {Grundlehren der mathematischen Wissenschaften},
  volume    = {233},
  year      = {1979}
}

Prerequisites

02.07.04
02.13.01
02.13.02

Tier anchors

beginner: Strauss, Partial Differential Equations: An Introduction, 2e (Wiley 2008), §2-§3; physics-anchored diffusion of heat in a metal bar; Fourier 1822 Théorie analytique de la chaleur (originator: Fourier-series solution of the heat equation in a slab and a ring)
intermediate: Evans, Partial Differential Equations, 2e (AMS GSM 19, 2010), §2.3; Strauss §2-§3; John, Partial Differential Equations, 4e (Springer 1982), §7
master: Evans §2.3; John §7; Friedman, Partial Differential Equations of Parabolic Type (Prentice-Hall 1964); Lieberman, Second Order Parabolic Differential Equations (World Scientific 1996); Stroock-Varadhan, Multidimensional Diffusion Processes (Springer 1979)

References

Fourier — Théorie analytique de la chaleur · Firmin Didot (Paris 1822); originator: heat equation, Fourier-series solution, Fourier integral
Sturm — Mémoire sur les équations différentielles linéaires du second ordre · Journal de Mathématiques Pures et Appliquées 1 (1836), 106-186
Liouville — Sur le développement des fonctions ou parties de fonctions en séries · Journal de Mathématiques Pures et Appliquées 1 (1836), 253-265
Tikhonov — Théorèmes d'unicité pour l'équation de la chaleur · Matematicheskii Sbornik (N.S.) 42 (1935), 199-216
Widder — Positive temperatures on an infinite rod · Transactions of the American Mathematical Society 55 (1944), 85-95
Nash — Continuity of solutions of parabolic and elliptic equations · American Journal of Mathematics 80 (1958), 931-954
De Giorgi — Sulla differenziabilità e l'analiticità delle estremali degli integrali multipli regolari · Memorie dell'Accademia delle Scienze di Torino. Classe di Scienze Fisiche, Matematiche e Naturali, Serie 3, 3 (1957), 25-43
Moser — A Harnack inequality for parabolic differential equations · Communications on Pure and Applied Mathematics 17 (1964), 101-134
Aronson — Bounds for the fundamental solution of a parabolic equation · Bulletin of the American Mathematical Society 73 (1967), 890-896
Einstein — Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen · Annalen der Physik 17 (1905), 549-560
Wiener — Differential-space · Journal of Mathematics and Physics 2 (1923), 131-174
Kac — On distributions of certain Wiener functionals · Transactions of the American Mathematical Society 65 (1949), 1-13
Evans — Partial Differential Equations, 2e · AMS Graduate Studies in Mathematics 19 (2010), §2.3
John — Partial Differential Equations, 4e · Springer (1982), §7
Strauss — Partial Differential Equations: An Introduction, 2e · Wiley (2008), §2-§3
Friedman — Partial Differential Equations of Parabolic Type · Prentice-Hall (1964)
Lieberman — Second Order Parabolic Differential Equations · World Scientific (1996)
Stroock-Varadhan — Multidimensional Diffusion Processes · Springer Grundlehren 233 (1979)

Estimated time

beginner: 25m
intermediate: 60m
master: 100m