02.07.10 · analysis / measure-theory

Rademacher's theorem

shipped3 tiersLean: none

Anchor (Master): Federer 1969 Geometric Measure Theory (Springer) §3.1.6; Evans-Gariepy 2015 Measure Theory and Fine Properties of Functions (CRC) Ch. 3; Maly-Ziemer 1997 Fine Regularity of Solutions of Elliptic PDE §1

Intuition Beginner

A function is Lipschitz when there is a fixed speed limit on how fast its output can change relative to its input: move the input a little, and the output moves at most a fixed multiple of that little. The absolute-value function on the real line is Lipschitz with speed limit one — its graph rises and falls at slope plus or minus one, never steeper. Rademacher's theorem says something surprising about every such speed-limited function: even though it might have sharp corners, it has a well-defined slope at almost every point.

The word "almost" is doing careful work. The absolute-value function has no slope at the single point where its corner sits, but a single point is negligibly small — it has zero length. Everywhere else the slope is plain. Rademacher's theorem promises that for any Lipschitz function, even one with infinitely many corners, the set of bad points where the slope fails to exist is negligible in the sense of measure.

Why care? A speed limit on a function looks like a weak assumption — it allows corners, kinks, and ridges. The theorem says this weak assumption secretly buys you a strong conclusion: a genuine derivative almost everywhere. That bargain is the foundation for doing calculus with rough functions, the kind that arise when you measure surface areas of jagged shapes or change variables under maps that are merely Lipschitz rather than smooth.

Visual Beginner

Picture the graph of a Lipschitz function in one variable as a path of a hiker who is forbidden from walking too steeply: the path can have sharp ridges and valleys, but its steepness is capped. At a smooth stretch the hiker has a clear instantaneous heading. At a sharp ridge the heading is ambiguous — left-going and right-going slopes disagree. Rademacher's theorem says these ambiguous ridge points are so sparse that their total length is zero.

In two variables the picture becomes a creased surface, like a tent with fold lines. Across the smooth panels the surface has a tangent plane; along the fold lines it does not. Rademacher's theorem says the fold lines, being lower-dimensional, occupy zero area, so the tangent plane exists almost everywhere on the tent.

Worked example Beginner

Take the function $f (x, y) = ∣ x ∣ + ∣ y ∣$ on the plane. It is Lipschitz: moving the input by a small step changes the output by at most the size of that step times a fixed constant. We find where its slope is well defined.

Step 1. Away from the two axis lines, both $∣ x ∣$ and $∣ y ∣$ are plain smooth pieces. If $x > 0$ and $y > 0$ , then $f (x, y) = x + y$ , a flat tilted plane with a clear tangent. The same holds in each of the four open quadrants, where $f$ is one of $\pm x \pm y$ .

Step 2. On the line $x = 0$ the piece $∣ x ∣$ has a corner, and on the line $y = 0$ the piece $∣ y ∣$ has a corner. So the slope can fail only on these two lines.

Step 3. Measure the bad set. The two axis lines together form a cross. A line in the plane has zero area — you can cover it by a strip of any tiny width you like. So the cross has zero area.

Step 4. Conclude. The slope of $f$ is well defined at every point off the cross, and the cross has zero area. So $f$ has a well-defined slope at almost every point of the plane, with the single exception of a negligible cross.

What this tells us: a function built from absolute values has plenty of corners, but the corners collect along thin lower-dimensional sets of zero area, and the slope survives everywhere else. That is Rademacher's theorem in a hand-checkable case.

Check your understanding Beginner

Exercise (easy, multiple choice).

Rademacher's theorem guarantees that a Lipschitz function on $R^{n}$ is differentiable:

A. Everywhere B. At almost every point (the bad set has measure zero) C. Only at points where it is smooth in the classical sense D. Nowhere, in general

Hint

Think about the absolute-value function: where does its slope fail, and how big is that bad set?

Answer

B. At almost every point. The bad set where the derivative fails to exist has Lebesgue measure zero. Feedback-correct: this is the exact content of Rademacher's theorem — Lipschitz buys differentiability off a null set, not everywhere. Feedback-wrong: A is false because corners exist (absolute value at the origin); C is too restrictive — a Lipschitz function can be differentiable at points where it is not classically smooth; D is wrong because the bad set is small, not all of space.

Formal definition Intermediate+

Let $U \subseteq R^{n}$ be open. A map $f : U \to R^{m}$ is Lipschitz with constant $L \geq 0$ when $∣ f (x) - f (y) ∣ \leq L ∣ x - y ∣$ for all $x, y \in U$ ; it is locally Lipschitz when each point of $U$ has a neighbourhood on which $f$ is Lipschitz. The smallest admissible $L$ on a set is the Lipschitz constant $Lip (f)$ .

Definition (directional derivative). For $v \in R^{n}$ and $x \in U$ , the directional derivative of $f$ at $x$ in direction $v$ is $\partial_{v} f (x) = t \to 0 lim \frac{f ( x + t v ) - f ( x )}{t},$ when the limit exists. For the standard basis vectors $e_{i}$ this gives the partial derivatives $\partial_{e_{i}} f (x)$ , assembled into the gradient $\nabla f (x) = (\partial_{e_{1}} f (x), \dots, \partial_{e_{n}} f (x))$ when $f$ is scalar-valued.

Definition (total differentiability). The map $f$ is (totally) differentiable at $x$ when there is a linear map $D f (x) : R^{n} \to R^{m}$ with $y \to x lim \frac{∣ f ( y ) - f ( x ) - D f ( x ) ( y - x ) ∣}{∣ y - x ∣} = 0.$ Total differentiability is strictly stronger than the existence of all directional derivatives: it requires a single linear map approximating $f$ uniformly over all directions, not merely a slope along each line.

Throughout, $L^{n}$ denotes Lebesgue measure on $R^{n}$ , and "a.e." means outside a set of $L^{n}$ -measure zero.

Counterexamples to common slips Intermediate+

Existence of all directional derivatives does not give total differentiability. The function $g (x, y) = x^{3} / (x^{2} + y^{2})$ for $(x, y) \neq = 0$ and $g (0, 0) = 0$ has every directional derivative at the origin but is not differentiable there. Rademacher's force is precisely that for Lipschitz maps this gap closes a.e.
Continuity is not enough. The Weierstrass function on the line is uniformly continuous yet nowhere differentiable; the Lipschitz hypothesis is what powers the conclusion. Hölder continuity of exponent $α < 1$ is likewise insufficient.
"A.e." cannot be improved to "everywhere". For any $L^{n}$ -null set $N$ one can construct a Lipschitz $f$ non-differentiable at each point of a dense set; the absolute value is the one-point prototype. The theorem is sharp at the level of the null exceptional set.
The a.e. gradient is the weak gradient. For Lipschitz $f$ , the pointwise $\nabla f$ defined a.e. coincides with the distributional gradient, placing $f$ in $W_{loc}^{1, \infty}$ . The two notions agree because integration by parts against test functions sees only the a.e. values.

Key theorem with proof Intermediate+

Theorem (Rademacher 1919). Let $f : R^{n} \to R^{m}$ be locally Lipschitz. Then $f$ is differentiable at $L^{n}$ -almost every point of $R^{n}$ . At each point of differentiability the differential $D f (x)$ has the components $\partial_{v} f (x) = v \cdot \nabla f (x)$ , where $\nabla f$ is the a.e.-defined gradient.

Proof. It suffices to treat $m = 1$ : a vector-valued map is differentiable at $x$ exactly when each scalar component is, and a finite intersection of full-measure sets is full measure. So assume $f : R^{n} \to R$ is Lipschitz with constant $L$ .

Step 1 (directional derivatives exist a.e.). Fix a unit vector $v$ . For each $x$ , the function $t \mapsto f (x + t v)$ is Lipschitz on $R$ , hence absolutely continuous, hence differentiable for $L^{1}$ -a.e. $t$ by the a.e.-differentiability of monotone and absolutely continuous functions 02.07.05. Let $A_{v} = {x : \partial_{v} f (x) fails to exist}$ . Restricting to lines parallel to $v$ and using that $A_{v}$ meets each such line in a $L^{1}$ -null set, Fubini's theorem (the integral of the line-wise null indicator vanishes) gives $L^{n} (A_{v}) = 0$ . So $\partial_{v} f$ exists a.e., and $∣ \partial_{v} f ∣ \leq L$ a.e. since $f$ is Lipschitz.

Step 2 (the gradient and the linearity identity). Apply Step 1 to the basis directions $e_{1}, \dots, e_{n}$ : the gradient $\nabla f (x) = (\partial_{e_{1}} f (x), \dots, \partial_{e_{n}} f (x))$ exists a.e. Fix a unit vector $v$ . For any test function $φ \in C_{c}^{\infty} (R^{n})$ , the difference-quotient functions $\frac{f ( x + t v ) - f ( x )}{t}$ are bounded by $L$ in absolute value and converge a.e. to $\partial_{v} f (x)$ as $t \to 0$ . The dominated convergence theorem 02.07.05 with the constant dominator $L ∣ φ ∣$ lets us pass the limit through the integral: $\int_{R^{n}} \partial_{v} f (x) φ (x) d x = t \to 0 lim \int_{R^{n}} \frac{f ( x + t v ) - f ( x )}{t} φ (x) d x = - \int_{R^{n}} f (x) \partial_{v} φ (x) d x,$ the last equality by the change of variables $x \mapsto x - t v$ inside the first integral term and recognising the difference quotient of $φ$ in the limit. Since $\partial_{v} φ = v \cdot \nabla φ = \sum_{i} v_{i} \partial_{e_{i}} φ$ , the right-hand side equals $- \sum_{i} v_{i} \int f \partial_{e_{i}} φ = \sum_{i} v_{i} \int \partial_{e_{i}} f φ = \int (v \cdot \nabla f) φ$ . As $φ$ was arbitrary, $\partial_{v} f (x) = v \cdot \nabla f (x)$ for a.e. $x$ .

Step 3 (countable dense directions). Choose a countable dense set ${v_{k}}$ in the unit sphere $S^{n - 1}$ . Let $E$ be the full-measure set on which $\nabla f$ exists and the identity $\partial_{v_{k}} f (x) = v_{k} \cdot \nabla f (x)$ holds simultaneously for every $k$ (a countable intersection of full-measure sets is full measure). Fix $x \in E$ .

Step 4 (promotion to total differentiability). For $v \in S^{n - 1}$ and $t \neq = 0$ , define $Q (x, v, t) = \frac{f ( x + t v ) - f ( x )}{t} - v \cdot \nabla f (x) .$ For each fixed $v$ and $v^{'}$ , the Lipschitz bound gives $∣ Q (x, v, t) - Q (x, v^{'}, t) ∣ \leq \frac{∣ f ( x + t v ) - f ( x + t v ^{'} ) ∣}{∣ t ∣} + ∣ (v - v^{'}) \cdot \nabla f (x) ∣ \leq (L + ∣\nabla f (x) ∣) ∣ v - v^{'} ∣.$ So $v \mapsto Q (x, v, t)$ is Lipschitz uniformly in $t$ . Given $ε > 0$ , pick finitely many $v_{k_{1}}, \dots, v_{k_{N}}$ from the dense set so every $v \in S^{n - 1}$ lies within $ε$ of some $v_{k_{j}}$ . For each $v_{k_{j}}$ , $Q (x, v_{k_{j}}, t) \to 0$ as $t \to 0$ by the choice of $E$ , so there is $δ > 0$ with $∣ Q (x, v_{k_{j}}, t) ∣ < ε$ for all $j$ and $0 < ∣ t ∣ < δ$ . For general $v$ , choosing the nearest $v_{k_{j}}$ gives $∣ Q (x, v, t) ∣ \leq ∣ Q (x, v_{k_{j}}, t) ∣ + (L + ∣\nabla f (x) ∣) ε < ε (1 + L + ∣\nabla f (x) ∣)$ for $0 < ∣ t ∣ < δ$ . Since $ε$ was arbitrary, $Q (x, v, t) \to 0$ uniformly in $v$ , which is exactly the statement that $f$ is differentiable at $x$ with differential $v \mapsto v \cdot \nabla f (x)$ . As $E$ has full measure, $f$ is differentiable a.e. $□$

Bridge. Rademacher's theorem builds toward the area and coarea formulas 02.07.11, where the a.e. existence of the Jacobian of a Lipschitz map is the foundational reason the non-smooth change-of-variables integrals are even well posed; the bridge is that an a.e.-defined differential is enough to integrate against, since integration sees only a.e. values. The central insight is the two-stage reduction: a.e. differentiability along lines 02.07.05 is upgraded to a.e. total differentiability by a Fubini-plus-dense-directions argument, and this is exactly the mechanism that generalises to the Stepanov theorem and to Lipschitz maps between metric measure spaces. Putting these together, the result places Lipschitz functions inside $W^{1, \infty}$ with pointwise gradient equal to the weak gradient, which appears again in the rectifiability theory of currents where Lipschitz parametrisations of rectifiable sets carry a.e.-defined tangent planes. The same pattern generalises to the Sobolev and BV settings, where a control on a derivative in an integral sense forces pointwise differentiability off a small set.

Exercises Intermediate+

Exercise 3 (medium, symbolic).

Show that the distance function $d (x) = dist (x, C) = in f_{c \in C} ∣ x - c ∣$ to a nonempty closed set $C \subseteq R^{n}$ is Lipschitz with constant $1$ , hence differentiable a.e. by Rademacher.

Hint

For any $x, y$ and any $c \in C$ , bound $d (x)$ by $∣ x - c ∣ \leq ∣ x - y ∣ + ∣ y - c ∣$ and take the infimum over $c$ .

Answer

Fix $x, y \in R^{n}$ . For every $c \in C$ , the triangle inequality gives $∣ x - c ∣ \leq ∣ x - y ∣ + ∣ y - c ∣$ . Taking the infimum over $c \in C$ on both sides yields $d (x) \leq ∣ x - y ∣ + d (y)$ , so $d (x) - d (y) \leq ∣ x - y ∣$ . By symmetry $d (y) - d (x) \leq ∣ x - y ∣$ , hence $∣ d (x) - d (y) ∣ \leq ∣ x - y ∣$ . So $d$ is Lipschitz with constant $1$ .

By Rademacher's theorem $d$ is differentiable at $L^{n}$ -a.e. point, with $∣\nabla d ∣ \leq 1$ a.e.; in fact $∣\nabla d ∣ = 1$ a.e. on $R^{n} ∖ C$ , where $\nabla d (x)$ points away from the nearest point of $C$ .

Exercise 4 (medium, symbolic).

Let $f : R^{n} \to R$ be Lipschitz. Prove that the pointwise gradient $\nabla f$ (defined a.e.) is the weak gradient, that is, $\int f \partial_{e_{i}} φ = - \int \partial_{e_{i}} f φ$ for all $φ \in C_{c}^{\infty} (R^{n})$ .

Hint

Use the difference-quotient identity from Step 2 of the key theorem with $v = e_{i}$ , justifying the limit interchange by dominated convergence with a constant dominator.

Answer

Fix $i$ and $φ \in C_{c}^{\infty}$ . The difference quotients $D_{t} f (x) = (f (x + t e_{i}) - f (x)) / t$ are bounded by $L$ uniformly and converge a.e. to $\partial_{e_{i}} f (x)$ as $t \to 0$ . A change of variables gives $\int D_{t} f (x) φ (x) d x = - \int f (x) \frac{φ ( x ) - φ ( x - t e _{i} )}{t} d x .$ The left integrand is dominated by $L ∣ φ ∣ \in L^{1}$ and converges a.e. to $\partial_{e_{i}} f φ$ ; the right difference quotient of $φ$ converges uniformly to $\partial_{e_{i}} φ$ on the compact support. Dominated convergence 02.07.05 on the left and uniform convergence on the right give $\int \partial_{e_{i}} f φ = - \int f \partial_{e_{i}} φ,$ so $\partial_{e_{i}} f$ is the weak partial derivative. As all weak partials are bounded, $f \in W_{loc}^{1, \infty}$ .

Exercise 5 (medium, symbolic).

Prove the a.e. chain rule: if $f : R^{n} \to R^{m}$ is Lipschitz and $g : R^{m} \to R$ is $C^{1}$ , then $g \circ f$ is Lipschitz and $\nabla (g \circ f) (x) = (D g) (f (x)) \cdot D f (x)$ for a.e. $x$ .

Hint

Composition of Lipschitz with $C^{1}$ -on-compacts is Lipschitz on bounded sets. Apply the classical chain rule at points where both $f$ is differentiable (a.e. by Rademacher) and $g$ is differentiable (everywhere).

Answer

On any bounded set, $g$ restricted to the bounded image $f (set)$ is Lipschitz (it is $C^{1}$ on a compact set, so its gradient is bounded), and the composition of Lipschitz maps is Lipschitz; hence $g \circ f$ is locally Lipschitz and differentiable a.e. by Rademacher.

Let $E$ be the full-measure set where $f$ is totally differentiable. Fix $x \in E$ . Since $g$ is differentiable at $f (x)$ and $f$ is differentiable at $x$ , the classical chain rule for compositions of differentiable maps applies at $x$ : $D (g \circ f) (x) = (D g) (f (x)) \circ D f (x),$ which in gradient form reads $\nabla (g \circ f) (x) = (D g) (f (x)) \cdot D f (x)$ . As $E$ is full measure, the identity holds a.e. The point of the exercise is that Rademacher supplies the a.e. differentiability of the inner Lipschitz map, the only ingredient the classical chain rule lacks.

Exercise 6 (hard, symbolic).

Prove that a Lipschitz function $f : R^{n} \to R$ whose a.e. gradient vanishes ( $\nabla f = 0$ a.e.) is constant.

Hint

Use the weak-gradient identification from Exercise 4 together with mollification, or argue along lines using absolute continuity on lines.

Answer

By Exercise 4, the a.e. gradient is the weak gradient, so the distributional gradient of $f$ is zero. Mollify: let $f_{ε} = f * ρ_{ε}$ with a standard mollifier $ρ_{ε}$ . Then $\nabla f_{ε} = (\nabla f) * ρ_{ε} = 0$ , so each $f_{ε}$ is a smooth function with zero gradient, hence constant $c_{ε}$ on each connected component (all of $R^{n}$ here). As $ε \to 0$ , $f_{ε} \to f$ locally uniformly (since $f$ is continuous), so $f = lim_{ε} c_{ε}$ is constant.

Alternatively, argue on lines: for a.e. line parallel to $e_{i}$ , the restriction is absolutely continuous with a.e. derivative $\partial_{e_{i}} f = 0$ , hence constant along that line; combining the coordinate directions and using continuity of $f$ forces $f$ constant. Both routes lean on Rademacher only through the a.e. gradient being the genuine derivative along almost every line.

Exercise 7 (hard, open-ended).

Sketch how Rademacher's theorem fails to extend verbatim to infinite dimensions, and name a class of Banach spaces where an analogue (a.e. Gâteaux differentiability of Lipschitz maps in the sense of Gaussian-null exceptional sets) does hold.

Hint

Lebesgue measure has no infinite-dimensional translation-invariant analogue, so "a.e." must be reinterpreted. Consider Aronszajn-null or Gauss-null sets and the role of the Radon-Nikodym property.

Answer

In infinite dimensions there is no nonzero translation-invariant locally finite measure, so the phrase "a.e." in the Lebesgue sense has no direct counterpart; the finite-dimensional Fubini reduction also breaks because one cannot integrate over uncountably many independent directions. The substitute is a notion of negligible set adapted to the geometry: Aronszajn-null / Gauss-null sets (countable unions of sets null along lines in each direction of a spanning sequence, or null for every nondegenerate Gaussian measure).

For Banach-space targets, the Radon-Nikodym property is the right hypothesis: a Lipschitz map from a separable Banach space into a space with the Radon-Nikodym property is Gâteaux differentiable outside a Gauss-null set (Aronszajn 1976; Christensen; Mankiewicz). Fréchet differentiability is far subtler and fails in general; sharp positive results (Lindenstrauss-Preiss-Tišer) require strong geometric assumptions on the domain. The finite-dimensional theorem is the clean special case where Lebesgue-null is the universal notion of negligibility.

Lean formalization Intermediate+

lean_status: none. Mathlib formalizes the one-variable building blocks — MonotoneOn.ae_differentiableWithinAt, the a.e. differentiability of monotone real functions, and LipschitzWith.ae_differentiableAt for one real variable via the Lebesgue differentiation theorem MeasureTheory.ae_tendsto_average — but it does not contain the multivariable Rademacher theorem: the a.e. total differentiability of Lipschitz maps f : EuclideanSpace ℝ (Fin n) → EuclideanSpace ℝ (Fin m). The directional-derivative-plus-Fubini reduction, the dense-directions promotion to full differentiability, and the identification of the pointwise gradient with the distributional gradient (placing Lipschitz maps in W^{1,∞}) are the formalization targets. The statement one would register is below; the proof is the open gap.

import Mathlib.Analysis.Calculus.FDeriv.Basic
import Mathlib.MeasureTheory.Measure.Lebesgue.Basic
import Mathlib.Topology.MetricSpace.Lipschitz

open MeasureTheory

-- Target statement (proof is the Mathlib gap):
abbrev CodexRademacher
    {n m : ℕ} (f : (EuclideanSpace ℝ (Fin n)) → (EuclideanSpace ℝ (Fin m)))
    (L : NNReal) (hf : LipschitzWith L f) : Prop :=
  ∀ᵐ x ∂(volume : Measure (EuclideanSpace ℝ (Fin n))),
    DifferentiableAt ℝ f x

Advanced results Master

The theory around Rademacher's theorem splits into four strands: the Stepanov sharpening that drops the global Lipschitz hypothesis, the Lebesgue-point characterisation of the gradient, the second-order Alexandrov theorem for convex and semiconvex functions, and the metric-space generalisations that carry differentiability into settings without a linear structure.

Theorem 1 (Stepanov 1923). Let $f : R^{n} \to R^{m}$ be measurable. Then $f$ is differentiable at $L^{n}$ -a.e. point of the set $S_{f} = {x : y \to x lim sup \frac{∣ f ( y ) - f ( x ) ∣}{∣ y - x ∣} < \infty}$ on which $f$ is pointwise Lipschitz. Rademacher is the special case $S_{f} = R^{n}$ ^{[Stepanov 1923]}.

The Stepanov theorem is proved by reducing to Rademacher: cover $S_{f}$ by countably many sets on which $f$ agrees with a globally Lipschitz function (built by a McShane-Kirszbraun extension of $f$ restricted to a level set of the local Lipschitz constant), apply Rademacher to each global extension, and intersect. The pointwise differentiability of $f$ then transfers from its global agreement on a set of density one at each point.

Theorem 2 (Lebesgue-point gradient). For Lipschitz $f$ , at $L^{n}$ -a.e. $x$ the gradient is recovered as the blow-up limit $\nabla f (x) \cdot v = r \to 0 lim \frac{1}{r} \fint_{B (x, r)} (f (y + r v) - f (y)) d y$ and $x$ is a Lebesgue point of $\nabla f$ . The differential is the a.e.-unique linear map approximating the blow-ups $f_{x, r} (z) = (f (x + r z) - f (x)) / r$ , which converge locally uniformly to the linear function $z \mapsto \nabla f (x) \cdot z$ .

Theorem 3 (Alexandrov 1939). A convex function $f : R^{n} \to R$ is twice differentiable $L^{n}$ -a.e.: at a.e. $x$ there is a symmetric matrix $D^{2} f (x)$ with $f (x + h) = f (x) + \nabla f (x) \cdot h + \frac{1}{2} h^{⊤} D^{2} f (x) h + o (∣ h ∣^{2})$ . Since convex functions are locally Lipschitz, Rademacher gives the a.e. first derivative; Alexandrov's theorem promotes this to a.e. second-order expansion using the monotonicity of the subdifferential ^{[Evans-Gariepy Ch. 6]}.

Theorem 4 (metric differentiation, Kirchheim 1994). A Lipschitz map $f : R^{n} \to X$ into a metric space is metrically differentiable $L^{n}$ -a.e.: at a.e. $x$ there is a seminorm $md_{x} f$ on $R^{n}$ with $d_{X} (f (y), f (x)) = md_{x} f (y - x) + o (∣ y - x ∣)$ . This is the form of Rademacher that survives when the target has no linear structure, foundational for the Ambrosio-Kirchheim theory of metric currents.

Theorem 5 (Cheeger 1999). On a doubling metric measure space supporting a Poincaré inequality, every Lipschitz function is differentiable a.e. with respect to a measurable cotangent structure of finite dimension. This Rademacher analogue in the absence of a smooth structure is the cornerstone of analysis on metric measure spaces and of the Cheeger-Kleiner rigidity theory.

Theorem 6 (failure of full differentiability sets, Preiss 1990). There is a Lipschitz function on $R^{2}$ whose set of points of differentiability, while of full measure, is not the complement of any $σ$ -porous set; the fine structure of the exceptional set is genuinely intricate. The existence of small universal differentiability sets (Lindenstrauss-Preiss-Tišer) quantifies how small a set can still capture a point of differentiability for every Lipschitz function.

Synthesis. Rademacher's theorem is the foundational reason that the non-smooth calculus underlying geometric measure theory is well posed: a single integrable speed limit forces a genuine differential off a null set, and this is exactly the structural fact that the area and coarea formulas 02.07.11 exploit when they integrate Jacobians of merely Lipschitz maps. The central insight is the reduction of total differentiability to two layers — a.e. differentiability along lines 02.07.05 promoted to a.e. total differentiability by Fubini and a dense set of directions — and putting these together identifies the pointwise gradient with the weak gradient, placing Lipschitz functions in $W^{1, \infty}$ . This pattern generalises in three directions at once: Stepanov drops the global hypothesis to pointwise Lipschitz, Alexandrov promotes first-order to second-order differentiability for convex functions, and Kirchheim-Cheeger carry the differentiability into metric measure spaces where no linear structure is available; the bridge in each case is that a control on difference quotients in an integral or doubling sense is enough to manufacture a derivative almost everywhere. The result appears again in the rectifiability theory of currents, where Lipschitz parametrisations carry a.e. tangent planes, and the same mechanism that organises Rademacher reappears whenever rough regularity is upgraded to pointwise structure.

Full proof set Master

Proposition 1 (Lipschitz on lines and absolute continuity). Let $f : R^{n} \to R$ be Lipschitz with constant $L$ and fix a unit vector $v$ . For every $x$ , the function $γ_{x} (t) = f (x + t v)$ is absolutely continuous on $R$ , and $γ_{x}^{'} (t)$ exists for $L^{1}$ -a.e. $t$ with $∣ γ_{x}^{'} ∣ \leq L$ .

Proof. For $s < t$ , $∣ γ_{x} (t) - γ_{x} (s) ∣ = ∣ f (x + t v) - f (x + s v) ∣ \leq L ∣ t - s ∣$ , so $γ_{x}$ is Lipschitz on $R$ . A Lipschitz function on an interval is absolutely continuous: given $ε$ , the choice $δ = ε / L$ makes the total variation over any finite disjoint family of intervals of total length below $δ$ less than $ε$ . By the fundamental theorem of calculus for absolutely continuous functions, $γ_{x}$ is differentiable a.e. with $γ_{x} (t) - γ_{x} (0) = \int_{0}^{t} γ_{x}^{'}$ , and the Lipschitz bound forces $∣ γ_{x}^{'} ∣ \leq L$ a.e. $□$

Proposition 2 (Fubini promotion of line-wise a.e. statements). Let $A \subseteq R^{n}$ be measurable and $v$ a unit vector. If $A$ meets $L^{1}$ -a.e. line in direction $v$ in a $L^{1}$ -null set, then $L^{n} (A) = 0$ .

Proof. Choose coordinates so $v = e_{n}$ ; write $x = (x^{'}, x_{n})$ with $x^{'} \in R^{n - 1}$ . The hypothesis states that for $L^{n - 1}$ -a.e. $x^{'}$ , the slice $A_{x^{'}} = {x_{n} : (x^{'}, x_{n}) \in A}$ has $L^{1}$ -measure zero. By Tonelli's theorem applied to $χ_{A}$ , $L^{n} (A) = \int_{R^{n - 1}} L^{1} (A_{x^{'}}) d x^{'} = \int_{R^{n - 1}} 0 d x^{'} = 0. □$

Proposition 3 (a.e. gradient is the weak gradient). For Lipschitz $f : R^{n} \to R$ , each pointwise partial $\partial_{e_{i}} f$ (defined a.e.) satisfies $\int f \partial_{e_{i}} φ = - \int \partial_{e_{i}} f φ$ for all $φ \in C_{c}^{\infty} (R^{n})$ ; hence $f \in W_{loc}^{1, \infty}$ with $∥\nabla f ∥_{L^{\infty}} \leq Lip (f)$ .

Proof. The difference quotients $D_{t} f (x) = (f (x + t e_{i}) - f (x)) / t$ obey $∣ D_{t} f ∣ \leq L$ and converge a.e. to $\partial_{e_{i}} f$ . Translation invariance of Lebesgue measure gives $\int (D_{t} f) φ d x = - \int f (D_{- t} φ) d x, D_{- t} φ (x) = \frac{φ ( x ) - φ ( x - t e _{i} )}{t} .$ On the left, dominated convergence 02.07.05 with dominator $L ∣ φ ∣ \in L^{1}$ passes the limit inside to give $\int \partial_{e_{i}} f φ$ . On the right, $D_{- t} φ \to \partial_{e_{i}} φ$ uniformly on the compact support of $φ$ , so the right side tends to $- \int f \partial_{e_{i}} φ$ . Equating limits gives the weak-derivative identity. Since $∣ \partial_{e_{i}} f ∣ \leq L$ a.e., the weak gradient is bounded, so $f \in W_{loc}^{1, \infty}$ . $□$

Proposition 4 (sharpness: a null set of corners). For every $L^{1}$ -null compact set $K \subseteq R$ there is a Lipschitz function $f : R \to R$ failing to be differentiable at each point of $K$ .

Proof. Take $f (x) = dist (x, K)$ . By the argument of Exercise 3, $f$ is Lipschitz with constant $1$ . At any point $a \in K$ , $f (a) = 0$ and $f (x) > 0$ for $x \in / K$ near $a$ (when $a$ is not isolated in $K^{c}$ on both sides), so the right and left difference quotients of $f$ at $a$ take values approaching $+ 1$ and $- 1$ along sequences in $K^{c}$ , while equalling $0$ along sequences in $K$ ; the limit fails to exist. Thus $f$ is non-differentiable on a dense-in- $K$ subset of $K$ , and by enlarging $K$ to a fat-Cantor-style null set one realises an uncountable null exceptional set. This shows the "a.e." in Rademacher cannot be upgraded to "everywhere". $□$

Proposition 5 (blow-up convergence). Let $f : R^{n} \to R$ be Lipschitz and let $x$ be a point of differentiability with differential $ℓ (z) = \nabla f (x) \cdot z$ . Then the rescalings $f_{x, r} (z) = (f (x + r z) - f (x)) / r$ converge to $ℓ$ locally uniformly as $r \to 0$ .

Proof. For $∣ z ∣ \leq R$ , $∣ f_{x, r} (z) - ℓ (z) ∣ = \frac{∣ f ( x + r z ) - f ( x ) - \nabla f ( x ) \cdot ( r z ) ∣}{r} = ∣ z ∣ \cdot \frac{∣ f ( x + r z ) - f ( x ) - \nabla f ( x ) \cdot ( r z ) ∣}{∣ r z ∣} .$ The differentiability of $f$ at $x$ makes the last quotient tend to $0$ as $r z \to 0$ , uniformly for $z$ in the ball $∣ z ∣ \leq R$ because $r ∣ z ∣ \leq r R \to 0$ . Hence $sup_{∣ z ∣ \leq R} ∣ f_{x, r} (z) - ℓ (z) ∣ \to 0$ , which is local uniform convergence of the blow-ups to the linear differential. $□$

Connections Master

The line-wise step of the proof rests on the a.e. differentiability of monotone and absolutely continuous functions developed alongside the convergence theorems in 02.07.05; the dominated convergence theorem there is the exact tool that passes the difference-quotient limit through the integral in the weak-gradient identity.
The Carathéodory construction and Hausdorff measures of 02.07.02 supply the measure-theoretic substrate — Lebesgue null sets and the Fubini slicing of 02.07.02's product structure — without which "differentiable almost everywhere" has no meaning; the sharpness constructions use null sets built by the outer-measure machinery there.
Rademacher is the hinge prerequisite for the area and coarea formulas 02.07.11, where the a.e.-defined Jacobian of a Lipschitz map makes the non-smooth change-of-variables integrals well posed; that unit consumes this theorem directly in its Step 1.
The placement of Lipschitz functions inside $W^{1, \infty}$ connects to the Sobolev-space theory used throughout the analysis-side PDE chapter, where Lipschitz test and barrier functions are differentiated a.e. against weak formulations, and to the rectifiable-currents units 02.13.07 and 02.13.11, whose Lipschitz parametrisations of rectifiable sets carry a.e.-defined tangent planes by exactly this theorem.

Historical & philosophical context Master

Hans Rademacher proved the theorem in 1919 as a tool for the transformation of double integrals, embedding it in a study of partial and total differentiability of functions of several variables ^{[Rademacher 1919]}. The result sat at the confluence of two earlier currents: Lebesgue's theory of the a.e. differentiability of monotone functions of one variable, and the nineteenth-century discovery, through Weierstrass, that continuity alone guarantees no differentiability whatsoever. Rademacher's contribution was to locate the precise regularity threshold — a global speed limit — at which a.e. differentiability returns in arbitrary dimension.

Stepanov sharpened the statement four years later, replacing the global Lipschitz hypothesis by pointwise Lipschitz behaviour on a measurable set ^{[Stepanov 1923]}. The theorem then became structural infrastructure in Federer's encyclopedic codification of geometric measure theory, where it underwrites the area and coarea formulas and the rectifiability of integral currents ^{[Federer 1969]}. The modern textbook treatment in Evans and Gariepy isolates the directional-derivative-plus-Fubini argument as the canonical proof and connects it forward to the fine properties of Sobolev and BV functions ^{[Evans-Gariepy Ch. 3]}. The later metric-space extensions of Kirchheim and Cheeger reinterpreted differentiability itself as a measurable structure, showing that Rademacher's bargain — integral control of difference quotients yielding pointwise derivatives — persists far beyond the Euclidean setting.

Bibliography Master

@article{Rademacher1919,
  author  = {Rademacher, Hans},
  title   = {{\"U}ber partielle und totale Differenzierbarkeit von Funktionen mehrerer Variabeln und {\"u}ber die Transformation der Doppelintegrale},
  journal = {Mathematische Annalen},
  volume  = {79},
  pages   = {340--359},
  year    = {1919}
}

@article{Stepanov1923,
  author  = {Stepanov, W.},
  title   = {{\"U}ber totale Differenzierbarkeit},
  journal = {Mathematische Annalen},
  volume  = {90},
  pages   = {318--320},
  year    = {1923}
}

@book{Federer1969,
  author    = {Federer, Herbert},
  title     = {Geometric Measure Theory},
  series    = {Die Grundlehren der mathematischen Wissenschaften, Band 153},
  publisher = {Springer-Verlag},
  address   = {New York},
  year      = {1969}
}

@book{EvansGariepy2015,
  author    = {Evans, Lawrence C. and Gariepy, Ronald F.},
  title     = {Measure Theory and Fine Properties of Functions},
  edition   = {Revised},
  series    = {Textbooks in Mathematics},
  publisher = {CRC Press},
  address   = {Boca Raton},
  year      = {2015}
}

@book{Heinonen2005,
  author    = {Heinonen, Juha},
  title     = {Lectures on Lipschitz Analysis},
  series    = {Report. University of Jyv{\"a}skyl{\"a}},
  publisher = {University of Jyv{\"a}skyl{\"a}},
  year      = {2005}
}

@article{Kirchheim1994,
  author  = {Kirchheim, Bernd},
  title   = {Rectifiable metric spaces: local structure and regularity of the Hausdorff measure},
  journal = {Proceedings of the American Mathematical Society},
  volume  = {121},
  number  = {1},
  pages   = {113--123},
  year    = {1994}
}

@article{Cheeger1999,
  author  = {Cheeger, Jeff},
  title   = {Differentiability of Lipschitz functions on metric measure spaces},
  journal = {Geometric and Functional Analysis},
  volume  = {9},
  number  = {3},
  pages   = {428--517},
  year    = {1999}
}

Prerequisites

02.07.02
02.07.05

Tier anchors

beginner: A function that never changes too fast has a well-defined slope almost everywhere, even where it has corners; the absolute-value picture and the staircase intuition
intermediate: Evans-Gariepy 2015 Measure Theory and Fine Properties of Functions (CRC) §3.1.2; Heinonen 2005 Lectures on Lipschitz Analysis §3
master: Federer 1969 Geometric Measure Theory (Springer) §3.1.6; Evans-Gariepy 2015 Measure Theory and Fine Properties of Functions (CRC) Ch. 3; Maly-Ziemer 1997 Fine Regularity of Solutions of Elliptic PDE §1

References

Rademacher — Über partielle und totale Differenzierbarkeit von Funktionen mehrerer Variabeln und über die Transformation der Doppelintegrale · Mathematische Annalen 79 (1919), 340-359
Stepanov — Über totale Differenzierbarkeit · Mathematische Annalen 90 (1923), 318-320
Federer — Geometric Measure Theory · §3.1.6, a.e. differentiability of Lipschitz maps
Evans, Gariepy — Measure Theory and Fine Properties of Functions, revised edition · Ch. 3, §3.1.2 (Rademacher's theorem)
Heinonen — Lectures on Lipschitz Analysis · §3, differentiability of Lipschitz functions

Estimated time

beginner: 16m
intermediate: 45m
master: 80m