Rademacher's theorem
Anchor (Master): Federer 1969 Geometric Measure Theory (Springer) §3.1.6; Evans-Gariepy 2015 Measure Theory and Fine Properties of Functions (CRC) Ch. 3; Maly-Ziemer 1997 Fine Regularity of Solutions of Elliptic PDE §1
Intuition Beginner
A function is Lipschitz when there is a fixed speed limit on how fast its output can change relative to its input: move the input a little, and the output moves at most a fixed multiple of that little. The absolute-value function on the real line is Lipschitz with speed limit one — its graph rises and falls at slope plus or minus one, never steeper. Rademacher's theorem says something surprising about every such speed-limited function: even though it might have sharp corners, it has a well-defined slope at almost every point.
The word "almost" is doing careful work. The absolute-value function has no slope at the single point where its corner sits, but a single point is negligibly small — it has zero length. Everywhere else the slope is plain. Rademacher's theorem promises that for any Lipschitz function, even one with infinitely many corners, the set of bad points where the slope fails to exist is negligible in the sense of measure.
Why care? A speed limit on a function looks like a weak assumption — it allows corners, kinks, and ridges. The theorem says this weak assumption secretly buys you a strong conclusion: a genuine derivative almost everywhere. That bargain is the foundation for doing calculus with rough functions, the kind that arise when you measure surface areas of jagged shapes or change variables under maps that are merely Lipschitz rather than smooth.
Visual Beginner
Picture the graph of a Lipschitz function in one variable as a path of a hiker who is forbidden from walking too steeply: the path can have sharp ridges and valleys, but its steepness is capped. At a smooth stretch the hiker has a clear instantaneous heading. At a sharp ridge the heading is ambiguous — left-going and right-going slopes disagree. Rademacher's theorem says these ambiguous ridge points are so sparse that their total length is zero.
In two variables the picture becomes a creased surface, like a tent with fold lines. Across the smooth panels the surface has a tangent plane; along the fold lines it does not. Rademacher's theorem says the fold lines, being lower-dimensional, occupy zero area, so the tangent plane exists almost everywhere on the tent.
Worked example Beginner
Take the function on the plane. It is Lipschitz: moving the input by a small step changes the output by at most the size of that step times a fixed constant. We find where its slope is well defined.
Step 1. Away from the two axis lines, both and are plain smooth pieces. If and , then , a flat tilted plane with a clear tangent. The same holds in each of the four open quadrants, where is one of .
Step 2. On the line the piece has a corner, and on the line the piece has a corner. So the slope can fail only on these two lines.
Step 3. Measure the bad set. The two axis lines together form a cross. A line in the plane has zero area — you can cover it by a strip of any tiny width you like. So the cross has zero area.
Step 4. Conclude. The slope of is well defined at every point off the cross, and the cross has zero area. So has a well-defined slope at almost every point of the plane, with the single exception of a negligible cross.
What this tells us: a function built from absolute values has plenty of corners, but the corners collect along thin lower-dimensional sets of zero area, and the slope survives everywhere else. That is Rademacher's theorem in a hand-checkable case.
Check your understanding Beginner
Formal definition Intermediate+
Let be open. A map is Lipschitz with constant when for all ; it is locally Lipschitz when each point of has a neighbourhood on which is Lipschitz. The smallest admissible on a set is the Lipschitz constant .
Definition (directional derivative). For and , the directional derivative of at in direction is when the limit exists. For the standard basis vectors this gives the partial derivatives , assembled into the gradient when is scalar-valued.
Definition (total differentiability). The map is (totally) differentiable at when there is a linear map with Total differentiability is strictly stronger than the existence of all directional derivatives: it requires a single linear map approximating uniformly over all directions, not merely a slope along each line.
Throughout, denotes Lebesgue measure on , and "a.e." means outside a set of -measure zero.
Counterexamples to common slips Intermediate+
Existence of all directional derivatives does not give total differentiability. The function for and has every directional derivative at the origin but is not differentiable there. Rademacher's force is precisely that for Lipschitz maps this gap closes a.e.
Continuity is not enough. The Weierstrass function on the line is uniformly continuous yet nowhere differentiable; the Lipschitz hypothesis is what powers the conclusion. Hölder continuity of exponent is likewise insufficient.
"A.e." cannot be improved to "everywhere". For any -null set one can construct a Lipschitz non-differentiable at each point of a dense set; the absolute value is the one-point prototype. The theorem is sharp at the level of the null exceptional set.
The a.e. gradient is the weak gradient. For Lipschitz , the pointwise defined a.e. coincides with the distributional gradient, placing in . The two notions agree because integration by parts against test functions sees only the a.e. values.
Key theorem with proof Intermediate+
Theorem (Rademacher 1919). Let be locally Lipschitz. Then is differentiable at -almost every point of . At each point of differentiability the differential has the components , where is the a.e.-defined gradient.
Proof. It suffices to treat : a vector-valued map is differentiable at exactly when each scalar component is, and a finite intersection of full-measure sets is full measure. So assume is Lipschitz with constant .
Step 1 (directional derivatives exist a.e.). Fix a unit vector . For each , the function is Lipschitz on , hence absolutely continuous, hence differentiable for -a.e. by the a.e.-differentiability of monotone and absolutely continuous functions 02.07.05. Let . Restricting to lines parallel to and using that meets each such line in a -null set, Fubini's theorem (the integral of the line-wise null indicator vanishes) gives . So exists a.e., and a.e. since is Lipschitz.
Step 2 (the gradient and the linearity identity). Apply Step 1 to the basis directions : the gradient exists a.e. Fix a unit vector . For any test function , the difference-quotient functions
are bounded by in absolute value and converge a.e. to as . The dominated convergence theorem 02.07.05 with the constant dominator lets us pass the limit through the integral:
the last equality by the change of variables inside the first integral term and recognising the difference quotient of in the limit. Since , the right-hand side equals . As was arbitrary, for a.e. .
Step 3 (countable dense directions). Choose a countable dense set in the unit sphere . Let be the full-measure set on which exists and the identity holds simultaneously for every (a countable intersection of full-measure sets is full measure). Fix .
Step 4 (promotion to total differentiability). For and , define For each fixed and , the Lipschitz bound gives So is Lipschitz uniformly in . Given , pick finitely many from the dense set so every lies within of some . For each , as by the choice of , so there is with for all and . For general , choosing the nearest gives for . Since was arbitrary, uniformly in , which is exactly the statement that is differentiable at with differential . As has full measure, is differentiable a.e.
Bridge. Rademacher's theorem builds toward the area and coarea formulas 02.07.11, where the a.e. existence of the Jacobian of a Lipschitz map is the foundational reason the non-smooth change-of-variables integrals are even well posed; the bridge is that an a.e.-defined differential is enough to integrate against, since integration sees only a.e. values. The central insight is the two-stage reduction: a.e. differentiability along lines 02.07.05 is upgraded to a.e. total differentiability by a Fubini-plus-dense-directions argument, and this is exactly the mechanism that generalises to the Stepanov theorem and to Lipschitz maps between metric measure spaces. Putting these together, the result places Lipschitz functions inside with pointwise gradient equal to the weak gradient, which appears again in the rectifiability theory of currents where Lipschitz parametrisations of rectifiable sets carry a.e.-defined tangent planes. The same pattern generalises to the Sobolev and BV settings, where a control on a derivative in an integral sense forces pointwise differentiability off a small set.
Exercises Intermediate+
Lean formalization Intermediate+
lean_status: none. Mathlib formalizes the one-variable building blocks — MonotoneOn.ae_differentiableWithinAt, the a.e. differentiability of monotone real functions, and LipschitzWith.ae_differentiableAt for one real variable via the Lebesgue differentiation theorem MeasureTheory.ae_tendsto_average — but it does not contain the multivariable Rademacher theorem: the a.e. total differentiability of Lipschitz maps f : EuclideanSpace ℝ (Fin n) → EuclideanSpace ℝ (Fin m). The directional-derivative-plus-Fubini reduction, the dense-directions promotion to full differentiability, and the identification of the pointwise gradient with the distributional gradient (placing Lipschitz maps in W^{1,∞}) are the formalization targets. The statement one would register is below; the proof is the open gap.
import Mathlib.Analysis.Calculus.FDeriv.Basic
import Mathlib.MeasureTheory.Measure.Lebesgue.Basic
import Mathlib.Topology.MetricSpace.Lipschitz
open MeasureTheory
-- Target statement (proof is the Mathlib gap):
abbrev CodexRademacher
{n m : ℕ} (f : (EuclideanSpace ℝ (Fin n)) → (EuclideanSpace ℝ (Fin m)))
(L : NNReal) (hf : LipschitzWith L f) : Prop :=
∀ᵐ x ∂(volume : Measure (EuclideanSpace ℝ (Fin n))),
DifferentiableAt ℝ f xAdvanced results Master
The theory around Rademacher's theorem splits into four strands: the Stepanov sharpening that drops the global Lipschitz hypothesis, the Lebesgue-point characterisation of the gradient, the second-order Alexandrov theorem for convex and semiconvex functions, and the metric-space generalisations that carry differentiability into settings without a linear structure.
Theorem 1 (Stepanov 1923). Let be measurable. Then is differentiable at -a.e. point of the set on which is pointwise Lipschitz. Rademacher is the special case [Stepanov 1923].
The Stepanov theorem is proved by reducing to Rademacher: cover by countably many sets on which agrees with a globally Lipschitz function (built by a McShane-Kirszbraun extension of restricted to a level set of the local Lipschitz constant), apply Rademacher to each global extension, and intersect. The pointwise differentiability of then transfers from its global agreement on a set of density one at each point.
Theorem 2 (Lebesgue-point gradient). For Lipschitz , at -a.e. the gradient is recovered as the blow-up limit and is a Lebesgue point of . The differential is the a.e.-unique linear map approximating the blow-ups , which converge locally uniformly to the linear function .
Theorem 3 (Alexandrov 1939). A convex function is twice differentiable -a.e.: at a.e. there is a symmetric matrix with . Since convex functions are locally Lipschitz, Rademacher gives the a.e. first derivative; Alexandrov's theorem promotes this to a.e. second-order expansion using the monotonicity of the subdifferential [Evans-Gariepy Ch. 6].
Theorem 4 (metric differentiation, Kirchheim 1994). A Lipschitz map into a metric space is metrically differentiable -a.e.: at a.e. there is a seminorm on with . This is the form of Rademacher that survives when the target has no linear structure, foundational for the Ambrosio-Kirchheim theory of metric currents.
Theorem 5 (Cheeger 1999). On a doubling metric measure space supporting a Poincaré inequality, every Lipschitz function is differentiable a.e. with respect to a measurable cotangent structure of finite dimension. This Rademacher analogue in the absence of a smooth structure is the cornerstone of analysis on metric measure spaces and of the Cheeger-Kleiner rigidity theory.
Theorem 6 (failure of full differentiability sets, Preiss 1990). There is a Lipschitz function on whose set of points of differentiability, while of full measure, is not the complement of any -porous set; the fine structure of the exceptional set is genuinely intricate. The existence of small universal differentiability sets (Lindenstrauss-Preiss-Tišer) quantifies how small a set can still capture a point of differentiability for every Lipschitz function.
Synthesis. Rademacher's theorem is the foundational reason that the non-smooth calculus underlying geometric measure theory is well posed: a single integrable speed limit forces a genuine differential off a null set, and this is exactly the structural fact that the area and coarea formulas 02.07.11 exploit when they integrate Jacobians of merely Lipschitz maps. The central insight is the reduction of total differentiability to two layers — a.e. differentiability along lines 02.07.05 promoted to a.e. total differentiability by Fubini and a dense set of directions — and putting these together identifies the pointwise gradient with the weak gradient, placing Lipschitz functions in . This pattern generalises in three directions at once: Stepanov drops the global hypothesis to pointwise Lipschitz, Alexandrov promotes first-order to second-order differentiability for convex functions, and Kirchheim-Cheeger carry the differentiability into metric measure spaces where no linear structure is available; the bridge in each case is that a control on difference quotients in an integral or doubling sense is enough to manufacture a derivative almost everywhere. The result appears again in the rectifiability theory of currents, where Lipschitz parametrisations carry a.e. tangent planes, and the same mechanism that organises Rademacher reappears whenever rough regularity is upgraded to pointwise structure.
Full proof set Master
Proposition 1 (Lipschitz on lines and absolute continuity). Let be Lipschitz with constant and fix a unit vector . For every , the function is absolutely continuous on , and exists for -a.e. with .
Proof. For , , so is Lipschitz on . A Lipschitz function on an interval is absolutely continuous: given , the choice makes the total variation over any finite disjoint family of intervals of total length below less than . By the fundamental theorem of calculus for absolutely continuous functions, is differentiable a.e. with , and the Lipschitz bound forces a.e.
Proposition 2 (Fubini promotion of line-wise a.e. statements). Let be measurable and a unit vector. If meets -a.e. line in direction in a -null set, then .
Proof. Choose coordinates so ; write with . The hypothesis states that for -a.e. , the slice has -measure zero. By Tonelli's theorem applied to ,
Proposition 3 (a.e. gradient is the weak gradient). For Lipschitz , each pointwise partial (defined a.e.) satisfies for all ; hence with .
Proof. The difference quotients obey and converge a.e. to . Translation invariance of Lebesgue measure gives
On the left, dominated convergence 02.07.05 with dominator passes the limit inside to give . On the right, uniformly on the compact support of , so the right side tends to . Equating limits gives the weak-derivative identity. Since a.e., the weak gradient is bounded, so .
Proposition 4 (sharpness: a null set of corners). For every -null compact set there is a Lipschitz function failing to be differentiable at each point of .
Proof. Take . By the argument of Exercise 3, is Lipschitz with constant . At any point , and for near (when is not isolated in on both sides), so the right and left difference quotients of at take values approaching and along sequences in , while equalling along sequences in ; the limit fails to exist. Thus is non-differentiable on a dense-in- subset of , and by enlarging to a fat-Cantor-style null set one realises an uncountable null exceptional set. This shows the "a.e." in Rademacher cannot be upgraded to "everywhere".
Proposition 5 (blow-up convergence). Let be Lipschitz and let be a point of differentiability with differential . Then the rescalings converge to locally uniformly as .
Proof. For , The differentiability of at makes the last quotient tend to as , uniformly for in the ball because . Hence , which is local uniform convergence of the blow-ups to the linear differential.
Connections Master
The line-wise step of the proof rests on the a.e. differentiability of monotone and absolutely continuous functions developed alongside the convergence theorems in
02.07.05; the dominated convergence theorem there is the exact tool that passes the difference-quotient limit through the integral in the weak-gradient identity.The Carathéodory construction and Hausdorff measures of
02.07.02supply the measure-theoretic substrate — Lebesgue null sets and the Fubini slicing of02.07.02's product structure — without which "differentiable almost everywhere" has no meaning; the sharpness constructions use null sets built by the outer-measure machinery there.Rademacher is the hinge prerequisite for the area and coarea formulas
02.07.11, where the a.e.-defined Jacobian of a Lipschitz map makes the non-smooth change-of-variables integrals well posed; that unit consumes this theorem directly in its Step 1.The placement of Lipschitz functions inside connects to the Sobolev-space theory used throughout the analysis-side PDE chapter, where Lipschitz test and barrier functions are differentiated a.e. against weak formulations, and to the rectifiable-currents units
02.13.07and02.13.11, whose Lipschitz parametrisations of rectifiable sets carry a.e.-defined tangent planes by exactly this theorem.
Historical & philosophical context Master
Hans Rademacher proved the theorem in 1919 as a tool for the transformation of double integrals, embedding it in a study of partial and total differentiability of functions of several variables [Rademacher 1919]. The result sat at the confluence of two earlier currents: Lebesgue's theory of the a.e. differentiability of monotone functions of one variable, and the nineteenth-century discovery, through Weierstrass, that continuity alone guarantees no differentiability whatsoever. Rademacher's contribution was to locate the precise regularity threshold — a global speed limit — at which a.e. differentiability returns in arbitrary dimension.
Stepanov sharpened the statement four years later, replacing the global Lipschitz hypothesis by pointwise Lipschitz behaviour on a measurable set [Stepanov 1923]. The theorem then became structural infrastructure in Federer's encyclopedic codification of geometric measure theory, where it underwrites the area and coarea formulas and the rectifiability of integral currents [Federer 1969]. The modern textbook treatment in Evans and Gariepy isolates the directional-derivative-plus-Fubini argument as the canonical proof and connects it forward to the fine properties of Sobolev and BV functions [Evans-Gariepy Ch. 3]. The later metric-space extensions of Kirchheim and Cheeger reinterpreted differentiability itself as a measurable structure, showing that Rademacher's bargain — integral control of difference quotients yielding pointwise derivatives — persists far beyond the Euclidean setting.
Bibliography Master
@article{Rademacher1919,
author = {Rademacher, Hans},
title = {{\"U}ber partielle und totale Differenzierbarkeit von Funktionen mehrerer Variabeln und {\"u}ber die Transformation der Doppelintegrale},
journal = {Mathematische Annalen},
volume = {79},
pages = {340--359},
year = {1919}
}
@article{Stepanov1923,
author = {Stepanov, W.},
title = {{\"U}ber totale Differenzierbarkeit},
journal = {Mathematische Annalen},
volume = {90},
pages = {318--320},
year = {1923}
}
@book{Federer1969,
author = {Federer, Herbert},
title = {Geometric Measure Theory},
series = {Die Grundlehren der mathematischen Wissenschaften, Band 153},
publisher = {Springer-Verlag},
address = {New York},
year = {1969}
}
@book{EvansGariepy2015,
author = {Evans, Lawrence C. and Gariepy, Ronald F.},
title = {Measure Theory and Fine Properties of Functions},
edition = {Revised},
series = {Textbooks in Mathematics},
publisher = {CRC Press},
address = {Boca Raton},
year = {2015}
}
@book{Heinonen2005,
author = {Heinonen, Juha},
title = {Lectures on Lipschitz Analysis},
series = {Report. University of Jyv{\"a}skyl{\"a}},
publisher = {University of Jyv{\"a}skyl{\"a}},
year = {2005}
}
@article{Kirchheim1994,
author = {Kirchheim, Bernd},
title = {Rectifiable metric spaces: local structure and regularity of the Hausdorff measure},
journal = {Proceedings of the American Mathematical Society},
volume = {121},
number = {1},
pages = {113--123},
year = {1994}
}
@article{Cheeger1999,
author = {Cheeger, Jeff},
title = {Differentiability of Lipschitz functions on metric measure spaces},
journal = {Geometric and Functional Analysis},
volume = {9},
number = {3},
pages = {428--517},
year = {1999}
}