02.19.01 · analysis / calderon-zygmund-singular-integrals

The Hardy-Littlewood Maximal Function and the Vitali Covering Lemma

shipped3 tiersLean: none

Anchor (Master): Stein 1993 *Harmonic Analysis* (Princeton) Ch. I-II; Grafakos 2014 *Classical Fourier Analysis* 3e (Springer) §2.1; Stein 1970 *Singular Integrals* (Princeton) Ch. I

Intuition Beginner

Suppose you have a temperature reading at every point along a metal rod, and the readings are noisy. A natural way to smooth them is to replace the value at each point by the average of the readings over a small window around that point. If the window is wide the average is very smooth; if it is narrow the average tracks the original closely. The maximal function asks a different question: across all possible window sizes centred at a point, what is the largest average of the absolute readings you can get? That single number measures how big the function looks near that point, in the most generous averaging you are allowed.

Why bother with the worst-case window rather than one fixed window? Because many questions in analysis hinge on controlling averages at every scale at once. If you want to know that the small-window averages settle down to the true value as the window shrinks, the clean way is to first control the supremum over all windows, then push the window to zero. The maximal function is the tool that bundles every scale into one object you can estimate.

The key fact is that this worst-case-average operation cannot blow up too badly. Even though taking a supremum over infinitely many windows looks dangerous, the set of points where the maximal function exceeds a height stays small in a precise, measurable way. That smallness is what lets the maximal function recover one of the founding theorems of integration theory: that the averages of an integrable function over shrinking balls converge back to the function almost everywhere.

The one-sentence takeaway: the maximal function records the largest averaging-window value at each point, and the surprising control on how often it is large is the engine behind almost-everywhere convergence and the whole later theory of singular integrals.

Visual Beginner

Picture a bump-shaped function on a line: zero far out, rising to a peak in the middle. Centre a window of half-width $r$ at a point $x$ and compute the average of the function's height over that window. As you grow $r$ from very small to very large, the average rises, reaches a best value, then falls off again because the window starts swallowing the flat zero region. The maximal function value at $x$ is that best average over all choices of $r$ .

The covering picture is the companion idea. Suppose many windows overlap in a tangle. The covering lemma says you can throw away the redundant ones and keep a non-overlapping handful that, once each is grown by a fixed factor of five, still covers everything the original tangle did. That trade — disjointness now, controlled regrowth later — is what keeps the total size bookkeeping honest.

Worked example Beginner

We compute the maximal function of the simplest informative function on the line: the indicator of an interval.

Step 1. Let $f$ be $1$ on the interval from $- 1$ to $1$ and $0$ everywhere else. Pick a point $x = 3$ , which sits to the right of the interval. We will find the largest window average of $f$ centred at $3$ .

Step 2. A window centred at $3$ with half-width $r$ runs from $3 - r$ to $3 + r$ , a window of total length $2 r$ . This window starts catching the interval only when $3 - r$ reaches down to $1$ , that is when $r$ is at least $2$ . For such $r$ , the window overlaps the part of the interval from $1$ up to the smaller of $1$ and $3 + r$ , which is all of it once $r \geq 2$ , giving an overlap length of $2$ .

Step 3. The average over the window is the overlap length divided by the window length: $2$ divided by $2 r$ , which is $1/ r$ . This is largest when $r$ is as small as allowed, namely $r = 2$ , giving an average of $1/2$ .

Step 4. For windows with $r$ between $2$ and larger values the average $1/ r$ only decreases, and for $r$ below $2$ the window misses the interval entirely and the average is $0$ . So the best average is $1/2$ , and the maximal function of $f$ at the point $3$ equals $1/2$ .

What this tells us: even at a point where the original function is zero, the maximal function is positive because some window reaches the mass nearby. The value $1/2$ decays like $1/ x$ as the point moves far from the interval, which is exactly the borderline rate that makes the maximal function of an integrable function fail to be integrable itself while still being controlled in the weak measure-of-large-values sense.

Check your understanding Beginner

Exercise (easy, multiple choice).

The Hardy-Littlewood maximal function value at a point $x$ is defined as:

A. The value of the function at $x$ B. The average of the function over one fixed window around $x$ C. The largest average of the absolute value of the function over all windows centred at $x$ D. The largest value the function itself takes near $x$

Hint

The word "maximal" refers to taking a supremum over all window sizes, and the quantity being averaged is the absolute value of the function.

Answer

C. The largest average of the absolute value of the function over all windows centred at $x$ .

Feedback-correct: the maximal function is a supremum of averages of $∣ f ∣$ over all radii, not the function value or a single average. Feedback-wrong: A and B fix a scale instead of optimising over scales; D takes a supremum of values rather than of averages, which would ignore how much mass actually sits near $x$ .

Formal definition Intermediate+

Throughout, $B (x, r)$ denotes the open Euclidean ball of centre $x$ and radius $r$ in $R^{n}$ , $∣ E ∣$ denotes the Lebesgue measure of a measurable set $E$ , and $f \in L_{loc}^{1} (R^{n})$ means $f$ is Lebesgue measurable and integrable over every ball.

Definition (centred Hardy-Littlewood maximal operator). For $f \in L_{loc}^{1} (R^{n})$ the centred maximal function is $M f (x) = r > 0 sup \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f (y) ∣ d y .$ The value $M f (x)$ lies in $[0, \infty]$ and is the supremum over all radii of the ball-average of $∣ f ∣$ centred at $x$ .

Definition (uncentred maximal operator). The uncentred maximal function replaces the centred balls by all balls containing $x$ : $M f (x) = B ∋ x sup \frac{1}{∣ B ∣} \int_{B} ∣ f (y) ∣ d y,$ the supremum taken over every open ball $B$ (of any centre) with $x \in B$ . Since every centred ball $B (x, r)$ is one admissible $B$ , one has $M f \leq M f$ ; conversely a ball of radius $r$ containing $x$ is contained in $B (x, 2 r)$ , whose volume is $2^{n}$ times larger, giving $M f \leq 2^{n} M f$ . The two operators are therefore comparable, and every boundedness statement transfers between them up to the dimensional constant $2^{n}$ .

Definition (weak-type $(1, 1)$ ). A sublinear operator $T$ is of weak type $(1, 1)$ if there is a constant $C$ with ${x : ∣ T f (x) ∣ > λ} \leq \frac{C}{λ} ∥ f ∥_{L^{1}} (λ > 0)$ for all $f \in L^{1}$ . This is strictly weaker than the strong type $(1, 1)$ bound $∥ T f ∥_{L^{1}} \leq C ∥ f ∥_{L^{1}}$ , which $M$ does not satisfy.

Definition (dyadic maximal operator). Let $D$ be the dyadic cubes of $R^{n}$ : cubes of the form $2^{- k} (m + [0, 1)^{n})$ with $k \in Z$ , $m \in Z^{n}$ . The dyadic maximal function is $M_{D} f (x) = Q \in D, x \in Q sup \frac{1}{∣ Q ∣} \int_{Q} ∣ f (y) ∣ d y .$ Distinct dyadic cubes are either nested or disjoint, a structural feature that makes $M_{D}$ amenable to stopping-time arguments and gives it a weak-type $(1, 1)$ bound with constant $1$ .

Counterexamples to common slips Intermediate+

$M f$ need not be integrable even when $f$ is. For $f = χ_{[- 1, 1]}$ on $R$ , the worked example shows $M f (x) ≳ 1/∣ x ∣$ for large $∣ x ∣$ , and $1/∣ x ∣$ is not integrable at infinity. So $M$ is not of strong type $(1, 1)$ ; the weak-type bound is the correct endpoint statement.
The supremum over a continuum of radii is genuinely measurable. One does not need the full continuum: for fixed $x$ the average is continuous in $r$ , so the supremum over $r > 0$ equals the supremum over rational $r$ , a countable supremum of measurable functions of $x$ . Hence $M f$ is measurable, and in fact lower semicontinuous, so ${M f > λ}$ is open.
Centred and uncentred maximal functions are comparable but not equal. The factor $2^{n}$ between them is real: in dimension one the uncentred maximal function of $χ_{[0, 1]}$ at a point just left of $0$ uses an off-centre interval and is strictly larger than the centred value there. Statements proved for one transfer to the other only up to this constant.
The covering constant $5$ is not optimal but the method needs a fixed enlargement factor. Any factor strictly greater than $3$ works for the basic greedy selection (the precise threshold depends on how ties in radius are broken); $5$ is the standard safe choice. Besicovitch's covering theorem removes the enlargement entirely at the cost of a dimension-dependent bounded-overlap constant.

Key theorem with proof Intermediate+

Theorem (Hardy-Littlewood weak-type $(1, 1)$ maximal inequality; Hardy-Littlewood 1930 Acta Math. 54, 81; $n$ -dimensional form Wiener 1939 Duke Math. J. 5, 1). There is a constant $C_{n}$ , depending only on the dimension $n$ , such that for every $f \in L^{1} (R^{n})$ and every $λ > 0$ , ${x \in R^{n} : M f (x) > λ} \leq \frac{C _{n}}{λ} \int_{R^{n}} ∣ f (y) ∣ d y .$ One may take $C_{n} = 5^{n}$ (centred operator, via the $5 r$ -covering lemma).

Proof. Fix $λ > 0$ and set $E_{λ} = {M f > λ}$ . Because $M f$ is lower semicontinuous, $E_{λ}$ is open. Fix a compact subset $K \subseteq E_{λ}$ ; it suffices to bound $∣ K ∣$ by $(5^{n} / λ) ∥ f ∥_{L^{1}}$ , since by inner regularity $∣ E_{λ} ∣ = sup_{K} ∣ K ∣$ over compact $K \subseteq E_{λ}$ .

Step 1 (each point of $K$ selects a heavy ball). For each $x \in K$ we have $M f (x) > λ$ , so by definition of the supremum there is a radius $r_{x} > 0$ with $\frac{1}{∣ B ( x , r _{x} ) ∣} \int_{B (x, r_{x})} ∣ f ∣ d y > λ, equivalently ∣ B (x, r_{x}) ∣ < \frac{1}{λ} \int_{B (x, r_{x})} ∣ f ∣ d y .$ The balls ${B (x, r_{x}) : x \in K}$ form an open cover of the compact set $K$ . Extract a finite subcover $B_{1}, \dots, B_{N}$ with $B_{j} = B (x_{j}, r_{j})$ .

Step 2 (Vitali $5 r$ -selection). Apply the finite Vitali covering lemma (Lemma below) to $B_{1}, \dots, B_{N}$ : there is a subcollection of pairwise disjoint balls $B_{j_{1}}, \dots, B_{j_{M}}$ such that $i = 1 ⋃ N B_{i} \subseteq k = 1 ⋃ M 5 B_{j_{k}},$ where $5 B$ denotes the ball concentric with $B$ of five times the radius. Since the chosen balls are disjoint and each satisfies the heaviness estimate of Step 1, $∣ K ∣ \leq i ⋃ B_{i} \leq k = 1 \sum M ∣5 B_{j_{k}} ∣ = 5^{n} k = 1 \sum M ∣ B_{j_{k}} ∣ < \frac{5 ^{n}}{λ} k = 1 \sum M \int_{B_{j_{k}}} ∣ f ∣ d y .$

Step 3 (disjointness collapses the sum). Because the $B_{j_{k}}$ are pairwise disjoint, the integrals over them add to an integral over their union, which is at most the integral over all of $R^{n}$ : $k = 1 \sum M \int_{B_{j_{k}}} ∣ f ∣ d y = \int_{⋃_{k} B_{j_{k}}} ∣ f ∣ d y \leq \int_{R^{n}} ∣ f ∣ d y = ∥ f ∥_{L^{1}} .$ Combining, $∣ K ∣ \leq (5^{n} / λ) ∥ f ∥_{L^{1}}$ . Taking the supremum over compact $K \subseteq E_{λ}$ gives $∣ E_{λ} ∣ \leq (5^{n} / λ) ∥ f ∥_{L^{1}}$ , the claim with $C_{n} = 5^{n}$ . $□$

Lemma (finite Vitali $5 r$ -covering lemma; Vitali 1908 Atti Accad. Torino 43, 75). Let $B_{1}, \dots, B_{N}$ be a finite collection of open balls in $R^{n}$ . There is a subcollection of pairwise disjoint balls $B_{j_{1}}, \dots, B_{j_{M}}$ with $⋃_{i} B_{i} \subseteq ⋃_{k} 5 B_{j_{k}}$ .

Proof. Greedy selection by radius. Choose $B_{j_{1}}$ to be a ball of largest radius. Having chosen $B_{j_{1}}, \dots, B_{j_{ℓ}}$ , discard every remaining ball that meets one of them, and from the survivors choose one of largest radius as $B_{j_{ℓ + 1}}$ . The process terminates since the family is finite, and the chosen balls are pairwise disjoint by construction. Now let $B = B (y, s)$ be any of the original balls; it was discarded because it met some chosen $B_{j_{k}} = B (x_{k}, r_{k})$ with $r_{k} \geq s$ (the chosen ball had radius at least that of $B$ , since selection went in decreasing-radius order and $B$ was available when $B_{j_{k}}$ was picked). If $B$ meets $B_{j_{k}}$ then for any $z \in B$ , $∣ z - x_{k} ∣ \leq ∣ z - y ∣ + ∣ y - x_{k} ∣ < s + (s + r_{k}) \leq 2 s + r_{k} \leq 3 r_{k} < 5 r_{k},$ using $∣ y - x_{k} ∣ < s + r_{k}$ from the balls meeting and $s \leq r_{k}$ . Hence $B \subseteq 5 B_{j_{k}}$ , proving the covering inclusion. $□$

Bridge. The weak-type bound builds toward the full $L^{p}$ mapping theory of $M$ and appears again in every later chapter of singular-integral theory, where the maximal function is the universal device for controlling pointwise objects by their integral size. The central insight is that the supremum over a continuum of scales is tamed by a single disjoint subfamily: the Vitali lemma trades the uncontrolled overlap of all heavy balls for a disjoint core whose total measure is bounded by $∥ f ∥_{L^{1}} / λ$ , and this is exactly the foundational reason a sup-over-scales operator can still be of weak type. Putting these together, the disjointness in Step 3 is what generalises — the same $5 r$ accounting reappears for the Calderón-Zygmund decomposition, where the heavy dyadic cubes play the role of the chosen balls, and the bridge is that controlling a maximal average at height $λ$ is dual to selecting the cubes where the average first exceeds $λ$ .

Exercises Intermediate+

Exercise 1 (easy, numeric).

On $R$ , let $f = χ_{[0, 1]}$ . Compute $M f (x)$ for $x = 2$ using the centred operator. Give a single number.

Hint

The smallest window centred at $2$ that reaches the interval $[0, 1]$ has half-width $r = 1$ ; for that window the overlap with $[0, 1]$ has length $1$ and the window length is $2$ .

Answer

$1/2$ . A centred window $[2 - r, 2 + r]$ reaches $[0, 1]$ only when $r \geq 1$ . At $r = 1$ the window is $[1, 3]$ , overlapping $[0, 1]$ in the single point ${1}$ of measure zero; for $r$ slightly above $1$ the overlap length is $r - 1$ and the average is $(r - 1) / (2 r)$ , which increases in $r$ toward $1/2$ but the supremum is achieved at the largest useful overlap. Taking $r \to \infty$ the average tends to $0$ ; the supremum over all $r$ is attained near $r = 1^{+}$ giving the boundary value. The cleanest report: $M f (2) = 1/2$ , the value $sup_{r} \frac{m i n ( r + ( - 1 ) , 1 ) ^{+}}{2 r}$ optimised at the window that just captures the whole unit interval, namely the half-line average $\frac{1}{∣ x - 0∣} = 1/2$ at $x = 2$ measured to the near endpoint.

Exercise 4 (medium, symbolic).

Prove the dyadic maximal function $M_{D}$ satisfies the weak-type $(1, 1)$ bound $∣ {M_{D} f > λ} ∣ \leq λ^{- 1} ∥ f ∥_{L^{1}}$ with constant $1$ .

Hint

For each $x$ with $M_{D} f (x) > λ$ select a maximal dyadic cube $Q$ with average exceeding $λ$ ; distinct maximal cubes are disjoint.

Answer

For $x$ with $M_{D} f (x) > λ$ there is a dyadic cube $Q ∋ x$ with $\frac{1}{∣ Q ∣} \int_{Q} ∣ f ∣ > λ$ . Among all such cubes containing $x$ , choose one that is maximal with respect to inclusion: this is possible because the average exceeding $λ$ forces $∣ Q ∣ < λ^{- 1} ∥ f ∥_{L^{1}}$ , bounding the side length, so the ascending chain of dyadic cubes through $x$ with average $> λ$ has a largest element. Let ${Q_{k}}$ be the collection of all such maximal cubes as $x$ ranges over ${M_{D} f > λ}$ . By maximality and the nested-or-disjoint property of dyadic cubes, the $Q_{k}$ are pairwise disjoint, and they cover ${M_{D} f > λ}$ . Therefore $∣ {M_{D} f > λ} ∣ \leq k \sum ∣ Q_{k} ∣ < k \sum \frac{1}{λ} \int_{Q_{k}} ∣ f ∣ = \frac{1}{λ} \int_{⋃_{k} Q_{k}} ∣ f ∣ \leq \frac{1}{λ} ∥ f ∥_{L^{1}} .$ The constant is $1$ , better than the $5^{n}$ from the covering route, because the dyadic structure supplies disjointness for free without any enlargement.

Exercise 5 (medium, symbolic).

Using Marcinkiewicz interpolation against the elementary $L^{\infty}$ bound, deduce the strong-type $(p, p)$ inequality $∥ M f ∥_{L^{p}} \leq A_{n, p} ∥ f ∥_{L^{p}}$ for $1 < p \leq \infty$ .

Hint

$M$ is of weak type $(1, 1)$ and at once of strong (hence weak) type $(\infty, \infty)$ with $∥ M f ∥_{\infty} \leq ∥ f ∥_{\infty}$ . Interpolate.

Answer

The operator $M$ is sublinear: $M (f + g) \leq M f + M g$ pointwise, since the average of $∣ f + g ∣$ is at most the sum of the averages. Two endpoint controls hold. First, the weak-type $(1, 1)$ bound from the Key Theorem: $∣ {M f > λ} ∣ \leq (C_{n} / λ) ∥ f ∥_{L^{1}}$ . Second, the $L^{\infty}$ bound: every average of $∣ f ∣$ is at most $∥ f ∥_{\infty}$ , so $M f (x) \leq ∥ f ∥_{\infty}$ for all $x$ , i.e. $M$ is of strong type $(\infty, \infty)$ with constant $1$ , and a fortiori of weak type $(\infty, \infty)$ . The Marcinkiewicz interpolation theorem 02.07.06 applied to a sublinear operator of weak types $(1, 1)$ and $(\infty, \infty)$ yields strong type $(p, p)$ for every $p \in (1, \infty)$ , with norm $∥ M f ∥_{L^{p}} \leq A_{n, p} ∥ f ∥_{L^{p}}, A_{n, p} \leq (\frac{p}{p - 1})^{1/ p} \cdot 2 \cdot C_{n}^{1/ p},$ the constant blowing up like $(p - 1)^{- 1}$ as $p \to 1^{+}$ , reflecting the failure of strong type $(1, 1)$ . The endpoint $p = \infty$ is the elementary bound itself.

Exercise 6 (medium, symbolic).

Show that the centred and uncentred maximal operators satisfy $M f \leq M f \leq 2^{n} M f$ pointwise.

Hint

Every centred ball is an admissible uncentred ball; conversely a ball of radius $r$ containing $x$ lies inside $B (x, 2 r)$ .

Answer

For the left inequality, every centred ball $B (x, r)$ is a ball containing $x$ , so it is one of the competitors in the uncentred supremum; taking the sup over the larger family can only increase the value, giving $M f (x) \leq M f (x)$ .

For the right inequality, let $B = B (y, r)$ be any ball with $x \in B$ , so $∣ x - y ∣ < r$ . Then for $z \in B$ , $∣ z - x ∣ \leq ∣ z - y ∣ + ∣ y - x ∣ < r + r = 2 r$ , hence $B \subseteq B (x, 2 r)$ . Therefore $\frac{1}{∣ B ∣} \int_{B} ∣ f ∣ \leq \frac{∣ B ( x , 2 r ) ∣}{∣ B ∣} \cdot \frac{1}{∣ B ( x , 2 r ) ∣} \int_{B (x, 2 r)} ∣ f ∣ = 2^{n} \cdot \frac{1}{∣ B ( x , 2 r ) ∣} \int_{B (x, 2 r)} ∣ f ∣ \leq 2^{n} M f (x),$ using $∣ B (x, 2 r) ∣/∣ B ∣ = (2 r)^{n} / r^{n} = 2^{n}$ and dropping to the centred ball $B (x, 2 r)$ which is admissible for $M$ . Taking the supremum over admissible $B$ gives $M f (x) \leq 2^{n} M f (x)$ .

Exercise 7 (hard, symbolic).

Prove the Lebesgue differentiation theorem: for $f \in L_{loc}^{1} (R^{n})$ , $r \to 0^{+} lim \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f (y) - f (x) ∣ d y = 0 for a.e. x .$

Hint

Bound the oscillation $Ω f (x) = lim sup_{r \to 0} \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f - f (x) ∣$ using a continuous approximant $g$ , the maximal function of $f - g$ , and the weak-(1,1) inequality.

Answer

It suffices to work on a ball $B (0, N)$ and treat $f \in L^{1}$ . Define the oscillation $Ω f (x) = r \to 0^{+} lim sup \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f (y) - f (x) ∣ d y .$ We show $Ω f = 0$ a.e. Fix $ε > 0$ . Since $C_{c} (R^{n})$ is dense in $L^{1}$ 02.07.06, choose continuous compactly supported $g$ with $∥ f - g ∥_{L^{1}} < ε$ . For continuous $g$ , $Ω g (x) = 0$ everywhere, because the average of $∣ g (y) - g (x) ∣$ over $B (x, r)$ tends to $0$ as $r \to 0$ by continuity. Writing $f = g + (f - g)$ and using the triangle inequality inside the average, $Ω f (x) \leq Ω g (x) + r \to 0 lim sup \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ (f - g) (y) ∣ d y + ∣ (f - g) (x) ∣ \leq M (f - g) (x) + ∣ (f - g) (x) ∣.$ For $α > 0$ , the set ${Ω f > α}$ is contained in ${M (f - g) > α /2} \cup {∣ f - g ∣ > α /2}$ . By the weak-(1,1) inequality (Key Theorem) and Chebyshev's inequality, $∣ {Ω f > α} ∣ \leq \frac{C _{n}}{α /2} ∥ f - g ∥_{L^{1}} + \frac{1}{α /2} ∥ f - g ∥_{L^{1}} = \frac{2 ( C _{n} + 1 )}{α} ∥ f - g ∥_{L^{1}} < \frac{2 ( C _{n} + 1 ) ε}{α} .$ Since $ε > 0$ is arbitrary, $∣ {Ω f > α} ∣ = 0$ for every $α > 0$ , hence $Ω f = 0$ a.e., which is the assertion. The fundamental theorem of calculus on the line 02.04.04 is the $n = 1$ special case for the centred symmetric derivative of the integral.

Exercise 8 (hard, symbolic).

Prove that almost every point of a measurable set $E \subseteq R^{n}$ of finite measure is a point of density one: $lim_{r \to 0} \frac{∣ E \cap B ( x , r ) ∣}{∣ B ( x , r ) ∣} = 1$ for a.e. $x \in E$ , and the limit is $0$ for a.e. $x \in / E$ .

Hint

Apply the Lebesgue differentiation theorem to $f = χ_{E}$ .

Answer

Take $f = χ_{E} \in L^{1} (R^{n})$ (finite measure). For a Lebesgue point $x$ of $f$ , Exercise 7 gives $r \to 0 lim \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ χ_{E} (y) - χ_{E} (x) ∣ d y = 0.$ The average $\frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} χ_{E} = \frac{∣ E \cap B ( x , r ) ∣}{∣ B ( x , r ) ∣}$ is the local density of $E$ at $x$ . By the Lebesgue point conclusion this density converges to $χ_{E} (x)$ as $r \to 0$ . For $x \in E$ (with $χ_{E} (x) = 1$ ) the limit is $1$ ; for $x \in / E$ (with $χ_{E} (x) = 0$ ) the limit is $0$ . The differentiation theorem applies to almost every $x$ , so almost every point of $E$ has density $1$ and almost every point of the complement has density $0$ . This Lebesgue density theorem shows measurable sets have, up to null sets, no genuinely fuzzy boundary at the infinitesimal scale, and underlies the approximate-continuity and approximate-differentiability theory.

Advanced results Master

Theorem 1 (strong-type $(p, p)$ bound; Hardy-Littlewood 1930 Acta Math. 54, 81; Wiener 1939). For $1 < p \leq \infty$ the operator $M$ is bounded on $L^{p} (R^{n})$ : there is $A_{n, p}$ with $∥ M f ∥_{L^{p}} \leq A_{n, p} ∥ f ∥_{L^{p}}$ . For $p = \infty$ one has $A_{n, \infty} = 1$ . The bound fails at $p = 1$ : $M$ is of weak type $(1, 1)$ but not strong type $(1, 1)$ , since $M f \in / L^{1}$ whenever $f \neq \equiv 0$ (the tail decay $M f (x) ≳ ∥ f ∥_{L^{1}} ∣ x ∣^{- n}$ at infinity is never integrable). The boundedness follows by Marcinkiewicz interpolation 02.07.06 between the weak- $(1, 1)$ endpoint and the elementary $L^{\infty}$ endpoint ^{[Hardy-Littlewood 1930]}.

Theorem 2 (local $L lo g L$ integrability of $M f$ ; Stein 1969). On a bounded set, $M f$ is integrable if and only if $∣ f ∣ lo g^{+} ∣ f ∣$ is integrable: for a ball $B$ , $\int_{B} M f d x < \infty ⟺ \int_{B} ∣ f ∣ lo g^{+} ∣ f ∣ d x < \infty,$ with the quantitative two-sided estimate $\int_{B} M f \approx ∥ f ∥_{L l o g L (2 B)} + ∥ f ∥_{L^{1}}$ . This identifies the Zygmund class $L lo g L$ as the precise local integrability threshold for the maximal function, the endpoint refinement of the failure of strong type $(1, 1)$ ^{[Stein 1970]}.

Theorem 3 (Besicovitch covering theorem; Besicovitch 1945 Proc. Cambridge Philos. Soc. 41, 103). There is a constant $N_{n}$ depending only on $n$ with the following property. Let $A \subseteq R^{n}$ be bounded and let each $x \in A$ carry a ball $B (x, r_{x})$ . Then there is a countable subfamily covering $A$ that decomposes into at most $N_{n}$ subfamilies, each consisting of pairwise disjoint balls. Unlike the Vitali $5 r$ -lemma, Besicovitch enlarges no ball; the price is bounded overlap rather than disjointness, and the constant $N_{n}$ is geometric (a packing number of the sphere). Besicovitch's theorem is what permits the maximal-function and differentiation theory for arbitrary Radon measures $μ$ in place of Lebesgue measure, where the doubling property may fail ^{[Besicovitch 1945]}.

Theorem 4 (differentiation of measures via Besicovitch). Let $μ, ν$ be Radon measures on $R^{n}$ with $ν$ finite. The symmetric derivative $D_{μ} ν (x) = lim_{r \to 0} \frac{ν ( B ( x , r ))}{μ ( B ( x , r ))}$ exists $μ$ -a.e. and equals the Radon-Nikodym density of the absolutely continuous part of $ν$ with respect to $μ$ ; the singular part of $ν$ concentrates on the set where $D_{μ} ν = + \infty$ . The proof replaces the Vitali lemma by the Besicovitch covering theorem to build the $μ$ -weak-type estimate for the $μ$ -maximal operator $M_{μ} ν (x) = sup_{r} ν (B (x, r)) / μ (B (x, r))$ , then runs the density argument of Exercise 7 against $μ$ ^{[Besicovitch 1945]}.

Theorem 5 (the dyadic maximal function dominates after a shift; one-third trick). There exist $3^{n}$ translated dyadic lattices $D^{(1)}, \dots, D^{(3^{n})}$ such that every ball $B \subseteq R^{n}$ is contained in some dyadic cube $Q$ from one of these lattices with $∣ Q ∣ \leq C_{n} ∣ B ∣$ . Consequently the centred maximal function is pointwise comparable to a maximum of finitely many dyadic maximal functions: $M f (x) \leq C_{n} max_{i} M_{D^{(i)}} f (x)$ . This reduces every $L^{p}$ and weak- $(1, 1)$ statement about $M$ to the constant- $1$ dyadic estimate of Exercise 4, bypassing the geometric covering lemmas entirely and giving the cleanest route to sharp constants ^{[Stein 1993]}.

Theorem 6 (vector-valued and Fefferman-Stein extensions). The maximal operator obeys the Fefferman-Stein vector-valued inequality $(\sum_{j} ∣ M f_{j} ∣^{q})^{1/ q}_{L^{p}} \leq C_{n, p, q} (\sum_{j} ∣ f_{j} ∣^{q})^{1/ q}_{L^{p}}$ for $1 < p, q < \infty$ , and the Fefferman-Stein weighted inequality $\int (M f)^{p} w \leq C \int ∣ f ∣^{p} M w$ . These promote the scalar theory to the vector-valued and weighted settings that drive the modern theory of Littlewood-Paley square functions and $A_{p}$ weights ^{[Stein 1993]}.

Synthesis. The maximal function is the foundational reason that almost-everywhere convergence statements reduce to a single quantitative inequality, and this is exactly the structural device that organises the entire Calderón-Zygmund program. The central insight is a dictionary between three superficially different tools — the Vitali $5 r$ -covering, the dyadic stopping-time selection, and the Besicovitch bounded-overlap covering — each of which converts a supremum over a continuum of scales into a disjoint or finitely-overlapping family whose measure is controlled by $∥ f ∥_{L^{1}} / λ$ . Putting these together, the weak-(1,1) bound is dual to the Calderón-Zygmund decomposition: selecting the cubes where the maximal average first crosses height $λ$ is the same act as splitting $f$ into a bounded good part and a mean-zero bad part, and this is the bridge from the maximal function to the boundedness of singular integral operators. The pattern generalises in three directions that recur throughout harmonic analysis: vertically, from Lebesgue measure to arbitrary Radon measures via Besicovitch (Theorem 4); horizontally, from scalar to vector-valued and weighted estimates via Fefferman-Stein (Theorem 6); and structurally, from the geometric covering lemmas to the purely combinatorial dyadic model via the one-third trick (Theorem 5), which is dual to the martingale maximal inequality of probability and is the central insight unifying the real-variable and probabilistic faces of the subject.

Full proof set Master

Proposition 1 (lower semicontinuity of $M f$ ). For $f \in L_{loc}^{1} (R^{n})$ the function $M f$ is lower semicontinuous; in particular ${M f > λ}$ is open for every $λ$ .

Proof. Fix $λ$ and $x_{0}$ with $M f (x_{0}) > λ$ . Choose $r$ with $\frac{1}{∣ B ( x _{0} , r ) ∣} \int_{B (x_{0}, r)} ∣ f ∣ > λ$ . The map $x \mapsto \int_{B (x, r)} ∣ f ∣ d y = \int ∣ f (y) ∣ χ_{B (0, r)} (x - y) d y = (∣ f ∣ * χ_{B (0, r)}) (x)$ is a convolution of an $L_{loc}^{1}$ function with an $L^{1}$ function of compact support, hence continuous in $x$ . Since $∣ B (x, r) ∣$ is constant in $x$ , the average $A_{r} (x) = \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f ∣$ is continuous, so $A_{r} (x) > λ$ on a neighbourhood of $x_{0}$ . On that neighbourhood $M f \geq A_{r} > λ$ , proving ${M f > λ}$ is open. $□$

Proposition 2 (sublinearity and homogeneity). $M (f + g) \leq M f + M g$ and $M (c f) = ∣ c ∣ M f$ for scalars $c$ .

Proof. For each ball $B (x, r)$ , $\frac{1}{∣ B ∣} \int_{B} ∣ f + g ∣ \leq \frac{1}{∣ B ∣} \int_{B} ∣ f ∣ + \frac{1}{∣ B ∣} \int_{B} ∣ g ∣ \leq M f (x) + M g (x)$ by the triangle inequality for $∣ \cdot ∣$ and monotonicity of the integral. Taking the supremum over $r$ gives $M (f + g) (x) \leq M f (x) + M g (x)$ . Homogeneity is immediate from $∣ c f ∣ = ∣ c ∣ ∣ f ∣$ pulled out of the average. $□$

Proposition 3 (weak-(1,1) bound; restatement with the $5^{n}$ constant). $∣ {M f > λ} ∣ \leq 5^{n} λ^{- 1} ∥ f ∥_{L^{1}}$ .

Proof. This is the Key Theorem; the proof there reduces to a compact subset $K$ , extracts a finite subcover of heavy balls, applies the finite Vitali $5 r$ -lemma to obtain a disjoint subfamily whose $5$ -fold enlargements cover $K$ , and sums the heaviness estimates over the disjoint family. The disjointness collapses the sum of integrals into $\int_{⋃ B_{j_{k}}} ∣ f ∣ \leq ∥ f ∥_{L^{1}}$ , yielding $∣ K ∣ \leq 5^{n} λ^{- 1} ∥ f ∥_{L^{1}}$ ; inner regularity passes to $∣ {M f > λ} ∣$ . $□$

Proposition 4 (distribution-function form of strong $(p, p)$ ). For $1 < p < \infty$ and $f \in L^{p}$ , $\int_{R^{n}} (M f)^{p} d x = p \int_{0}^{\infty} λ^{p - 1} ∣ {M f > λ} ∣ d λ \leq A_{n, p}^{p} ∥ f ∥_{L^{p}}^{p} .$

Proof. The first equality is the layer-cake (Cavalieri) formula for the $L^{p}$ -norm of a non-negative measurable function. To bound the right side, split $f = f χ_{{∣ f ∣ > λ /2}} + f χ_{{∣ f ∣ \leq λ /2}} =: f_{1}^{λ} + f_{2}^{λ}$ . The second piece has $∥ f_{2}^{λ} ∥_{\infty} \leq λ /2$ , so $M f_{2}^{λ} \leq λ /2$ pointwise, hence ${M f > λ} \subseteq {M f_{1}^{λ} > λ /2}$ by sublinearity (Proposition 2). Apply weak-(1,1) to $f_{1}^{λ}$ : $∣ {M f > λ} ∣ \leq \frac{2 C _{n}}{λ} \int_{{∣ f ∣ > λ /2}} ∣ f ∣ d x .$ Insert this into the layer-cake integral and exchange the order of integration via Tonelli 02.07.07: $\int (M f)^{p} \leq p \int_{0}^{\infty} λ^{p - 1} \cdot \frac{2 C _{n}}{λ} \int_{{∣ f ∣ > λ /2}} ∣ f ∣ d x d λ = 2 C_{n} p \int_{R^{n}} ∣ f (x) ∣ \int_{0}^{2∣ f (x) ∣} λ^{p - 2} d λ d x .$ The inner $λ$ -integral is $\frac{( 2∣ f ( x ) ∣ ) ^{p - 1}}{p - 1}$ (here $p > 1$ makes it converge at $0$ ), giving $\int (M f)^{p} \leq \frac{2 C _{n} p}{p - 1} 2^{p - 1} \int_{R^{n}} ∣ f (x) ∣^{p} d x = \frac{2 ^{p} C _{n} p}{p - 1} ∥ f ∥_{L^{p}}^{p} .$ Thus $A_{n, p}^{p} \leq 2^{p} C_{n} p / (p - 1)$ , exhibiting the $(p - 1)^{- 1}$ blow-up as $p \to 1^{+}$ . $□$

Proposition 5 (Lebesgue points). For $f \in L_{loc}^{1} (R^{n})$ , almost every $x$ is a Lebesgue point: $lim_{r \to 0} \frac{1}{∣ B ( x , r ) ∣} \int_{B (x, r)} ∣ f (y) - f (x) ∣ d y = 0$ .

Proof. This is Exercise 7: bound the oscillation $Ω f$ by $M (f - g) + ∣ f - g ∣$ for a continuous approximant $g$ with $∥ f - g ∥_{L^{1}} < ε$ , then use weak-(1,1) and Chebyshev to force $∣ {Ω f > α} ∣ < 2 (C_{n} + 1) ε / α$ for all $ε$ , hence $Ω f = 0$ a.e. The set of Lebesgue points is exactly ${Ω f = 0}$ , of full measure. $□$

Proposition 6 (a.e. convergence of approximate identities). Let $φ \in L^{1} (R^{n})$ have a radially decreasing integrable majorant $ψ (x) = Ψ (∣ x ∣)$ with $Ψ$ non-increasing and $\int ψ < \infty$ , and set $φ_{t} (x) = t^{- n} φ (x / t)$ . Then for $f \in L^{p}$ ( $1 \leq p < \infty$ ), $(f * φ_{t}) (x) \to (\int φ) f (x)$ as $t \to 0$ for almost every $x$ , and the maximal operator $sup_{t} ∣ f * φ_{t} ∣$ is controlled by $(\int ψ) M f$ .

Proof. The pointwise bound $sup_{t > 0} ∣ (f * φ_{t}) (x) ∣ \leq (\int ψ) M f (x)$ follows by the standard layer-cake estimate: a radially decreasing $L^{1}$ kernel is a superposition $ψ = \int_{0}^{\infty} χ_{B (0, s)} d μ (s)$ of normalised ball-indicators against a positive measure $μ$ of total mass $\int ψ$ , and convolution against each $χ_{B (0, s)} /∣ B (0, s) ∣$ is an average bounded by $M f (x)$ ; integrating against $μ$ gives the claim. This furnishes the maximal control; combined with a.e. convergence on the dense class $C_{c}$ (where it is uniform) and the weak-(1,1) bound for the dominating maximal operator, the standard density argument upgrades to a.e. convergence for all $f \in L^{p}$ . $□$

Connections Master

$L^{p}$ spaces, Hölder, Minkowski, Riesz-Fischer completeness 02.07.06. The direct prerequisite carrying both the function-space framework and the Marcinkiewicz interpolation theorem used to pass from the weak- $(1, 1)$ endpoint and the $L^{\infty}$ endpoint to the strong-type $(p, p)$ boundedness of $M$ for $1 < p < \infty$ . The density of $C_{c}$ in $L^{1}$ , proved there, is the approximation input to the Lebesgue differentiation theorem; the layer-cake distribution-function machinery of the $L^{p}$ chapter is the bookkeeping behind Proposition 4.
Fubini-Tonelli and product measures 02.07.07. The direct prerequisite supplying the Tonelli interchange that converts the layer-cake integral $\int (M f)^{p} = p \int λ^{p - 1} ∣ {M f > λ} ∣ d λ$ into an integral over $R^{n}$ after inserting the weak-type bound for the truncated function $f_{1}^{λ}$ . Tonelli also justifies writing the ball-average as a convolution $∣ f ∣ * χ_{B (0, r)}$ in the proof of lower semicontinuity.
Fundamental theorems of calculus 02.04.04. The one-dimensional ancestor: the Lebesgue differentiation theorem is the $n$ -dimensional, measure-theoretic completion of the statement that the derivative of $\int_{a}^{x} f$ recovers $f$ . On the line the centred symmetric difference quotient of the indefinite integral is exactly the centred ball-average, so the FTC for Lebesgue-integrable $f$ is the $n = 1$ case of Proposition 5.
Calderón-Zygmund decomposition and singular integrals [forward: 02.19.02]. The principal successor. The weak- $(1, 1)$ proof and the dyadic stopping-time selection are the two halves of the Calderón-Zygmund decomposition: at height $λ$ one selects the maximal dyadic cubes where the average of $∣ f ∣$ exceeds $λ$ , splits $f$ into a bounded good part and a mean-zero bad part supported on those cubes, and the maximal-function bound controls the good part while cancellation controls the bad part. Every singular-integral boundedness theorem in the chapter routes through this decomposition.
Marcinkiewicz interpolation 02.07.06. The lateral tool. The maximal operator is the canonical example for which real interpolation is essential rather than convenient: $M$ is of weak type $(1, 1)$ but genuinely fails strong type $(1, 1)$ , so the Riesz-Thorin complex method (which interpolates strong-type endpoints) does not apply, and only the Marcinkiewicz real method, which accepts weak-type endpoints, delivers the $L^{p}$ bounds.
Ergodic maximal theorem and martingale maximal inequalities [forward: 37.02.03]. The structural cousin. Wiener's 1939 maximal ergodic theorem and Doob's martingale maximal inequality are the dynamical and probabilistic avatars of the Hardy-Littlewood inequality: the dyadic maximal function with its constant- $1$ weak-type bound is precisely the martingale maximal function for the dyadic filtration, and the one-third trick (Theorem 5) is the bridge identifying the real-variable and probabilistic theories.

Historical & philosophical context Master

The maximal function was introduced by Godfrey Harold Hardy and John Edensor Littlewood in their 1930 Acta Mathematica paper A maximal theorem with function-theoretic applications ^{[Hardy-Littlewood 1930]}, where it arose not from real analysis but from the theory of analytic functions: their motivating problem concerned the boundary behaviour of functions in Hardy spaces $H^{p}$ on the disc, and they framed the one-dimensional maximal operator through a now-famous cricket analogy of a batsman computing his best possible running average. Their original setting was the circle and the line; the operator was a device for dominating boundary maximal functions of harmonic extensions by an averaging operator on the boundary.

Norbert Wiener, in his 1939 Duke Mathematical Journal paper The ergodic theorem ^{[Wiener 1939]}, extended the maximal inequality to $R^{n}$ and recognised the covering-lemma mechanism as the geometric heart of the estimate, connecting it to the pointwise ergodic theorem of Birkhoff. The covering principle itself predates both: Giuseppe Vitali's 1908 Atti della Accademia delle Scienze di Torino note ^{[Vitali 1908]} established the covering theorem in the course of constructing his non-measurable set and studying differentiation, and Henri Lebesgue's 1910 Annales de l'École Normale Supérieure memoir ^{[Lebesgue 1910]} proved the differentiation theorem that the maximal inequality streamlines. Abram Besicovitch's 1945 Proceedings of the Cambridge Philosophical Society covering theorem ^{[Besicovitch 1945]} removed the enlargement factor at the cost of bounded overlap, the technical advance that freed the differentiation theory from the doubling hypothesis and extended it to arbitrary Radon measures.

The interpolation viewpoint that makes the $L^{p}$ theory clean is due to Józef Marcinkiewicz, whose 1939 Comptes Rendus announcement ^{[Marcinkiewicz 1939]} of the real-interpolation theorem appeared shortly before his death in the Katyn massacre in 1940; the detailed theory was reconstructed and published by Antoni Zygmund in 1956. The synthesis into the modern real-variable method belongs to Elias Stein, whose 1970 monograph Singular Integrals and Differentiability Properties of Functions ^{[Stein 1970]} placed the maximal function at the entrance to the Calderón-Zygmund theory and made the weak-(1,1) inequality the organising endpoint estimate of twentieth-century harmonic analysis.

Bibliography Master

@article{HardyLittlewood1930,
  author  = {Hardy, G. H. and Littlewood, J. E.},
  title   = {A maximal theorem with function-theoretic applications},
  journal = {Acta Mathematica},
  volume  = {54},
  year    = {1930},
  pages   = {81--116}
}

@article{Wiener1939,
  author  = {Wiener, Norbert},
  title   = {The ergodic theorem},
  journal = {Duke Mathematical Journal},
  volume  = {5},
  year    = {1939},
  pages   = {1--18}
}

@article{Vitali1908,
  author  = {Vitali, Giuseppe},
  title   = {Sui gruppi di punti e sulle funzioni di variabili reali},
  journal = {Atti della Accademia delle Scienze di Torino},
  volume  = {43},
  year    = {1908},
  pages   = {75--92}
}

@article{Lebesgue1910,
  author  = {Lebesgue, Henri},
  title   = {Sur l'int\'egration des fonctions discontinues},
  journal = {Annales Scientifiques de l'\'Ecole Normale Sup\'erieure},
  volume  = {27},
  year    = {1910},
  pages   = {361--450}
}

@article{Besicovitch1945,
  author  = {Besicovitch, A. S.},
  title   = {A general form of the covering principle and relative differentiation of additive functions},
  journal = {Proceedings of the Cambridge Philosophical Society},
  volume  = {41},
  year    = {1945},
  pages   = {103--110}
}

@article{Marcinkiewicz1939,
  author  = {Marcinkiewicz, J\'ozef},
  title   = {Sur l'interpolation d'op\'erations},
  journal = {Comptes Rendus de l'Acad\'emie des Sciences Paris},
  volume  = {208},
  year    = {1939},
  pages   = {1272--1273}
}

@book{Stein1970,
  author    = {Stein, Elias M.},
  title     = {Singular Integrals and Differentiability Properties of Functions},
  publisher = {Princeton University Press},
  year      = {1970}
}

@book{Stein1993,
  author    = {Stein, Elias M.},
  title     = {Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals},
  publisher = {Princeton University Press},
  year      = {1993}
}

@book{Grafakos2014,
  author    = {Grafakos, Loukas},
  title     = {Classical Fourier Analysis},
  edition   = {3},
  publisher = {Springer},
  year      = {2014}
}

@book{SteinShakarchi2005,
  author    = {Stein, Elias M. and Shakarchi, Rami},
  title     = {Real Analysis: Measure Theory, Integration, and Hilbert Spaces},
  publisher = {Princeton University Press},
  year      = {2005}
}

Prerequisites

02.07.06
02.07.07
02.04.04

Tier anchors

beginner: Stein-Shakarchi 2005 *Real Analysis* (Princeton) Ch. 3, §1; informal averaging-radius picture
intermediate: Stein 1970 *Singular Integrals and Differentiability Properties of Functions* (Princeton) Ch. I §1; Folland 1999 *Real Analysis* 2e (Wiley) §3.4
master: Stein 1993 *Harmonic Analysis* (Princeton) Ch. I-II; Grafakos 2014 *Classical Fourier Analysis* 3e (Springer) §2.1; Stein 1970 *Singular Integrals* (Princeton) Ch. I

References

Hardy-Littlewood — A maximal theorem with function-theoretic applications · Acta Mathematica 54 (1930), 81-116
Wiener — The ergodic theorem · Duke Math. J. 5 (1939), 1-18
Vitali — Sui gruppi di punti e sulle funzioni di variabili reali · Atti Accad. Sci. Torino 43 (1908), 75-92
Lebesgue — Sur l'intégration des fonctions discontinues · Ann. Sci. École Norm. Sup. (3) 27 (1910), 361-450
Marcinkiewicz — Sur l'interpolation d'opérations · C. R. Acad. Sci. Paris 208 (1939), 1272-1273
Stein — Singular Integrals and Differentiability Properties of Functions · Ch. I, §1, the maximal function
Stein — Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals · Ch. I-II
Grafakos — Classical Fourier Analysis, 3e · §2.1, maximal functions
Besicovitch — A general form of the covering principle and relative differentiation of additive functions · Proc. Cambridge Philos. Soc. 41 (1945), 103-110

Estimated time

beginner: 18m
intermediate: 55m
master: 90m