The Hardy-Littlewood Maximal Function and the Vitali Covering Lemma
Anchor (Master): Stein 1993 *Harmonic Analysis* (Princeton) Ch. I-II; Grafakos 2014 *Classical Fourier Analysis* 3e (Springer) §2.1; Stein 1970 *Singular Integrals* (Princeton) Ch. I
Intuition Beginner
Suppose you have a temperature reading at every point along a metal rod, and the readings are noisy. A natural way to smooth them is to replace the value at each point by the average of the readings over a small window around that point. If the window is wide the average is very smooth; if it is narrow the average tracks the original closely. The maximal function asks a different question: across all possible window sizes centred at a point, what is the largest average of the absolute readings you can get? That single number measures how big the function looks near that point, in the most generous averaging you are allowed.
Why bother with the worst-case window rather than one fixed window? Because many questions in analysis hinge on controlling averages at every scale at once. If you want to know that the small-window averages settle down to the true value as the window shrinks, the clean way is to first control the supremum over all windows, then push the window to zero. The maximal function is the tool that bundles every scale into one object you can estimate.
The key fact is that this worst-case-average operation cannot blow up too badly. Even though taking a supremum over infinitely many windows looks dangerous, the set of points where the maximal function exceeds a height stays small in a precise, measurable way. That smallness is what lets the maximal function recover one of the founding theorems of integration theory: that the averages of an integrable function over shrinking balls converge back to the function almost everywhere.
The one-sentence takeaway: the maximal function records the largest averaging-window value at each point, and the surprising control on how often it is large is the engine behind almost-everywhere convergence and the whole later theory of singular integrals.
Visual Beginner
Picture a bump-shaped function on a line: zero far out, rising to a peak in the middle. Centre a window of half-width at a point and compute the average of the function's height over that window. As you grow from very small to very large, the average rises, reaches a best value, then falls off again because the window starts swallowing the flat zero region. The maximal function value at is that best average over all choices of .
The covering picture is the companion idea. Suppose many windows overlap in a tangle. The covering lemma says you can throw away the redundant ones and keep a non-overlapping handful that, once each is grown by a fixed factor of five, still covers everything the original tangle did. That trade — disjointness now, controlled regrowth later — is what keeps the total size bookkeeping honest.
Worked example Beginner
We compute the maximal function of the simplest informative function on the line: the indicator of an interval.
Step 1. Let be on the interval from to and everywhere else. Pick a point , which sits to the right of the interval. We will find the largest window average of centred at .
Step 2. A window centred at with half-width runs from to , a window of total length . This window starts catching the interval only when reaches down to , that is when is at least . For such , the window overlaps the part of the interval from up to the smaller of and , which is all of it once , giving an overlap length of .
Step 3. The average over the window is the overlap length divided by the window length: divided by , which is . This is largest when is as small as allowed, namely , giving an average of .
Step 4. For windows with between and larger values the average only decreases, and for below the window misses the interval entirely and the average is . So the best average is , and the maximal function of at the point equals .
What this tells us: even at a point where the original function is zero, the maximal function is positive because some window reaches the mass nearby. The value decays like as the point moves far from the interval, which is exactly the borderline rate that makes the maximal function of an integrable function fail to be integrable itself while still being controlled in the weak measure-of-large-values sense.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, denotes the open Euclidean ball of centre and radius in , denotes the Lebesgue measure of a measurable set , and means is Lebesgue measurable and integrable over every ball.
Definition (centred Hardy-Littlewood maximal operator). For the centred maximal function is The value lies in and is the supremum over all radii of the ball-average of centred at .
Definition (uncentred maximal operator). The uncentred maximal function replaces the centred balls by all balls containing : the supremum taken over every open ball (of any centre) with . Since every centred ball is one admissible , one has ; conversely a ball of radius containing is contained in , whose volume is times larger, giving . The two operators are therefore comparable, and every boundedness statement transfers between them up to the dimensional constant .
Definition (weak-type ). A sublinear operator is of weak type if there is a constant with for all . This is strictly weaker than the strong type bound , which does not satisfy.
Definition (dyadic maximal operator). Let be the dyadic cubes of : cubes of the form with , . The dyadic maximal function is Distinct dyadic cubes are either nested or disjoint, a structural feature that makes amenable to stopping-time arguments and gives it a weak-type bound with constant .
Counterexamples to common slips Intermediate+
need not be integrable even when is. For on , the worked example shows for large , and is not integrable at infinity. So is not of strong type ; the weak-type bound is the correct endpoint statement.
The supremum over a continuum of radii is genuinely measurable. One does not need the full continuum: for fixed the average is continuous in , so the supremum over equals the supremum over rational , a countable supremum of measurable functions of . Hence is measurable, and in fact lower semicontinuous, so is open.
Centred and uncentred maximal functions are comparable but not equal. The factor between them is real: in dimension one the uncentred maximal function of at a point just left of uses an off-centre interval and is strictly larger than the centred value there. Statements proved for one transfer to the other only up to this constant.
The covering constant is not optimal but the method needs a fixed enlargement factor. Any factor strictly greater than works for the basic greedy selection (the precise threshold depends on how ties in radius are broken); is the standard safe choice. Besicovitch's covering theorem removes the enlargement entirely at the cost of a dimension-dependent bounded-overlap constant.
Key theorem with proof Intermediate+
Theorem (Hardy-Littlewood weak-type maximal inequality; Hardy-Littlewood 1930 Acta Math. 54, 81; -dimensional form Wiener 1939 Duke Math. J. 5, 1). There is a constant , depending only on the dimension , such that for every and every , One may take (centred operator, via the -covering lemma).
Proof. Fix and set . Because is lower semicontinuous, is open. Fix a compact subset ; it suffices to bound by , since by inner regularity over compact .
Step 1 (each point of selects a heavy ball). For each we have , so by definition of the supremum there is a radius with The balls form an open cover of the compact set . Extract a finite subcover with .
Step 2 (Vitali -selection). Apply the finite Vitali covering lemma (Lemma below) to : there is a subcollection of pairwise disjoint balls such that where denotes the ball concentric with of five times the radius. Since the chosen balls are disjoint and each satisfies the heaviness estimate of Step 1,
Step 3 (disjointness collapses the sum). Because the are pairwise disjoint, the integrals over them add to an integral over their union, which is at most the integral over all of : Combining, . Taking the supremum over compact gives , the claim with .
Lemma (finite Vitali -covering lemma; Vitali 1908 Atti Accad. Torino 43, 75). Let be a finite collection of open balls in . There is a subcollection of pairwise disjoint balls with .
Proof. Greedy selection by radius. Choose to be a ball of largest radius. Having chosen , discard every remaining ball that meets one of them, and from the survivors choose one of largest radius as . The process terminates since the family is finite, and the chosen balls are pairwise disjoint by construction. Now let be any of the original balls; it was discarded because it met some chosen with (the chosen ball had radius at least that of , since selection went in decreasing-radius order and was available when was picked). If meets then for any , using from the balls meeting and . Hence , proving the covering inclusion.
Bridge. The weak-type bound builds toward the full mapping theory of and appears again in every later chapter of singular-integral theory, where the maximal function is the universal device for controlling pointwise objects by their integral size. The central insight is that the supremum over a continuum of scales is tamed by a single disjoint subfamily: the Vitali lemma trades the uncontrolled overlap of all heavy balls for a disjoint core whose total measure is bounded by , and this is exactly the foundational reason a sup-over-scales operator can still be of weak type. Putting these together, the disjointness in Step 3 is what generalises — the same accounting reappears for the Calderón-Zygmund decomposition, where the heavy dyadic cubes play the role of the chosen balls, and the bridge is that controlling a maximal average at height is dual to selecting the cubes where the average first exceeds .
Exercises Intermediate+
Advanced results Master
Theorem 1 (strong-type bound; Hardy-Littlewood 1930 Acta Math. 54, 81; Wiener 1939). For the operator is bounded on : there is with . For one has . The bound fails at : is of weak type but not strong type , since whenever (the tail decay at infinity is never integrable). The boundedness follows by Marcinkiewicz interpolation 02.07.06 between the weak- endpoint and the elementary endpoint [Hardy-Littlewood 1930].
Theorem 2 (local integrability of ; Stein 1969). On a bounded set, is integrable if and only if is integrable: for a ball , with the quantitative two-sided estimate . This identifies the Zygmund class as the precise local integrability threshold for the maximal function, the endpoint refinement of the failure of strong type [Stein 1970].
Theorem 3 (Besicovitch covering theorem; Besicovitch 1945 Proc. Cambridge Philos. Soc. 41, 103). There is a constant depending only on with the following property. Let be bounded and let each carry a ball . Then there is a countable subfamily covering that decomposes into at most subfamilies, each consisting of pairwise disjoint balls. Unlike the Vitali -lemma, Besicovitch enlarges no ball; the price is bounded overlap rather than disjointness, and the constant is geometric (a packing number of the sphere). Besicovitch's theorem is what permits the maximal-function and differentiation theory for arbitrary Radon measures in place of Lebesgue measure, where the doubling property may fail [Besicovitch 1945].
Theorem 4 (differentiation of measures via Besicovitch). Let be Radon measures on with finite. The symmetric derivative exists -a.e. and equals the Radon-Nikodym density of the absolutely continuous part of with respect to ; the singular part of concentrates on the set where . The proof replaces the Vitali lemma by the Besicovitch covering theorem to build the -weak-type estimate for the -maximal operator , then runs the density argument of Exercise 7 against [Besicovitch 1945].
Theorem 5 (the dyadic maximal function dominates after a shift; one-third trick). There exist translated dyadic lattices such that every ball is contained in some dyadic cube from one of these lattices with . Consequently the centred maximal function is pointwise comparable to a maximum of finitely many dyadic maximal functions: . This reduces every and weak- statement about to the constant- dyadic estimate of Exercise 4, bypassing the geometric covering lemmas entirely and giving the cleanest route to sharp constants [Stein 1993].
Theorem 6 (vector-valued and Fefferman-Stein extensions). The maximal operator obeys the Fefferman-Stein vector-valued inequality for , and the Fefferman-Stein weighted inequality . These promote the scalar theory to the vector-valued and weighted settings that drive the modern theory of Littlewood-Paley square functions and weights [Stein 1993].
Synthesis. The maximal function is the foundational reason that almost-everywhere convergence statements reduce to a single quantitative inequality, and this is exactly the structural device that organises the entire Calderón-Zygmund program. The central insight is a dictionary between three superficially different tools — the Vitali -covering, the dyadic stopping-time selection, and the Besicovitch bounded-overlap covering — each of which converts a supremum over a continuum of scales into a disjoint or finitely-overlapping family whose measure is controlled by . Putting these together, the weak-(1,1) bound is dual to the Calderón-Zygmund decomposition: selecting the cubes where the maximal average first crosses height is the same act as splitting into a bounded good part and a mean-zero bad part, and this is the bridge from the maximal function to the boundedness of singular integral operators. The pattern generalises in three directions that recur throughout harmonic analysis: vertically, from Lebesgue measure to arbitrary Radon measures via Besicovitch (Theorem 4); horizontally, from scalar to vector-valued and weighted estimates via Fefferman-Stein (Theorem 6); and structurally, from the geometric covering lemmas to the purely combinatorial dyadic model via the one-third trick (Theorem 5), which is dual to the martingale maximal inequality of probability and is the central insight unifying the real-variable and probabilistic faces of the subject.
Full proof set Master
Proposition 1 (lower semicontinuity of ). For the function is lower semicontinuous; in particular is open for every .
Proof. Fix and with . Choose with . The map is a convolution of an function with an function of compact support, hence continuous in . Since is constant in , the average is continuous, so on a neighbourhood of . On that neighbourhood , proving is open.
Proposition 2 (sublinearity and homogeneity). and for scalars .
Proof. For each ball , by the triangle inequality for and monotonicity of the integral. Taking the supremum over gives . Homogeneity is immediate from pulled out of the average.
Proposition 3 (weak-(1,1) bound; restatement with the constant). .
Proof. This is the Key Theorem; the proof there reduces to a compact subset , extracts a finite subcover of heavy balls, applies the finite Vitali -lemma to obtain a disjoint subfamily whose -fold enlargements cover , and sums the heaviness estimates over the disjoint family. The disjointness collapses the sum of integrals into , yielding ; inner regularity passes to .
Proposition 4 (distribution-function form of strong ). For and ,
Proof. The first equality is the layer-cake (Cavalieri) formula for the -norm of a non-negative measurable function. To bound the right side, split . The second piece has , so pointwise, hence by sublinearity (Proposition 2). Apply weak-(1,1) to :
Insert this into the layer-cake integral and exchange the order of integration via Tonelli 02.07.07:
The inner -integral is (here makes it converge at ), giving
Thus , exhibiting the blow-up as .
Proposition 5 (Lebesgue points). For , almost every is a Lebesgue point: .
Proof. This is Exercise 7: bound the oscillation by for a continuous approximant with , then use weak-(1,1) and Chebyshev to force for all , hence a.e. The set of Lebesgue points is exactly , of full measure.
Proposition 6 (a.e. convergence of approximate identities). Let have a radially decreasing integrable majorant with non-increasing and , and set . Then for (), as for almost every , and the maximal operator is controlled by .
Proof. The pointwise bound follows by the standard layer-cake estimate: a radially decreasing kernel is a superposition of normalised ball-indicators against a positive measure of total mass , and convolution against each is an average bounded by ; integrating against gives the claim. This furnishes the maximal control; combined with a.e. convergence on the dense class (where it is uniform) and the weak-(1,1) bound for the dominating maximal operator, the standard density argument upgrades to a.e. convergence for all .
Connections Master
spaces, Hölder, Minkowski, Riesz-Fischer completeness
02.07.06. The direct prerequisite carrying both the function-space framework and the Marcinkiewicz interpolation theorem used to pass from the weak- endpoint and the endpoint to the strong-type boundedness of for . The density of in , proved there, is the approximation input to the Lebesgue differentiation theorem; the layer-cake distribution-function machinery of the chapter is the bookkeeping behind Proposition 4.Fubini-Tonelli and product measures
02.07.07. The direct prerequisite supplying the Tonelli interchange that converts the layer-cake integral into an integral over after inserting the weak-type bound for the truncated function . Tonelli also justifies writing the ball-average as a convolution in the proof of lower semicontinuity.Fundamental theorems of calculus
02.04.04. The one-dimensional ancestor: the Lebesgue differentiation theorem is the -dimensional, measure-theoretic completion of the statement that the derivative of recovers . On the line the centred symmetric difference quotient of the indefinite integral is exactly the centred ball-average, so the FTC for Lebesgue-integrable is the case of Proposition 5.Calderón-Zygmund decomposition and singular integrals [forward: 02.19.02]. The principal successor. The weak- proof and the dyadic stopping-time selection are the two halves of the Calderón-Zygmund decomposition: at height one selects the maximal dyadic cubes where the average of exceeds , splits into a bounded good part and a mean-zero bad part supported on those cubes, and the maximal-function bound controls the good part while cancellation controls the bad part. Every singular-integral boundedness theorem in the chapter routes through this decomposition.
Marcinkiewicz interpolation
02.07.06. The lateral tool. The maximal operator is the canonical example for which real interpolation is essential rather than convenient: is of weak type but genuinely fails strong type , so the Riesz-Thorin complex method (which interpolates strong-type endpoints) does not apply, and only the Marcinkiewicz real method, which accepts weak-type endpoints, delivers the bounds.Ergodic maximal theorem and martingale maximal inequalities [forward: 37.02.03]. The structural cousin. Wiener's 1939 maximal ergodic theorem and Doob's martingale maximal inequality are the dynamical and probabilistic avatars of the Hardy-Littlewood inequality: the dyadic maximal function with its constant- weak-type bound is precisely the martingale maximal function for the dyadic filtration, and the one-third trick (Theorem 5) is the bridge identifying the real-variable and probabilistic theories.
Historical & philosophical context Master
The maximal function was introduced by Godfrey Harold Hardy and John Edensor Littlewood in their 1930 Acta Mathematica paper A maximal theorem with function-theoretic applications [Hardy-Littlewood 1930], where it arose not from real analysis but from the theory of analytic functions: their motivating problem concerned the boundary behaviour of functions in Hardy spaces on the disc, and they framed the one-dimensional maximal operator through a now-famous cricket analogy of a batsman computing his best possible running average. Their original setting was the circle and the line; the operator was a device for dominating boundary maximal functions of harmonic extensions by an averaging operator on the boundary.
Norbert Wiener, in his 1939 Duke Mathematical Journal paper The ergodic theorem [Wiener 1939], extended the maximal inequality to and recognised the covering-lemma mechanism as the geometric heart of the estimate, connecting it to the pointwise ergodic theorem of Birkhoff. The covering principle itself predates both: Giuseppe Vitali's 1908 Atti della Accademia delle Scienze di Torino note [Vitali 1908] established the covering theorem in the course of constructing his non-measurable set and studying differentiation, and Henri Lebesgue's 1910 Annales de l'École Normale Supérieure memoir [Lebesgue 1910] proved the differentiation theorem that the maximal inequality streamlines. Abram Besicovitch's 1945 Proceedings of the Cambridge Philosophical Society covering theorem [Besicovitch 1945] removed the enlargement factor at the cost of bounded overlap, the technical advance that freed the differentiation theory from the doubling hypothesis and extended it to arbitrary Radon measures.
The interpolation viewpoint that makes the theory clean is due to Józef Marcinkiewicz, whose 1939 Comptes Rendus announcement [Marcinkiewicz 1939] of the real-interpolation theorem appeared shortly before his death in the Katyn massacre in 1940; the detailed theory was reconstructed and published by Antoni Zygmund in 1956. The synthesis into the modern real-variable method belongs to Elias Stein, whose 1970 monograph Singular Integrals and Differentiability Properties of Functions [Stein 1970] placed the maximal function at the entrance to the Calderón-Zygmund theory and made the weak-(1,1) inequality the organising endpoint estimate of twentieth-century harmonic analysis.
Bibliography Master
@article{HardyLittlewood1930,
author = {Hardy, G. H. and Littlewood, J. E.},
title = {A maximal theorem with function-theoretic applications},
journal = {Acta Mathematica},
volume = {54},
year = {1930},
pages = {81--116}
}
@article{Wiener1939,
author = {Wiener, Norbert},
title = {The ergodic theorem},
journal = {Duke Mathematical Journal},
volume = {5},
year = {1939},
pages = {1--18}
}
@article{Vitali1908,
author = {Vitali, Giuseppe},
title = {Sui gruppi di punti e sulle funzioni di variabili reali},
journal = {Atti della Accademia delle Scienze di Torino},
volume = {43},
year = {1908},
pages = {75--92}
}
@article{Lebesgue1910,
author = {Lebesgue, Henri},
title = {Sur l'int\'egration des fonctions discontinues},
journal = {Annales Scientifiques de l'\'Ecole Normale Sup\'erieure},
volume = {27},
year = {1910},
pages = {361--450}
}
@article{Besicovitch1945,
author = {Besicovitch, A. S.},
title = {A general form of the covering principle and relative differentiation of additive functions},
journal = {Proceedings of the Cambridge Philosophical Society},
volume = {41},
year = {1945},
pages = {103--110}
}
@article{Marcinkiewicz1939,
author = {Marcinkiewicz, J\'ozef},
title = {Sur l'interpolation d'op\'erations},
journal = {Comptes Rendus de l'Acad\'emie des Sciences Paris},
volume = {208},
year = {1939},
pages = {1272--1273}
}
@book{Stein1970,
author = {Stein, Elias M.},
title = {Singular Integrals and Differentiability Properties of Functions},
publisher = {Princeton University Press},
year = {1970}
}
@book{Stein1993,
author = {Stein, Elias M.},
title = {Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals},
publisher = {Princeton University Press},
year = {1993}
}
@book{Grafakos2014,
author = {Grafakos, Loukas},
title = {Classical Fourier Analysis},
edition = {3},
publisher = {Springer},
year = {2014}
}
@book{SteinShakarchi2005,
author = {Stein, Elias M. and Shakarchi, Rami},
title = {Real Analysis: Measure Theory, Integration, and Hilbert Spaces},
publisher = {Princeton University Press},
year = {2005}
}