Littlewood-Paley Theory and the Square Function
Anchor (Master): Stein 1970 *Singular Integrals* (Princeton) Ch. IV; Stein 1993 *Harmonic Analysis* (Princeton) Ch. I §8, VI-VII; Grafakos 2014 *Modern Fourier Analysis* 3e §6; Frazier-Jawerth-Weiss 1991 *Littlewood-Paley Theory and the Study of Function Spaces* (AMS CBMS 79)
Intuition Beginner
A sound engineer never works with a song as one undivided wall of noise. She splits it into frequency bands — the bass octave, the next octave up, and so on — and reads a separate level meter for each band. Two facts make this useful. First, the bands recombine: stack all the band signals back up and you recover the original song exactly. Second, the loudness of the whole song is faithfully reported by combining the band meters: a song is loud overall exactly when its bands carry a lot of energy between them, and quiet exactly when they do not. Littlewood-Paley theory is this idea turned into a theorem about functions.
The frequency axis gets chopped into "octaves" — bands where the frequency size roughly doubles from one band to the next. Each band picks out the part of the function vibrating at that scale, called a frequency piece. The square function is the band level meter: at each location you take the size of every frequency piece, square them, add them up, and take the square root, exactly the way you combine the sides of a right triangle to get the hypotenuse. This single number measures how much total energy, across all scales at once, the function carries near that point.
The headline claim is a faithfulness statement. Measuring the size of a function the ordinary way and measuring it through this band-combining square function give comparable answers — neither can be large without the other being large. So you may always trade the function for its menu of frequency pieces and lose nothing about its size. That trade is what makes scale-by-scale arguments legal in analysis.
The one-sentence takeaway: split a function into octave-frequency pieces, combine their sizes with a square-root-of-sum-of-squares, and the result faithfully reports the function's original size, so you can analyze any function one scale at a time.
Visual Beginner
Picture the frequency axis as a ruler. Place a row of smooth, overlapping bumps along it, each bump twice as wide and twice as far out as the one before: the first covers low frequencies, the next covers the octave above, and so on outward forever. Each bump is a band filter. Multiply a function's frequency content by one bump and you keep only that octave; the rest is silenced. Because the bumps add up to one everywhere, dropping nothing, stacking all the filtered pieces rebuilds the original.
The bottom panel shows the payoff. The three light curves are the function seen through three successive octave filters — each wigglier than the last, since higher octaves vibrate faster. The heavy curve below is the square function: at every point it bundles the three local sizes into one through the right-triangle combination. Where the heavy curve is tall, the function is carrying real energy nearby at some scale; where it is short, the function is locally calm at every scale. The theorem says the area-style size of this heavy curve matches the size of the original function.
Worked example Beginner
We build a square function for a signal made of two pure tones and read off how it combines their strengths.
Step 1. Take a signal with exactly two frequency components: a low tone of strength sitting in band one, and a high tone of strength sitting in band two. Every other band is silent. So the first frequency piece has size , the second has size , and all the rest have size .
Step 2. Form the square function at a point where these are the local sizes. Square each piece: squared is , and squared is . The silent bands contribute .
Step 3. Add the squares: . This total is the combined squared energy across all bands at that point.
Step 4. Take the square root: the square root of is . So the square function reads at that point.
What this tells us: the two tones of strength and do not simply add to , nor does the loud one alone () dominate. They combine through the right-triangle rule into , the hypotenuse of the - triangle. This is exactly how the square function pools energy from different scales: a function spread across many octaves registers a size that blends all of them, and the Littlewood-Paley theorem promises this blended reading tracks the function's true overall size.
Check your understanding Beginner
Formal definition Intermediate+
Throughout, is the Fourier transform on of 02.10.04, normalized by , and ranges over the Schwartz class unless stated otherwise. Fix a Littlewood-Paley bump whose Fourier transform is supported in the annulus , is non-negative, and satisfies
Such a exists: take any non-negative equal to on and supported in , set , which is supported in and telescopes to the required partition.
Definition (Littlewood-Paley projections). The Littlewood-Paley projection at frequency scale is the Fourier multiplier The frequency-localization keeps the part of with frequencies . The partition of unity gives the dyadic frequency decomposition with convergence in (and in modulo polynomials), since for by the partition property.
Definition (Littlewood-Paley square function). The square function of is the -norm of the projections, taken pointwise:
By Plancherel 02.10.04 and the almost-orthogonality of the (at most three overlap at any ), , so the equivalence is built in; the content of the theory is the same equivalence on every .
Definition (Stein -function). Let be the Poisson semigroup, , so is the harmonic extension of to the upper half-space . The Littlewood-Paley -function is a continuous-parameter square function measuring, at each , the -energy of the vertical derivative of the harmonic extension. The Lusin area integral refines it by integrating over a cone :
Counterexamples to common slips Intermediate+
The square function is not the same as the partial sum. takes the -norm of the before summing the geometric data; the reconstruction sums the with their signs. Cancellation in the second is exactly what the first discards, which is why controls from both sides while does not.
The equivalence is elementary; the equivalence is not. On the comparison is one application of Plancherel and bounded overlap. There is no analogue of Plancherel on for , so the statement genuinely requires the vector-valued Calderón-Zygmund theorem or the -function machinery; treating it as "Plancherel again" is the central error.
The choice of bump does not matter — but only up to constants. Two admissible partitions give comparable square functions, , with constants depending on . The space and the qualitative equivalence are independent of ; the numerical constant is not, and chasing the sharp constant is a separate and harder question.
The lower bound is not automatic from the upper bound. It is recovered by polarization/duality, not by reversing inequalities: one uses the upper bound for the dual exponent together with the reproducing identity for a fattened companion . Assuming the reverse bound for free is a common gap.
Key theorem with proof Intermediate+
Theorem (Littlewood-Paley; ). Let . There are constants , depending only on , , and the bump , such that for all ,
Proof. The two directions split.
Upper bound via vector-valued Calderón-Zygmund. Regard the family as a single operator from scalar functions to -valued functions. Its kernel is the -valued convolution kernel , acting by . Three facts make a (Banach-space-valued) Calderón-Zygmund operator with target :
bound. By Plancherel 02.10.04, , since at most three of the annuli overlap at any and . So .
Hörmander regularity of the -valued kernel. We must bound . By Minkowski's integral inequality 02.07.06 it suffices to control the kernel scale by scale and sum. Each is a Schwartz function at scale with and concentrated on with rapid decay; the mean value bound gives, for fixed ,
where the last estimate is the standard summation of the scale-localized differences over (the small scales contribute the factor , the large scales the rapid decay; the geometric series sums to ). Integrating over gives a dimensional constant . Thus satisfies the -valued Hörmander condition.
With the bound and the Hörmander condition, the vector-valued extension of the Calderón-Zygmund theorem 02.19.03 — whose proof is verbatim the scalar one, with absolute values replaced by the -norm throughout the decomposition argument — yields weak type for , and by Marcinkiewicz interpolation 02.07.06 and duality,
Lower bound via duality and a reproducing identity. Choose a fattened companion bump with on and supported in , so , hence on . For , polarizing the decomposition and using (the multiplier ),
By Cauchy-Schwarz in pointwise, then Hölder 02.07.06 with exponents ,
where . The upper bound already proved (applied to the companion family, which satisfies the same hypotheses) gives . Taking the supremum over and using the duality ,
which is the lower bound with .
Bridge. This theorem builds toward every scale-by-scale argument in modern harmonic analysis and dispersive PDE, and it appears again in the proof of the Marcinkiewicz and Hörmander-Mikhlin multiplier theorems, in the definition of Besov and Triebel-Lizorkin spaces, and in the Strichartz machinery downstream. The foundational reason it works is that the square function is exactly a Calderón-Zygmund operator with values in the Hilbert space , so the scalar theory of 02.19.03 transfers wholesale once absolute values become -norms; this is exactly the mechanism by which "one bound on plus kernel smoothness" propagates to the full open scale, now for the vector-valued kernel. The central insight is that orthogonality of frequencies, which on is the bare Plancherel identity, survives onto every — not as orthogonality but as the -square-function equivalence — and this is dual to itself: the lower bound is the upper bound for the conjugate exponent, fed through the reproducing identity . Putting these together, the Littlewood-Paley theorem generalises Plancherel from to , and the bridge is the Khintchine inequality, which in the Master tier replaces the vector-valued Calderón-Zygmund black box by an explicit randomization that turns the square function into an average of ordinary multiplier operators.
Exercises Intermediate+
Advanced results Master
Theorem 1 (Littlewood-Paley equivalence; Littlewood-Paley 1937 Proc. London Math. Soc. 42, 52; Stein 1970 Singular Integrals Ch. IV). For , with constants depending only on . The square function is a Hilbert-space-valued Calderón-Zygmund operator; the equivalence is the vector-valued 02.19.03 applied to the -valued kernel , with the lower bound recovered by duality through the reproducing identity [Stein 1970-SI].
Theorem 2 (Khintchine randomization and the multiplier route; Marcinkiewicz 1939 Studia Math. 8, 78). With Rademacher signs , pointwise by Khintchine, so . Each is a Mikhlin multiplier with -independent constants, whence and the upper bound follows without the vector-valued kernel. This is the proof Stein attributes to the randomization idea, making the orthogonality survive onto as an average of scalar operators [Marcinkiewicz 1939].
Theorem 3 (Stein -function equivalence; Stein 1970 Topics in Harmonic Analysis AMS Studies 63). For , , where . The proof parallels Theorem 1 with the continuous Hilbert space in place of : the -valued kernel is jointly homogeneous of degree , satisfies the -valued size and Hörmander bounds, and is -bounded with norm , so the vector-valued Calderón-Zygmund theorem applies and duality gives the lower bound [Stein 1970-Topics].
Theorem 4 (Lusin area integral; Marcinkiewicz-Zygmund, Stein). The area integral , integrating the full gradient of the harmonic extension over a cone, satisfies for , with constants depending on the aperture . The pointwise relation and a Fubini argument over cones link the area integral to the -function; the area integral additionally controls nontangential boundary behavior, making it the tool for almost-everywhere convergence and for the / theory [Stein 1970-SI].
Theorem 5 (Besov and Triebel-Lizorkin spaces; Frazier-Jawerth-Weiss 1991 CBMS 79). The Littlewood-Paley pieces define the scale of smoothness spaces: (Triebel-Lizorkin) and (Besov). For the Triebel-Lizorkin norm is the square-function norm, so () is exactly Theorem 1; Sobolev spaces are , Hölder-Zygmund spaces are , and the whole functional-analytic apparatus of PDE is encoded in the [Frazier-Jawerth-Weiss 1991].
Theorem 6 (Hörmander-Mikhlin and the BMO endpoint). The square function reproves the Hörmander-Mikhlin multiplier theorem (a symbol with up to order is -bounded) by per-octave control, and it is exactly the analytic object behind the Carleson-measure characterization of 02.20.01: iff is a Carleson measure, i.e. iff the area integral of is controlled in the averaged sense. The square function thus bridges the theory and the - endpoint theory [Stein 1970-SI].
Synthesis. The square function is the foundational reason that frequency orthogonality, an phenomenon, governs the entire scale : the central insight is that is a Calderón-Zygmund operator valued in a Hilbert space, so the scalar singular-integral theorem of 02.19.03 transfers verbatim, and this is exactly what converts Plancherel's single identity into the equivalence . Putting these together, three reformulations carry the same content: the discrete square function (Theorem 1), the randomized multiplier average via Khintchine (Theorem 2), and the continuous - and area-integrals built from the Poisson semigroup (Theorems 3-4) — the bridge between them is that each is a vector-valued kernel (, the Rademacher average, ) over which the same Hörmander estimate runs. This generalises in two directions at once: vertically it is dual to itself, the lower bound being the upper bound for the conjugate exponent through ; horizontally it generalises the identity into the full Besov-Triebel-Lizorkin scale (Theorem 5), so that Sobolev regularity, Hölder regularity, and Hardy-space membership are all read off the same dyadic pieces. The deepest reading is that the square function is dual to the maximal function: where the Hardy-Littlewood maximal operator controls size by taking suprema of averages, the square function controls size by taking -norms of frequency increments, and the Fefferman-Stein theory of 02.20.01 shows these two are the twin pillars on which the , , and scales rest.
Full proof set Master
Proposition 1 ( two-sided bound for ). for all .
Proof. By Plancherel 02.10.04, with . At most three annuli overlap, so . For the lower bound, the partition with at most three nonzero terms forces some term , whence . Thus .
Proposition 2 (-valued Hörmander bound). The kernel satisfies for a dimensional .
Proof. Bound the -norm by the -norm, then sum scales. Each obeys the Schwartz bounds and . By the mean value theorem, for , , using ; this difference is also bounded directly by . Taking the minimum of the two bounds and summing the geometric series in (small scales use the difference bound times ; large scales use the rapid decay) yields . Hence , and .
Proposition 3 (upper bound ). For , is bounded.
Proof. By Proposition 1, is bounded with norm . By Proposition 2 the -valued kernel satisfies the Hörmander condition. The Calderón-Zygmund argument of 02.19.03 applies with scalars replaced by : the decomposition at height gives a good part handled by the bound through Chebyshev, and a bad part handled off the doubled cubes by the -valued Hörmander estimate (the mean-zero cancellation subtracts in ). This yields weak type for ; Marcinkiewicz interpolation 02.07.06 between and gives , and duality of with gives .
Proposition 4 (lower bound ). For , .
Proof. With the fattened companion ( on ), the reproducing identity (verified on the Fourier side: , then partition) gives for
by Cauchy-Schwarz in and Hölder 02.07.06. The companion family satisfies the same hypotheses, so Proposition 3 gives . Taking the supremum over and using - duality, .
Proposition 5 (-function identity). .
Proof. By Plancherel and Tonelli as in Exercise 6, , and the inner integral equals after . Hence .
Proposition 6 (Khintchine upper-bound reduction). , where .
Proof. Khintchine (Exercise 3) with gives pointwise. Integrating in and applying Tonelli 02.07.06 to swap and , . Each is Mikhlin with -independent constants, so by 02.19.03.
Connections Master
Calderón-Zygmund singular integral operators
02.19.03. The direct prerequisite and the engine. The square function is literally a Calderón-Zygmund operator with values in ; the weak-/interpolation/duality machinery proved there transfers verbatim with absolute values replaced by the -norm, and the Mikhlin multiplier theorem from there bounds each randomized operator uniformly. The Littlewood-Paley theorem is the vector-valued payoff of the scalar singular-integral theory.Fourier transform on and the Plancherel theorem
02.10.04. The source of the skeleton. The projections are Fourier multipliers, the partition of unity and bounded annular overlap give the two-sided bound by Plancherel, and the -function's identity is a Plancherel computation on the Poisson symbol .L^p spaces, Hölder, Minkowski, Marcinkiewicz interpolation
02.07.06. The function-space scaffolding. Minkowski's integral inequality assembles the -valued Hörmander bound, Hölder pairs the two square functions in the duality lower bound, Marcinkiewicz interpolates the vector-valued weak- and endpoints, and Tonelli swaps the spatial integral with the Rademacher expectation in the Khintchine reduction.Hilbert space
02.11.08. The structural home of the construction. The square function takes values in , the -function in , the area integral in of a cone; the vector-valued Calderón-Zygmund theorem requires precisely a Hilbert-space target so that orthogonality (Plancherel/Bessel) supplies the bound and the inner product supplies the duality pairing.BMO and the John-Nirenberg inequality
02.20.01. The endpoint partner. The Carleson-measure characterization of is the statement that the area-integral energy of the Poisson extension is a Carleson measure; the square function is thus the analytic object that ports the cube-oscillation picture of to the half-space and links the theory to the - endpoint.Strichartz estimates via the TT* method
02.21.02. The downstream application in dispersive PDE. Frequency-localized (Littlewood-Paley) pieces are the standard device for proving Strichartz and bilinear estimates: one decomposes the solution into dyadic frequency blocks, proves the estimate per block by the method, and reassembles with the square-function equivalence, exactly the scale-by-scale strategy this unit licenses.
Historical & philosophical context Master
The square function was introduced by John Edensor Littlewood and Raymond Paley in a sequence of 1937 papers in the Proceedings of the London Mathematical Society [Littlewood-Paley 1937], working on the circle with lacunary and dyadic blocks of Fourier series. Their motivation was to extend M. Riesz's theory of conjugate functions to a statement about the sizes of dyadic Fourier blocks, proving that the norm of a function is comparable to the norm of the square function of its dyadic Fourier pieces. The original proofs used complex-analytic methods specific to the disc — Blaschke products and the theory of analytic functions in — and did not transfer to several real variables.
The real-variable theory was created by Elias Stein, who recast the square function as a singular integral with values in a Hilbert space and built the continuous-parameter -function from the Poisson semigroup, developing the systematic theory in his 1970 monograph Singular Integrals and Differentiability Properties of Functions and the companion Topics in Harmonic Analysis Related to the Littlewood-Paley Theory [Stein 1970-Topics]. The randomization argument using Rademacher functions and the Khintchine inequality, which reduces the square function to an average of ordinary multiplier operators, traces to Józef Marcinkiewicz and to the Marcinkiewicz-Zygmund theory of the 1930s [Marcinkiewicz 1939]. The discrete smooth dyadic decomposition in the form used today, and its role in defining Besov and Triebel-Lizorkin spaces, was systematized by Michael Frazier, Björn Jawerth, and Guido Weiss [Frazier-Jawerth-Weiss 1991], who recast the whole theory in terms of the -transform and atomic/molecular decompositions.
Bibliography Master
@article{LittlewoodPaley1937,
author = {Littlewood, John E. and Paley, Raymond E. A. C.},
title = {Theorems on Fourier series and power series},
journal = {Proceedings of the London Mathematical Society},
volume = {42},
year = {1937},
pages = {52--89}
}
@article{Marcinkiewicz1939,
author = {Marcinkiewicz, J\'ozef},
title = {Sur les multiplicateurs des s\'eries de Fourier},
journal = {Studia Mathematica},
volume = {8},
year = {1939},
pages = {78--91}
}
@book{Stein1970SI,
author = {Stein, Elias M.},
title = {Singular Integrals and Differentiability Properties of Functions},
publisher = {Princeton University Press},
year = {1970}
}
@book{Stein1970Topics,
author = {Stein, Elias M.},
title = {Topics in Harmonic Analysis Related to the Littlewood-Paley Theory},
series = {Annals of Mathematics Studies},
number = {63},
publisher = {Princeton University Press},
year = {1970}
}
@book{Stein1993,
author = {Stein, Elias M.},
title = {Harmonic Analysis: Real-Variable Methods, Orthogonality, and Oscillatory Integrals},
publisher = {Princeton University Press},
year = {1993}
}
@book{Grafakos2014Modern,
author = {Grafakos, Loukas},
title = {Modern Fourier Analysis},
edition = {3},
publisher = {Springer},
year = {2014}
}
@book{FrazierJawerthWeiss1991,
author = {Frazier, Michael and Jawerth, Bj\"orn and Weiss, Guido},
title = {Littlewood-Paley Theory and the Study of Function Spaces},
series = {CBMS Regional Conference Series in Mathematics},
number = {79},
publisher = {American Mathematical Society},
year = {1991}
}