# Basic Littlewood-Paley theory I: frequency projections

I have written some notes on Littlewood-Paley theory for a masterclass, which I thought I would share here as well. This is the first part, covering some motivation, the case of a single frequency projection and its vector-valued generalisation. References I have used in preparing these notes include Stein’s “Singular integrals and differentiability properties of functions“, Duoandikoetxea’s “Fourier Analysis“, Grafakos’ “Classical Fourier Analysis” and as usual some material by Tao, both from his blog and the notes for his courses. Prerequisites are some basic Fourier transform theory, Calderón-Zygmund theory of euclidean singular integrals and its vector-valued generalisation (to Hilbert spaces, we won’t need Banach spaces).

0. Introduction
Harmonic analysis makes a fundamental use of divide-et-impera approaches. A particularly fruitful one is the decomposition of a function in terms of the frequencies that compose it, which is prominently incarnated in the theory of the Fourier transform and Fourier series. In many applications however it is not necessary or even useful to resolve the function ${f}$ at the level of single frequencies and it suffices instead to consider how wildly different frequency components behave instead. One example of this is the (formal) decomposition of functions of ${\mathbb{R}}$ given by

$\displaystyle f = \sum_{j \in \mathbb{Z}} \Delta_j f,$

where ${\Delta_j f}$ denotes the operator

$\displaystyle \Delta_j f (x) := \int_{\{\xi \in \mathbb{R} : 2^j \leq |\xi| < 2^{j+1}\}} \widehat{f}(\xi) e^{2\pi i \xi \cdot x} d\xi,$

commonly referred to as a (dyadic) frequency projection. Thus ${\Delta_j f}$ represents the portion of ${f}$ with frequencies of magnitude ${\sim 2^j}$. The Fourier inversion formula can be used to justify the above decomposition if, for example, ${f \in L^2(\mathbb{R})}$. Heuristically, since any two ${\Delta_j f, \Delta_{k} f}$ oscillate at significantly different frequencies when ${|j-k|}$ is large, we would expect that for most ${x}$‘s the different contributions to the sum cancel out more or less randomly; a probabilistic argument typical of random walks (see Exercise 1) leads to the conjecture that ${|f|}$ should behave “most of the time” like ${\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}}$ (the last expression is an example of a square function). While this is not true in a pointwise sense, we will see in these notes that the two are indeed interchangeable from the point of view of ${L^p}$-norms: more precisely, we will show that for any ${1 < p < \infty}$ it holds that

$\displaystyle \boxed{ \|f\|_{L^p (\mathbb{R})} \sim_p \Big\|\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}\Big\|_{L^p (\mathbb{R})}. }\ \ \ \ \ (\dagger)$

This is a result historically due to Littlewood and Paley, which explains the name given to the related theory. It is easy to see that the ${p=2}$ case is obvious thanks to Plancherel’s theorem, to which the statement is essentially equivalent. Therefore one could interpret the above as a substitute for Plancherel’s theorem in generic ${L^p}$ spaces when ${p\neq 2}$.

In developing a framework that allows to prove ($\dagger$) we will encounter some variants of the square function above, including ones with smoother frequency projections that are useful in a variety of contexts. We will moreover show some applications of the above fact and its variants. One of these applications will be a proof of the boundedness of the spherical maximal function ${\mathscr{M}_{\mathbb{S}^{d-1}}}$ (almost verbatim the one on Tao’s blog).

Notation: We will use ${A \lesssim B}$ to denote the estimate ${A \leq C B}$ where ${C>0}$ is some absolute constant, and ${A\sim B}$ to denote the fact that ${A \lesssim B \lesssim A}$. If the constant ${C}$ depends on a list of parameters ${L}$ we will write ${A \lesssim_L B}$.

1. Motivation: estimates for the heat equation with sources
In this section we present certain natural objects that cannot be dealt with by (euclidean) Calderón-Zygmund theory alone, as a way to motivate the study of Littlewood-Paley theory.

Consider the heat equation on ${\mathbb{R}^d}$

$\displaystyle \begin{cases} \partial_t u - \Delta u = f, \\ u(x,0) = f_0(x), \end{cases} \ \ \ \ \ (2)$

with ${f = f(x,t)}$ representing the instantaneous heat generated by the source at point ${x}$ at instant ${t}$. If the total heat introduced by the sources is small, we expect to see that the solution ${u}$ evolves slowly (that is, ${\partial_t u}$ is also small). One sense in which this conjecture can be made rigorous is to control the ${L^p (\mathbb{R}^d \times \mathbb{R})}$-norms of ${\partial_t u}$ by the corresponding norms of the data ${f}$. If we take a Fourier transform in both ${x}$ and ${t}$ we have

\displaystyle \begin{aligned} \widehat{\partial_t u}(\xi, \tau) & = 2\pi i \tau \widehat{u}(\xi, \tau), \\ \widehat{\Delta u}(\xi, \tau) & = - 4\pi^2 |\xi|^2 \widehat{u}(\xi, \tau). \end{aligned}

Now, we are trying to relate ${\partial_t u}$ and ${f}$ on the Fourier side (that is, we are trying to use spectral calculus). The simplest way to do so is to observe that, by taking the Fourier transform of both sides of (2), we have

$\displaystyle \widehat{\partial_t u} - \widehat{\Delta u} = \widehat{f}, \quad \text{ that is } \quad (2\pi i\tau + 4\pi^2|\xi|^2) \widehat{u}(\xi, \tau) = \widehat{f}(\xi, \tau);$

thus we can write with a little algebra

$\displaystyle \widehat{\partial_t u}(\xi, \tau)= 2\pi i \tau \widehat{u} =2 \pi i \tau \frac{2\pi i\tau + 4\pi^2|\xi|^2}{2\pi i\tau + 4\pi^2|\xi|^2} \widehat{u} = \frac{\tau}{\tau - 2\pi i |\xi|^2} \widehat{f}.$

Let ${m(\xi, \tau) := \frac{\tau}{\tau - 2\pi i |\xi|^2}}$, so that the result can be restated as ${\widehat{\partial_t u} = m \widehat{f}}$. Observe that both the real and imaginary parts of ${m}$ are discontinuous at ${(0,0)}$ and nowhere else. By Fourier inversion we can interpret this equality as defining a linear operator ${T_m}$, given by

$\displaystyle T_m g(x, t) = \int_{\mathbb{R}^d \times \mathbb{R}} m(\xi,\tau) \widehat{g}(\xi, \tau) e^{i(\xi\cdot x + t \tau)} d \xi d \tau;$

thus we have come to realise that the partial derivative ${\partial_t u}$ can be obtained (at least formally) from the function ${f}$ by ${\partial_t u = T_m f}$. To bound ${\|\partial_t u\|_{L^p(\mathbb{R}^d \times \mathbb{R})}}$ is therefore equivalent to bounding ${\|T_m f\|_{L^p(\mathbb{R}^d \times \mathbb{R})}}$: if ${T_m}$ is bounded from ${L^p}$ into ${L^p}$ then we will have

$\displaystyle \|\partial_t u\|_{L^p(\mathbb{R}^d \times \mathbb{R})} = \|T_m f\|_{L^p(\mathbb{R}^d \times \mathbb{R})} \lesssim \|f\|_{L^p(\mathbb{R}^n \times \mathbb{R})},$

which is the type of control we are looking for.

Observe that by the properties of the Fourier transform our operator is actually a convolution operator: precisely1,

$\displaystyle T_m f (x,t) = f \ast \widehat{m}(x,t) = \int_{\mathbb{R}^d} \int_{\mathbb{R}} f(x - y, t -s) \widehat{m}(y,s) d y d s.$

If we knew that the convolution kernel ${\widehat{m}}$ satisfied $\|\widehat{m}\|_{L^1} < \infty$ we would be done, since we could appeal to Young’s inequality. Unfortunately, this is not the case. This is the same issue that one encounters in dealing with singular integral kernels such as $p.v. \frac{dt}{t}$, so the natural thing to do is to check whether $\widehat{m}$ is a Calderón-Zygmund kernel. However, this is also not the case! Indeed, for $K$ to be a (euclidean) Calderón-Zygmund convolution kernel, it has to satisfy certain smoothness properties such as $|\nabla K(x)| \lesssim |x|^{-d-1}$. However, $\widehat{m}$ enjoys a certain (parabolic) scaling invariance that makes it incompatible with the last inequality: indeed, one can easily see that for any $\lambda > 0$

$\displaystyle \widehat{m}(y,s) = \lambda^{d+2} \widehat{m}(\lambda y, \lambda^2 s)$

and deduce from this a similar anisotropic rescaling invariance for ${\nabla \widehat{m}}$. Since ${|(y,s)|^{-d-1}}$ is not invariant with respect to this rescaling, the inequality ${|\nabla\widehat{m}(y,s)|\lesssim |(y,s)|^{-d-1}}$ cannot possibly hold for all ${y,s}$.

One solution to this issue is to develop a theory of parabolic singular integrals that extends the results of classical Calderón-Zygmund theory to those kernels that satisfy anisotropic rescaling invariance identities of the kind above. While this is a viable approach, we will take a different one in these notes. The approach we will take will not make use of said identities and will thus be more general in nature. With the methods of next section we will be able to show that ${T_m}$ is indeed ${L^p \rightarrow L^p}$ bounded for all ${1.

2. Frequency projections

We begin by considering the frequency projections introduced in the beginning. Let ${I}$ be an interval and define the frequency projection ${\Delta_I}$ to be

$\displaystyle \Delta_I f(x) := \int \mathbf{1}_{I}(\xi) \widehat{f}(\xi) e^{2\pi i \xi x} d\xi.$

An equivalent way of defining this is to work directly on the frequency side of things and stipulate that ${\Delta_I}$ is the operator that satisfies

$\displaystyle \widehat{\Delta_I f}(\xi) = \mathbf{1}_{I}(\xi) \widehat{f}(\xi);$

this is well-defined for ${f \in L^2}$ and can be therefore extended to functions ${f \in L^2 \cap L^p}$. We claim that ${\Delta_I}$ defines an ${L^p\rightarrow L^p}$ bounded operator for any ${1 and any (possibly semi-infinite) interval ${I}$ and, importantly, that its ${L^p \rightarrow L^p}$ norm is bounded independently of ${I}$. The point is that the operator ${\Delta_I}$ is essentially a linear combination of two (modulated) Hilbert transforms, that are bounded in the same stated range. Indeed, observe that the Fourier transform of the Hilbert transform kernel ${\mathrm{p.v.} 1/x}$ is ${-i \pi \mathrm{sign}(\xi)}$; it is easy to show that, if ${I=(a,b)}$, then

$\displaystyle \mathbf{1}_{(a,b)}(\xi) = \frac{\mathrm{sign}(\xi - a) - \mathrm{sign}(\xi-b)}{2}.$

Now, the Fourier transform exchanges translations with modulations, and specifically ${\widehat{f}(\xi + \theta) = (f(\cdot) e^{-2\pi i \theta \cdot})^{\wedge}(\xi)}$; we see therefore that we can write, with ${H}$ the Hilbert transform,

\displaystyle \begin{aligned} -i\pi \int \widehat{f}(\xi) & \mathrm{sign}(\xi - a) e^{2\pi i \xi x} d\xi \\ = & \; -i \pi e^{2\pi i a x}\int \widehat{f}(\xi + a) \mathrm{sign}(\xi) e^{2\pi i \xi x} d\xi \\ = & \; e^{2\pi i a x} H (e^{-2\pi i a \cdot} f(\cdot))(x). \end{aligned}

If we let ${\mathrm{Mod}_{\theta}}$ denote the modulation operator ${\mathrm{Mod}_{\theta} f(x) := e^{-2\pi i \theta x} f(x)}$, we have therefore shown that

$\displaystyle -i\pi \Delta_{(a,b)} = \frac{1}{2} \big( \mathrm{Mod}_{-a} \circ H \circ \mathrm{Mod}_{a} - \mathrm{Mod}_{-b} \circ H \circ \mathrm{Mod}_{b} \big),$

and from the ${L^p \rightarrow L^p}$ boundedness of ${H}$ it follows that

$\displaystyle \| \Delta_I f \|_{L^p} \lesssim_p \|f\|_{L^p} \ \ \ \ \ (3)$

(since ${\mathrm{Mod}_{\theta}}$ does not change the ${L^p}$ norms at all).

At this point, another important thing to notice is that when ${I,J}$ are disjoint intervals, then the operators ${\Delta_I, \Delta_J}$ are orthogonal to each other. Indeed, one has by Parseval’s identity

\displaystyle \begin{aligned} \langle \Delta_I f, \Delta_J g \rangle = & \Big\langle \widehat{\Delta_I f}, \widehat{\Delta_J g} \Big\rangle \\ = & \int \mathbf{1}_I(\xi) \widehat{f}(\xi) \mathbf{1}_J(\xi) \overline{\widehat{g}(\xi)} d\xi \\ = & \int \mathbf{1}_{I \cap J} (\xi) \widehat{f}(\xi)\overline{\widehat{g}(\xi)} d\xi = 0. \end{aligned}

A consequence of this fact, together with Plancherel’s theorem, is the ${p=2}$ case of the result that we mentioned in the introduction, that is inequality ($\dagger$); and actually, one has more, namely that for any collection ${\mathscr{I}}$ of disjoint intervals that partition ${\mathbb{R}}$ one has

$\displaystyle \Big\|\Big(\sum_{I \in \mathscr{I}} |\Delta_I f|^2 \Big)^{1/2}\Big\|_{L^2 (\mathbb{R})} = \|f\|_{L^2}. \ \ \ \ \ (4)$

Prove this in Exercise 2 to familiarise yourself with frequency projections.

3. Vector-valued frequency projections

As a step towards proving ($\dagger$) (and also important for the applications), it will be useful to study vector-valued analogues of these inequalities, which are easier to prove in general. We change the setting to ${\mathbb{R}^d}$ to work in more generality: so, if ${R}$ is an axis-parallel rectangle (that is, of the form ${I_1 \times \ldots \times I_d}$ with ${I_j}$ intervals, possibly semi-infinite) in ${\mathbb{R}^d}$, we define the frequency projection ${\Delta_R}$ to be the operator that satisfies for any ${f \in L^2(\mathbb{R}^d)}$

$\displaystyle \widehat{\Delta_R f}(\xi) = \mathbf{1}_{R}(\xi) \widehat{f}(\xi).$

The following is an abstract vector-valued analogue of (3).

Proposition 1 (Vector-valued square functions) Let ${(\Gamma, d\mu)}$ be a measure space with ${d\mu}$ a ${\sigma}$-finite measure, and let ${\mathscr{H}}$ denote the Hilbert space ${L^2(\Gamma, d\mu)}$, that is the space of functions ${G : \Gamma \rightarrow \mathbb{C}}$ such that ${\|G\|_{\mathscr{H}}:= (\int |G(\gamma)|^2 d\mu(\gamma))^{1/2} < \infty}$. We let ${(R_\gamma)_{\gamma \in \Gamma}}$ be a measurable collection of arbitrary rectangles of ${\mathbb{R}^d}$ (that is, the mapping ${\mathfrak{R}: \gamma \mapsto R_\gamma}$ is a measurable function2).

Given any vector-valued function ${\mathbf{f}(x) = (f_\gamma (x))_{\gamma \in \Gamma} \in L^2 (\mathbb{R}^d;\mathscr{H})}$ (that is, a function of ${\mathbb{R}^d}$ that takes values in ${\mathscr{H}}$), we can define its vector-valued frequency projection ${\Delta_{\mathfrak{R}}}$ by

$\displaystyle \Delta_{\mathfrak{R}}\mathbf{f} := ( \Delta_{R_\gamma} f_\gamma )_{\gamma \in \Gamma}.$

Then we have that for any ${1 < p < \infty}$ it holds for all functions ${\mathbf{f} = (f_\gamma)_{\gamma \in \Gamma} \in L^p (\mathbb{R}^d;\mathscr{H})}$ that

$\displaystyle \| \Delta_{\mathfrak{R}} \mathbf{f} \|_{L^p(\mathbb{R}^d;\mathscr{H})} \lesssim_p \| \mathbf{f} \|_{L^p(\mathbb{R}^d;\mathscr{H})}.$

In particular, the constant is independent of the collection of rectangles and even of the measure space ${(\Gamma, d\mu)}$.

The level of abstraction in the previous statement is quite high, and it might take the reader a while to fully unpack its meaning. The following special case of Proposition 1 should help:

Corollary 2 Let ${(R_j)_{j \in \mathbb{N}}}$ be a collection of arbitrary axis-parallel rectangles in ${\mathbb{R}^d}$. If ${\mathbf{f} = (f_j)_{j \in \mathbb{N}}}$ is a sequence of functions such that ${(\sum_{j} |f_j|^2)^{1/2} = \|\mathbf{f}\|_{\ell^2(\mathbb{N})}}$ is in ${L^p(\mathbb{R}^d)}$, we have

$\displaystyle \Big\| \Big(\sum_{j} |\Delta_{R_j} f_j|^2 \Big)^{1/2} \Big\|_{L^p(\mathbb{R}^d)} \lesssim_p \Big\| \Big(\sum_{j} |f_j|^2 \Big)^{1/2} \Big\|_{L^p(\mathbb{R}^d)},$

with constant independent of the collection of rectangles.

To prove ($\dagger$) we will only need Corollary 2, but for the application of Littlewood-Paley theory we will give in a subsequent post (that is, the Marcinkiewicz multiplier theorem) we will need the full power of Proposition 1. Prove in Exercise 3 that the corollary is indeed just a special case of the proposition, before looking at the proof below.

Remark 1 It is important to notice that there is no assumption on the family of rectangles other than the fact that they are all axis-parallel – in particular, they are not necessarily pairwise disjoint or even distinct.

Proof: The proof is a vector-valued generalisation of what we said for frequency projections over intervals in the previous section.

First consider ${d=1}$ and introduce the vector-valued Hilbert transform ${\mathbf{H}}$ given by

$\displaystyle \mathbf{H}\mathbf{f} := (H f_\gamma)_{\gamma \in \Gamma};$

we claim that this operator is ${L^p(\mathbb{R};\mathscr{H}) \rightarrow L^p(\mathbb{R};\mathscr{H})}$ bounded. Indeed, ${\mathbf{H}}$ is given by convolution with the $B(\mathscr{H},\mathscr{H})$-valued3 kernel ${\mathbf{K}(x):=(\mathrm{p.v.} 1/x)_{\gamma\in\Gamma}}$, and this kernel satisfies (as you can easily verify)

1. ${\|\mathbf{K}(x)\|_{B(\mathscr{H},\mathscr{H})} \lesssim |x|^{-1}}$,
2. ${\int_{|x|>2|y|} \|\mathbf{K}(x-y) - \mathbf{K}(x)\|_{B(\mathscr{H},\mathscr{H})} dx \lesssim 1 }$,
3. ${\int_{r < |x| < R} \mathbf{K}(x) dx = \mathbf{0}}$ for any ${0 < r < R}$.

Then i)-ii)-iii) imply ${L^p(\mathbb{R};\mathscr{H})}$ boundedness simply by vector-valued Calderón-Zygmund theory.

Next, observe that if ${d=1}$ and the intervals ${\mathfrak{I}=(I_\gamma)_{\gamma\in\Gamma}}$ are arbitrary, we can use the same trick used to prove (3) (rewriting ${\Delta_{I_\gamma}}$ as sum of Hilbert transforms conjugated with a modulation) to deduce boundedness of the corresponding vector-valued operator ${\Delta_{\mathfrak{I}}}$ from the boundedness of ${\mathbf{H}}$ above.

Finally, in the case of general ${d>1}$ one should observe that if ${R= I_1 \times \ldots \times I_d}$ then ${\Delta_R}$ factorises as4

$\displaystyle \Delta_R = \Delta_{I_1}^{(1)} \circ \ldots \circ \Delta_{I_d}^{(d)},$

where ${\Delta_{I}^{(k)}}$ denotes frequency projection in the ${x_k}$ variable. Therefore, writing ${R_\gamma = I^1_\gamma \times R'_\gamma}$ and ${x = (x_1, x') \in \mathbb{R} \times \mathbb{R}^{d-1}}$, we have by the one-dimensional result applied to ${x_1}$ that

\displaystyle \begin{aligned} \|\Delta_{\mathfrak{R}}\mathbf{f} \|_{L^p(\mathbb{R}^d;\mathscr{H})}^p = & \iint \Big\|(\Delta_{R_\gamma} f_\gamma (x_1,x'))_{\gamma \in \Gamma}\Big\|_{\mathscr{H}}^p dx_1 dx' \\ = & \iint \Big\|(\Delta_{I^1_\gamma}^{(1)} \Delta_{R'_\gamma} f_\gamma (x_1,x'))_{\gamma \in \Gamma}\Big\|_{\mathscr{H}}^p dx_1 dx' \\ \lesssim_p & \iint \Big\|(\Delta_{R'_\gamma} f_\gamma (x_1,x'))_{\gamma \in \Gamma}\Big\|_{\mathscr{H}}^p dx_1 dx'; \end{aligned}

iterating the argument for each variable, we obtain the claimed boundedness for generic ${d}$. $\Box$

In the next part of the notes we will consider smooth frequency projections and finally prove ($\dagger$). We will also see how the theory generalises to higher dimensions.

EXERCISES

Exercise 1 Let ${(\epsilon_k)_{k \in \mathbb{N}}}$ be independent random variables taking values in ${\{+1,-1\}}$ such that ${\epsilon_k}$ takes each value with equal probability. Show that

$\displaystyle \mathbb{E}\Big[\Big|\sum_{k=1}^{N} \epsilon_k\Big|^2\Big] = N.$

Generalise this to the case where the sum is ${\sum_{k=1}^{N} \epsilon_k a_k}$ with ${a_k}$ some complex coefficients.

Exercise 2 Show that (4) holds for any collection ${\mathscr{I}}$ of disjoint intervals.

Exercise 3 Show that Corollary 2 is a special case of Proposition 1. What is the measure space ${(\Gamma,d\mu)}$ that gives the corollary? Go to the proof of the proposition and spell out all the Hilbert norms in the argument in terms of the measure space you found.

Footnotes:
1: With a slight abuse of notation we have used ${\widehat{m}}$ to denote the inverse Fourier transform as well, in place of the more usual ${\check{m}}$, motivated by the fact that on ${\mathbb{R}^d}$ the two differ only by a reflection. We will continue to do so in the rest of these notes as it will always be clear from context which one is meant.
2: Notice the space of rectangles of ${\mathbb{R}^d}$ can be identified with ${\mathbb{R}^d \times \mathbb{R}^d}$.
3: ${B(\mathscr{H},\mathscr{H})}$ is the space of bounded linear operators in ${\mathscr{H}}$. Specifically, ${\mathbf{K}(x)}$ is the operator that multiplies ${\mathbf{f}(x)}$ componentwise by ${1/x}$.
4: The identity can be proved by appealing to the fact that it is trivial for tensor products and tensor products of single variable functions are dense in ${L^p(\mathbb{R}^d;\mathscr{H})}$.