# Oscillatory integrals II: several-variables phases

This is the second part of a two-part post on the theory of oscillatory integrals. In the first part we studied the theory of oscillatory integrals whose phases are functions of a single variable. In this post we will instead study the case in which the phase is a function of several variables (and we integrate in all of them). Here the theory becomes weaker because these objects can indeed have a worse behaviour. We will proceed by analogy following the same footsteps as in the single-variable case.
Part I

3. Oscillatory integrals in several variables

In the previous section we have analysed the situation for single variable phases, that is for integrals over (intervals of) ${\mathbb{R}}$. In this section, we will instead study the higher dimensional situation, when the phase is a function of several variables. Things are unfortunately generally not as nice as in the single variable case, as you will see.

In order to avoid having to worry about connected open sets of ${\mathbb{R}^d}$ (see Exercise 18 for the sort of issues that arise in trying to deal with general open sets of ${\mathbb{R}^d}$), in this section we will study mainly objects of the form

$\displaystyle \mathcal{I}_{\psi}(\lambda) := \int_{\mathbb{R}^d} e^{i \lambda u(x)} \psi(x) dx,$

where ${\psi}$ has compact support. We have switched to ${u}$ for the phase to remind the reader of the fact that it is a function of several variables now.

3.1. Principle of non-stationary phase – several variables

The principle of non-stationary phase we saw in Section 2 of part I continues to hold in the several variables case.
Given a phase ${u}$, we say that ${x_0}$ is a critical point of ${u}$ if

$\displaystyle \nabla u(x_0) = (0,\ldots,0).$

Proposition 8 (Principle of non-stationary phase – several variables) Let ${\psi \in C^\infty_c(\mathbb{R}^d)}$ (that is, smooth and compactly supported) and let the phase ${u\in C^\infty}$ be such that ${u}$ does not have critical points in the support of ${\psi}$. Then for any ${N >0}$ we have

$\displaystyle |\mathcal{I}_\psi(\lambda)|\lesssim_{N,\psi,u} |\lambda|^{-N}.$

# Oscillatory integrals I: single-variable phases

You might remember that I contributed some lecture notes on Littlewood-Paley theory to a masterclass, which I then turned into a series of three posts (IIIIII). I have also contributed a lecture note on some basic theory of oscillatory integrals, and therefore I am going to do the same and share them here as a blog post in two parts. The presentation largely follows the one in Stein’s “Harmonic Analysis: Real-variable methods, orthogonality, and oscillatory integrals“, with inputs from Stein and Shakarchi’s “Functional Analysis: Introduction to Further Topics in Analysis“, from some lecture notes by Terry Tao for his 247B course, from a very interesting paper by Carbery, Christ and Wright and from a number of other places that I would now have trouble tracking down.
In this first part we will discuss the theory of oscillatory integrals when the phase is a function of a single variable. There are extensive exercises included that are to be considered part of the lecture notes; indeed, in order to keep the actual notes short and engage the reader, I have turned many things into exercises. If you are interested in learning about oscillatory integrals, you should not ignore them.
In the next post, we will study instead the case where the phases are functions of several variables.

0. Introduction

A large part of modern harmonic analysis is concerned with understanding cancellation phenomena happening between different contributions to a sum or integral. Loosely speaking, we want to know how much better we can do than if we had taken absolute values everywhere. A prototypical example of this is the oscillatory integral of the form

$\displaystyle \int e^{i \phi_\lambda (x)} \psi(x) dx.$

Here ${\psi}$, called the amplitude, is usually understood to be “slowly varying” with respect to the real-valued ${\phi_\lambda}$, called the phase, where ${\lambda}$ denotes a parameter or list of parameters and $\phi'_\lambda$ gets larger as $\lambda$ grows; for example ${\phi_\lambda(x) = \lambda \phi(x)}$. Thus the oscillatory behaviour is given mainly by the complex exponential ${e^{i \phi_\lambda(x)}}$.
Expressions of this form arise quite naturally in several problems, as we will see in Section 1, and typically one seeks to provide an upperbound on the absolute value of the integral above in terms of the parameters ${\lambda}$. Intuitively, as ${\lambda}$ gets larger the phase ${\phi_\lambda}$ changes faster and therefore ${e^{i \phi_\lambda}}$ oscillates faster, producing more cancellation between the contributions of different intervals to the integral. We expect then the integral to decay as ${\lambda}$ grows larger, and usually seek upperbounds of the form ${|\lambda|^{-\alpha}}$. Notice that if you take absolute values inside the integral above you just obtain ${\|\psi\|_{L^1}}$, a bound that does not decay in ${\lambda}$ at all.
The main tool we will use is simply integration by parts. In the exercises you will also use a little basic complex analysis to obtain more precise information on certain special oscillatory integrals.

1. Motivation

In this section we shall showcase the appearance of oscillatory integrals in analysis with a couple of examples. The reader can find other interesting examples in the exercises.

1.1. Fourier transform of radial functions

Let ${f : \mathbb{R}^d \rightarrow \mathbb{C}}$ be a radially symmetric function, that is there exists a function ${f_0: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${f(x) = f_0(|x|)}$ for every ${x \in \mathbb{R}^d}$. Let’s suppose for simplicity that ${f\in L^1(\mathbb{R}^d)}$ (equivalently, that ${f_0 \in L^1(\mathbb{R}, r^{d-1} dr)}$), so that it has a well-defined Fourier transform. It is easy to see (by composing ${f}$ with a rotation and using a change of variable in the integral defining ${\widehat{f}}$) that ${\widehat{f}}$ must also be radially symmetric, that is there must exist ${g: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${\widehat{f}(\xi) = g(|\xi|)}$; we want to understand its relationship with ${f_0}$. Therefore we write using polar coordinates

\displaystyle \begin{aligned} \widehat{f}(\xi) = & \int_{\mathbb{R}^d} f(x) e^{-2\pi i x \cdot \xi} dx \\ = & \int_{0}^{\infty}\int_{\mathbb{S}^{d-1}} f_0(r) e^{-2\pi i r \omega\cdot \xi} r^{d-1} d\sigma_{d-1}(\omega) dr, \\ = & \int_{0}^{\infty} f_0(r) r^{d-1} \Big(\int_{\mathbb{S}^{d-1}} e^{-2\pi i r \omega\cdot \xi} d\sigma_{d-1}(\omega)\Big) dr \end{aligned}

where ${d\sigma_{d-1}}$ denotes the surface measure on the unit ${(d-1)}$-dimensional sphere ${\mathbb{S}^{d-1}}$ induced by the Lebesgue measure on the ambient space ${\mathbb{R}^{d}}$. By inspection, we see that the integral in brackets above is radially symmetric in ${\xi}$, and so if we define

$\displaystyle J(t) := \int_{\mathbb{S}^{d-1}} e^{-2\pi i t \omega\cdot \mathbf{e}_1} d\sigma_{d-1}(\omega),$

with ${\mathbf{e}_1 = (1, 0, \ldots, 0)}$, we have

$\displaystyle \widehat{f}(\xi) = g(|\xi|) = \int_{0}^{\infty} f_0(r) r^{d-1} J(r|\xi|) dr. \ \ \ \ \ (1)$

This is the relationship we were looking for: it allows one to calculate the Fourier transform of ${f}$ directly from the radial information ${f_0}$.

# Marcinkiewicz-type multiplier theorem for q-variation (q > 1)

Not long ago we discussed one of the main direct applications of the Littlewood-Paley theory, namely the Marcinkiewicz multiplier theorem. Recall that the single-variable version of this theorem can be formulated as follows:

Theorem 1 [Marcinkiewicz multiplier theorem]: Let ${m}$ be a function on $\mathbb{R}$ such that

1. $m \in L^\infty$
2. for every Littlewood-Paley dyadic interval $L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k]$ with $k \in \mathbb{Z}$

$\displaystyle \|m\|_{V(L)} \leq C,$

where $\|m\|_{V(L)}$ denotes the total variation of ${m}$ over the interval $L$.

Then for any ${1 < p < \infty}$ the multiplier ${T_m}$ defined by $\widehat{T_m f} = m \widehat{f}$ for functions $f \in L^2(\mathbb{R})$ extends to an $L^p \to L^p$ bounded operator,

$\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.$

You should also recall that the total variation $V(I)$ above is defined as

$\displaystyle \sup_{N}\sup_{\substack{t_0, \ldots, t_N \in I : \\ t_0 < \ldots < t_N}} \sum_{j=1}^{N} |m(t_j) - m(t_{j-1})|,$

and if ${m}$ is absolutely continuous then ${m'}$ exists as a measurable function and the total variation over interval $I$ is given equivalently by $\int_{I} |m'(\xi)|d\xi$. We have seen that the “dyadic total variation condition” 2.) above is to be seen as a generalisation of the pointwise condition $|m'(\xi)|\lesssim |\xi|^{-1}$, which in dimension 1 happens to coincide with the classical differential Hörmander condition (in higher dimensions the pointwise Marcinkiewicz conditions are of product type, while the pointwise Hörmander(-Mihklin) conditions are of radial type; see the relevant post). Thus the Marcinkiewicz multiplier theorem in dimension 1 can deal with multipliers whose symbol is somewhat rougher than being differentiable. It is an interesting question to wonder how much rougher the symbols can get while still preserving their $L^p$ mapping properties (or maybe giving up some range – recall though that the range of boundedness for multipliers must be symmetric around 2 because multipliers are self-adjoint).

Coifman, Rubio de Francia and Semmes came up with an answer to this question that is very interesting. They generalise the Marcinkiewicz multiplier theorem (in dimension 1) to multipliers that have bounded ${q}$-variation with ${q}$ > 1. Let us define this quantity rigorously.

Definition: Let $q \geq 1$ and let $I$ be an interval. Given a function $f : \mathbb{R} \to \mathbb{R}$, its ${q}$-variation over the interval ${I}$ is

$\displaystyle \|f\|_{V_q(I)} := \sup_{N} \sup_{\substack{t_0, \ldots t_N \in I : \\ t_0 < \ldots < t_N}} \Big(\sum_{j=1}^{N} |f(t_j) - f(t_{j-1})|^q\Big)^{1/q}$

Notice that, with respect to the notation above, we have $\|m\|_{V(I)} = \|m\|_{V_1(I)}$. From the fact that $\|\cdot\|_{\ell^q} \leq \|\cdot \|_{\ell^p}$ when $p \leq q$ we see that we have always $\|f\|_{V_q (I)} \leq \|f\|_{V_p(I)}$, and therefore the higher the ${q}$ the less stringent the condition of having bounded ${q}$-variation becomes (this is linked to the Hölder regularity of the function getting worse). In particular, if we wanted to weaken hypothesis 2.) in the Marcinkiewicz multiplier theorem above, we could simply replace it with the condition that for any Littlewood-Paley dyadic interval $L$ we have instead $\|m\|_{V_q(L)} \leq C$. This is indeed what Coifman, Rubio de Francia and Semmes do, and they were able to show the following:

Theorem 2 [Coifman-Rubio de Francia-Semmes, ’88]: Let $q\geq 1$ and let ${m}$ be a function on $\mathbb{R}$ such that

1. $m \in L^\infty$
2. for every Littlewood-Paley dyadic interval $L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k]$ with $k \in \mathbb{Z}$

$\displaystyle \|m\|_{V_q(L)} \leq C.$

Then for any ${1 < p < \infty}$ such that ${\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }$ the multiplier ${T_m}$ defined by $\widehat{T_m f} = m \widehat{f}$ extends to an $L^p \to L^p$ bounded operator,

$\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.$

The statement is essentially the same as before, except that now we are imposing control of the ${q}$-variation instead and as a consequence we have the restriction that our Lebesgue exponent ${p}$ satisfy ${\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }$. Taking a closer look at this condition, we see that when the variation parameter is $1 \leq q \leq 2$ the condition is empty, that is there is no restriction on the range of boundedness of $T_m$: it is still the full range ${1}$ < ${p}$ < $\infty$, and as ${q}$ grows larger and larger the range of boundedness restricts itself to be smaller and smaller around the exponent $p=2$ (for which the multiplier is always necessarily bounded, by Plancherel). This is a very interesting behaviour, which points to the fact that there is a certain dichotomy between variation in the range below 2 and the range above 2, with $2$-variation being the critical case. This is not an isolated case: for example, the Variation Norm Carleson theorem is false for ${q}$-variation with ${q \leq 2}$; similarly, the Lépingle inequality is false for 2-variation and below (and this is related to the properties of Brownian motion).

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part II

This is the second and final part of an entry dedicated to a very interesting and inventive trick due to Bourgain. In part I we saw a lemma on maximal Fourier projections due to Bourgain, together with the context it arises from (the study of pointwise ergodic theorems for polynomial sequences); we also saw a baby version of the idea to come, that we used to prove the Rademacher-Menshov theorem (recall that the idea was to represent the indices in the supremum in their binary positional notation form and to rearrange the supremum accordingly). Today we finally get to see Bourgain’s trick.

Before we start, recall the statement of Bourgain’s lemma:

Lemma 1 [Bourgain]: Let $K$ be an integer and let $\Lambda = \{\lambda_1, \ldots, \lambda_K \}$ a set of ${K}$ distinct frequencies. Define the maximal frequency projections

$\displaystyle \mathcal{B}_\Lambda f(x) := \sup_{j} \Big|\sum_{k=1}^{K} (\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}\Big|,$

where the supremum is restricted to those ${j \geq j_0}$ with $j_0 = j_0(\Lambda)$ being the smallest integer such that $2^{-j_0} \leq \frac{1}{2}\min \{ |\lambda_k - \lambda_{k'}| : 1\leq k\neq k'\leq K \}$.
Then

$\displaystyle \|\mathcal{B}_\Lambda f\|_{L^2} \lesssim (\log \#\Lambda)^2 \|f\|_{L^2}.$

Here we are using the notation $(\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}$ in the statement in place of the expanded formula $\int_{|\xi - \lambda_k| < 2^{-j}} \widehat{f}(\xi) e^{2\pi i \xi x} d\xi$. Observe that by the definition of $j_0$ we have that the intervals $[\lambda_k - 2^{-j_0}, \lambda_k + 2^{-j_0}]$ are disjoint (and $j_0$ is precisely maximal with respect to this condition).
We will need to do some reductions before we can get to the point where the trick makes its appearance. These reductions are the subject of the next section.

3. Initial reductions

A first important reduction is that we can safely replace the characteristic functions $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}$ by smooth bump functions with comparable support. Indeed, this is the result of a very standard square-function argument which was already essentially presented in Exercise 22 of the 3rd post on basic Littlewood-Paley theory. Briefly then, let $\varphi$ be a Schwartz function such that $\widehat{\varphi}$ is a smooth bump function compactly supported in the interval $[-1,1]$ and such that $\widehat{\varphi} \equiv 1$ on the interval $[-1/2, 1/2]$. Let $\varphi_j (x) := \frac{1}{2^j} \varphi \Big(\frac{x}{2^j}\Big)$ (so that $\widehat{\varphi_j}(\xi) = \widehat{\varphi}(2^j \xi)$) and let for convenience $\theta_j$ denote the difference $\theta_j := \mathbf{1}_{[-2^{-j}, 2^{-j}]} - \widehat{\varphi_j}$. We have that the difference

$\displaystyle \sup_{j\geq j_0(\Lambda)} \Big|\sum_{k=1}^{K} ((\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} - \widehat{\varphi_j}(\cdot - \lambda_k)) \widehat{f})^{\vee}\Big|$

is an $L^2$ bounded operator with norm $O(1)$ (that is, independent of $K$). Indeed, observe that $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}(\xi) - \widehat{\varphi_j}(\xi - \lambda_k) = \theta_j (\xi - \lambda_k)$, and bounding the supremum by the $\ell^2$ sum we have that the $L^2$ norm (squared) of the operator above is bounded by

$\displaystyle \sum_{j \geq j_0(\Lambda)} \Big\|\sum_{k=1}^{K} (\theta_j(\cdot - \lambda_k)\widehat{f})^{\vee}\Big\|_{L^2}^2,$

where the summation in ${j}$ is restricted in the same way as the supremum is in the lemma (that is, the intervals $[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]$ must be pairwise disjoint). By an application of Plancherel we see that the above is equal to

$\displaystyle \sum_{k=1}^{K} \Big\| \widehat{f}(\xi) \Big[\sum_{j \geq j_0} \theta_j(\xi - \lambda_k) \Big]\Big\|_{L^2}^2;$

but notice that the functions $\theta_j$ have supports disjoint in ${j}$, and therefore the multiplier satisfies $\sum_{j\geq j_0} \theta_j(\xi - \lambda_k) \lesssim 1$ in a neighbourhood of $\lambda_k$, and vanishes outside such neighbourhood. A final application of Plancherel allows us to conclude that the above is bounded by $\lesssim \|f\|_{L^2}^2$ by orthogonality (these neighbourhoods being all disjoint as well).
By triangle inequality, we see therefore that in order to prove Lemma 1 it suffices to prove that the operator

$\displaystyle \sup_{j} \Big|\sum_{k=1}^{K} (\widehat{\varphi_j}(\cdot - \lambda_k) \widehat{f})^{\vee}\Big|$

is $L^2$ bounded with norm at most $O((\log \#\Lambda)^2)$.

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part I

If you are reading this blog, you have probably heard that Jean Bourgain – one of the greatest analysts of the last century – has unfortunately passed away last December. It is fair to say that the progress of analysis will slow down significantly without him. I am not in any position to give a eulogy to this giant, but I thought it would be nice to commemorate him by talking occasionally on this blog about some of his many profound papers and his crazily inventive tricks. That’s something everybody agrees on: Bourgain was able to come up with a variety of insane tricks in a way that no one else is. The man was a problem solver and an overall magician: the first time you see one of his tricks, you don’t believe what’s happening in front of you. And that’s just the tricks part!

In this two-parts post I am going to talk about a certain trick that loosely speaking, involves representing points on an arbitrary set in a fashion similar to how integers are represented, say, in binary basis. I don’t know if this trick came straight out of Bourgain’s magical top hat or if he learned it from somewhere else; I haven’t seen it used elsewhere except for papers that cite Bourgain himself, so I’m inclined to attribute it to him – but please, correct me if I’m wrong.
Today we introduce the context for the trick (a famous lemma by Bourgain for maximal frequency projections on the real line) and present a toy version of the idea in a proof of the Rademacher-Menshov theorem. In the second part we will finally see the trick.

1. Ergodic averages along arithmetic sequences
First, some context. The trick I am going to talk about can be found in one of Bourgain’s major papers, that were among the ones cited in the motivation for his Fields medal prize. I am talking about the paper on a.e. convergence of ergodic averages along arithmetic sequences. The main result of that paper is stated as follows: let $(X,T,\mu)$ be an ergodic system, that is

1. $\mu$ is a probability on $X$;
2. $T: X \to X$ satisfies $\mu(T^{-1} A) = \mu(A)$ for all $\mu$-measurable sets $A$ (this is the invariance condition);
3. $T^{-1} A = A$ implies $\mu(A) = 0 \text{ or } 1$ (this is the ergodicity condition).

Then the result is

Theorem: [Bourgain, ’89] Let $(X,T,\mu)$ be an ergodic system and let $p(n)$ be a polynomial with integer coefficients. If $f \in L^q(d\mu)$ with ${q}$ > 1, then the averages $A_N f(x) := \frac{1}{N}\sum_{n=1}^{N}f(T^{p(n)} x)$ converge $\mu$-a.e. as $N \to \infty$; moreover, if ${T}$ is weakly mixing1, we have more precisely

$\displaystyle \lim_{N \to \infty} A_N f(x) = \int_X f d\mu$

for $\mu$-a.e. ${x}$.

For comparison, the more classical pointwise ergodic theorem of Birkhoff states the same for the case $p(n) = n$ and $f \in L^1(d\mu)$ (notice this is the largest of the $L^p(X,d\mu)$ spaces because $\mu$ is finite), in which case the theorem is deduced as a consequence of the $L^1 \to L^{1,\infty}$ boundedness of the Hardy-Littlewood maximal function. The dense class to appeal to is roughly speaking $L^2(X,d\mu)$, thanks to the ergodic theorem of Von Neumann, which states $A_N f$ converges in $L^2$ norm for $f \in L^2(X,d\mu)$. However, the details are non-trivial. Heuristically, these ergodic theorems incarnate a quantitative version of the idea that the orbits $\{T^n x\}_{n\in\mathbb{N}}$ fill up the entire space ${X}$ uniformly. I don’t want to enter into details because here I am just providing some context for those interested; there are plenty of introductions to ergodic theory where these results are covered in depth.

# Kovač’s solution of the maximal Fourier restriction problem

About 2 years ago, Müller Ricci and Wright published a paper that opened a new line of investigation in the field of Fourier restriction: that is, the study of the pointwise meaning of the Fourier restriction operators. Here is an account of a recent contribution to this problem that largely sorts it out.

1. Maximal Fourier Restriction
Recall that, given a smooth submanifold $\Sigma$ of $\mathbb{R}^d$ with surface measure $d\sigma$, the restriction operator ${R}$ is defined (initially) for Schwartz functions as

$\displaystyle f \mapsto Rf:= \widehat{f}\Big|_{\Sigma};$

it is only after having proven an a-priori estimate such as $\|Rf\|_{L^q(\Sigma,d\sigma)} \lesssim \|f\|_{L^p(\mathbb{R}^d)}$ that we can extend ${R}$ to an operator over the whole of $L^p(\mathbb{R}^d)$, by density of the Schwartz functions. However, it is no longer clear what the relationship is between this new operator that has been operator-theoretically extended and the original operator that had a clear pointwise definition. In particular, a non-trivial question to ask is whether for $d\sigma$-a.e. point $\xi \in \Sigma$ we have

$\displaystyle \lim_{r \to 0} \frac{1}{|B(0,r)|} \int_{\eta \in B(0,r)} |\widehat{f}(\xi - \eta)| d\eta = \widehat{f}(\xi), \ \ \ \ \ (1)$

where $B(0,r)$ is the ball of radius ${r}$ and center ${0}$. Observe that the Lebesgue differentiation theorem already tells us that for a.e. element of $\mathbb{R}^d$ in the Lebesgue sense the above holds; but the submanifold $\Sigma$ has Lebesgue measure zero, and therefore the differentiation theorem cannot give us any information. In this sense, the question above is about the structure of the set of the Lebesgue points of $\widehat{f}$ and can be reformulated as:

Q: can the complement of the set of Lebesgue points of $\widehat{f}$ contain a copy of the manifold $\Sigma$?

# Basic Littlewood-Paley theory III: applications

This is the last part of a 3 part series on the basics of Littlewood-Paley theory. Today we discuss a couple of applications, that is Marcinkiewicz multiplier theorem and the boundedness of the spherical maximal function (the latter being an application of frequency decompositions in general, and not so much of square functions – though one appears, but only for $L^2$ estimates where one does not need the sophistication of Littlewood-Paley theory).
Part I: frequency projections
Part II: square functions

7. Applications of Littlewood-Paley theory

In this section we will present two applications of the Littlewood-Paley theory developed so far. You can find further applications in the exercises (see particularly Exercise 22 and Exercise 23).

7.1. Marcinkiewicz multipliers

Given an ${L^\infty (\mathbb{R}^d)}$ function ${m}$, one can define the operator ${T_m}$ given by

$\displaystyle \widehat{T_m f}(\xi) := m(\xi) \widehat{f}(\xi)$

for all ${f \in L^2(\mathbb{R}^d)}$. The operator ${T_m}$ is called a multiplier and the function ${m}$ is called the symbol of the multiplier1. Since ${m \in L^\infty}$, Plancherel’s theorem shows that ${T_m}$ is a linear operator bounded in ${L^2}$; its definition can then be extended to ${L^2 \cap L^p}$ functions (which are dense in ${L^p}$). A natural question to ask is: for which values of ${p}$ in ${1 \leq p \leq \infty}$ is the operator ${T_m}$ an ${L^p \rightarrow L^p}$ bounded operator? When ${T_m}$ is bounded in a certain ${L^p}$ space, we say that it is an ${L^p}$multiplier.

The operator ${T_m}$ introduced in Section 1 of the first post in this series is an example of a multiplier, with symbol ${m(\xi,\tau) = \tau / (\tau - 2\pi i |\xi|^2)}$. It is the linear operator that satisfies the formal identity $T \circ (\partial_t - \Delta) = \partial_t$. We have seen that it cannot be a (euclidean) Calderón-Zygmund operator, and thus in particular it cannot be a Hörmander-Mikhlin multiplier. This can be seen more directly by the fact that any Hörmander-Mikhlin condition of the form ${|\partial^{\alpha}m(\xi,\tau)| \lesssim_\alpha |(\xi,\tau)|^{-|\alpha|} = (|\xi|^2 + \tau^2)^{-|\alpha|/2}}$ is clearly incompatible with the rescaling invariance of the symbol ${m}$, which satisfies ${m(\lambda \xi, \lambda^2 \tau) = m(\xi,\tau)}$ for any ${\lambda \neq 0}$. However, the derivatives of ${m}$ actually satisfy some other superficially similar conditions that are of interest to us. Indeed, letting ${(\xi,\tau) \in \mathbb{R}^2}$ for simplicity, we can see for example that ${\partial_\xi \partial_\tau m(\xi, \tau) = \lambda^3 \partial_\xi \partial_\tau m(\lambda\xi, \lambda^2\tau)}$. When ${|\tau|\lesssim |\xi|^2}$ we can therefore argue that ${|\partial_\xi \partial_\tau m(\xi, \tau)| = |\xi|^{-3} |\partial_\xi \partial_\tau m(1, \tau |\xi|^{-2})| \lesssim |\xi|^{-1} |\tau|^{-1} \sup_{|\eta|\lesssim 1} |\partial_\xi \partial_\tau m(1, \eta)|}$, and similarly when ${|\tau|\gtrsim |\xi|^2}$; this shows that for any ${(\xi, \tau)}$ with ${\xi,\tau \neq 0}$ one has

$\displaystyle |\partial_\xi \partial_\tau m(\xi, \tau)| \lesssim |\xi|^{-1} |\tau|^{-1}.$

This condition is comparable with the corresponding Hörmander-Mikhlin condition only when ${|\xi| \sim |\tau|}$, and is vastly different otherwise, being of product type (also notice that the inequality above is compatible with the rescaling invariance of ${m}$, as it should be).

# Basic Littlewood-Paley theory II: square functions

This is the second part of the series on basic Littlewood-Paley theory, which has been extracted from some lecture notes I wrote for a masterclass. In this part we will prove the Littlewood-Paley inequalities, namely that for any ${1 < p < \infty}$ it holds that

$\displaystyle \|f\|_{L^p (\mathbb{R})} \sim_p \Big\|\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}\Big\|_{L^p (\mathbb{R})}. \ \ \ \ \ (\dagger)$

This time there are also plenty more exercises, some of which I think are fairly interesting (one of them is a theorem of Rudin in disguise).
Part I: frequency projections.

4. Smooth square function

In this subsection we will consider a variant of the square function appearing at the right-hand side of ($\dagger$) where we replace the frequency projections ${\Delta_j}$ by better behaved ones.

Let ${\psi}$ denote a smooth function with the properties that ${\psi}$ is compactly supported in the intervals ${[-4,-1/2] \cup [1/2, 4]}$ and is identically equal to ${1}$ on the intervals ${[-2,-1] \cup [1,2]}$. We define the smooth frequency projections ${\widetilde{\Delta}_j}$ by stipulating

$\displaystyle \widehat{\widetilde{\Delta}_j f}(\xi) := \psi(2^{-j} \xi) \widehat{f}(\xi);$

notice that the function ${\psi(2^{-j} \xi)}$ is supported in ${[-2^{j+2},-2^{j-1}] \cup [2^{j-1}, 2^{j+2}]}$ and identically ${1}$ in ${[-2^{j+1},-2^{j}] \cup [2^{j}, 2^{j+1}]}$. The reason why such projections are better behaved resides in the fact that the functions ${\psi(2^{-j}\xi)}$ are now smooth, unlike the characteristic functions ${\mathbf{1}_{[2^j,2^{j+1}]}}$. Indeed, they are actually Schwartz functions and you can see by Fourier inversion formula that ${\widetilde{\Delta}_j f = f \ast (2^{j} \widehat{\psi}(2^{j}\cdot))}$; the convolution kernel ${2^{j} \widehat{\psi}(2^{j}\cdot)}$ is uniformly in ${L^1}$ and therefore the operator is trivially ${L^p \rightarrow L^p}$ bounded for any ${1 \leq p \leq \infty}$ by Young’s inequality, without having to resort to the boundedness of the Hilbert transform.
We will show that the following smooth analogue of (one half of) ($\dagger$) is true (you can study the other half in Exercise 6).

Proposition 3 Let ${\widetilde{S}}$ denote the square function

$\displaystyle \widetilde{S}f := \Big(\sum_{j \in \mathbb{Z}} \big|\widetilde{\Delta}_j f \big|^2\Big)^{1/2}.$

Then for any ${1 < p < \infty}$ we have that the inequality

$\displaystyle \big\|\widetilde{S}f\big\|_{L^p(\mathbb{R})} \lesssim_p \|f\|_{L^p(\mathbb{R})} \ \ \ \ \ (1)$

holds for any ${f \in L^p(\mathbb{R})}$.

We will give two proofs of this fact, to illustrate different techniques. We remark that the boundedness will depend on the smoothness and the support properties of ${\psi}$ only, and as such extends to a larger class of square functions.

# Quadratically modulated Bilinear Hilbert transform

Here is a simple but surprising fact.

Recall that the Hilbert transform $Hf(x) := p.v. \int f(x-t) \frac{dt}{t}$ is $L^p \to L^p$ bounded for all ${1 < p < \infty}$ (and even $L^1 \to L^{1,\infty}$ bounded, of course). The quadratically modulated Hilbert transform is the operator

$\displaystyle H_q f(x) := p.v. \int \int f(x-t) e^{-i t^2} \frac{dt}{t};$

this operator is also known to be $L^p \to L^p$ bounded for all ${1 < p < \infty}$, but the proof is no corollary of that for $H$, it's a different beast requiring oscillatory integral techniques and almost orthogonality and is due to Ricci and Stein (interestingly though, $H_q$ is also $L^1 \to L^{1,\infty}$ bounded, and this can indeed be obtained by a clever adaptation of Calderón-Zygmund theory due to Chanillo and Christ).

The bilinear Hilbert transform instead is the operator

$\displaystyle BHT(f,g)(x) := p.v. \int \int f(x-t)g(x+t)\frac{dt}{t}$

and it is known, thanks to foundational work of Lacey and Thiele, to be $L^p \times L^q \to L^r$ bounded at least in the range given by $p,q>1, r>2/3$ with the exponents satisfying the Hölder condition $1/p + 1/q = 1/r$ (this condition is strictly necessary to have boundedness, due to the scaling invariance of the operator). This operator has an interesting modulation invariance (corresponding to the fact that its bilinear multiplier is $\mathrm{sgn}(\xi - \eta)$, which is invariant with respect to translations along the diagonal): indeed, if $\mathrm{Mod}_{\theta}$ denotes the modulation operator $\mathrm{Mod}_{\theta} f(x) := e^{- i \theta x} f(x)$, we have

$\displaystyle BHT(\mathrm{Mod}_{\theta}f,\mathrm{Mod}_{\theta}g) = \mathrm{Mod}_{2 \theta} BHT(f,g);$

it is this fact that suggests one should use time-frequency analysis to deal with this operator.
Now, analogously to the linear case, one can consider the quadratically modulated bilinear Hilbert transform, given simply by

$\displaystyle BHT_q(f,g)(x) := p.v. \int \int f(x-t)g(x+t) e^{-i t^2} \frac{dt}{t}.$

One might be tempted to think, by analogy, that this operator is harder to bound than $BHT$ – at least, I would naively think so at first sight. However, due to the particular structure of the bilinear Hilbert transform, the boundedness of $BHT_q$ is a trivial corollary of that of $BHT$! Indeed, this is due to the trivial polynomial identity

$\displaystyle (x+t)^2 + (x-t)^2 = 2x^2 + 2t^2;$

thus if $\mathrm{QMod}_{\theta}$ denotes the quadratic modulation operator $\mathrm{QMod}_{\theta}f(x) = e^{-i \theta x^2} f(x)$ we have

\displaystyle \begin{aligned} BHT_q(f,g)(x) = & \int f(x-t)g(x+t) e^{-it^2} \frac{dt}{t} \\ = & \int f(x-t)g(x+t) e^{ix^2}e^{-i(x+t)^2/2}e^{-i(x-t)^2/2} \frac{dt}{t} \\ = & e^{ix^2}\int e^{-i(x-t)^2/2}f(x-t)e^{-i(x+t)^2/2}g(x+t) \frac{dt}{t} \\ = & \big[ \mathrm{QMod}_{-1} BHT( \mathrm{QMod}_{1/2} f, \mathrm{QMod}_{1/2} g )\big](x). \end{aligned}

Of course this trick is limited to quadratic modulations, so for example already the cubic modulation of $BHT$

is non-trivial to bound (but the boundedness of the cubic modulation of the trilinear Hilbert transform would again be a trivial consequence of the boundedness of the trilinear Hilbert transform itself… too bad we don’t know if it is bounded at all!). Polynomial modulations of bilinear singular integrals (thus a bilinear analogue of the Ricci-Stein work) have been shown to be bounded by Christ, Li, Tao and Thiele in “On multilinear oscillatory integrals, nonsingular and singular“.

UPDATE: Interesting synchronicity just happened: today Dong, Maldague and Villano have uploaded on ArXiv their paper “Special cases of power decay in multilinear oscillatory integrals” in which they extend the work of Christ, Li, Tao and Thiele to some special cases that were left out. Maybe I should check my email for the arXiv digest before posting next time.

# Basic Littlewood-Paley theory I: frequency projections

I have written some notes on Littlewood-Paley theory for a masterclass, which I thought I would share here as well. This is the first part, covering some motivation, the case of a single frequency projection and its vector-valued generalisation. References I have used in preparing these notes include Stein’s “Singular integrals and differentiability properties of functions“, Duoandikoetxea’s “Fourier Analysis“, Grafakos’ “Classical Fourier Analysis” and as usual some material by Tao, both from his blog and the notes for his courses. Prerequisites are some basic Fourier transform theory, Calderón-Zygmund theory of euclidean singular integrals and its vector-valued generalisation (to Hilbert spaces, we won’t need Banach spaces).

0. Introduction
Harmonic analysis makes a fundamental use of divide-et-impera approaches. A particularly fruitful one is the decomposition of a function in terms of the frequencies that compose it, which is prominently incarnated in the theory of the Fourier transform and Fourier series. In many applications however it is not necessary or even useful to resolve the function ${f}$ at the level of single frequencies and it suffices instead to consider how wildly different frequency components behave instead. One example of this is the (formal) decomposition of functions of ${\mathbb{R}}$ given by

$\displaystyle f = \sum_{j \in \mathbb{Z}} \Delta_j f,$

where ${\Delta_j f}$ denotes the operator

$\displaystyle \Delta_j f (x) := \int_{\{\xi \in \mathbb{R} : 2^j \leq |\xi| < 2^{j+1}\}} \widehat{f}(\xi) e^{2\pi i \xi \cdot x} d\xi,$

commonly referred to as a (dyadic) frequency projection. Thus ${\Delta_j f}$ represents the portion of ${f}$ with frequencies of magnitude ${\sim 2^j}$. The Fourier inversion formula can be used to justify the above decomposition if, for example, ${f \in L^2(\mathbb{R})}$. Heuristically, since any two ${\Delta_j f, \Delta_{k} f}$ oscillate at significantly different frequencies when ${|j-k|}$ is large, we would expect that for most ${x}$‘s the different contributions to the sum cancel out more or less randomly; a probabilistic argument typical of random walks (see Exercise 1) leads to the conjecture that ${|f|}$ should behave “most of the time” like ${\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}}$ (the last expression is an example of a square function). While this is not true in a pointwise sense, we will see in these notes that the two are indeed interchangeable from the point of view of ${L^p}$-norms: more precisely, we will show that for any ${1 < p < \infty}$ it holds that

$\displaystyle \boxed{ \|f\|_{L^p (\mathbb{R})} \sim_p \Big\|\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}\Big\|_{L^p (\mathbb{R})}. }\ \ \ \ \ (\dagger)$

This is a result historically due to Littlewood and Paley, which explains the name given to the related theory. It is easy to see that the ${p=2}$ case is obvious thanks to Plancherel’s theorem, to which the statement is essentially equivalent. Therefore one could interpret the above as a substitute for Plancherel’s theorem in generic ${L^p}$ spaces when ${p\neq 2}$.

In developing a framework that allows to prove ($\dagger$) we will encounter some variants of the square function above, including ones with smoother frequency projections that are useful in a variety of contexts. We will moreover show some applications of the above fact and its variants. One of these applications will be a proof of the boundedness of the spherical maximal function ${\mathscr{M}_{\mathbb{S}^{d-1}}}$ (almost verbatim the one on Tao’s blog).

Notation: We will use ${A \lesssim B}$ to denote the estimate ${A \leq C B}$ where ${C>0}$ is some absolute constant, and ${A\sim B}$ to denote the fact that ${A \lesssim B \lesssim A}$. If the constant ${C}$ depends on a list of parameters ${L}$ we will write ${A \lesssim_L B}$.