# Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single $L^p$ space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if $R$ denotes the Fourier restriction operator of the sphere $\mathbb{S}^{d-1}$ (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

$\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}$

(defined initially on Schwartz functions), then

Stein-Tomas theorem: $R$ satisfies the a-priori inequality

$\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)$

for all exponents ${p}$ such that $1 \leq p \leq \frac{2(d+1)}{d+3}$ (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents $1 \leq p$ < $\frac{2(d+1)}{d+3}$ by interpolation of Lorentz spaces with the $p=1$ case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set ${E}$ one has

$\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)$

where the LHS is clearly the $L^{2(d+1)/(d+3)}$ norm of the characteristic function $\mathbf{1}_E$. Notice that we could write the inequality equivalently as $\|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}$.

# Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

$\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,$

where $A_t$ denotes the spherical average at radius ${t}$, that is

$\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),$

where $d\sigma_{d-1}$ denotes the spherical measure on the $(d-1)$-dimensional sphere (we will omit the subscript from now on and just write $d\sigma$ since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator $\mathcal{M}_{\mathbb{S}^{d-1}}$ is $L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d)$ bounded for any $\frac{d}{d-1}$ < $p \leq \infty$.

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows $L^p \to L^p$ boundedness of the operator for every $2 \geq p$ > $\frac{d}{d-1}$ quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for $\mathcal{M}_{\mathbb{S}^{d-1}}$! The latter means that for any measurable $E$ of finite (Lebesgue) measure we have

$\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)$

which is exactly the $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the $L^{d/(d-1),1} \to L^{d/(d-1),\infty}$ inequality). The downside of Bourgain’s argument is that it only works in dimension $d \geq 4$, and thus misses the dimension $d=3$ that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.

# Oscillatory integrals II: several-variables phases

This is the second part of a two-part post on the theory of oscillatory integrals. In the first part we studied the theory of oscillatory integrals whose phases are functions of a single variable. In this post we will instead study the case in which the phase is a function of several variables (and we integrate in all of them). Here the theory becomes weaker because these objects can indeed have a worse behaviour. We will proceed by analogy following the same footsteps as in the single-variable case.
Part I

3. Oscillatory integrals in several variables

In the previous section we have analysed the situation for single variable phases, that is for integrals over (intervals of) ${\mathbb{R}}$. In this section, we will instead study the higher dimensional situation, when the phase is a function of several variables. Things are unfortunately generally not as nice as in the single variable case, as you will see.

In order to avoid having to worry about connected open sets of ${\mathbb{R}^d}$ (see Exercise 18 for the sort of issues that arise in trying to deal with general open sets of ${\mathbb{R}^d}$), in this section we will study mainly objects of the form

$\displaystyle \mathcal{I}_{\psi}(\lambda) := \int_{\mathbb{R}^d} e^{i \lambda u(x)} \psi(x) dx,$

where ${\psi}$ has compact support. We have switched to ${u}$ for the phase to remind the reader of the fact that it is a function of several variables now.

3.1. Principle of non-stationary phase – several variables

The principle of non-stationary phase we saw in Section 2 of part I continues to hold in the several variables case.
Given a phase ${u}$, we say that ${x_0}$ is a critical point of ${u}$ if

$\displaystyle \nabla u(x_0) = (0,\ldots,0).$

Proposition 8 (Principle of non-stationary phase – several variables) Let ${\psi \in C^\infty_c(\mathbb{R}^d)}$ (that is, smooth and compactly supported) and let the phase ${u\in C^\infty}$ be such that ${u}$ does not have critical points in the support of ${\psi}$. Then for any ${N >0}$ we have

$\displaystyle |\mathcal{I}_\psi(\lambda)|\lesssim_{N,\psi,u} |\lambda|^{-N}.$

# Oscillatory integrals I: single-variable phases

You might remember that I contributed some lecture notes on Littlewood-Paley theory to a masterclass, which I then turned into a series of three posts (IIIIII). I have also contributed a lecture note on some basic theory of oscillatory integrals, and therefore I am going to do the same and share them here as a blog post in two parts. The presentation largely follows the one in Stein’s “Harmonic Analysis: Real-variable methods, orthogonality, and oscillatory integrals“, with inputs from Stein and Shakarchi’s “Functional Analysis: Introduction to Further Topics in Analysis“, from some lecture notes by Terry Tao for his 247B course, from a very interesting paper by Carbery, Christ and Wright and from a number of other places that I would now have trouble tracking down.
In this first part we will discuss the theory of oscillatory integrals when the phase is a function of a single variable. There are extensive exercises included that are to be considered part of the lecture notes; indeed, in order to keep the actual notes short and engage the reader, I have turned many things into exercises. If you are interested in learning about oscillatory integrals, you should not ignore them.
In the next post, we will study instead the case where the phases are functions of several variables.

0. Introduction

A large part of modern harmonic analysis is concerned with understanding cancellation phenomena happening between different contributions to a sum or integral. Loosely speaking, we want to know how much better we can do than if we had taken absolute values everywhere. A prototypical example of this is the oscillatory integral of the form

$\displaystyle \int e^{i \phi_\lambda (x)} \psi(x) dx.$

Here ${\psi}$, called the amplitude, is usually understood to be “slowly varying” with respect to the real-valued ${\phi_\lambda}$, called the phase, where ${\lambda}$ denotes a parameter or list of parameters and $\phi'_\lambda$ gets larger as $\lambda$ grows; for example ${\phi_\lambda(x) = \lambda \phi(x)}$. Thus the oscillatory behaviour is given mainly by the complex exponential ${e^{i \phi_\lambda(x)}}$.
Expressions of this form arise quite naturally in several problems, as we will see in Section 1, and typically one seeks to provide an upperbound on the absolute value of the integral above in terms of the parameters ${\lambda}$. Intuitively, as ${\lambda}$ gets larger the phase ${\phi_\lambda}$ changes faster and therefore ${e^{i \phi_\lambda}}$ oscillates faster, producing more cancellation between the contributions of different intervals to the integral. We expect then the integral to decay as ${\lambda}$ grows larger, and usually seek upperbounds of the form ${|\lambda|^{-\alpha}}$. Notice that if you take absolute values inside the integral above you just obtain ${\|\psi\|_{L^1}}$, a bound that does not decay in ${\lambda}$ at all.
The main tool we will use is simply integration by parts. In the exercises you will also use a little basic complex analysis to obtain more precise information on certain special oscillatory integrals.

1. Motivation

In this section we shall showcase the appearance of oscillatory integrals in analysis with a couple of examples. The reader can find other interesting examples in the exercises.

1.1. Fourier transform of radial functions

Let ${f : \mathbb{R}^d \rightarrow \mathbb{C}}$ be a radially symmetric function, that is there exists a function ${f_0: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${f(x) = f_0(|x|)}$ for every ${x \in \mathbb{R}^d}$. Let’s suppose for simplicity that ${f\in L^1(\mathbb{R}^d)}$ (equivalently, that ${f_0 \in L^1(\mathbb{R}, r^{d-1} dr)}$), so that it has a well-defined Fourier transform. It is easy to see (by composing ${f}$ with a rotation and using a change of variable in the integral defining ${\widehat{f}}$) that ${\widehat{f}}$ must also be radially symmetric, that is there must exist ${g: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${\widehat{f}(\xi) = g(|\xi|)}$; we want to understand its relationship with ${f_0}$. Therefore we write using polar coordinates

\displaystyle \begin{aligned} \widehat{f}(\xi) = & \int_{\mathbb{R}^d} f(x) e^{-2\pi i x \cdot \xi} dx \\ = & \int_{0}^{\infty}\int_{\mathbb{S}^{d-1}} f_0(r) e^{-2\pi i r \omega\cdot \xi} r^{d-1} d\sigma_{d-1}(\omega) dr, \\ = & \int_{0}^{\infty} f_0(r) r^{d-1} \Big(\int_{\mathbb{S}^{d-1}} e^{-2\pi i r \omega\cdot \xi} d\sigma_{d-1}(\omega)\Big) dr \end{aligned}

where ${d\sigma_{d-1}}$ denotes the surface measure on the unit ${(d-1)}$-dimensional sphere ${\mathbb{S}^{d-1}}$ induced by the Lebesgue measure on the ambient space ${\mathbb{R}^{d}}$. By inspection, we see that the integral in brackets above is radially symmetric in ${\xi}$, and so if we define

$\displaystyle J(t) := \int_{\mathbb{S}^{d-1}} e^{-2\pi i t \omega\cdot \mathbf{e}_1} d\sigma_{d-1}(\omega),$

with ${\mathbf{e}_1 = (1, 0, \ldots, 0)}$, we have

$\displaystyle \widehat{f}(\xi) = g(|\xi|) = \int_{0}^{\infty} f_0(r) r^{d-1} J(r|\xi|) dr. \ \ \ \ \ (1)$

This is the relationship we were looking for: it allows one to calculate the Fourier transform of ${f}$ directly from the radial information ${f_0}$.

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part I

If you are reading this blog, you have probably heard that Jean Bourgain – one of the greatest analysts of the last century – has unfortunately passed away last December. It is fair to say that the progress of analysis will slow down significantly without him. I am not in any position to give a eulogy to this giant, but I thought it would be nice to commemorate him by talking occasionally on this blog about some of his many profound papers and his crazily inventive tricks. That’s something everybody agrees on: Bourgain was able to come up with a variety of insane tricks in a way that no one else is. The man was a problem solver and an overall magician: the first time you see one of his tricks, you don’t believe what’s happening in front of you. And that’s just the tricks part!

In this two-parts post I am going to talk about a certain trick that loosely speaking, involves representing points on an arbitrary set in a fashion similar to how integers are represented, say, in binary basis. I don’t know if this trick came straight out of Bourgain’s magical top hat or if he learned it from somewhere else; I haven’t seen it used elsewhere except for papers that cite Bourgain himself, so I’m inclined to attribute it to him – but please, correct me if I’m wrong.
Today we introduce the context for the trick (a famous lemma by Bourgain for maximal frequency projections on the real line) and present a toy version of the idea in a proof of the Rademacher-Menshov theorem. In the second part we will finally see the trick.

1. Ergodic averages along arithmetic sequences
First, some context. The trick I am going to talk about can be found in one of Bourgain’s major papers, that were among the ones cited in the motivation for his Fields medal prize. I am talking about the paper on a.e. convergence of ergodic averages along arithmetic sequences. The main result of that paper is stated as follows: let $(X,T,\mu)$ be an ergodic system, that is

1. $\mu$ is a probability on $X$;
2. $T: X \to X$ satisfies $\mu(T^{-1} A) = \mu(A)$ for all $\mu$-measurable sets $A$ (this is the invariance condition);
3. $T^{-1} A = A$ implies $\mu(A) = 0 \text{ or } 1$ (this is the ergodicity condition).

Then the result is

Theorem: [Bourgain, ’89] Let $(X,T,\mu)$ be an ergodic system and let $p(n)$ be a polynomial with integer coefficients. If $f \in L^q(d\mu)$ with ${q}$ > 1, then the averages $A_N f(x) := \frac{1}{N}\sum_{n=1}^{N}f(T^{p(n)} x)$ converge $\mu$-a.e. as $N \to \infty$; moreover, if ${T}$ is weakly mixing1, we have more precisely

$\displaystyle \lim_{N \to \infty} A_N f(x) = \int_X f d\mu$

for $\mu$-a.e. ${x}$.

For comparison, the more classical pointwise ergodic theorem of Birkhoff states the same for the case $p(n) = n$ and $f \in L^1(d\mu)$ (notice this is the largest of the $L^p(X,d\mu)$ spaces because $\mu$ is finite), in which case the theorem is deduced as a consequence of the $L^1 \to L^{1,\infty}$ boundedness of the Hardy-Littlewood maximal function. The dense class to appeal to is roughly speaking $L^2(X,d\mu)$, thanks to the ergodic theorem of Von Neumann, which states $A_N f$ converges in $L^2$ norm for $f \in L^2(X,d\mu)$. However, the details are non-trivial. Heuristically, these ergodic theorems incarnate a quantitative version of the idea that the orbits $\{T^n x\}_{n\in\mathbb{N}}$ fill up the entire space ${X}$ uniformly. I don’t want to enter into details because here I am just providing some context for those interested; there are plenty of introductions to ergodic theory where these results are covered in depth.

# Basic Littlewood-Paley theory III: applications

This is the last part of a 3 part series on the basics of Littlewood-Paley theory. Today we discuss a couple of applications, that is Marcinkiewicz multiplier theorem and the boundedness of the spherical maximal function (the latter being an application of frequency decompositions in general, and not so much of square functions – though one appears, but only for $L^2$ estimates where one does not need the sophistication of Littlewood-Paley theory).
Part I: frequency projections
Part II: square functions

7. Applications of Littlewood-Paley theory

In this section we will present two applications of the Littlewood-Paley theory developed so far. You can find further applications in the exercises (see particularly Exercise 22 and Exercise 23).

7.1. Marcinkiewicz multipliers

Given an ${L^\infty (\mathbb{R}^d)}$ function ${m}$, one can define the operator ${T_m}$ given by

$\displaystyle \widehat{T_m f}(\xi) := m(\xi) \widehat{f}(\xi)$

for all ${f \in L^2(\mathbb{R}^d)}$. The operator ${T_m}$ is called a multiplier and the function ${m}$ is called the symbol of the multiplier1. Since ${m \in L^\infty}$, Plancherel’s theorem shows that ${T_m}$ is a linear operator bounded in ${L^2}$; its definition can then be extended to ${L^2 \cap L^p}$ functions (which are dense in ${L^p}$). A natural question to ask is: for which values of ${p}$ in ${1 \leq p \leq \infty}$ is the operator ${T_m}$ an ${L^p \rightarrow L^p}$ bounded operator? When ${T_m}$ is bounded in a certain ${L^p}$ space, we say that it is an ${L^p}$multiplier.

The operator ${T_m}$ introduced in Section 1 of the first post in this series is an example of a multiplier, with symbol ${m(\xi,\tau) = \tau / (\tau - 2\pi i |\xi|^2)}$. It is the linear operator that satisfies the formal identity $T \circ (\partial_t - \Delta) = \partial_t$. We have seen that it cannot be a (euclidean) Calderón-Zygmund operator, and thus in particular it cannot be a Hörmander-Mikhlin multiplier. This can be seen more directly by the fact that any Hörmander-Mikhlin condition of the form ${|\partial^{\alpha}m(\xi,\tau)| \lesssim_\alpha |(\xi,\tau)|^{-|\alpha|} = (|\xi|^2 + \tau^2)^{-|\alpha|/2}}$ is clearly incompatible with the rescaling invariance of the symbol ${m}$, which satisfies ${m(\lambda \xi, \lambda^2 \tau) = m(\xi,\tau)}$ for any ${\lambda \neq 0}$. However, the derivatives of ${m}$ actually satisfy some other superficially similar conditions that are of interest to us. Indeed, letting ${(\xi,\tau) \in \mathbb{R}^2}$ for simplicity, we can see for example that ${\partial_\xi \partial_\tau m(\xi, \tau) = \lambda^3 \partial_\xi \partial_\tau m(\lambda\xi, \lambda^2\tau)}$. When ${|\tau|\lesssim |\xi|^2}$ we can therefore argue that ${|\partial_\xi \partial_\tau m(\xi, \tau)| = |\xi|^{-3} |\partial_\xi \partial_\tau m(1, \tau |\xi|^{-2})| \lesssim |\xi|^{-1} |\tau|^{-1} \sup_{|\eta|\lesssim 1} |\partial_\xi \partial_\tau m(1, \eta)|}$, and similarly when ${|\tau|\gtrsim |\xi|^2}$; this shows that for any ${(\xi, \tau)}$ with ${\xi,\tau \neq 0}$ one has

$\displaystyle |\partial_\xi \partial_\tau m(\xi, \tau)| \lesssim |\xi|^{-1} |\tau|^{-1}.$

This condition is comparable with the corresponding Hörmander-Mikhlin condition only when ${|\xi| \sim |\tau|}$, and is vastly different otherwise, being of product type (also notice that the inequality above is compatible with the rescaling invariance of ${m}$, as it should be).