# Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single $L^p$ space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if $R$ denotes the Fourier restriction operator of the sphere $\mathbb{S}^{d-1}$ (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

$\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}$

(defined initially on Schwartz functions), then

Stein-Tomas theorem: $R$ satisfies the a-priori inequality

$\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)$

for all exponents ${p}$ such that $1 \leq p \leq \frac{2(d+1)}{d+3}$ (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents $1 \leq p$ < $\frac{2(d+1)}{d+3}$ by interpolation of Lorentz spaces with the $p=1$ case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set ${E}$ one has

$\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)$

where the LHS is clearly the $L^{2(d+1)/(d+3)}$ norm of the characteristic function $\mathbf{1}_E$. Notice that we could write the inequality equivalently as $\|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}$.

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part II

This is the second and final part of an entry dedicated to a very interesting and inventive trick due to Bourgain. In part I we saw a lemma on maximal Fourier projections due to Bourgain, together with the context it arises from (the study of pointwise ergodic theorems for polynomial sequences); we also saw a baby version of the idea to come, that we used to prove the Rademacher-Menshov theorem (recall that the idea was to represent the indices in the supremum in their binary positional notation form and to rearrange the supremum accordingly). Today we finally get to see Bourgain’s trick.

Before we start, recall the statement of Bourgain’s lemma:

Lemma 1 [Bourgain]: Let $K$ be an integer and let $\Lambda = \{\lambda_1, \ldots, \lambda_K \}$ a set of ${K}$ distinct frequencies. Define the maximal frequency projections

$\displaystyle \mathcal{B}_\Lambda f(x) := \sup_{j} \Big|\sum_{k=1}^{K} (\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}\Big|,$

where the supremum is restricted to those ${j \geq j_0}$ with $j_0 = j_0(\Lambda)$ being the smallest integer such that $2^{-j_0} \leq \frac{1}{2}\min \{ |\lambda_k - \lambda_{k'}| : 1\leq k\neq k'\leq K \}$.
Then

$\displaystyle \|\mathcal{B}_\Lambda f\|_{L^2} \lesssim (\log \#\Lambda)^2 \|f\|_{L^2}.$

Here we are using the notation $(\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}$ in the statement in place of the expanded formula $\int_{|\xi - \lambda_k| < 2^{-j}} \widehat{f}(\xi) e^{2\pi i \xi x} d\xi$. Observe that by the definition of $j_0$ we have that the intervals $[\lambda_k - 2^{-j_0}, \lambda_k + 2^{-j_0}]$ are disjoint (and $j_0$ is precisely maximal with respect to this condition).
We will need to do some reductions before we can get to the point where the trick makes its appearance. These reductions are the subject of the next section.

3. Initial reductions

A first important reduction is that we can safely replace the characteristic functions $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}$ by smooth bump functions with comparable support. Indeed, this is the result of a very standard square-function argument which was already essentially presented in Exercise 22 of the 3rd post on basic Littlewood-Paley theory. Briefly then, let $\varphi$ be a Schwartz function such that $\widehat{\varphi}$ is a smooth bump function compactly supported in the interval $[-1,1]$ and such that $\widehat{\varphi} \equiv 1$ on the interval $[-1/2, 1/2]$. Let $\varphi_j (x) := \frac{1}{2^j} \varphi \Big(\frac{x}{2^j}\Big)$ (so that $\widehat{\varphi_j}(\xi) = \widehat{\varphi}(2^j \xi)$) and let for convenience $\theta_j$ denote the difference $\theta_j := \mathbf{1}_{[-2^{-j}, 2^{-j}]} - \widehat{\varphi_j}$. We have that the difference

$\displaystyle \sup_{j\geq j_0(\Lambda)} \Big|\sum_{k=1}^{K} ((\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} - \widehat{\varphi_j}(\cdot - \lambda_k)) \widehat{f})^{\vee}\Big|$

is an $L^2$ bounded operator with norm $O(1)$ (that is, independent of $K$). Indeed, observe that $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}(\xi) - \widehat{\varphi_j}(\xi - \lambda_k) = \theta_j (\xi - \lambda_k)$, and bounding the supremum by the $\ell^2$ sum we have that the $L^2$ norm (squared) of the operator above is bounded by

$\displaystyle \sum_{j \geq j_0(\Lambda)} \Big\|\sum_{k=1}^{K} (\theta_j(\cdot - \lambda_k)\widehat{f})^{\vee}\Big\|_{L^2}^2,$

where the summation in ${j}$ is restricted in the same way as the supremum is in the lemma (that is, the intervals $[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]$ must be pairwise disjoint). By an application of Plancherel we see that the above is equal to

$\displaystyle \sum_{k=1}^{K} \Big\| \widehat{f}(\xi) \Big[\sum_{j \geq j_0} \theta_j(\xi - \lambda_k) \Big]\Big\|_{L^2}^2;$

but notice that the functions $\theta_j$ have supports disjoint in ${j}$, and therefore the multiplier satisfies $\sum_{j\geq j_0} \theta_j(\xi - \lambda_k) \lesssim 1$ in a neighbourhood of $\lambda_k$, and vanishes outside such neighbourhood. A final application of Plancherel allows us to conclude that the above is bounded by $\lesssim \|f\|_{L^2}^2$ by orthogonality (these neighbourhoods being all disjoint as well).
By triangle inequality, we see therefore that in order to prove Lemma 1 it suffices to prove that the operator

$\displaystyle \sup_{j} \Big|\sum_{k=1}^{K} (\widehat{\varphi_j}(\cdot - \lambda_k) \widehat{f})^{\vee}\Big|$

is $L^2$ bounded with norm at most $O((\log \#\Lambda)^2)$.

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part I

If you are reading this blog, you have probably heard that Jean Bourgain – one of the greatest analysts of the last century – has unfortunately passed away last December. It is fair to say that the progress of analysis will slow down significantly without him. I am not in any position to give a eulogy to this giant, but I thought it would be nice to commemorate him by talking occasionally on this blog about some of his many profound papers and his crazily inventive tricks. That’s something everybody agrees on: Bourgain was able to come up with a variety of insane tricks in a way that no one else is. The man was a problem solver and an overall magician: the first time you see one of his tricks, you don’t believe what’s happening in front of you. And that’s just the tricks part!

In this two-parts post I am going to talk about a certain trick that loosely speaking, involves representing points on an arbitrary set in a fashion similar to how integers are represented, say, in binary basis. I don’t know if this trick came straight out of Bourgain’s magical top hat or if he learned it from somewhere else; I haven’t seen it used elsewhere except for papers that cite Bourgain himself, so I’m inclined to attribute it to him – but please, correct me if I’m wrong.
Today we introduce the context for the trick (a famous lemma by Bourgain for maximal frequency projections on the real line) and present a toy version of the idea in a proof of the Rademacher-Menshov theorem. In the second part we will finally see the trick.

1. Ergodic averages along arithmetic sequences
First, some context. The trick I am going to talk about can be found in one of Bourgain’s major papers, that were among the ones cited in the motivation for his Fields medal prize. I am talking about the paper on a.e. convergence of ergodic averages along arithmetic sequences. The main result of that paper is stated as follows: let $(X,T,\mu)$ be an ergodic system, that is

1. $\mu$ is a probability on $X$;
2. $T: X \to X$ satisfies $\mu(T^{-1} A) = \mu(A)$ for all $\mu$-measurable sets $A$ (this is the invariance condition);
3. $T^{-1} A = A$ implies $\mu(A) = 0 \text{ or } 1$ (this is the ergodicity condition).

Then the result is

Theorem: [Bourgain, ’89] Let $(X,T,\mu)$ be an ergodic system and let $p(n)$ be a polynomial with integer coefficients. If $f \in L^q(d\mu)$ with ${q}$ > 1, then the averages $A_N f(x) := \frac{1}{N}\sum_{n=1}^{N}f(T^{p(n)} x)$ converge $\mu$-a.e. as $N \to \infty$; moreover, if ${T}$ is weakly mixing1, we have more precisely

$\displaystyle \lim_{N \to \infty} A_N f(x) = \int_X f d\mu$

for $\mu$-a.e. ${x}$.

For comparison, the more classical pointwise ergodic theorem of Birkhoff states the same for the case $p(n) = n$ and $f \in L^1(d\mu)$ (notice this is the largest of the $L^p(X,d\mu)$ spaces because $\mu$ is finite), in which case the theorem is deduced as a consequence of the $L^1 \to L^{1,\infty}$ boundedness of the Hardy-Littlewood maximal function. The dense class to appeal to is roughly speaking $L^2(X,d\mu)$, thanks to the ergodic theorem of Von Neumann, which states $A_N f$ converges in $L^2$ norm for $f \in L^2(X,d\mu)$. However, the details are non-trivial. Heuristically, these ergodic theorems incarnate a quantitative version of the idea that the orbits $\{T^n x\}_{n\in\mathbb{N}}$ fill up the entire space ${X}$ uniformly. I don’t want to enter into details because here I am just providing some context for those interested; there are plenty of introductions to ergodic theory where these results are covered in depth.