# Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single $L^p$ space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if $R$ denotes the Fourier restriction operator of the sphere $\mathbb{S}^{d-1}$ (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

$\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}$

(defined initially on Schwartz functions), then

Stein-Tomas theorem: $R$ satisfies the a-priori inequality

$\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)$

for all exponents ${p}$ such that $1 \leq p \leq \frac{2(d+1)}{d+3}$ (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents $1 \leq p$ < $\frac{2(d+1)}{d+3}$ by interpolation of Lorentz spaces with the $p=1$ case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set ${E}$ one has

$\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)$

where the LHS is clearly the $L^{2(d+1)/(d+3)}$ norm of the characteristic function $\mathbf{1}_E$. Notice that we could write the inequality equivalently as $\|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}$.

1. The proof

We prove inequality (2) right away. The proof follows the $T^\ast T$ method to start: we have

\displaystyle \begin{aligned} \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1}, d\sigma)}^2 = & \int_{\mathbb{S}^{d-1}} |\widehat{\mathbf{1}_E}|^2 d\sigma \\ = & \int_{\widehat{\mathbb{R}^d}} \overline{\widehat{\mathbf{1}_E}} \cdot (\widehat{\mathbf{1}_E} d\sigma) \\ = & \int_{\mathbb{R}^d} \mathbf{1}_E \cdot (\mathbf{1}_E \ast (d\sigma)^{\vee}) dx, \end{aligned}

where in the last line we have used Plancherel (in the manipulations above I am being a bit liberal; you can check that everything can be justified); also, $(d\sigma)^{\vee}$ is the inverse Fourier transform of the spherical measure. Notice the operator $f \mapsto f \ast (d\sigma)^{\vee}$ is exactly $R^\ast R$, where $R^\ast$ is the (formal) adjoint of $R$.
We will use Hölder’s inequality to bound the above, but FIRST we will split the spherical measure $d\sigma$ into two parts in terms of its frequencies, according to a cutoff that we will choose carefully at the end of the argument. This is exactly how Bourgain proceeds for the spherical maximal function!
Let therefore $\varphi$ denote a Schwartz function such that $\varphi^{\vee} \equiv 1$ on the unit ball $B(0,1)$ and $\varphi^{\vee} \equiv 0$ outside $B(0,2)$. For a certain positive parameter $\lambda$ that will be chosen later (this is the threshold that separates low frequencies from high frequencies), we define $\sigma_1, \sigma_2$ to be given by

$\displaystyle (d\sigma)^{\vee}(x) = (d\sigma)^{\vee}(x)\varphi^{\vee}(\lambda^{-1}x) + (d\sigma)^{\vee}(x)(1 - \varphi^{\vee}(\lambda^{-1}x)) =: \sigma_1^{\vee}(x) + \sigma_2^{\vee}(x);$

thus $\sigma_1$ is the low frequency part (frequencies $\lesssim \lambda$) and $\sigma_2$ is the high frequency part (frequencies $\gtrsim \lambda$).
[The terminology here is a bit misleading: the surface measure $d\sigma$ already lives in the frequency space $\widehat{\mathbb{R}^d}$, so $d\sigma^\vee$ actually lives in physical space $\mathbb{R}^{d}$; but the two are equivalent, and so we can retain this point of view.]
Using these two functions we can split the operator $R^\ast R$ into the operators $T_1, T_2$:

$\displaystyle f \ast (d\sigma)^{\vee}(x) = f \ast (\sigma_1)^{\vee}+ f\ast (\sigma_2)^{\vee}(x) =: T_1 f + T_2 f;$

we have therefore

$\displaystyle \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)}^2 = \langle \mathbf{1}_E, T_1 \mathbf{1}_E \rangle + \langle \mathbf{1}_E, T_2 \mathbf{1}_E \rangle. \ \ \ \ \ \ (3)$

Precisely as in Bourgain’s case, it will not be necessary to estimate the operators $T_1, T_2$ in the same norms: we can estimate them separately in different $L^p$ spaces and then optimise in the parameter $\lambda$, taking advantage of the fact that $\|\mathbf{1}_E \|_{L^p} = |E|^{1/p}$ and therefore we can pass from one Lebesgue space to another by changing the exponent of $|E|$.

It turns out that it is extremely easy to estimate $T_1, T_2$, since the spaces that do the job are very convenient ones – in the end, we will reduce to a couple of $L^\infty$ estimates, namely $\|\sigma_1\|_{L^\infty}$ and $\|\sigma_2^{\vee}\|_{L^\infty}$.

1.1. $L^1 \to L^\infty$ estimate for $T_2 \mathbf{1}_E$

Starting with $T_2 \mathbf{1}_E$, which we will estimate in $L^\infty$, we argue trivially that

$\displaystyle \|T_2 \mathbf{1}_E\|_{L^\infty(\mathbb{R}^d)} = \|\sigma_2^{\vee} \ast \mathbf{1}_E\|_{L^\infty(\mathbb{R}^d)} \leq \|\sigma_2^{\vee}\|_{L^\infty(\mathbb{R}^d)} \|\mathbf{1}_E\|_{L^1(\mathbb{R}^d)} = \|\sigma_2^{\vee}\|_{L^\infty(\mathbb{R}^d)} |E|;$

but estimating $\|\sigma_2^{\vee}\|_{L^\infty}$ is super easy, we simply have the standard decay estimates $|(d\sigma)^{\vee}(x)|\lesssim |x|^{-(d-1)/2}$, which immediately imply $\|\sigma_2^{\vee}\|_{L^\infty} \lesssim \lambda^{-(d-1)/2}$. We have shown

$\displaystyle |\langle \mathbf{1}_E, T_2 \mathbf{1}_E \rangle| \lesssim \lambda^{-(d-1)/2} |E|^2. \ \ \ \ \ \ (4)$

1.2. $L^2 \to L^2$ estimate for $T_1 \mathbf{1}_E$

Now we estimate $T_1\mathbf{1}_E$ instead – in $L^2$ it will be easy. Indeed, we have by Plancherel (twice)

$\displaystyle \|T_2 \mathbf{1}_E\|_{L^2(\mathbb{R}^d)} = \|\sigma_1 \widehat{\mathbf{1}_E}\|_{L^2(\widehat{\mathbb{R}^d})} \leq \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} \|\mathbf{1}_E\|_{L^2(\mathbb{R}^d)} = \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} |E|^{1/2},$

so it suffices to estimate $\|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})}$ (notice this is in frequency, unlike the estimate for $\sigma_2$ – indeed, we are using $\widehat{\mathbb{R}^d}$ to refer to the space of frequencies of $\mathbb{R}^d$ for clarity). Now, technically we have by definition of $\sigma_1$ that $\sigma_1 = d\sigma \ast \varphi$. Let us use this information heuristically first, to guess the maximum height of $\sigma_1$. We have that $|\varphi|$ is essentially a bump function concentrated on the ball $B(0,O(\lambda^{-1}))$ and normalised so that it has integral $\sim 1$; therefore $d\sigma \ast \varphi$ is an average of $d\sigma$ at scale $\lambda^{-1}$, and it should be largest close to the unit sphere $\mathbb{S}^{d-1}$, where it is about $\sigma(B(\lambda^{-1}) \cap \mathbb{S}^{d-1}) / |B(\lambda^{-1})| \sim (\lambda^{-1})^{d-1} / (\lambda^{-1})^d = \lambda$. This is our guess.
It is not hard to turn the above heuristic into a rigorous estimate. Indeed, by splitting dyadically, we have

$\displaystyle |\sigma_1(\xi)| \leq \int_{B(\xi, \lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) + \sum_{k>0} \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega),$

and

$\displaystyle \int_{B(\xi, \lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) \leq \|\varphi\|_{L^\infty} \sigma(B(\xi,\lambda^{-1}) \cap \mathbb{S}^{d-1}) \lesssim \lambda^{d} \cdot \lambda^{-(d-1)} = \lambda,$

as desired. As for the other terms, since $\varphi$ is a Schwartz function we have (taking note of the normalisation) $|\varphi(\eta)| \lesssim \lambda^{d} (1 + \lambda|\eta|)^{-N}$ for a large ${N}$; thus

\displaystyle \begin{aligned} \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) \lesssim & \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} \lambda^{d} \lambda^{-N}|\xi - \omega|^{-N} d\sigma(\omega) \\ \sim & \lambda^{d} 2^{-kN} \sigma((B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})) \cap \mathbb{S}^{d-1}) \\ \sim & \lambda^{d} 2^{-kN} (2^k\lambda^{-1})^{d-1} = \lambda 2^{-(N-d)k}, \end{aligned}

and summing in ${k}$ we obtain another contribution of $\lambda$ and the claim derived heuristically is proven to be correct.
[In Bourgain’s proof we had to estimate $\|\sup_{|y|\geq |x|}|\sigma_1(y)|\|_{L^1}$ instead, and that required a little more attention; here we only need $\|\sigma_1\|_{L^\infty}$ and this is much easier.]
Thus we have shown (by Cauchy-Schwarz)

$\displaystyle |\langle \mathbf{1}_E, T_1 \mathbf{1}_E \rangle| \lesssim \lambda |E|. \ \ \ \ \ \ (5)$

Remark: one could argue that $\|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} \leq \|\sigma_1^\vee\|_{L^1(\mathbb{R}^d)}$ by Hausdorff-Young and then use again the decay estimates for $(d\sigma)^\vee$ to estimate the latter; but the bound thus obtained would be too large to be of any use. Indeed, Hausdorff-Young is only efficient for functions that look like gaussians, and $\sigma_1$ is far from looking like one.

1.3. Concluding the argument

Putting (3), (4) and (5) together we have obtained

$\displaystyle \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)}^2 \lesssim \lambda |E| + \lambda^{-(d-1)/2}|E|^2;$

to optimise this estimate we need to choose $\lambda$ so that $\lambda|E| \sim \lambda^{-(d-1)/2} |E|^2$, that is $\lambda \sim |E|^{2/(d+1)}$. With this value of $\lambda$ the LHS of the above inequality becomes $\sim \lambda|E| \sim |E|^{1 + 2/(d+1)} = |E|^{(d+3)/(d+1)}$, and this is precisely the LHS of (2) squared, as desired! The proof is concluded. $\Box$