Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single L^p space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if R denotes the Fourier restriction operator of the sphere \mathbb{S}^{d-1} (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}

(defined initially on Schwartz functions), then

Stein-Tomas theorem: R satisfies the a-priori inequality

\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)


for all exponents {p} such that 1 \leq p \leq \frac{2(d+1)}{d+3} (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents 1 \leq p < \frac{2(d+1)}{d+3} by interpolation of Lorentz spaces with the p=1 case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set {E} one has

\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)

where the LHS is clearly the L^{2(d+1)/(d+3)} norm of the characteristic function \mathbf{1}_E . Notice that we could write the inequality equivalently as \|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}} .

1. The proof

We prove inequality (2) right away. The proof follows the T^\ast T method to start: we have

\displaystyle \begin{aligned} \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1}, d\sigma)}^2 = &  \int_{\mathbb{S}^{d-1}} |\widehat{\mathbf{1}_E}|^2 d\sigma \\ = & \int_{\widehat{\mathbb{R}^d}} \overline{\widehat{\mathbf{1}_E}} \cdot (\widehat{\mathbf{1}_E} d\sigma) \\ = & \int_{\mathbb{R}^d} \mathbf{1}_E \cdot (\mathbf{1}_E \ast (d\sigma)^{\vee}) dx, \end{aligned}

where in the last line we have used Plancherel (in the manipulations above I am being a bit liberal; you can check that everything can be justified); also, (d\sigma)^{\vee} is the inverse Fourier transform of the spherical measure. Notice the operator f \mapsto f \ast (d\sigma)^{\vee} is exactly R^\ast R , where R^\ast is the (formal) adjoint of R .
We will use Hölder’s inequality to bound the above, but FIRST we will split the spherical measure d\sigma into two parts in terms of its frequencies, according to a cutoff that we will choose carefully at the end of the argument. This is exactly how Bourgain proceeds for the spherical maximal function!
Let therefore \varphi denote a Schwartz function such that \varphi^{\vee} \equiv 1 on the unit ball B(0,1)  and \varphi^{\vee} \equiv 0 outside B(0,2)  . For a certain positive parameter \lambda that will be chosen later (this is the threshold that separates low frequencies from high frequencies), we define \sigma_1, \sigma_2 to be given by

\displaystyle (d\sigma)^{\vee}(x) = (d\sigma)^{\vee}(x)\varphi^{\vee}(\lambda^{-1}x) + (d\sigma)^{\vee}(x)(1 - \varphi^{\vee}(\lambda^{-1}x)) =: \sigma_1^{\vee}(x) + \sigma_2^{\vee}(x);

thus \sigma_1 is the low frequency part (frequencies \lesssim \lambda) and \sigma_2 is the high frequency part (frequencies \gtrsim \lambda ).
[The terminology here is a bit misleading: the surface measure d\sigma already lives in the frequency space \widehat{\mathbb{R}^d}, so d\sigma^\vee actually lives in physical space \mathbb{R}^{d}; but the two are equivalent, and so we can retain this point of view.]
Using these two functions we can split the operator R^\ast R into the operators T_1, T_2 :

\displaystyle f \ast (d\sigma)^{\vee}(x) = f \ast (\sigma_1)^{\vee}+ f\ast (\sigma_2)^{\vee}(x) =: T_1 f + T_2 f;

we have therefore

\displaystyle \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)}^2 = \langle \mathbf{1}_E, T_1 \mathbf{1}_E \rangle + \langle \mathbf{1}_E, T_2 \mathbf{1}_E \rangle. \ \ \ \ \ \ (3)


Precisely as in Bourgain’s case, it will not be necessary to estimate the operators T_1, T_2 in the same norms: we can estimate them separately in different L^p spaces and then optimise in the parameter \lambda , taking advantage of the fact that \|\mathbf{1}_E \|_{L^p} = |E|^{1/p} and therefore we can pass from one Lebesgue space to another by changing the exponent of |E| .

It turns out that it is extremely easy to estimate T_1, T_2 , since the spaces that do the job are very convenient ones – in the end, we will reduce to a couple of L^\infty estimates, namely \|\sigma_1\|_{L^\infty} and \|\sigma_2^{\vee}\|_{L^\infty} .

1.1. L^1 \to L^\infty estimate for T_2 \mathbf{1}_E

Starting with T_2 \mathbf{1}_E , which we will estimate in L^\infty , we argue trivially that

\displaystyle \|T_2 \mathbf{1}_E\|_{L^\infty(\mathbb{R}^d)} = \|\sigma_2^{\vee}  \ast \mathbf{1}_E\|_{L^\infty(\mathbb{R}^d)} \leq \|\sigma_2^{\vee}\|_{L^\infty(\mathbb{R}^d)} \|\mathbf{1}_E\|_{L^1(\mathbb{R}^d)} =  \|\sigma_2^{\vee}\|_{L^\infty(\mathbb{R}^d)} |E|;

but estimating \|\sigma_2^{\vee}\|_{L^\infty} is super easy, we simply have the standard decay estimates |(d\sigma)^{\vee}(x)|\lesssim |x|^{-(d-1)/2} , which immediately imply \|\sigma_2^{\vee}\|_{L^\infty} \lesssim \lambda^{-(d-1)/2} . We have shown

\displaystyle |\langle \mathbf{1}_E, T_2 \mathbf{1}_E \rangle| \lesssim \lambda^{-(d-1)/2} |E|^2. \ \ \ \ \ \ (4)

1.2. L^2 \to L^2 estimate for T_1 \mathbf{1}_E

Now we estimate T_1\mathbf{1}_E instead – in L^2 it will be easy. Indeed, we have by Plancherel (twice)

\displaystyle \|T_2 \mathbf{1}_E\|_{L^2(\mathbb{R}^d)} = \|\sigma_1 \widehat{\mathbf{1}_E}\|_{L^2(\widehat{\mathbb{R}^d})} \leq \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} \|\mathbf{1}_E\|_{L^2(\mathbb{R}^d)} = \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} |E|^{1/2},

so it suffices to estimate \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} (notice this is in frequency, unlike the estimate for \sigma_2 – indeed, we are using \widehat{\mathbb{R}^d} to refer to the space of frequencies of \mathbb{R}^d for clarity). Now, technically we have by definition of \sigma_1 that \sigma_1 = d\sigma \ast \varphi . Let us use this information heuristically first, to guess the maximum height of \sigma_1 . We have that |\varphi| is essentially a bump function concentrated on the ball B(0,O(\lambda^{-1})) and normalised so that it has integral \sim 1 ; therefore d\sigma \ast \varphi is an average of d\sigma at scale \lambda^{-1} , and it should be largest close to the unit sphere \mathbb{S}^{d-1}, where it is about \sigma(B(\lambda^{-1}) \cap \mathbb{S}^{d-1}) / |B(\lambda^{-1})| \sim (\lambda^{-1})^{d-1} / (\lambda^{-1})^d = \lambda . This is our guess.
It is not hard to turn the above heuristic into a rigorous estimate. Indeed, by splitting dyadically, we have

\displaystyle |\sigma_1(\xi)| \leq \int_{B(\xi, \lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) + \sum_{k>0} \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega),

and

\displaystyle \int_{B(\xi, \lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) \leq \|\varphi\|_{L^\infty} \sigma(B(\xi,\lambda^{-1}) \cap \mathbb{S}^{d-1}) \lesssim \lambda^{d} \cdot \lambda^{-(d-1)} = \lambda,

as desired. As for the other terms, since \varphi is a Schwartz function we have (taking note of the normalisation) |\varphi(\eta)| \lesssim \lambda^{d} (1 + \lambda|\eta|)^{-N} for a large {N} ; thus

\displaystyle \begin{aligned} \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} |\varphi(\xi - \omega)| d\sigma(\omega) \lesssim &  \int_{B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})} \lambda^{d} \lambda^{-N}|\xi - \omega|^{-N} d\sigma(\omega) \\ \sim & \lambda^{d} 2^{-kN} \sigma((B(\xi, 2^{k+1}\lambda^{-1}) \backslash B(\xi, 2^{k}\lambda^{-1})) \cap \mathbb{S}^{d-1}) \\ \sim & \lambda^{d} 2^{-kN} (2^k\lambda^{-1})^{d-1} = \lambda 2^{-(N-d)k}, \end{aligned}

and summing in {k} we obtain another contribution of \lambda and the claim derived heuristically is proven to be correct.
[In Bourgain’s proof we had to estimate \|\sup_{|y|\geq |x|}|\sigma_1(y)|\|_{L^1} instead, and that required a little more attention; here we only need \|\sigma_1\|_{L^\infty} and this is much easier.]
Thus we have shown (by Cauchy-Schwarz)

\displaystyle |\langle \mathbf{1}_E, T_1 \mathbf{1}_E \rangle| \lesssim \lambda |E|. \ \ \ \ \ \ (5)

Remark: one could argue that \|\sigma_1\|_{L^\infty(\widehat{\mathbb{R}^d})} \leq \|\sigma_1^\vee\|_{L^1(\mathbb{R}^d)} by Hausdorff-Young and then use again the decay estimates for (d\sigma)^\vee to estimate the latter; but the bound thus obtained would be too large to be of any use. Indeed, Hausdorff-Young is only efficient for functions that look like gaussians, and \sigma_1 is far from looking like one.

1.3. Concluding the argument

Putting (3), (4) and (5) together we have obtained

\displaystyle \|\widehat{\mathbf{1}_E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)}^2 \lesssim \lambda |E| + \lambda^{-(d-1)/2}|E|^2;

to optimise this estimate we need to choose \lambda so that \lambda|E| \sim \lambda^{-(d-1)/2} |E|^2 , that is \lambda \sim |E|^{2/(d+1)}. With this value of \lambda the LHS of the above inequality becomes \sim \lambda|E| \sim |E|^{1 + 2/(d+1)} = |E|^{(d+3)/(d+1)} , and this is precisely the LHS of (2) squared, as desired! The proof is concluded. \Box

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s