# Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

$\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,$

where $A_t$ denotes the spherical average at radius ${t}$, that is

$\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),$

where $d\sigma_{d-1}$ denotes the spherical measure on the $(d-1)$-dimensional sphere (we will omit the subscript from now on and just write $d\sigma$ since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator $\mathcal{M}_{\mathbb{S}^{d-1}}$ is $L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d)$ bounded for any $\frac{d}{d-1}$ < $p \leq \infty$.

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows $L^p \to L^p$ boundedness of the operator for every $2 \geq p$ > $\frac{d}{d-1}$ quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for $\mathcal{M}_{\mathbb{S}^{d-1}}$! The latter means that for any measurable $E$ of finite (Lebesgue) measure we have

$\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)$

which is exactly the $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the $L^{d/(d-1),1} \to L^{d/(d-1),\infty}$ inequality). The downside of Bourgain’s argument is that it only works in dimension $d \geq 4$, and thus misses the dimension $d=3$ that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.

1. Overview of the proof

As we mentioned above, the main conceptual difference between Stein’s proof and Bourgain’s proof is the following:

• Stein splits the function ${f}$ into countably many pieces, each being frequency-localised on a dyadic annulus:

$\displaystyle f = P_{\leq k}f + \sum_{j > k} P_j f,$

where $\widehat{P_j f}(\xi) = \psi(2^{-j}\xi)\widehat{f}(\xi)$ with $\psi$ a smooth bump function supported in $\{\xi \; : \; 1/2\leq |\xi| \leq 2\}$, and $P_{\leq k} = \sum_{j\leq k} P_j$ (here ${k}$ is fixed by the fact that we are estimating $\sup_{r \sim 2^{-k}} |A_r f|$).

• Bourgain, on the other hand, does not split the function ${f}$ at all, and instead splits the spherical measure $d\sigma$ itself in frequency in just two parts:

$\displaystyle d\sigma = d\sigma^{1} + d\sigma^{2},$

where $d\sigma^{1}$ is the low-frequency part of $d\sigma$ and $d\sigma^{2}$ is the high-frequency part, and the threshold separating high from low is carefully chosen according to $\alpha$ (in the restricted weak-type endpoint estimate (1) above).

With this in mind, we proceed to illustrate the structure of the proof.

First of all, observe that to prove the boundedness of the spherical maximal function in the full range it will suffice to prove (1). Indeed, we can use interpolation of Lorentz spaces to interpolate that inequality with the trivial $L^{\infty} \to L^{\infty}$ inequality to conclude strong $L^p \to L^p$ boundedness for any exponent $p \in \big(\frac{d}{d-1}, \infty\big]$, as detailed in an older post.
Let $\varphi$ be a Schwartz function with the properties that $\widehat{\varphi}$ is compactly supported in $B(0,2)$ and $\widehat{\varphi}(\xi) = 1$ if $|\xi| \leq 1$ (that is, the Fourier transform of $\varphi$ is a smooth bump function). For any $\lambda >0$ we let $\varphi_\lambda(x) := \lambda^d \varphi(\lambda x)$ (so that $\varphi_\lambda$ has unit $L^1$ norm and is concentrated in the ball of radius $\lambda^{-1}$). We then decompose the spherical measure $d\sigma$ as follows: for a value of $\lambda$ that will be chosen later (it is convenient to think of it as being quite large), let

$\displaystyle \sigma^1 := d\sigma \ast \varphi_\lambda,$

and consequently let $\sigma^2 := d\sigma - \sigma^1$, so that we have the decomposition

$\displaystyle d\sigma = \sigma^1 + \sigma^2.$

Notice that in frequency this decomposition reads

$\displaystyle \widehat{d\sigma}(\xi) = \widehat{d\sigma}(\xi) \widehat{\varphi}(\lambda^{-1}\xi) + \widehat{d\sigma}(\xi) (1 - \widehat{\varphi}(\lambda^{-1}\xi)),$

and by the properties of $\varphi$ we have therefore that $\sigma^1$ consists of the low-frequency part of $d\sigma$ and $\sigma^2$ consists of the high-frequency part, the threshold between the two being roughly $\lambda$ (that we have to choose). We split the spherical maximal function $\mathscr{M}_{\mathbb{S}^{d-1}}$ accordingly: define

\displaystyle \begin{aligned} M^{(1)}f := & \sup_{t>0} | f \ast \sigma^1_t|, \\ M^{(2)}f := & \sup_{t>0} | f \ast \sigma^2_t|, \end{aligned}

where, for future convenience, we adopt the notation opposite to the one used for $\varphi_\lambda$, that is here we are using $\sigma^1_t(y) := \frac{1}{t^d} \sigma^1\big(\frac{x}{t}\big)$, so that $f \ast \sigma^1_t (x) = \int f(x-y) \frac{1}{t^d} \sigma^1\Big(\frac{y}{t}\Big) \,dy$. Then we have pointwise

$\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}}f \leq M^{(1)}f + M^{(2)}f;$

in the interest of proving (1), we can use this pointwise inequality to bound (for a generic function ${f}$ instead of a characteristic function only)

\displaystyle \begin{aligned} |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}f(x) > \alpha \}| \leq & \Big|\Big\{x \in \mathbb{R}^d \; : \; M^{(1)}f(x) > \frac{\alpha}{2} \Big\}\Big| \\ & + \Big|\Big\{x \in \mathbb{R}^d \; : \; M^{(2)}f(x) > \frac{\alpha}{2} \Big\}\Big|. \end{aligned}

At this point, we might be tempted to look for a $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ inequality for both $M^{(1)}, M^{(2)}$ to conclude. However, we will not be able to do so. We will not even be able to bound the operators $M^{(1)}, M^{(2)}$ in the same Lebesgue spaces! However, this will not be an insurmountable obstacle: the only downside is that we will have to give up any hope of proving a weak-type inequality and content ourselves with a restricted weak-type inequality instead (although, see the last section of this post for why we are not really giving up anything). This is a general fact and a very important one to keep in mind when trying to prove inequalities by splitting the operators! Let us see how the argument works.
We will show that for arbitrary functions ${f}$ we have

$\displaystyle |\{x \in \mathbb{R}^d \; : \; M^{(1)}f(x) > \alpha \}| \lesssim \frac{\lambda}{\alpha} \|f\|_{L^1}, \ \ \ \ \ \ (2)$

and that

$\displaystyle \| M^{(2)}f \|_{L^2} \lesssim \lambda^{-(d-2)/2} \|f\|_{L^2}. \ \ \ \ \ \ (3)$

The first one is an $L^1 \to L^{1,\infty}$ estimate and the second one is an $L^2 \to L^2$ estimate. But look at what happens when we feed these inequalities a characteristic function $f = \mathbf{1}_E$: inequality (2) says that

$\displaystyle \Big|\Big\{x \in \mathbb{R}^d \; : \; M^{(1)}\mathbf{1}_{E}(x) > \frac{\alpha}{2} \Big\}\Big| \lesssim \frac{\lambda}{\alpha} |E|,$

and inequality (3) says, by Chebyshev’s inequality, that

$\displaystyle \Big|\Big\{x \in \mathbb{R}^d \; : \; M^{(2)}\mathbf{1}_{E}(x) > \frac{\alpha}{2} \Big\}\Big| \lesssim \frac{\lambda^{-(d-2)}}{\alpha^2} |E|,$

which together imply

$\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathscr{M}_{\mathbb{S}^{d-1}}\mathbf{1}_{E}(x) > \alpha \}| \lesssim \Big(\frac{\lambda}{\alpha} + \frac{\lambda^{-(d-2)}}{\alpha^2}\Big) |E|$

by the pointwise domination. Now recall that we have not yet chosen the parameter $\lambda$. If we choose $\lambda$ in such a way as to optimise the factor above, we see that we have to choose1 $\lambda = \alpha^{-1/(d-1)}$, which gives

$\displaystyle \frac{\lambda}{\alpha} + \frac{\lambda^{-(d-2)}}{\alpha^2} = \alpha^{-1 -1/(d-1)} + \alpha^{-2 +(d-2)/(d-1)} = \alpha^{-d/(d-1)},$

and therefore we obtain precisely (1) !

In the rest of this post we will therefore prove inequalities (2) and (3), each in its own section.

2. Proof of inequality (2) for $M^{(1)}$

This inequality is the easiest of the two. It will follow immediately from the pointwise bound

$\displaystyle M^{(1)}f \lesssim \lambda Mf,$

where ${M}$ here is the Hardy-Littlewood maximal function (it is instructive to keep track of the parallels between this proof and Stein’s proof). Heuristically, this inequality is straightforward: indeed, observe that $\sigma^1 = d\sigma \ast \varphi_\lambda$ is essentially concentrated in the $\lambda^{-1}$-neighbourhood of the unit sphere $\mathbb{S}^{d-1}$ (a shell) and has unit $L^1$ norm. Since the volume of the neighbourhood is $\sim \lambda^{-1}$, this means that $\sigma^1$ is about $\sim \lambda$ in this shell, and nearly vanishing everywhere else. Writing the shells as the difference of two balls of radii that differ by $\lambda^{-1}$, we see that each term $f \ast \sigma^1_t$ is split into is dominated (morally) by $\lesssim \lambda \frac{1}{|B(0,t)|}\int_{B(0,t)}|f(x - y)|\,dy$, and the claimed bound follows once we take the supremum in ${t}$.

Now we turn the heuristic into a rigorous argument, although the result will be a bit cumbersome. We will show that for a large ${N}$ we have the coarse estimate

$\displaystyle |\sigma^1(y)| \lesssim_N \lambda \mathbf{1}_{B(0,3/2)}(y) + \lambda^{d-N} \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) |y|^{-N}; \ \ \ \ \ \ (4)$

as a consequence, the radial non-increasing majorant of $\sigma^1$, which is defined as $\sigma^\ast(x) := \sup_{y \,:\,|y|\geq |x|} |\sigma^1(y)|$ is dominated by the same RHS above (since it is already radial and non-increasing), and therefore by a trivial calculation we have $\|\sigma^\ast\|_{L^1} \lesssim \lambda$. This is enough to argue that $M^{(1)}f \lesssim \lambda Mf$, by a very standard argument (see Stein’s “Harmonic Analysis: real-variable methods, orthogonality and oscillatory integrals”, Ch. II, § 2.1).
We will prove something stronger than the bound (4). Since we have the estimate $|\varphi(x)| \lesssim_N (1 + |x|)^{-N}$ for any ${N}>0$ (due to smoothness of $\widehat{\varphi}$), we have

$\displaystyle |\sigma^1(y)| = \Big|\int_{\mathbb{S}^{d-1}} \varphi_\lambda (y-\omega)d\sigma(\omega)\Big| \lesssim_N \lambda^d \int_{\mathbb{S}^{d-1}} (1 + \lambda|y-\omega|)^{-N} d\sigma(\omega)$

and it will therefore make sense to partition according to the possible dyadic values that $\lambda|y-\omega|$ can take. This is what we do next.
For $k \in \mathbb{N}\backslash\{0\}$ such that $2^k \leq \lambda$, define the shells

$\displaystyle S_k := \{z \in \mathbb{R}^d \; : \; 2^{k}\lambda^{-1} < \mathrm{dist}(z,\mathbb{S}^{d-1}) \leq 2^{k+1} \lambda^{-1}\},$

and for $k =0$ let instead

$\displaystyle S_0 := \{z \in \mathbb{R}^d \; : \; \mathrm{dist}(z,\mathbb{S}^{d-1}) \leq \lambda^{-1}\}.$

The sets $S_k$ look like pairs of concentric shells (except for the one where $\lambda/2 \leq 2^k \leq \lambda$, which looks like a shell and a ball; but this is just an inconsequential observation). The complement of the union of all the shells $S_k$ is the complement of a certain ball and it is clearly contained in, say, the complement of $B(0,3/2)$; thus we can bound

$\displaystyle |\sigma^1(y)| \leq \sum_{k \in \mathbb{N} \,: 2^k \leq \lambda} \mathbf{1}_{S_k}(y) |\sigma^1(y)| + \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) |\sigma^1(y)|.$

The rightmost term on the RHS is easily bounded pointwise: since in $\mathbb{R}^d \backslash B(0,3/2)$ we have $|y-\omega| \sim |y|$ for any $\omega \in \mathbb{S}^{d-1}$, we simply have

$\displaystyle \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) |\sigma^1(y)| \lesssim \lambda^{d} \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) \int_{\mathbb{S}^{d-1}}(\lambda|y|)^{-N} d\sigma \sim \lambda^{d-N} \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) |y|^{-N},$

which gives us the second term in (4). For the shells instead, define the following subsets of the sphere: for $j \in \mathbb{Z}$

$\displaystyle \Sigma_{j,y} := \{\omega \in \mathbb{S}^{d-1} \; : \; 2^{j}\lambda^{-1} < |y-\omega| \leq 2^{j+1}\lambda^{-1}\}.$

It is immediate that if $y \in S_k$ then $\Sigma_{j,y} = \emptyset$ for all ${j}$ < ${k}$ (and similarly, once $2^j \lambda^{-1} \gg 1$, the set $\Sigma_{j,y}$ will be empty). It is also clear that the sets $\Sigma_{j,y}$ partition the sphere. Finally, observe that the typical set $\Sigma_{j,y}$ is geometrically very close to being a dyadic annulus of radius $\sim 2^j \lambda^{-1}$ in $(d-1)$-dimensions, and in particular we can see that $\sigma(\Sigma_{j,y}) \lesssim (2^j \lambda^{-1})^{d-1}$.
With $k \in \mathbb{N}\backslash\{0\}$ fixed and $y \in S_k$ we can therefore write

\displaystyle \begin{aligned} \lambda^d \int_{\mathbb{S}^{d-1}} (1 + \lambda|y-\omega|)^{-N} d\sigma(\omega) = & \lambda^d \sum_{j} \int_{\Sigma_{j,y}} (1 + \lambda|y-\omega|)^{-N} d\sigma(\omega) \\ \sim & \lambda^d \sum_{j} 2^{-j N} \sigma(\Sigma_{j,y}) \\ \lesssim & \lambda^d \sum_{j} 2^{-j N} 2^{j(d-1)} \lambda^{-(d-1)} \\ \sim & \lambda 2^{-k(N - d + 1)}; \end{aligned}

a similar argument for $S_0$ shows that $|\sigma^1(y)| \lesssim \lambda$ when $y \in S_0$. Thus we have shown that

$\displaystyle |\sigma^1(y)| \lesssim \lambda \sum_{k \in \mathbb{N} \,: 2^k \leq \lambda} \mathbf{1}_{S_k}(y) 2^{-k N'} + \lambda^{d-N} \mathbf{1}_{\mathbb{R}^d \backslash B(0,3/2)}(y) |y|^{-N},$

and since $\sum_{k} \mathbf{1}_{S_k}(y) 2^{-k N'}$ is pointwise dominated by $\mathbf{1}_{B(0,3/2)}$, we have shown (4).

3. Proof of inequality (3) for $M^{(2)}$

The proof of (3) will boil down to just a few routine $L^2$ estimates, using the oscillatory integral bounds for $\widehat{d\sigma}$ and its derivatives. It is in these estimates that the restriction $d\geq 4$ will arise.

Recall that now $\sigma^2_t(y) := \frac{1}{t^d} \sigma^2\big(\frac{x}{t}\big)$. We need a way to estimate the supremum $\sup_{t>0} |f \ast \sigma^2_t(x)|$, and we will proceed analogously to Stein’s proof by using the Fundamental Theorem of Calculus. In that case there was some transparent smoothness in (the equivalent of) the parameter ${t}$, due to the frequency localisation imposed on ${f}$ which made it locally roughly constant at a certain scale. In this case it is not too clear to me why we should expect a similar smoothness in ${t}$; however, the proof works, so we proceed anyway.
Recall the trick we want to use: if $\phi$ is a differentiable function with $\phi(0)=0$ we can write by Cauchy-Schwarz

\displaystyle \begin{aligned} \sup_{t>0} \frac{1}{2}|\phi(t)|^2 = & \sup_{t>0} \Big|\int_{0}^{t} \phi(s) \phi'(s) \,ds \Big| \\ \leq & \int_{0}^{\infty} |\phi(s)||\phi'(s)| \,ds \\ \leq & \Big(\int_{0}^{\infty} |\phi(s)|^2 \,\frac{ds}{s} \Big)^{1/2} \Big(\int_{0}^{\infty} |s \phi'(s)|^2 \,\frac{ds}{s} \Big)^{1/2}; \end{aligned}

in this case, unlike what we did in Stein’s proof (see inequality (7) there), we can stop here: we won’t need to optimally balance the two contributions because they will already be balanced.
Applied to $M^{(2)}f(x)$ the above inequality gives pointwise

$\displaystyle M^{(2)}f(x)^2 \lesssim \Big(\int_{0}^{\infty} |f \ast \sigma^2_s(x)|^2 \,\frac{ds}{s} \Big)^{1/2} \Big(\int_{0}^{\infty} \Big|f \ast \big[s \frac{d}{ds}\sigma^2_s\big](x)\Big|^2 \,\frac{ds}{s} \Big)^{1/2}, \ \ \ \ \ \ (5)$

where the expressions on the RHS can be considered square functions of continuous type. Let me remark that we can indeed apply the inequality to $M^{(2)}f(x)$ because we have $\lim_{s \to 0} f\ast \sigma^{2}_s (x) = 0$, so that the $\phi(0)=0$ condition is satisfied (you can check that this is the case using the fact that $\widehat{\sigma^2_s}(\xi) = \widehat{d\sigma}(\xi) (1 - \widehat{\varphi}(s\lambda^{-1}\xi))$). By a further application of Cauchy-Schwarz, in order to estimate $\|M^{(2)}f\|_{L^2}$ it will suffice to estimate each factor on the RHS of the last inequality separately in $L^2$.

3.1. First factor in the RHS of (5).

The first quantity we have to estimate is therefore

$\displaystyle \Big\|\Big(\int_{0}^{\infty} |f \ast \sigma^2_s|^2 \frac{ds}{s} \Big)^{1/2} \Big\|_{L^2} = \Big(\int_{\mathbb{R}^d} \int_{0}^{\infty} |f \ast \sigma^2_s(x)|^2 \frac{ds}{s} dx \Big)^{1/2};$

by Fubini and Plancherel, it is the same to estimate

$\displaystyle \Big( \int_{0}^{\infty} \int_{\mathbb{R}^d}|\widehat{f}(\xi) \widehat{\sigma^2_s} (\xi)|^2 \,d\xi \,\frac{ds}{s}\Big)^{1/2}.$

From the definition of $\sigma^2_s$ we can see that

$\displaystyle \widehat{\sigma^2_s} (\xi) = \widehat{\sigma^2} (s\xi) = \widehat{d\sigma}(s\xi) (1 - \widehat{\varphi}(s \lambda^{-1} \xi)),$

and recall that we have the bound $\widehat{d\sigma}(\xi) \lesssim (1 + |\xi|)^{-(d-1)/2}$ for the decay of the Fourier transform of $d\sigma$
(see the lecture notes on oscillatory integrals, III). Combining this bound with the fact that $1 - \widehat{\varphi}(s \lambda^{-1} \xi) = 0$ when $|\xi| \lesssim \lambda s^{-1}$ (and is bounded by 1 when $|\xi| \gtrsim \lambda s^{-1}$), we have that the expression above is controlled by

$\displaystyle \Big( \int_{0}^{\infty} \int_{\{\xi \,:\, |\xi|\gtrsim \lambda s^{-1}\}}|\widehat{f}(\xi)|^2 (s|\xi|)^{-(d-1)} \,d\xi \,\frac{ds}{s}\Big)^{1/2}.$

Using Fubini again we control this by

\displaystyle \begin{aligned} \Big( \int_{\mathbb{R}^d} & |\widehat{f}(\xi)|^2 |\xi|^{-(d-1)} \Big[ \int_{s \gtrsim \lambda |\xi|^{-1}} s^{-(d-1)}\,\frac{ds}{s} \Big] \,d\xi \Big)^{1/2} \\ & \sim \Big( \int_{\mathbb{R}^d} |\widehat{f}(\xi)|^2 |\xi|^{-(d-1)} \lambda^{-(d-1)} |\xi|^{d-1} \,d\xi \Big)^{1/2} = \lambda^{-(d-1)/2} \|f\|_{L^2}, \ \ \ \ \ \ (6) \end{aligned}

the last equality by Plancherel, obviously. This is the estimate for the first factor.

3.2. Second factor in the RHS of (5).

Now it remains to estimate the $L^2$ norm of the second factor of (5), namely (the second line by Fubini and Plancherel)

\displaystyle \begin{aligned} \Big(\int_{\mathbb{R}^d} \int_{0}^{\infty} & \Big|f \ast \big[s \frac{d}{ds}\sigma^2_s\big](x)\Big|^2 \,\frac{ds}{s} \, dx\Big)^{1/2} \\ & = \Big(\int_{0}^{\infty} \int_{\mathbb{R}^d} \Big|\widehat{f}(\xi) \big[\widehat{s \frac{d}{ds}\sigma^2_s}(\xi)\big]\Big|^2 \,d\xi \,\frac{ds}{s} \Big)^{1/2} \end{aligned}

and this will be done in an entirely analogous way. It is clearly convenient to compute the Fourier transform of $s \frac{d}{ds}\sigma^2_s$ right away: we have by linearity

\displaystyle \begin{aligned} \widehat{s \frac{d}{ds}\sigma^2_s}(\xi) = \,& s \frac{d}{ds}[\widehat{\sigma^2}(s \xi)] = s \frac{d}{ds}\Big[\widehat{d\sigma}(s\xi)(1 - \widehat{\varphi}(s \lambda^{-1} \xi))\Big] \\ = \,& s (\nabla \widehat{d\sigma}(s\xi)\cdot \xi)(1 - \widehat{\varphi}(s \lambda^{-1} \xi)) \\ & \qquad - s \lambda^{-1} \widehat{d\sigma}(s\xi) (\nabla \widehat{\varphi}(s \lambda^{-1} \xi) \cdot \xi). \end{aligned}

We will estimate these two contributions separately. Thus, by triangle inequality, we begin by estimating

$\displaystyle \Big( \int_{0}^{\infty} \int_{\mathbb{R}^d} |\widehat{f}(\xi) s (\nabla \widehat{d\sigma}(s\xi)\cdot \xi)(1 - \widehat{\varphi}(s \lambda^{-1} \xi))|^2 \,d\xi \, \frac{ds}{s} \Big)^{1/2}.$

Recall that the gradient $\nabla \widehat{d\sigma}$ satisfies the same2 decay estimates as $\widehat{d\sigma}$, namely $|\nabla \widehat{d\sigma}(\xi)| \lesssim (1 + |\xi|)^{-(d-1)}$. Once again, we feed the decay information in the above expression and combine it with the fact that $1 - \widehat{\varphi}(s \lambda^{-1} \xi) \neq 0$ only if $|\xi| \gtrsim s^{-1} \lambda$, so we obtain the bound

\displaystyle \begin{aligned} \lesssim & \,\Big( \int_{0}^{\infty} \int_{|\xi|\gtrsim s^{-1} \lambda} |\widehat{f}(\xi)|^2 s^2 (s|\xi|)^{-(d-1)} |\xi|^2 \frac{ds}{s} \,d\xi \Big)^{1/2} \\ = & \,\Big( \int_{0}^{\infty} \int_{|\xi|\gtrsim s^{-1} \lambda} |\widehat{f}(\xi)|^2 |\xi|^{-(d-3)} s^{-(d-3)} \frac{ds}{s} \,d\xi \Big)^{1/2}. \end{aligned}

Once again we use Fubini, but now the integral in ${s}$ that results is $\int_{s \gtrsim \lambda |\xi|^{-1}} s^{-(d-3)} \frac{ds}{s}$ !

This integral is only finite if $d\geq 4$, and this is the source of the restriction on dimension.

So, since $d \geq 4$, we can perform the integration in ${s}$ and estimate the above by

$\displaystyle \Big( \int_{\mathbb{R}^d} |\widehat{f}(\xi)|^2 |\xi|^{-(d-3)} \lambda^{-(d-3)} |\xi|^{d-3} \,d\xi \Big)^{1/2} = \lambda^{-(d-3)/2} \|f\|_{L^2}, \ \ \ \ \ \ (7)$

and this completes the estimate of the first term coming from $s \frac{d}{ds}\widehat{\sigma^2_s}$.
Now we are almost done. It remains to estimate (by Plancherel and triangle inequality, recall) the quantity

$\displaystyle \Big( \int_{\mathbb{R}^d} \int_{0}^{\infty} |\widehat{f}(\xi) s \lambda^{-1} \widehat{d\sigma}(s\xi) (\nabla \widehat{\varphi}(s \lambda^{-1} \xi) \cdot \xi)|^2 \,\frac{ds}{s} \,d\xi \Big)^{1/2}.$

[We anticipate that this will essentially be just an error term, giving a smaller contribution that the preceding term.]
The gradient $\nabla \widehat{\varphi}(s \lambda^{-1} \xi)$, by definition of $\widehat{\varphi}$, is supported in the annulus $|\xi|\sim s^{-1} \lambda$, where it is $O(1)$ in size. Using the decay estimate for $\widehat{d\sigma}(s\xi)$ and the support information we have that the last expression is comparable to

$\displaystyle \Big( \int_{0}^{\infty} \int_{|\xi|\sim s^{-1} \lambda} |\widehat{f}(\xi)|^2 s^2 \lambda^{-2} (s|\xi|)^{-(d-1)}|\xi|^2 \,\frac{ds}{s} \,d\xi \Big)^{1/2},$

which by Fubini and integration in ${s}$ (once again, enabled by the fact that $d \geq 4$) is itself comparable to

$\displaystyle \Big(\int_{\mathbb{R}^d} |\widehat{f}(\xi)|^2 |\xi|^{-(d-3)}\lambda^{-2} (\lambda |\xi|)^{-(d-3)} \,d\xi \Big)^{1/2} = \lambda^{-(d-1)/2}\|f\|_{L^2}.$

Observe that, since $\lambda \gg 1$, this bound we have obtained is much smaller than the contribution given by (7), and is therefore to be considered an error term.

3.3. Conclusion of the argument for $M^{(2)}$.

Now we are ready to conclude. Using the pointwise bound (5) for $M^{(2)}f$, Cauchy-Schwarz and estimates (6) and (7), we have shown that

$\displaystyle \|M^{(2)}f\|_{L^2} \lesssim \big(\lambda^{-(d-1)/2} \|f\|_{L^2} \lambda^{-(d-3)/2} \|f\|_{L^2} \big)^{1/2} = \lambda^{-(d-2)/2} \|f\|_{L^2},$

which is precisely the desired estimate (3). This concludes the proof. $\Box$

4. Concluding remarks

At some point during the above discussion, you might have wondered whether the weak-type endpoint estimate for $\mathscr{M}_{\mathbb{S}^{d-1}}$ is true, given that the restricted one is. Of course, as explained, the proof above cannot possibly show that, since we have bounds in different function spaces for the two pieces of the operator.
It turns out that the weak-type endpoint estimate is actually false! This was known since Stein’s original paper. Recall that, as seen in Exercise 17 of the 3rd part of the lecture notes on Littlewood-Paley theory, the spherical maximal function is not $L^{d/(d-1)} \to L^{d/(d-1)}$ bounded, and the counterexample is extremely simple: if $B$ is the unit ball then $\mathscr{M}_{\mathbb{S}^{d-1}}\mathbf{1}_{B}(x) \sim (1 + |x|)^{-(d-1)}$, and this function is in $L^p$ only for $p > d/(d-1)$. Stein’s counterexample for the $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ boundedness is different, although not particularly exciting: it is the function

$\displaystyle f(x) := |x|^{-(d-1)} [\log(1/|x|)]^{-1} \mathbf{1}_{B(0,1/2)}(x).$

Indeed, we can see that $f \in L^{d/(d-1)}$ easily by a dyadic decomposition:

\displaystyle \begin{aligned} \int |f(x)|^{d/(d-1)} \,dx \sim & \sum_{k \in \mathbb{N}} (2^{k(d-1)})^{d/(d-1)} k^{-d/(d-1)} \mathrm{Vol}(B(0,2^{-k})\backslash B(0,2^{-k-1})) \\ \sim & \sum_{k \in \mathbb{N}} k^{-d/(d-1)} < \infty. \end{aligned}

However, for any large $x \in \mathbb{R}^d$ the spherical maximal function $\mathscr{M}_{\mathbb{S}^{d-1}}f(x)$ is actually equal to $\infty$: indeed, for ${x}$ large the intersection $B(0,1/2) \cap (x + |x|\mathbb{S}^{d-1})$ is approximately flat, and therefore (at least heuristically) by symmetry

\displaystyle \begin{aligned} \mathscr{M}_{\mathbb{S}^{d-1}}f(x) \gtrsim & \int_{x' \in \mathbb{R}^{d-1} : |x'|<1/2} |x'|^{-(d-1)} [\log(1/|x'|)]^{-1}\,dx' \\ \sim & \sum_{k \in \mathbb{N}} 2^{k(d-1)} k^{-1} 2^{-k(d-1)} = \sum_{k \in \mathbb{N}} k^{-1} = \infty. \end{aligned}

So, Bourgain’s result is quite optimal, save for the moderate dimensionality restriction.
[For completeness, let me mention that in the $d=2$ case – that we have totally ignored (because it is an entirely different beast) – even the restricted weak-type inequality is false, as shown by Seeger, Tao and Wright – and the counterexample is exciting, being based on a Besicovitch-type construction.]

Footnotes:
1: Notice that inequality (1) is only significant when $\alpha \ll 1$ (because $\mathscr{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E \leq 1$), and therefore $\lambda=\alpha^{-1/(d-1)}$ is indeed pretty large. [go back]

2: Indeed, when written out explicitely, these two oscillatory integrals only differ in the amplitude; but it is the phase that dictates the decay here, and they have the same phase. [go back]