# Proof of the square-function characterisation of L(log L)^{1/2}: part I

This is a follow-up on the post on the Chang-Wilson-Wolff inequality and how it can be proven using a lemma of Tao-Wright. The latter consists of a square-function characterisation of the Orlicz space $L(\log L)^{1/2}$ analogous in spirit to the better known one for the Hardy spaces.
In this post we will commence the proof of the Tao-Wright lemma, as promised. We will start by showing how the lemma, which is stated for smooth frequency projections, can be deduced from its discrete variant stated in terms of Haar coefficients (or equivalently, martingale differences with respect to the standard dyadic filtration). This is a minor part of the overall argument but it is slightly tricky so I thought I would spell it out.

Recall that the Tao-Wright lemma is as follows. We write $\widetilde{\Delta}_j f$ for the smooth frequency projection defined by $\widehat{\widetilde{\Delta}_j f}(\xi) = \widehat{\psi}(2^{-j}\xi) \widehat{f}(\xi)$, where $\widehat{\psi}$ is a smooth function compactly supported in $1/2 \leq |\xi| \leq 4$ and identically equal to 1 on $1 \leq |\xi| \leq 2$.

Lemma 1 – Square-function characterisation of $L(\log L)^{1/2}$ [Tao-Wright, 2001]:
Let for any $j \in \mathbb{Z}$

$\displaystyle \phi_j(x) := \frac{2^j}{(1 + 2^j |x|)^{3/2}}$

(notice $\phi_j$ is concentrated in $[-2^{-j},2^{-j}]$ and $\|\phi_j\|_{L^1} \sim 1$).
If the function ${f}$ is in $L(\log L)^{1/2}([-R,R])$ and such that $\int f(x) \,dx = 0$, then there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. pointwise for any $j \in \mathbb{Z}$

$\displaystyle \big|\widetilde{\Delta}_j f\big| \lesssim F_j \ast \phi_j ;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{Z}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}}.$

# The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space $L(\log L)^{1/2}$.

## 1. The Chang-Wilson-Wolff inequality

Consider the collection $\mathcal{D}$ of standard dyadic intervals that are contained in $[0,1]$. We let $\mathcal{D}_j$ for each $j \in \mathbb{N}$ denote the subcollection of intervals $I \in \mathcal{D}$ such that $|I|= 2^{-j}$. Notice that these subcollections generate a filtration of $\mathcal{D}$, that is $(\sigma(\mathcal{D}_j))_{j \in \mathbb{N}}$, where $\sigma(\mathcal{D}_j)$ denotes the sigma-algebra generated by the collection $\mathcal{D}_j$. We can associate to this filtration the conditional expectation operators

$\displaystyle \mathbf{E}_j f := \mathbf{E}[f \,|\, \sigma(\mathcal{D}_j)],$

and therefore define the martingale differences

$\displaystyle \mathbf{D}_j f:= \mathbf{E}_{j+1} f - \mathbf{E}_{j}f.$

With this notation, we have the formal telescopic identity

$\displaystyle f = \mathbf{E}_0 f + \sum_{j \in \mathbb{N}} \mathbf{D}_j f.$

Demystification: the expectation $\mathbf{E}_j f(x)$ is simply $\frac{1}{|I|} \int_I f(y) \,dy$, where $I$ is the unique dyadic interval in $\mathcal{D}_j$ such that $x \in I$.

Letting $f_j := \mathbf{E}_j f$ for brevity, the sequence of functions $(f_j)_{j \in \mathbb{N}}$ is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is

$\displaystyle \mathbf{E}_{j} f_{j+1} = f_j.$

In the following we will only be interested in functions with zero average, that is functions such that $\mathbf{E}_0 f = 0$. Given such a function $f : [0,1] \to \mathbb{R}$ then, we can define its martingale square function $S_{\mathcal{D}}f$ to be

$\displaystyle S_{\mathcal{D}} f := \Big(\sum_{j \in \mathbb{N}} |\mathbf{D}_j f|^2 \Big)^{1/2}.$

With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.

C-W-W inequality: Let ${f : [0,1] \to \mathbb{R}}$ be such that $\mathbf{E}_0 f = 0$. For any ${2\leq p < \infty}$ it holds that

$\displaystyle \boxed{\|f\|_{L^p([0,1])} \lesssim p^{1/2}\, \|S_{\mathcal{D}}f\|_{L^p([0,1])}.} \ \ \ \ \ \ (\text{CWW}_1)$

An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent ${p}$, which is sharp. This can be seen by taking a “lacunary” function ${f}$ (essentially one where $\mathbf{D}_jf = a_j \in \mathbb{C}$, a constant) and randomising the signs using Khintchine’s inequality (indeed, ${p^{1/2}}$ is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which $[0,1]$ is replaced by the unit cube $[0,1]^d$ and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.

# Hausdorff-Young inequality and interpolation

The Hausdorff-Young inequality is one of the most fundamental results about the mapping properties of the Fourier transform: it says that

$\displaystyle \| \widehat{f} \|_{L^{p'}(\mathbb{R}^d)} \leq \|f\|_{L^p(\mathbb{R}^d)}$

for all ${1 \leq p \leq 2}$, where $\frac{1}{p} + \frac{1}{p'} = 1$. It is important because it tells us that the Fourier transform maps $L^p$ continuously into $L^{p'}$, something which is not obvious when the exponent ${p}$ is not 1 or 2. When the underlying group is the torus, the corresponding Hausdorff-Young inequality is instead

$\displaystyle \| \widehat{f} \|_{\ell^{p'}(\mathbb{Z}^d)} \leq \|f\|_{L^p(\mathbb{T}^d)}.$

The optimal constant is actually less than 1 in general, and it has been calculated for $\mathbb{R}^d$ (and proven to be optimal) by Beckner, but this will not concern us here (if you want to find out what it is, take ${f}$ to be a gaussian). In the notes on Littlewood-Paley theory we also saw (in Exercise 7) that the inequality is false for ${p}$ greater than 2, and we proved so using a randomisation trick enabled by Khintchine’s inequality1.

Today I would like to talk about how the Hausdorff-Young inequality (H-Y) is proven and how important (or not) interpolation theory is to this inequality. I won’t be saying anything new or important, and ultimately this detour into H-Y will take us nowhere; but I hope the ride will be enjoyable.

## I found these fantastic lecture notes on Penrose’s…

### Status

I found these fantastic lecture notes on Penrose’s aperiodic tilings by Alexander F. Ritter, which he wrote for a masterclass at Oxford in 2014.

There is not much systematic well-structured material on aperiodic tilings around, so I thought I would share this for whoever is interested. The material is accessible to every undergrad that has taken a class in real analysis (some nice application of extracting a converging subsequence from a sequence in a compact space, in the proof of the Extension Theorem), and includes many painstaking hand-drawings that are very helpful in following the arguments. Besides being overall well-written, the notes are also good at hammering home some important points – for example, the fact that the Composition and Decomposition operations that one can make on a tiling are unique and thus reversible in the case of tilings by Penrose’s Kites and Darts tiles, which has a cascade of consequences (they prove aperiodicity of the tilings in a few lines, for example) which do not apply to regular periodic tilings for this very reason (non-uniqueness).

# Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single $L^p$ space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if $R$ denotes the Fourier restriction operator of the sphere $\mathbb{S}^{d-1}$ (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

$\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}$

(defined initially on Schwartz functions), then

Stein-Tomas theorem: $R$ satisfies the a-priori inequality

$\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)$

for all exponents ${p}$ such that $1 \leq p \leq \frac{2(d+1)}{d+3}$ (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents $1 \leq p$ < $\frac{2(d+1)}{d+3}$ by interpolation of Lorentz spaces with the $p=1$ case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set ${E}$ one has

$\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)$

where the LHS is clearly the $L^{2(d+1)/(d+3)}$ norm of the characteristic function $\mathbf{1}_E$. Notice that we could write the inequality equivalently as $\|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}$.

# Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

$\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,$

where $A_t$ denotes the spherical average at radius ${t}$, that is

$\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),$

where $d\sigma_{d-1}$ denotes the spherical measure on the $(d-1)$-dimensional sphere (we will omit the subscript from now on and just write $d\sigma$ since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator $\mathcal{M}_{\mathbb{S}^{d-1}}$ is $L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d)$ bounded for any $\frac{d}{d-1}$ < $p \leq \infty$.

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows $L^p \to L^p$ boundedness of the operator for every $2 \geq p$ > $\frac{d}{d-1}$ quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for $\mathcal{M}_{\mathbb{S}^{d-1}}$! The latter means that for any measurable $E$ of finite (Lebesgue) measure we have

$\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)$

which is exactly the $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the $L^{d/(d-1),1} \to L^{d/(d-1),\infty}$ inequality). The downside of Bourgain’s argument is that it only works in dimension $d \geq 4$, and thus misses the dimension $d=3$ that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.

# Oscillatory integrals II: several-variables phases

This is the second part of a two-part post on the theory of oscillatory integrals. In the first part we studied the theory of oscillatory integrals whose phases are functions of a single variable. In this post we will instead study the case in which the phase is a function of several variables (and we integrate in all of them). Here the theory becomes weaker because these objects can indeed have a worse behaviour. We will proceed by analogy following the same footsteps as in the single-variable case.
Part I

3. Oscillatory integrals in several variables

In the previous section we have analysed the situation for single variable phases, that is for integrals over (intervals of) ${\mathbb{R}}$. In this section, we will instead study the higher dimensional situation, when the phase is a function of several variables. Things are unfortunately generally not as nice as in the single variable case, as you will see.

In order to avoid having to worry about connected open sets of ${\mathbb{R}^d}$ (see Exercise 18 for the sort of issues that arise in trying to deal with general open sets of ${\mathbb{R}^d}$), in this section we will study mainly objects of the form

$\displaystyle \mathcal{I}_{\psi}(\lambda) := \int_{\mathbb{R}^d} e^{i \lambda u(x)} \psi(x) dx,$

where ${\psi}$ has compact support. We have switched to ${u}$ for the phase to remind the reader of the fact that it is a function of several variables now.

3.1. Principle of non-stationary phase – several variables

The principle of non-stationary phase we saw in Section 2 of part I continues to hold in the several variables case.
Given a phase ${u}$, we say that ${x_0}$ is a critical point of ${u}$ if

$\displaystyle \nabla u(x_0) = (0,\ldots,0).$

Proposition 8 (Principle of non-stationary phase – several variables) Let ${\psi \in C^\infty_c(\mathbb{R}^d)}$ (that is, smooth and compactly supported) and let the phase ${u\in C^\infty}$ be such that ${u}$ does not have critical points in the support of ${\psi}$. Then for any ${N >0}$ we have

$\displaystyle |\mathcal{I}_\psi(\lambda)|\lesssim_{N,\psi,u} |\lambda|^{-N}.$

# Oscillatory integrals I: single-variable phases

You might remember that I contributed some lecture notes on Littlewood-Paley theory to a masterclass, which I then turned into a series of three posts (IIIIII). I have also contributed a lecture note on some basic theory of oscillatory integrals, and therefore I am going to do the same and share them here as a blog post in two parts. The presentation largely follows the one in Stein’s “Harmonic Analysis: Real-variable methods, orthogonality, and oscillatory integrals“, with inputs from Stein and Shakarchi’s “Functional Analysis: Introduction to Further Topics in Analysis“, from some lecture notes by Terry Tao for his 247B course, from a very interesting paper by Carbery, Christ and Wright and from a number of other places that I would now have trouble tracking down.
In this first part we will discuss the theory of oscillatory integrals when the phase is a function of a single variable. There are extensive exercises included that are to be considered part of the lecture notes; indeed, in order to keep the actual notes short and engage the reader, I have turned many things into exercises. If you are interested in learning about oscillatory integrals, you should not ignore them.
In the next post, we will study instead the case where the phases are functions of several variables.

0. Introduction

A large part of modern harmonic analysis is concerned with understanding cancellation phenomena happening between different contributions to a sum or integral. Loosely speaking, we want to know how much better we can do than if we had taken absolute values everywhere. A prototypical example of this is the oscillatory integral of the form

$\displaystyle \int e^{i \phi_\lambda (x)} \psi(x) dx.$

Here ${\psi}$, called the amplitude, is usually understood to be “slowly varying” with respect to the real-valued ${\phi_\lambda}$, called the phase, where ${\lambda}$ denotes a parameter or list of parameters and $\phi'_\lambda$ gets larger as $\lambda$ grows; for example ${\phi_\lambda(x) = \lambda \phi(x)}$. Thus the oscillatory behaviour is given mainly by the complex exponential ${e^{i \phi_\lambda(x)}}$.
Expressions of this form arise quite naturally in several problems, as we will see in Section 1, and typically one seeks to provide an upperbound on the absolute value of the integral above in terms of the parameters ${\lambda}$. Intuitively, as ${\lambda}$ gets larger the phase ${\phi_\lambda}$ changes faster and therefore ${e^{i \phi_\lambda}}$ oscillates faster, producing more cancellation between the contributions of different intervals to the integral. We expect then the integral to decay as ${\lambda}$ grows larger, and usually seek upperbounds of the form ${|\lambda|^{-\alpha}}$. Notice that if you take absolute values inside the integral above you just obtain ${\|\psi\|_{L^1}}$, a bound that does not decay in ${\lambda}$ at all.
The main tool we will use is simply integration by parts. In the exercises you will also use a little basic complex analysis to obtain more precise information on certain special oscillatory integrals.

1. Motivation

In this section we shall showcase the appearance of oscillatory integrals in analysis with a couple of examples. The reader can find other interesting examples in the exercises.

1.1. Fourier transform of radial functions

Let ${f : \mathbb{R}^d \rightarrow \mathbb{C}}$ be a radially symmetric function, that is there exists a function ${f_0: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${f(x) = f_0(|x|)}$ for every ${x \in \mathbb{R}^d}$. Let’s suppose for simplicity that ${f\in L^1(\mathbb{R}^d)}$ (equivalently, that ${f_0 \in L^1(\mathbb{R}, r^{d-1} dr)}$), so that it has a well-defined Fourier transform. It is easy to see (by composing ${f}$ with a rotation and using a change of variable in the integral defining ${\widehat{f}}$) that ${\widehat{f}}$ must also be radially symmetric, that is there must exist ${g: \mathbb{R}^{+} \rightarrow \mathbb{C}}$ such that ${\widehat{f}(\xi) = g(|\xi|)}$; we want to understand its relationship with ${f_0}$. Therefore we write using polar coordinates

\displaystyle \begin{aligned} \widehat{f}(\xi) = & \int_{\mathbb{R}^d} f(x) e^{-2\pi i x \cdot \xi} dx \\ = & \int_{0}^{\infty}\int_{\mathbb{S}^{d-1}} f_0(r) e^{-2\pi i r \omega\cdot \xi} r^{d-1} d\sigma_{d-1}(\omega) dr, \\ = & \int_{0}^{\infty} f_0(r) r^{d-1} \Big(\int_{\mathbb{S}^{d-1}} e^{-2\pi i r \omega\cdot \xi} d\sigma_{d-1}(\omega)\Big) dr \end{aligned}

where ${d\sigma_{d-1}}$ denotes the surface measure on the unit ${(d-1)}$-dimensional sphere ${\mathbb{S}^{d-1}}$ induced by the Lebesgue measure on the ambient space ${\mathbb{R}^{d}}$. By inspection, we see that the integral in brackets above is radially symmetric in ${\xi}$, and so if we define

$\displaystyle J(t) := \int_{\mathbb{S}^{d-1}} e^{-2\pi i t \omega\cdot \mathbf{e}_1} d\sigma_{d-1}(\omega),$

with ${\mathbf{e}_1 = (1, 0, \ldots, 0)}$, we have

$\displaystyle \widehat{f}(\xi) = g(|\xi|) = \int_{0}^{\infty} f_0(r) r^{d-1} J(r|\xi|) dr. \ \ \ \ \ (1)$

This is the relationship we were looking for: it allows one to calculate the Fourier transform of ${f}$ directly from the radial information ${f_0}$.

# Marcinkiewicz-type multiplier theorem for q-variation (q > 1)

Not long ago we discussed one of the main direct applications of the Littlewood-Paley theory, namely the Marcinkiewicz multiplier theorem. Recall that the single-variable version of this theorem can be formulated as follows:

Theorem 1 [Marcinkiewicz multiplier theorem]: Let ${m}$ be a function on $\mathbb{R}$ such that

1. $m \in L^\infty$
2. for every Littlewood-Paley dyadic interval $L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k]$ with $k \in \mathbb{Z}$

$\displaystyle \|m\|_{V(L)} \leq C,$

where $\|m\|_{V(L)}$ denotes the total variation of ${m}$ over the interval $L$.

Then for any ${1 < p < \infty}$ the multiplier ${T_m}$ defined by $\widehat{T_m f} = m \widehat{f}$ for functions $f \in L^2(\mathbb{R})$ extends to an $L^p \to L^p$ bounded operator,

$\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.$

You should also recall that the total variation $V(I)$ above is defined as

$\displaystyle \sup_{N}\sup_{\substack{t_0, \ldots, t_N \in I : \\ t_0 < \ldots < t_N}} \sum_{j=1}^{N} |m(t_j) - m(t_{j-1})|,$

and if ${m}$ is absolutely continuous then ${m'}$ exists as a measurable function and the total variation over interval $I$ is given equivalently by $\int_{I} |m'(\xi)|d\xi$. We have seen that the “dyadic total variation condition” 2.) above is to be seen as a generalisation of the pointwise condition $|m'(\xi)|\lesssim |\xi|^{-1}$, which in dimension 1 happens to coincide with the classical differential Hörmander condition (in higher dimensions the pointwise Marcinkiewicz conditions are of product type, while the pointwise Hörmander(-Mihklin) conditions are of radial type; see the relevant post). Thus the Marcinkiewicz multiplier theorem in dimension 1 can deal with multipliers whose symbol is somewhat rougher than being differentiable. It is an interesting question to wonder how much rougher the symbols can get while still preserving their $L^p$ mapping properties (or maybe giving up some range – recall though that the range of boundedness for multipliers must be symmetric around 2 because multipliers are self-adjoint).

Coifman, Rubio de Francia and Semmes came up with an answer to this question that is very interesting. They generalise the Marcinkiewicz multiplier theorem (in dimension 1) to multipliers that have bounded ${q}$-variation with ${q}$ > 1. Let us define this quantity rigorously.

Definition: Let $q \geq 1$ and let $I$ be an interval. Given a function $f : \mathbb{R} \to \mathbb{R}$, its ${q}$-variation over the interval ${I}$ is

$\displaystyle \|f\|_{V_q(I)} := \sup_{N} \sup_{\substack{t_0, \ldots t_N \in I : \\ t_0 < \ldots < t_N}} \Big(\sum_{j=1}^{N} |f(t_j) - f(t_{j-1})|^q\Big)^{1/q}$

Notice that, with respect to the notation above, we have $\|m\|_{V(I)} = \|m\|_{V_1(I)}$. From the fact that $\|\cdot\|_{\ell^q} \leq \|\cdot \|_{\ell^p}$ when $p \leq q$ we see that we have always $\|f\|_{V_q (I)} \leq \|f\|_{V_p(I)}$, and therefore the higher the ${q}$ the less stringent the condition of having bounded ${q}$-variation becomes (this is linked to the Hölder regularity of the function getting worse). In particular, if we wanted to weaken hypothesis 2.) in the Marcinkiewicz multiplier theorem above, we could simply replace it with the condition that for any Littlewood-Paley dyadic interval $L$ we have instead $\|m\|_{V_q(L)} \leq C$. This is indeed what Coifman, Rubio de Francia and Semmes do, and they were able to show the following:

Theorem 2 [Coifman-Rubio de Francia-Semmes, ’88]: Let $q\geq 1$ and let ${m}$ be a function on $\mathbb{R}$ such that

1. $m \in L^\infty$
2. for every Littlewood-Paley dyadic interval $L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k]$ with $k \in \mathbb{Z}$

$\displaystyle \|m\|_{V_q(L)} \leq C.$

Then for any ${1 < p < \infty}$ such that ${\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }$ the multiplier ${T_m}$ defined by $\widehat{T_m f} = m \widehat{f}$ extends to an $L^p \to L^p$ bounded operator,

$\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.$

The statement is essentially the same as before, except that now we are imposing control of the ${q}$-variation instead and as a consequence we have the restriction that our Lebesgue exponent ${p}$ satisfy ${\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }$. Taking a closer look at this condition, we see that when the variation parameter is $1 \leq q \leq 2$ the condition is empty, that is there is no restriction on the range of boundedness of $T_m$: it is still the full range ${1}$ < ${p}$ < $\infty$, and as ${q}$ grows larger and larger the range of boundedness restricts itself to be smaller and smaller around the exponent $p=2$ (for which the multiplier is always necessarily bounded, by Plancherel). This is a very interesting behaviour, which points to the fact that there is a certain dichotomy between variation in the range below 2 and the range above 2, with $2$-variation being the critical case. This is not an isolated case: for example, the Variation Norm Carleson theorem is false for ${q}$-variation with ${q \leq 2}$; similarly, the Lépingle inequality is false for 2-variation and below (and this is related to the properties of Brownian motion).

# Representing points in a set in positional-notation fashion (a trick by Bourgain): part II

This is the second and final part of an entry dedicated to a very interesting and inventive trick due to Bourgain. In part I we saw a lemma on maximal Fourier projections due to Bourgain, together with the context it arises from (the study of pointwise ergodic theorems for polynomial sequences); we also saw a baby version of the idea to come, that we used to prove the Rademacher-Menshov theorem (recall that the idea was to represent the indices in the supremum in their binary positional notation form and to rearrange the supremum accordingly). Today we finally get to see Bourgain’s trick.

Before we start, recall the statement of Bourgain’s lemma:

Lemma 1 [Bourgain]: Let $K$ be an integer and let $\Lambda = \{\lambda_1, \ldots, \lambda_K \}$ a set of ${K}$ distinct frequencies. Define the maximal frequency projections

$\displaystyle \mathcal{B}_\Lambda f(x) := \sup_{j} \Big|\sum_{k=1}^{K} (\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}\Big|,$

where the supremum is restricted to those ${j \geq j_0}$ with $j_0 = j_0(\Lambda)$ being the smallest integer such that $2^{-j_0} \leq \frac{1}{2}\min \{ |\lambda_k - \lambda_{k'}| : 1\leq k\neq k'\leq K \}$.
Then

$\displaystyle \|\mathcal{B}_\Lambda f\|_{L^2} \lesssim (\log \#\Lambda)^2 \|f\|_{L^2}.$

Here we are using the notation $(\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}$ in the statement in place of the expanded formula $\int_{|\xi - \lambda_k| < 2^{-j}} \widehat{f}(\xi) e^{2\pi i \xi x} d\xi$. Observe that by the definition of $j_0$ we have that the intervals $[\lambda_k - 2^{-j_0}, \lambda_k + 2^{-j_0}]$ are disjoint (and $j_0$ is precisely maximal with respect to this condition).
We will need to do some reductions before we can get to the point where the trick makes its appearance. These reductions are the subject of the next section.

3. Initial reductions

A first important reduction is that we can safely replace the characteristic functions $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}$ by smooth bump functions with comparable support. Indeed, this is the result of a very standard square-function argument which was already essentially presented in Exercise 22 of the 3rd post on basic Littlewood-Paley theory. Briefly then, let $\varphi$ be a Schwartz function such that $\widehat{\varphi}$ is a smooth bump function compactly supported in the interval $[-1,1]$ and such that $\widehat{\varphi} \equiv 1$ on the interval $[-1/2, 1/2]$. Let $\varphi_j (x) := \frac{1}{2^j} \varphi \Big(\frac{x}{2^j}\Big)$ (so that $\widehat{\varphi_j}(\xi) = \widehat{\varphi}(2^j \xi)$) and let for convenience $\theta_j$ denote the difference $\theta_j := \mathbf{1}_{[-2^{-j}, 2^{-j}]} - \widehat{\varphi_j}$. We have that the difference

$\displaystyle \sup_{j\geq j_0(\Lambda)} \Big|\sum_{k=1}^{K} ((\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} - \widehat{\varphi_j}(\cdot - \lambda_k)) \widehat{f})^{\vee}\Big|$

is an $L^2$ bounded operator with norm $O(1)$ (that is, independent of $K$). Indeed, observe that $\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}(\xi) - \widehat{\varphi_j}(\xi - \lambda_k) = \theta_j (\xi - \lambda_k)$, and bounding the supremum by the $\ell^2$ sum we have that the $L^2$ norm (squared) of the operator above is bounded by

$\displaystyle \sum_{j \geq j_0(\Lambda)} \Big\|\sum_{k=1}^{K} (\theta_j(\cdot - \lambda_k)\widehat{f})^{\vee}\Big\|_{L^2}^2,$

where the summation in ${j}$ is restricted in the same way as the supremum is in the lemma (that is, the intervals $[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]$ must be pairwise disjoint). By an application of Plancherel we see that the above is equal to

$\displaystyle \sum_{k=1}^{K} \Big\| \widehat{f}(\xi) \Big[\sum_{j \geq j_0} \theta_j(\xi - \lambda_k) \Big]\Big\|_{L^2}^2;$

but notice that the functions $\theta_j$ have supports disjoint in ${j}$, and therefore the multiplier satisfies $\sum_{j\geq j_0} \theta_j(\xi - \lambda_k) \lesssim 1$ in a neighbourhood of $\lambda_k$, and vanishes outside such neighbourhood. A final application of Plancherel allows us to conclude that the above is bounded by $\lesssim \|f\|_{L^2}^2$ by orthogonality (these neighbourhoods being all disjoint as well).
By triangle inequality, we see therefore that in order to prove Lemma 1 it suffices to prove that the operator

$\displaystyle \sup_{j} \Big|\sum_{k=1}^{K} (\widehat{\varphi_j}(\cdot - \lambda_k) \widehat{f})^{\vee}\Big|$

is $L^2$ bounded with norm at most $O((\log \#\Lambda)^2)$.