Proof of the square-function characterisation of L(log L)^{1/2}: part I

This is a follow-up on the post on the Chang-Wilson-Wolff inequality and how it can be proven using a lemma of Tao-Wright. The latter consists of a square-function characterisation of the Orlicz space L(\log L)^{1/2} analogous in spirit to the better known one for the Hardy spaces.
In this post we will commence the proof of the Tao-Wright lemma, as promised. We will start by showing how the lemma, which is stated for smooth frequency projections, can be deduced from its discrete variant stated in terms of Haar coefficients (or equivalently, martingale differences with respect to the standard dyadic filtration). This is a minor part of the overall argument but it is slightly tricky so I thought I would spell it out.

Recall that the Tao-Wright lemma is as follows. We write \widetilde{\Delta}_j f for the smooth frequency projection defined by \widehat{\widetilde{\Delta}_j f}(\xi) = \widehat{\psi}(2^{-j}\xi) \widehat{f}(\xi) , where \widehat{\psi} is a smooth function compactly supported in 1/2 \leq |\xi| \leq 4 and identically equal to 1 on 1 \leq |\xi| \leq 2 .


Lemma 1 – Square-function characterisation of L(\log L)^{1/2} [Tao-Wright, 2001]:
Let for any j \in \mathbb{Z}

\displaystyle  \phi_j(x) := \frac{2^j}{(1 + 2^j |x|)^{3/2}}

(notice \phi_j is concentrated in [-2^{-j},2^{-j}] and \|\phi_j\|_{L^1} \sim 1).
If the function {f} is in L(\log L)^{1/2}([-R,R]) and such that \int f(x) \,dx = 0 , then there exists a collection (F_j)_{j \in \mathbb{Z}} of non-negative functions such that:

  1. pointwise for any j \in \mathbb{Z}

    \displaystyle \big|\widetilde{\Delta}_j f\big| \lesssim F_j \ast \phi_j ;

  2. they satisfy the square-function estimate

    \displaystyle  \Big\|\Big(\sum_{j \in \mathbb{Z}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}}.

Continue reading

The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space L(\log L)^{1/2} .

1. The Chang-Wilson-Wolff inequality

Consider the collection \mathcal{D} of standard dyadic intervals that are contained in [0,1] . We let \mathcal{D}_j for each j \in \mathbb{N} denote the subcollection of intervals I \in \mathcal{D} such that |I|= 2^{-j} . Notice that these subcollections generate a filtration of \mathcal{D}, that is (\sigma(\mathcal{D}_j))_{j \in \mathbb{N}}, where \sigma(\mathcal{D}_j) denotes the sigma-algebra generated by the collection \mathcal{D}_j . We can associate to this filtration the conditional expectation operators

\displaystyle  \mathbf{E}_j f := \mathbf{E}[f \,|\, \sigma(\mathcal{D}_j)],

and therefore define the martingale differences

\displaystyle  \mathbf{D}_j f:= \mathbf{E}_{j+1} f - \mathbf{E}_{j}f.

With this notation, we have the formal telescopic identity

\displaystyle  f = \mathbf{E}_0 f + \sum_{j \in \mathbb{N}} \mathbf{D}_j f.

Demystification: the expectation \mathbf{E}_j f(x) is simply \frac{1}{|I|} \int_I f(y) \,dy, where I is the unique dyadic interval in \mathcal{D}_j such that x \in I .

Letting f_j := \mathbf{E}_j f for brevity, the sequence of functions (f_j)_{j \in \mathbb{N}} is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is

\displaystyle  \mathbf{E}_{j} f_{j+1} = f_j.

In the following we will only be interested in functions with zero average, that is functions such that \mathbf{E}_0 f = 0. Given such a function f : [0,1] \to \mathbb{R} then, we can define its martingale square function S_{\mathcal{D}}f to be

\displaystyle  S_{\mathcal{D}} f := \Big(\sum_{j \in \mathbb{N}} |\mathbf{D}_j f|^2 \Big)^{1/2}.

With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.

C-W-W inequality: Let {f : [0,1] \to \mathbb{R}} be such that \mathbf{E}_0 f = 0. For any {2\leq p < \infty} it holds that

\displaystyle  \boxed{\|f\|_{L^p([0,1])} \lesssim p^{1/2}\, \|S_{\mathcal{D}}f\|_{L^p([0,1])}.} \ \ \ \ \ \ (\text{CWW}_1)

An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent {p} , which is sharp. This can be seen by taking a “lacunary” function {f} (essentially one where \mathbf{D}_jf = a_j \in \mathbb{C} , a constant) and randomising the signs using Khintchine’s inequality (indeed, {p^{1/2}} is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which [0,1] is replaced by the unit cube [0,1]^d and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.

Continue reading

Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,

where A_t denotes the spherical average at radius {t} , that is

\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),

where d\sigma_{d-1} denotes the spherical measure on the (d-1) -dimensional sphere (we will omit the subscript from now on and just write d\sigma since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator \mathcal{M}_{\mathbb{S}^{d-1}} is L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d) bounded for any \frac{d}{d-1} < p \leq \infty .

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows L^p \to L^p boundedness of the operator for every 2 \geq p > \frac{d}{d-1} quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for \mathcal{M}_{\mathbb{S}^{d-1}} ! The latter means that for any measurable E of finite (Lebesgue) measure we have

\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)


which is exactly the L^{d/(d-1)} \to L^{d/(d-1),\infty} inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the L^{d/(d-1),1} \to L^{d/(d-1),\infty} inequality). The downside of Bourgain’s argument is that it only works in dimension d \geq 4 , and thus misses the dimension d=3 that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.

Continue reading

Marcinkiewicz-type multiplier theorem for q-variation (q > 1)

Not long ago we discussed one of the main direct applications of the Littlewood-Paley theory, namely the Marcinkiewicz multiplier theorem. Recall that the single-variable version of this theorem can be formulated as follows:

Theorem 1 [Marcinkiewicz multiplier theorem]: Let {m} be a function on \mathbb{R} such that

  1. m \in L^\infty
  2. for every Littlewood-Paley dyadic interval L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k] with k \in \mathbb{Z}

    \displaystyle \|m\|_{V(L)} \leq C,

    where \|m\|_{V(L)} denotes the total variation of {m} over the interval L .

Then for any {1 < p < \infty} the multiplier {T_m} defined by \widehat{T_m f} = m \widehat{f} for functions f \in L^2(\mathbb{R}) extends to an L^p \to L^p bounded operator,

\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.

You should also recall that the total variation V(I) above is defined as

\displaystyle \sup_{N}\sup_{\substack{t_0, \ldots, t_N \in I : \\ t_0 < \ldots < t_N}} \sum_{j=1}^{N} |m(t_j) - m(t_{j-1})|,

and if {m} is absolutely continuous then {m'} exists as a measurable function and the total variation over interval I is given equivalently by \int_{I} |m'(\xi)|d\xi . We have seen that the “dyadic total variation condition” 2.) above is to be seen as a generalisation of the pointwise condition |m'(\xi)|\lesssim |\xi|^{-1} , which in dimension 1 happens to coincide with the classical differential Hörmander condition (in higher dimensions the pointwise Marcinkiewicz conditions are of product type, while the pointwise Hörmander(-Mihklin) conditions are of radial type; see the relevant post). Thus the Marcinkiewicz multiplier theorem in dimension 1 can deal with multipliers whose symbol is somewhat rougher than being differentiable. It is an interesting question to wonder how much rougher the symbols can get while still preserving their L^p mapping properties (or maybe giving up some range – recall though that the range of boundedness for multipliers must be symmetric around 2 because multipliers are self-adjoint).

Coifman, Rubio de Francia and Semmes came up with an answer to this question that is very interesting. They generalise the Marcinkiewicz multiplier theorem (in dimension 1) to multipliers that have bounded {q} -variation with {q} > 1. Let us define this quantity rigorously.

Definition: Let q \geq 1 and let I be an interval. Given a function f : \mathbb{R} \to \mathbb{R}, its {q} -variation over the interval {I} is

\displaystyle \|f\|_{V_q(I)} := \sup_{N} \sup_{\substack{t_0, \ldots t_N \in I : \\ t_0 < \ldots < t_N}} \Big(\sum_{j=1}^{N} |f(t_j) - f(t_{j-1})|^q\Big)^{1/q}

Notice that, with respect to the notation above, we have \|m\|_{V(I)} = \|m\|_{V_1(I)} . From the fact that \|\cdot\|_{\ell^q} \leq \|\cdot \|_{\ell^p} when p \leq q we see that we have always \|f\|_{V_q (I)} \leq \|f\|_{V_p(I)} , and therefore the higher the {q} the less stringent the condition of having bounded {q} -variation becomes (this is linked to the Hölder regularity of the function getting worse). In particular, if we wanted to weaken hypothesis 2.) in the Marcinkiewicz multiplier theorem above, we could simply replace it with the condition that for any Littlewood-Paley dyadic interval L we have instead \|m\|_{V_q(L)} \leq C . This is indeed what Coifman, Rubio de Francia and Semmes do, and they were able to show the following:

Theorem 2 [Coifman-Rubio de Francia-Semmes, ’88]: Let q\geq 1 and let {m} be a function on \mathbb{R} such that

  1. m \in L^\infty
  2. for every Littlewood-Paley dyadic interval L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k] with k \in \mathbb{Z}

    \displaystyle \|m\|_{V_q(L)} \leq C.

Then for any {1 < p < \infty} such that {\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} } the multiplier {T_m} defined by \widehat{T_m f} = m \widehat{f} extends to an L^p \to L^p bounded operator,

\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.

The statement is essentially the same as before, except that now we are imposing control of the {q} -variation instead and as a consequence we have the restriction that our Lebesgue exponent {p} satisfy {\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }. Taking a closer look at this condition, we see that when the variation parameter is 1 \leq q \leq 2 the condition is empty, that is there is no restriction on the range of boundedness of T_m : it is still the full range {1} < {p} < \infty , and as {q} grows larger and larger the range of boundedness restricts itself to be smaller and smaller around the exponent p=2 (for which the multiplier is always necessarily bounded, by Plancherel). This is a very interesting behaviour, which points to the fact that there is a certain dichotomy between variation in the range below 2 and the range above 2, with 2 -variation being the critical case. This is not an isolated case: for example, the Variation Norm Carleson theorem is false for {q} -variation with {q \leq 2} ; similarly, the Lépingle inequality is false for 2-variation and below (and this is related to the properties of Brownian motion).

Continue reading

Basic Littlewood-Paley theory III: applications

This is the last part of a 3 part series on the basics of Littlewood-Paley theory. Today we discuss a couple of applications, that is Marcinkiewicz multiplier theorem and the boundedness of the spherical maximal function (the latter being an application of frequency decompositions in general, and not so much of square functions – though one appears, but only for L^2 estimates where one does not need the sophistication of Littlewood-Paley theory).
Part I: frequency projections
Part II: square functions

7. Applications of Littlewood-Paley theory

In this section we will present two applications of the Littlewood-Paley theory developed so far. You can find further applications in the exercises (see particularly Exercise 22 and Exercise 23).

7.1. Marcinkiewicz multipliers

Given an {L^\infty (\mathbb{R}^d)} function {m}, one can define the operator {T_m} given by

\displaystyle  \widehat{T_m f}(\xi) := m(\xi) \widehat{f}(\xi)

for all {f \in L^2(\mathbb{R}^d)}. The operator {T_m} is called a multiplier and the function {m} is called the symbol of the multiplier1. Since {m \in L^\infty}, Plancherel’s theorem shows that {T_m} is a linear operator bounded in {L^2}; its definition can then be extended to {L^2 \cap L^p} functions (which are dense in {L^p}). A natural question to ask is: for which values of {p} in {1 \leq p \leq \infty} is the operator {T_m} an {L^p \rightarrow L^p} bounded operator? When {T_m} is bounded in a certain {L^p} space, we say that it is an {L^p}multiplier.

The operator {T_m} introduced in Section 1 of the first post in this series is an example of a multiplier, with symbol {m(\xi,\tau) = \tau / (\tau - 2\pi i |\xi|^2)}. It is the linear operator that satisfies the formal identity T \circ (\partial_t - \Delta) = \partial_t . We have seen that it cannot be a (euclidean) Calderón-Zygmund operator, and thus in particular it cannot be a Hörmander-Mikhlin multiplier. This can be seen more directly by the fact that any Hörmander-Mikhlin condition of the form {|\partial^{\alpha}m(\xi,\tau)| \lesssim_\alpha |(\xi,\tau)|^{-|\alpha|} = (|\xi|^2 + \tau^2)^{-|\alpha|/2}} is clearly incompatible with the rescaling invariance of the symbol {m}, which satisfies {m(\lambda \xi, \lambda^2 \tau) = m(\xi,\tau)} for any {\lambda \neq 0}. However, the derivatives of {m} actually satisfy some other superficially similar conditions that are of interest to us. Indeed, letting {(\xi,\tau) \in \mathbb{R}^2} for simplicity, we can see for example that {\partial_\xi \partial_\tau m(\xi, \tau) = \lambda^3 \partial_\xi \partial_\tau m(\lambda\xi, \lambda^2\tau)}. When {|\tau|\lesssim |\xi|^2} we can therefore argue that {|\partial_\xi \partial_\tau m(\xi, \tau)| = |\xi|^{-3} |\partial_\xi \partial_\tau m(1, \tau |\xi|^{-2})| \lesssim |\xi|^{-1} |\tau|^{-1} \sup_{|\eta|\lesssim 1} |\partial_\xi \partial_\tau m(1, \eta)|}, and similarly when {|\tau|\gtrsim |\xi|^2}; this shows that for any {(\xi, \tau)} with {\xi,\tau \neq 0} one has

\displaystyle  |\partial_\xi \partial_\tau m(\xi, \tau)| \lesssim |\xi|^{-1} |\tau|^{-1}.

This condition is comparable with the corresponding Hörmander-Mikhlin condition only when {|\xi| \sim |\tau|}, and is vastly different otherwise, being of product type (also notice that the inequality above is compatible with the rescaling invariance of {m}, as it should be).
Continue reading

Basic Littlewood-Paley theory II: square functions

This is the second part of the series on basic Littlewood-Paley theory, which has been extracted from some lecture notes I wrote for a masterclass. In this part we will prove the Littlewood-Paley inequalities, namely that for any {1 < p < \infty} it holds that

\displaystyle \|f\|_{L^p (\mathbb{R})} \sim_p \Big\|\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}\Big\|_{L^p (\mathbb{R})}. \ \ \ \ \ (\dagger)


This time there are also plenty more exercises, some of which I think are fairly interesting (one of them is a theorem of Rudin in disguise).
Part I: frequency projections.

4. Smooth square function

In this subsection we will consider a variant of the square function appearing at the right-hand side of (\dagger ) where we replace the frequency projections {\Delta_j} by better behaved ones.

Let {\psi} denote a smooth function with the properties that {\psi} is compactly supported in the intervals {[-4,-1/2] \cup [1/2, 4]} and is identically equal to {1} on the intervals {[-2,-1] \cup [1,2]}. We define the smooth frequency projections {\widetilde{\Delta}_j} by stipulating

\displaystyle  \widehat{\widetilde{\Delta}_j f}(\xi) := \psi(2^{-j} \xi) \widehat{f}(\xi);

notice that the function {\psi(2^{-j} \xi)} is supported in {[-2^{j+2},-2^{j-1}] \cup [2^{j-1}, 2^{j+2}]} and identically {1} in {[-2^{j+1},-2^{j}] \cup [2^{j}, 2^{j+1}]}. The reason why such projections are better behaved resides in the fact that the functions {\psi(2^{-j}\xi)} are now smooth, unlike the characteristic functions {\mathbf{1}_{[2^j,2^{j+1}]}}. Indeed, they are actually Schwartz functions and you can see by Fourier inversion formula that {\widetilde{\Delta}_j f = f \ast (2^{j} \widehat{\psi}(2^{j}\cdot))}; the convolution kernel {2^{j} \widehat{\psi}(2^{j}\cdot)} is uniformly in {L^1} and therefore the operator is trivially {L^p \rightarrow L^p} bounded for any {1 \leq p \leq \infty} by Young’s inequality, without having to resort to the boundedness of the Hilbert transform.
We will show that the following smooth analogue of (one half of) (\dagger ) is true (you can study the other half in Exercise 6).

Proposition 3 Let {\widetilde{S}} denote the square function

\displaystyle  \widetilde{S}f := \Big(\sum_{j \in \mathbb{Z}} \big|\widetilde{\Delta}_j f \big|^2\Big)^{1/2}.

Then for any {1 < p < \infty} we have that the inequality

\displaystyle  \big\|\widetilde{S}f\big\|_{L^p(\mathbb{R})} \lesssim_p \|f\|_{L^p(\mathbb{R})} \ \ \ \ \ (1)

holds for any {f \in L^p(\mathbb{R})}.

We will give two proofs of this fact, to illustrate different techniques. We remark that the boundedness will depend on the smoothness and the support properties of {\psi} only, and as such extends to a larger class of square functions.
Continue reading

Basic Littlewood-Paley theory I: frequency projections

I have written some notes on Littlewood-Paley theory for a masterclass, which I thought I would share here as well. This is the first part, covering some motivation, the case of a single frequency projection and its vector-valued generalisation. References I have used in preparing these notes include Stein’s “Singular integrals and differentiability properties of functions“, Duoandikoetxea’s “Fourier Analysis“, Grafakos’ “Classical Fourier Analysis” and as usual some material by Tao, both from his blog and the notes for his courses. Prerequisites are some basic Fourier transform theory, Calderón-Zygmund theory of euclidean singular integrals and its vector-valued generalisation (to Hilbert spaces, we won’t need Banach spaces).

0. Introduction
Harmonic analysis makes a fundamental use of divide-et-impera approaches. A particularly fruitful one is the decomposition of a function in terms of the frequencies that compose it, which is prominently incarnated in the theory of the Fourier transform and Fourier series. In many applications however it is not necessary or even useful to resolve the function {f} at the level of single frequencies and it suffices instead to consider how wildly different frequency components behave instead. One example of this is the (formal) decomposition of functions of {\mathbb{R}} given by

\displaystyle f = \sum_{j \in \mathbb{Z}} \Delta_j f,

where {\Delta_j f} denotes the operator

\displaystyle \Delta_j f (x) := \int_{\{\xi \in \mathbb{R} : 2^j \leq |\xi| < 2^{j+1}\}} \widehat{f}(\xi) e^{2\pi i \xi \cdot x} d\xi,

commonly referred to as a (dyadic) frequency projection. Thus {\Delta_j f} represents the portion of {f} with frequencies of magnitude {\sim 2^j}. The Fourier inversion formula can be used to justify the above decomposition if, for example, {f \in L^2(\mathbb{R})}. Heuristically, since any two {\Delta_j f, \Delta_{k} f} oscillate at significantly different frequencies when {|j-k|} is large, we would expect that for most {x}‘s the different contributions to the sum cancel out more or less randomly; a probabilistic argument typical of random walks (see Exercise 1) leads to the conjecture that {|f|} should behave “most of the time” like {\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}} (the last expression is an example of a square function). While this is not true in a pointwise sense, we will see in these notes that the two are indeed interchangeable from the point of view of {L^p}-norms: more precisely, we will show that for any {1 < p < \infty} it holds that

\displaystyle  \boxed{ \|f\|_{L^p (\mathbb{R})} \sim_p \Big\|\Big(\sum_{j \in \mathbb{Z}} |\Delta_j f|^2 \Big)^{1/2}\Big\|_{L^p (\mathbb{R})}. }\ \ \ \ \ (\dagger)

This is a result historically due to Littlewood and Paley, which explains the name given to the related theory. It is easy to see that the {p=2} case is obvious thanks to Plancherel’s theorem, to which the statement is essentially equivalent. Therefore one could interpret the above as a substitute for Plancherel’s theorem in generic {L^p} spaces when {p\neq 2}.

In developing a framework that allows to prove (\dagger ) we will encounter some variants of the square function above, including ones with smoother frequency projections that are useful in a variety of contexts. We will moreover show some applications of the above fact and its variants. One of these applications will be a proof of the boundedness of the spherical maximal function {\mathscr{M}_{\mathbb{S}^{d-1}}} (almost verbatim the one on Tao’s blog).

Notation: We will use {A \lesssim B} to denote the estimate {A \leq C B} where {C>0} is some absolute constant, and {A\sim B} to denote the fact that {A \lesssim B \lesssim A}. If the constant {C} depends on a list of parameters {L} we will write {A \lesssim_L B}.

Continue reading