# The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space $L(\log L)^{1/2}$.

## 1. The Chang-Wilson-Wolff inequality

Consider the collection $\mathcal{D}$ of standard dyadic intervals that are contained in $[0,1]$. We let $\mathcal{D}_j$ for each $j \in \mathbb{N}$ denote the subcollection of intervals $I \in \mathcal{D}$ such that $|I|= 2^{-j}$. Notice that these subcollections generate a filtration of $\mathcal{D}$, that is $(\sigma(\mathcal{D}_j))_{j \in \mathbb{N}}$, where $\sigma(\mathcal{D}_j)$ denotes the sigma-algebra generated by the collection $\mathcal{D}_j$. We can associate to this filtration the conditional expectation operators

$\displaystyle \mathbf{E}_j f := \mathbf{E}[f \,|\, \sigma(\mathcal{D}_j)],$

and therefore define the martingale differences

$\displaystyle \mathbf{D}_j f:= \mathbf{E}_{j+1} f - \mathbf{E}_{j}f.$

With this notation, we have the formal telescopic identity

$\displaystyle f = \mathbf{E}_0 f + \sum_{j \in \mathbb{N}} \mathbf{D}_j f.$

Demystification: the expectation $\mathbf{E}_j f(x)$ is simply $\frac{1}{|I|} \int_I f(y) \,dy$, where $I$ is the unique dyadic interval in $\mathcal{D}_j$ such that $x \in I$.

Letting $f_j := \mathbf{E}_j f$ for brevity, the sequence of functions $(f_j)_{j \in \mathbb{N}}$ is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is

$\displaystyle \mathbf{E}_{j} f_{j+1} = f_j.$

In the following we will only be interested in functions with zero average, that is functions such that $\mathbf{E}_0 f = 0$. Given such a function $f : [0,1] \to \mathbb{R}$ then, we can define its martingale square function $S_{\mathcal{D}}f$ to be

$\displaystyle S_{\mathcal{D}} f := \Big(\sum_{j \in \mathbb{N}} |\mathbf{D}_j f|^2 \Big)^{1/2}.$

With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.

C-W-W inequality: Let ${f : [0,1] \to \mathbb{R}}$ be such that $\mathbf{E}_0 f = 0$. For any ${2\leq p < \infty}$ it holds that

$\displaystyle \boxed{\|f\|_{L^p([0,1])} \lesssim p^{1/2}\, \|S_{\mathcal{D}}f\|_{L^p([0,1])}.} \ \ \ \ \ \ (\text{CWW}_1)$

An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent ${p}$, which is sharp. This can be seen by taking a “lacunary” function ${f}$ (essentially one where $\mathbf{D}_jf = a_j \in \mathbb{C}$, a constant) and randomising the signs using Khintchine’s inequality (indeed, ${p^{1/2}}$ is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which $[0,1]$ is replaced by the unit cube $[0,1]^d$ and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.

This inequality was proven in “Some weighted norm inequalities concerning the Schrödinger operators” by Chang, Wilson and Wolff. However, you won’t find it in there in the form (CWW1) written above; rather, you will find (essentially formula (3.1) in their paper) the following equivalent distributional inequality:

$\displaystyle \boxed{ |\{x \in [0,1] \,: \, |f(x) - \mathbf{E}_0 f| \geq \lambda\}| \leq e^{-c \lambda^{2} / \|S_{\mathcal{D}}f\|_{L^{\infty}}^2} } \ \ \ \ \ \ \ (\text{CWW}_2)$

That $(\text{CWW}_1) \Leftrightarrow (\text{CWW}_2)$ is not at all elementary, and there are other versions that are all referred to as “Chang-Wilson-Wolff inequality” in the literature. In the interest of clarity, we describe them and their relations below.
Another inequality that is called C-W-W inequality is the endpoint inequality (again, assuming $\mathbf{E}_0 f = 0$)

$\displaystyle \boxed{ \|f\|_{\exp(L^2)} \lesssim \|S_{\mathcal{D}}f\|_{L^\infty}, } \ \ \ \ \ \ \ (\text{CWW}_3)$

where $\exp(L^2)$ denotes the Orlicz space with norm

$\displaystyle \|f\|_{\exp(L^2)} := \inf \Big\{\mu > 0 \, : \, \int_{0}^{1} e^{|f(x)|^2 / \mu^2} \,dx \leq 2 \Big\}$

(this space is dual to $L(\log L)^{1/2}$, something that will be useful later). Yet another variant is the version of (CWW1) where we replace $\|S_{\mathcal{D}}f\|_{L^p}$ with $\|S_{\mathcal{D}}f\|_{L^\infty}$ (assuming yet again that $\mathbf{E}_0 f = 0$ and that ${2 \leq p < \infty}$):

$\displaystyle \boxed{ \|f\|_{L^p([0,1])} \lesssim p^{1/2} \, \|S_{\mathcal{D}}f\|_{L^\infty([0,1])}. } \ \ \ \ \ \ \ (\text{CWW}_4)$

The inequalities $(\text{CWW}_i)$ for $i \in \{1,2,3,4\}$ are all equivalent. In particular, we can show the following relationships:

1. $(\text{CWW}_1) \Rightarrow (\text{CWW}_4)$; this is simply an immediate consequence of Hölder inequality, that is of the fact that $\|S_{\mathcal{D}}f\|_{L^p([0,1])} \leq \|S_{\mathcal{D}}f\|_{L^{\infty}([0,1])}$.
2. $(\text{CWW}_4) \Rightarrow (\text{CWW}_3)$; this is a cute exercise (actually, the reverse direction is also true, essentially by reversing the argument below). For $\mu$ to be chosen, we Taylor-expand the exponential, and we have by exchanging sum and integral

$\displaystyle \int_{0}^{1} e^{|f(x)|^2 \mu^{-2}} \,dx = \sum_{p \in \mathbb{N}} \frac{1}{p! \mu^{2p}} \int_{0}^{1}|f(x)|^{2p} \,dx,$

which by (CWW4) is bounded by

$\displaystyle \sum_{p \in \mathbb{N}} \frac{C^{2p} \,p^{p}\, \|S_{\mathcal{D}}f\|_{L^\infty}^{2p}}{p! \mu^{2p}} = \sum_{p \in \mathbb{N}} \Big(\frac{C \|S_{\mathcal{D}}f\|_{L^{\infty}}}{\mu}\Big)^{2p} \frac{p^p}{p!};$

by Stirling’s formula we can bound $\frac{p^p}{p!} \lesssim e^{p}$ and therefore we see that we can make the above sum less than 2 if we choose $\mu$ to be a sufficiently large multiple of $\|S_{\mathcal{D}}f\|_{L^{\infty}}$. This implies (CWW3) by definition of Orlicz norm.

3. $(\text{CWW}_3) \Rightarrow (\text{CWW}_2)$; this is just a consequence of Markov/Chebyshev’s inequality, since (remember $\mathbf{E}_0 f = 0$)

$\displaystyle e^{\lambda^2 \mu^{-2}} |\{x \in [0,1] \,: \, |f(x)| \geq \lambda\}| \leq \int_{0}^{1} e^{|f(x)|^2 \mu^{-2}}\,dx,$

and if we choose $\mu = C \|S_{\mathcal{D}}f\|_{L^\infty}$ to be the RHS of (CWW3) then the RHS of the above is $\leq 2$ and (CWW4) follows.

4. $(\text{CWW}_2) \Rightarrow (\text{CWW}_1)$; this is the only non-elementary implication and will be the subject of an upcoming post. Roughly speaking the idea is: show that (CWW2) implies a good-$\lambda$ inequality for $S_{\mathcal{D}}$ with a very good constant (gaussian decay); use some fine properties of $A_1$ weights and the good-$\lambda$ inequality to show the weighted inequality $\|f\|_{L^2(w)} \lesssim [w]_{A_1}^{1/2} \|S_{\mathcal{D}}f\|_{L^2(w)}$; use the weighted inequality and a trick of Rubio de Francia to conclude (CWW1). As you can see, the proof is somewhat convoluted, and will take up an entire post of its own.

Inequality (CWW1) was originally proven by first proving (CWW2) directly using a bit of basic martingale theory (this is a clever argument due to Rubin) and then using the implications above.
In this post we will prove (CWW3) directly by means of a Lemma of Tao-Wright, that we will introduce in the next section. Before we move on though, I would like to comment a little on the motivation behind the C-W-W inequality.

### 1.1. What can you do with the C-W-W inequality?

Chang, Wilson and Wolff used the C-W-W inequality above as a stepping stone to prove a similar inequality for a continuous square function, the Lusin Area Integral. Let ${P}$ denote the Poisson kernel on $\mathbb{R}^d$ (that is $P(x) = c_d (1+|x|^2)^{-(d+1)/2}$) and let $P_t(x):= t^{-d} P(x / t)$, so that if $f \,:\, \mathbb{R}^d \to \mathbb{C}$ then $F(x,t):= f \ast P_t(x)$ is the harmonic extension of ${f}$ to $\mathbb{R}^{d+1}_{+}$; we also let $\psi_t(x) := t \nabla_x P_t(x) = t^{-d} (\nabla_x P)_t(x)$ and observe that $\int \psi_t(x) \,dx = 0$. Given a fixed ${\gamma \in (0,\infty)}$ we define the Lusin Area Integral of a function ${f}$ to be

$\displaystyle A_\gamma f(x) := \Big(\iint\limits_{\{(y,t) \in \mathbb{R}^{d+1}_{+} : |x-y|\leq \gamma t\}} t^{-d} |\psi_t \ast f(y,t)|^2 \,dy\, \frac{dt}{t}\Big)^{1/2}.$

Although not immediate to the eye, this square function is an operator in the same spirit as the smooth annular square function described in the 2nd post on Littlewood-Paley theory. This can be seen by converting the $\frac{dt}{t}$ integral into a summation over dyadic scales and considering the fact that the Fourier transform of $\psi_t$ is morally concentrated at frequencies $|\xi| \sim t^{-1}$ (and the integration in $dy$ is a further average at the same scale, which is ${t}$ in space).
Chang, Wilson and Wolff used (CWW2) to prove the exponential integrability property

$\displaystyle \int_{Q} \exp\Big(c_{d,\gamma} \frac{|f(x) - \mathbf{E}_Q f|^2}{\|A_\gamma f\|_{L^\infty}^2}\Big) \,dx \leq C_{d,\gamma}|Q|$

for $Q$ any cube in $\mathbb{R}^d$ (here $\mathbf{E}_Q f$ denotes the average of ${f}$ over $Q$). Compare this with (CWW3) above. This answered a question posed by Stein about the sharp order of local integrability of a function with $A_\gamma f \in L^\infty$ (they already knew that $\exp\Big(c \frac{|f(x) - \mathbf{E}_Q f|}{\|A_\gamma f\|_{L^\infty}}\Big)$ was in $L^1_{\mathrm{loc}}$ because $A_\gamma f \in L^\infty$ implies $f \in BMO$ and all $BMO$ functions have this property; but having $|f|^2$ rather than $|f|$ is much stronger). All the work in their “Some weighted norm inequalities concerning the Schrödinger operators” paper is inspired by the study of sufficient conditions on the potentials $V$ for establishing the positivity of the Schrödinger operator $-\Delta + V(x)$.
Other results of the above type for a number of continuous square functions have been obtained by Wilson, and most can be found in his book on the subject “Weighted Littlewood-Paley theory and exponential-square integrability“.

Another remarkable use of the C-W-W inequality was made by Bourgain in his paper “On the behaviour of the constant in the Littlewood-Paley inequality“, in which he studied the sharp asymptotic behaviour in the exponent ${p}$ of the constants $c_p, C_p$ in the Littlewood-Paley inequality $c_p \|f\|_{L^p(\mathbb{R})}\leq \|Sf\|_{L^p(\mathbb{R})}\leq C_p \|f\|_{L^p(\mathbb{R})}$ (here ${S}$ is the Littlewood-Paley square function $Sf = \Big(\sum_{k \in \mathbb{Z}} |\Delta_k f|^2\Big)^{1/2}$ with $\widehat{\Delta_k f}(\xi) = \mathbf{1}_{[2^k,2^{k+1}]\cup[-2^{k+1},-2^k]}(\xi) \widehat{f}(\xi)$). I will probably say more about this paper of Bourgain in the future. For the moment, it suffices to say that he used the C-W-W inequality in order to show that $C_p \sim (p-1)^{-3/2}$ for ${p \to 1^{+}}$ (with $1/2$ of that $3/2$ exponent coming precisely from the exponent of ${p}$ in the C-W-W inequality (CWW1)).

Yet another application was found by Seeger and Trebels in “Low regularity classes and entropy numbers“, in which they used the C-W-W inequality to obtain embeddings of certain Besov spaces denoted by ${B_{0,q}^{\infty}}$ (spaces whose norms control regularity properties of functions) into Orlicz spaces of the type ${\exp(L^{q'})}$. A particular instance of their results is the inequality

$\displaystyle \|f\|_{L^p} \lesssim p^{1/2} \Big(\sum_{j \in \mathbb{Z}} \|\widetilde{\Delta}_j f\|_{L^\infty}^2\Big)^{1/2}$

for $p \geq 2$, where ${\widetilde{\Delta}_j}$ is a smooth dyadic frequency projection (see next section for a definition). Notice that if the $L^\infty$ and the $\ell^2$ norms on the RHS were reversed, this would be the analogue of (CWW4) for the smooth Littlewood-Paley square function! However, Minkowski’s inequality tells us that the RHS above is always larger.

A bunch of other applications of the C-W-W inequality, such as several weighted estimates for square functions of different kind, can be found by combing through the papers that cite the CWW one (duh).

## 2. The square-function characterisation of $L (\log L)^{1/2}$

In this section we introduce formally the aforementioned lemma of Tao and Wright, which will then be used in the next section to provide a proof of the C-W-W inequality (CWW3).

Tao and Wright’s motivation was the study of the endpoint mapping properties of Marcinkiewicz multipliers in dimension 1; in particular, they wanted to answer (among others) the question

“What is the “smallest” Orlicz space $\Phi(L)$ such that $T_m$ is $\Phi(L) \to L^{1,\infty}$ bounded whenever $m$ is a Marcinkiewicz multiplier symbol and $\widehat{T_m f} = m \widehat{f}$?”

For context, recall that Hörmander multipliers give rise to Calderón-Zygmund operators, and as such are $L^1 \to L^{1,\infty}$ bounded by the classical theory of singular integral operators; however, Marcinkiewicz multipliers are somewhat rougher (and of a different nature), and if you recall the proof of the Marcinkiewicz multiplier theorem that we gave you’ll see that the argument only produced $L^p$ estimates for ${1 < p < \infty}$, leaving the question of the behaviour at or near ${p=1}$ open.
Tao and Wright answered the question above in “Endpoint multiplier theorems of Marcinkiewicz type“, in which they showed that a Marcinkiewicz multiplier $T_m$ maps1 $L (\log L)^{1/2} \to L^{1,\infty}$ boundedly and the exponent $1/2$ is sharp, in the sense that for any ${r < 1/2}$ there is always a Marcinkiewicz multiplier that is not bounded from $L(\log L)^r$ to $L^{1,\infty}$. Furthermore, they showed that Marcinkiewicz multipliers map the Hardy space $H^1$ into $L^{1,\infty}$ boundedly and characterised the values of $r,q$ such that Marcinkiewicz multipliers are (locally) bounded from $L(\log L)^{r}$ into the Lorentz space $L^{1,q}$. Their methods also prove endpoint mapping properties for the rougher Marcinkiewicz operators with bounded ${q}$-variation with ${1\leq q < 2}$, whose boundedness we have discussed before.

At the heart of their proofs of endpoint mapping properties there is -as can probably be guessed- an enhanced and adapted (vector-valued) Calderón-Zygmund decomposition; but behind that, there are certain square-function characterisations of the relevant spaces. For context, when we say “square-function characterisation” what we have in mind is the example of the Hardy space $H^1$, for which we have $\|f\|_{H^1} \sim \|\mathcal{S}f\|_{L^1}$ (see e.g. Grafakos). Here $\mathcal{S}$ denotes the smooth square function $\mathcal{S}f := \Big(\sum_{k \in \mathbb{Z}} |\widetilde{\Delta}_k f|^2\Big)^{1/2}$ and $\widehat{\widetilde{\Delta}f}(\xi) = \psi(2^{-j}\xi)\widehat{f}(\xi)$ with $\psi$ a smooth function supported in $1/2 \leq |\xi| \leq 4$ and identically 1 on $1 \leq |\xi| \leq 2$.
[There is a connection here between the Hardy space $H^1$ and the Orlicz spaces $L(\log L)^r$ that should be kept in mind: if $f \in H^1$ is positive on some compact set $K$, then $f \in L \log L (K)$ (this is due to the maximal-function characterisation of $H^1$ and the fact that the Hardy-Littlewood maximal function maps $L \log L$ into $L^1$ boundedly; see Stein’s “Harmonic Analysis“, Ch. III, §5.3 for details).]
Tao and Wright found themselves in need of a suitable square-function characterisation of $L(\log L)^{1/2}$ that could be analogous to the one for $H^1$, in order for their arguments to extend to the Orlicz spaces case. They were ultimately able to find one that -although not as neat- does the job egregiously and is quite deep:

Lemma 1 – Square-function characterisation of $L(\log L)^{1/2}$ [Tao-Wright, 2001]:
Let for any $j \in \mathbb{Z}$

$\displaystyle \phi_j(x) := \frac{2^j}{(1 + 2^j |x|)^{3/2}}$

(notice $\phi_j$ is concentrated in $[-2^{-j},2^{-j}]$ and $\|\phi_j\|_{L^1} \sim 1$).
If the function ${f}$ is in $L(\log L)^{1/2}([-R,R])$ and such that $\int f \,dx = 0$, then there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. pointwise for any $j \in \mathbb{Z}$

$\displaystyle \big|\widetilde{\Delta}_j f\big| \lesssim F_j \ast \phi_j ;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{Z}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}([-R,R])}.$

It should be noted that this proposition would be essentially trivial to prove with $f \in H^1$ rather than $f \in L(\log L)^{1/2}$: indeed, in that case it would suffice to take directly $F_j = |\widetilde{\Delta}_j f|$ or a similar projection to slightly enlarged frequency intervals. Point 2. above would then be just the square-function characterisation of $H^1$ itself.
In the paper they illustrate very well why the same choice of $F_j = |\widetilde{\Delta}_j f|$ does not work in the $L(\log L)^{1/2}$ case. Consider for ${N}$ large the function $f = N^{-1/2} \psi_N$, where $\psi_j(x) = 2^j \psi(2^j x)$ are smooth bump functions adapted to the interval $[-2^{-j},2^{-j}]$ and with $\int \psi_j = 0$. We have that $\|f\|_{L(\log L)^{1/2}} \sim 1$ pretty clearly. When $1 \ll j \ll N$ we have – at least morally, but can be made more rigorous – that $|\widetilde{\Delta}_j f| \approx N^{-1/2} \psi_j$. Since $\psi_j$ is concentrated in $[-2^{-j},2^{-j}]$ we can pretend that $|\psi_j| \approx 2^j \mathbf{1}_{[-2^{-j},2^{-j}]}$ and therefore we see that for $x \in [2^{-i}, 2^{-i+1}]$ we have

$\displaystyle \Big(\sum_{1 \ll j \ll N} |\widetilde{\Delta}_j f(x)|^2 \Big)^{1/2} \approx \Big(\sum_{1 \ll j \leq i} 2^{2j}N^{-1} \Big)^{1/2} \sim N^{-1/2}\,2^i.$

Therefore we can estimate

$\displaystyle \int \Big(\sum_{1 \ll j \ll N} |\widetilde{\Delta}_j f(x)|^2 \Big)^{1/2}\,dx \lesssim N^{-1/2} \int \sum_{1\ll i \ll N} 2^i \mathbf{1}_{[2^{-i}, 2^{-i+1}]}(x) \,dx \sim N^{1/2},$

which is very off the desired RHS! The problem here, as exemplified in the calculations above, is that the functions $|\widetilde{\Delta}_j f|$ have supports of very different sizes ($2^{-j}$ in this case, which is the scale at which $|\widetilde{\Delta}_j f|$ is approximately constant) and therefore almost disjoint in a sense. The result is that the $\ell^2$ sum that defines the square function $\big(\sum_{j} |\widetilde{\Delta}_j f(x)|^2 \big)^{1/2}$ behaves more like an $\ell^1$ sum – there is little to none “cancellation” overall. The above lemma is carefully designed to avoid this issue: adding the convolution with $\phi_j$ on the RHS of the pointwise domination of $|\widetilde{\Delta}_j f|$ lets us free to take the $F_j$‘s with much more concentrated supports, because convolution with $\phi_j$ produces a “smearing” of exactly $2^{-j}$ – the expected scale. Meanwhile, with the supports overlapping much more now, the summation in the square function behaves more correctly like an $\ell^2$ one – as it should – which makes it smaller overall. For example, in the above example of $f = N^{-1/2} \psi_N$, for $1 \ll j \ll N$ it would suffice to take $F_j = N^{-1/2}|\psi_N|$ for all of them; thus one would have $|\widetilde{\Delta}_j f| \approx N^{-1/2} |\psi_j| \lesssim N^{-1/2} |\psi_N| \ast \phi_j = F_j \ast \phi_j$ quite easily, and on the other hand

$\displaystyle \Big(\sum_{1 \ll j \ll N} |F_j|^2 \Big)^{1/2} \sim N^{-1/2} (N |\psi_N|^2 )^{1/2} = |\psi_N|,$

whose integral is $\sim 1$ as desired.

Although we won’t be proving the Tao-Wright lemma today (that’s the subject of upcoming posts), we have to mention that it is deduced from a martingale-differences variant of it. We introduce this variant because we will need it in the next section, in order to prove the C-W-W inequality (CWW3).
The result will be best stated in terms of Haar functions. Recall that for any dyadic interval $I \in \mathcal{D}$ the Haar function $h_I$ is defined as

$\displaystyle h_I := \frac{1}{|I|^{1/2}}(\mathbf{1}_{I_{+}} - \mathbf{1}_{I_{-}})$

where $I_{+}$ denotes the left half of $I$ and $I_{-}$ denotes the right half. We have $\langle h_I, h_J \rangle = \delta_{I,J}$, that is they are an orthonormal system.
The martingale-differences version of the lemma above is then as follows.

Lemma 2 – Square-function characterisation of $L(\log L)^{1/2}$ for martingale-differences:
For any function $f : [0,1] \to \mathbb{R}$ in $L(\log L)^{1/2}([0,1])$ there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. for any $j \in \mathbb{N}$ and any $I \in \mathcal{D}_j$

$\displaystyle |\langle f, h_I \rangle|\lesssim \frac{1}{|I|^{1/2}} \int_{I} F_j \,dx;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{N}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}([0,1])}.$

You might be rightly wondering what the connection with martingale differences is in the above statement, since they don’t appear explicitely. However, we claim they are really there. Indeed, given interval ${I}$, for any $x \in I_{+}$ we can write

\displaystyle \begin{aligned} \langle f, h_I \rangle = & \frac{1}{|I|^{1/2}} \Big[\int_{I_{+}} f - \int_{I_{-}} f \Big] \\ = & \frac{1}{|I|^{1/2}} \Big[2 \int_{I_{+}} f - \int_{I} f \Big] \\ = & |I|^{1/2} \Big[\frac{2}{|I|} \int_{I_{+}} f - \frac{1}{|I|}\int_{I} f \Big] \\ = & |I|^{1/2} [ \mathbf{E}_{j+1}f(x) - \mathbf{E}_j f(x)] \\ = & |I|^{1/2} \mathbf{D}_j f(x); \end{aligned}

similarly, for $x \in I_{-}$ we get $\langle f, h_I \rangle = - |I|^{1/2} \mathbf{D}_j f(x)$. Thus we could have equivalently stated property 1. in the lemma above as

$\displaystyle |\mathbf{D}_j f(x)| \lesssim \frac{1}{|I|} \int_{I} F_j \,dx = \mathbf{E}_j F_j(x),$

where ${I}$ is the dyadic interval of length $2^{-j}$ that contains ${x}$. Thus the relationship between lemma Lemma 2 and Lemma 1 should be a bit less obscure now: we have replaced the smooth frequency projections ${\widetilde{\Delta}_j}$ with the martingale differences ${\mathbf{D}_j}$ and we have replaced convolution with ${\phi_j}$ with the conditional expectations ${\mathbf{E}_j}$. Effectively, Lemma 2 is a discrete version of Lemma 1.
The proof of the conversion from the martingale-differences lemma to the smooth frequency projections one is somewhat sketched in the paper, so I will reproduce an expanded version in this blog in the near future. For now, we accept that Lemma 2 implies Lemma 1 at face value.

### 2.1. An immediate application: two-lines-proof of an inequality of Zygmund

There is an inequality of Zygmund that, in its simplest form, says the following:

if ${f: \mathbb{T} \to \mathbb{C}}$ is a function with lacunary2 Fourier series (in particular, $\widehat{f}$ is supported in $\{2^j \,: \, j \in \mathbb{N}\}$), then we have

$\displaystyle \Big(\sum_{j \in \mathbb{N}}|\widehat{f}(2^j)|^2 \Big)^{1/2} \lesssim \|f\|_{L(\log L)^{1/2}}. \ \ \ \ \ \ \ (ZY)$

Does this ring a bell? No? well, it should! At least if you have done the exercises in the 2nd post on Littlewood-Paley theory, because I basically tricked you into proving the dual form of this inequality in Exercise 8. See if you can reconstruct a proof of (ZY) starting from there (using Orlicz duality).
A nice application of the powerful square-function characterisation of $L(\log L)^{1/2}$ is a quick proof of (ZY) as follows. Ignoring the differences between $\mathbb{T}$ and $\mathbb{R}$ for brevity, we have by Hausdorff-Young inequality

$\displaystyle |\widehat{f}(2^j)| = \big|\widehat{\widetilde{\Delta}_j f}(2^j)\big| \leq \|\widetilde{\Delta}_j f\|_{L^1},$

which by 1. of the Tao-Wright lemma is bounded by $\|F_j \ast \phi_j\|_{L^1} \lesssim \|F_j\|_{L^1}$ (the latter by Young’s convolution inequality). Thus we have by Minkowski and 2. of the Tao-Wright lemma that

\displaystyle \begin{aligned} \Big(\sum_{j \in \mathbb{N}}|\widehat{f}(2^j)|^2 \Big)^{1/2} \lesssim & \Big(\sum_{j \in \mathbb{N}} \|F_j\|_{L^1}^2 \Big)^{1/2} \\ \leq & \Big\|\big(\sum_{j \in \mathbb{N}} |F_j|^2 \Big)^{1/2}\Big\|_{L^1} \\ \lesssim & \|f\|_{L (\log L)^{1/2}}, \end{aligned}

and we are done.

## 3. Proof of the C-W-W inequality using the square-function characterisation of $L(\log L)^{1/2}$

We are now ready to show that the Haar-version of the square-function characterisation above implies the C-W-W inequality in its endpoint form given by (CWW3), which we restate for convenience: we are going to prove that for a function with $\mathbf{E}_0 f = 0$ we have

$\displaystyle \|f\|_{\exp(L^2)} \lesssim \|S_{\mathcal{D}}f\|_{L^\infty}, \ \ \ \ \ \ \ (\text{CWW}_3)$

using the Haar-version of the lemma above.
I learned about this proof from my colleague Odysseas Bakas who came up with it (he also kindly helped me iron out some nasty details in the conversion from Lemma 2 to Lemma 1 above). His own work relates to the Tao-Wright paper discussed in Section 2. The proof is very short and elegant.

We are going to prove (CWW3) using duality in Orlicz spaces: we have that

$\displaystyle \|f\|_{\exp(L^2)} = \sup_{g \,: \, \|g\|_{L(\log L)^{1/2}} \leq 1} \Big|\int f g \,dx\Big|.$

In other words, $L(\log L)^{1/2}$ is in duality with the Orlicz space $\exp(L^2)$. This is not at all obvious and indeed I was a bit surprised the first time I ever came across this. For details, see for example Zygmund’s “Trigonometric Series“, Vol. 1, Ch. IV, §10.
Since $\mathbf{E}_0 f = 0$ and the Haar functions form an orthonormal basis of $L^2([0,1])$ we can write

\displaystyle \begin{aligned} \Big|\int fg \, dx \Big| = & \Big|\int \Big(\sum_{I \in \mathcal{D}} \langle f, h_I\rangle h_I \Big)\Big(\sum_{J \in \mathcal{D}} \langle g, h_J\rangle h_J \Big) \,dx \Big| \\ =& \Big|\sum_{I \in \mathcal{D}} \langle f, h_I\rangle \langle g, h_I \rangle\Big| \\ \leq & \sum_{j \in\mathbb{N}} \sum_{I \in \mathcal{D}_j} |\langle f, h_I\rangle| |\langle g, h_I \rangle|; \end{aligned}

notice that it is $\mathbf{E}_0 f = 0$ that kills off the contribution of $\mathbf{E}_0 g$. Since the function $g$ is in the unit ball of $L(\log L)^{1/2}$, Lemma 2 gives us functions $G_j$ such that $|\langle g, h_I \rangle | \lesssim |I|^{-1/2} \int_I G_j \,dx$ (where $|I|=2^{-j}$) and such that $\big\|\big(\sum_{j} |G_j|^2 \big)^{1/2}\big\|_{L^1} \lesssim 1$. Thus we can bound the last expression above by

$\displaystyle \sum_{j \in\mathbb{N}} \sum_{I \in \mathcal{D}_j} |\langle f, h_I\rangle| |I|^{-1/2} \int_I G_j \,dx,$

which we rewrite as

$\displaystyle \int \sum_{j \in\mathbb{N}} \sum_{I \in \mathcal{D}_j} |\langle f, h_I\rangle| |I|^{-1/2} \mathbf{1}_{I} G_j \,dx.$

Inverting the sums and using Cauchy-Schwarz in $j$ this is bounded by

\displaystyle \begin{aligned} \int & \Big(\sum_{j \in\mathbb{N}} \Big[\sum_{I \in \mathcal{D}_j} \frac{|\langle f, h_I\rangle|}{|I|^{1/2}}\mathbf{1}_{I}\Big]^2 \Big)^{1/2} \Big(\sum_{j \in\mathbb{N}} |G_j|^2 \Big)^{1/2} \,dx \\ & = \int \Big(\sum_{I \in \mathcal{D}} \frac{|\langle f, h_I\rangle|^2}{|I|}\mathbf{1}_{I}\Big)^{1/2} \Big(\sum_{j \in\mathbb{N}} |G_j|^2 \Big)^{1/2} \,dx \end{aligned}

(the latter because the intervals $I \in \mathcal{D}_j$ are pairwise disjoint). It is a matter of a simple calculation (which I have already done for you above) to show that the factor $\Big(\sum_{I \in \mathcal{D}} \frac{|\langle f, h_I\rangle|^2}{|I|}\mathbf{1}_{I}\Big)^{1/2}$ (sometimes called the Haar square-function) is actually equal to $S_{\mathcal{D}}f$. Therefore we can bound (using the properties of the $G_j$‘s)

$\displaystyle \int S_{\mathcal{D}}f \Big(\sum_{j \in\mathbb{N}} |G_j|^2 \Big)^{1/2} \,dx \leq \|S_{\mathcal{D}}f\|_{L^{\infty}} \Big\|\Big(\sum_{j \in\mathbb{N}} |G_j|^2 \Big)^{1/2} \Big\|_{L^1} \lesssim \|S_{\mathcal{D}}f\|_{L^{\infty}}.$

Taking the supremum over all possible functions $g$ in the unit ball of $L(\log L)^{1/2}$ shows by duality that (CWW3) indeed holds!

We are done for today. In the next posts we will see how to prove Lemma 2.

Footnotes:
1: For clarity, the Orlicz space $L(\log L)^r$ is the space of functions with finite Orlicz norm

$\displaystyle \|f\|_{L(\log L)^r} := \inf\Big\{\mu > 0 \,:\, \int \frac{|f(x)|}{\mu} \Big(\log \Big(2 + \frac{|f(x)|}{\mu}\Big)\Big)^r \,dx \leq 1 \Big\}.$

In general, if $\Phi\,:\, \mathbb{R}_{+} \to \mathbb{R}_{+}$ is a convex function such that $\Phi(t)/t \to 0$ as $t \to 0$ and $\Phi(t)/t \to \infty$ as $t \to \infty$, we can define the Orlicz space $\Phi(L)$ as the space of functions with finite Orlicz norm (actually, it should be called Luxemburg norm)

$\displaystyle \|f\|_{\Phi(L)} := \inf\Big\{\mu > 0 \,:\, \int \Phi\Big(\frac{|f(x)|}{\mu} \Big)\,dx \leq 1 \Big\}.$

[go back]

2: Actually, this particular inequality holds as well for a generic function ${f}$ – that is, we can drop the lacunarity assumption (indeed, the proof we give above works just fine for any function!). This is no longer the case for the inequality that was to be proven in Exercise 8.

[go back]