This is a follow-up on the post on the Chang-Wilson-Wolff inequality and how it can be proven using a lemma of Tao-Wright. The latter consists of a square-function characterisation of the Orlicz space analogous in spirit to the better known one for the Hardy spaces.

In this post we will commence the proof of the Tao-Wright lemma, as promised. We will start by showing how the lemma, which is stated for smooth frequency projections, can be deduced from its discrete variant stated in terms of Haar coefficients (or equivalently, martingale differences with respect to the standard dyadic filtration). This is a minor part of the overall argument but it is slightly tricky so I thought I would spell it out.

Recall that the Tao-Wright lemma is as follows. We write for the smooth frequency projection defined by , where is a smooth function compactly supported in and identically equal to 1 on .

Lemma 1 – Square-function characterisation of [Tao-Wright, 2001]:

Let for any

(notice is concentrated in and ).

If the function is in and such that , then there exists a collection of non-negative functions such that:

- pointwise for any

- they satisfy the square-function estimate

The exponent in the definition of above is not very important: any exponent bigger than 1 would work for the arguments in the Tao-Wright paper. Indeed, below we will show that we can prove the theorem above with any arbitrary large exponent.

As mentioned before, Lemma 1 is consequence of a discrete version of the lemma that we now state. Recall that denotes the Haar function associated to the dyadic interval , that is

where are respectively the left and right halves of .

Lemma 2 – Square-function characterisation of for martingale-differences:

For any function in there exists a collection of non-negative functions such that:

- for any and any

- they satisfy the square-function estimate

We also recall that condition 1. can be rephrased as in the notation of the previous post.

The idea of the proof that Lemma 2 implies Lemma 1 is quite simply to apply Lemma 2 to translations of and then average the results (translated back) with respect to the translation parameter. This is of course an oversimplification – we will need to average over a well-chosen subset of translations in order to get good estimates.

In the following we will let denote the rescaled function . We will build the functions explicitely. We are given a function with average zero, which we assume supported in for convenience and we define for its translation (the has no particular significance). Using Lemma 2 on a given we obtain a collection of functions satisfying and . The functions will end up being averages of these, not just of but averages of all the both in and in ; we anticipate that in the end the average we shall choose will be , but we will get there step by step and for the moment it’s best not to concentrate too much on this expression.

Before we proceed, I should point out that since is supported in the unit interval we don’t have to worry about its low frequencies. In particular, for we will simply choose where is a smooth frequency projection much like but on a slightly bigger frequency interval. Say, for example with a smooth function supported in and such that it is identically equal to 1 on . With this choice we have that , so that condition 1. of Lemma 2 is easily seen to be satisfied (as ). As for condition 2., we claim that so that we have

and since we will be done with the case. To see the claim, simply observe that since

and since (by the compact support of ) we also have by the Fundamental Theorem of Calculus

this last expression is integrable with mass and therefore by integrating in we have the claim.

From now on we concentrate solely on .

We can write by expanding in the Haar basis and linearity

and therefore we can bound using Lemma 2

We want to produce good estimates for the factor .

In the following I will replace with to save room.

## 1. Estimates for

There are two scales involved in the expression : one is the scale of , which is , and the other is the scale of , which is . Accordingly, we distinguish two cases: either or .

### 1.1. Case

In this case we have , the latter being the scale of . Since we write

the latter by the fact that is a Schwartz function. We will want to replace the factor of at the denominator with a factor like instead; the reason why we choose instead of the simpler will be clear later, when we consider the case instead. We anticipate that it will so be because we want to take advantage of all the cancellation we can, and there will be cancellation coming from inside as well.

To efficiently estimate the above we distinguish two further cases for precision.

- : in this case, we have and thus . This means that , and consequently by (1) we have

- : in this case we have that for some it is . Using this fact we have
and therefore we can bound

and in turn we have the same estimate (2) as before.

To summarise, we have shown that when we have for any

### 1.2. Case

In this case the above estimates produce a factor of which is now larger than , so we need a different argument. We can use the decay of and also the fact that . We will also restrict by imposing it belongs to a special set depending on .

We distinguish again two further cases.

- : in this case or ; say we are in the former one. We bound by triangle inequality
We have by the cancellation property of that

where we choose . Since now we have that , and thus the latter expression is

(the latter inequality simply because and so ).

Our problem now is the following: this expression as it stands does not provide__any__a-priori decay of the form (compare with (2) and (3) above). We need that decay because we will have to sum over all indices to obtain , and therefore we have to force it in, so to say. One way to do so is to impose that is never too small. How large do we need it to be? It is a matter of a simple calculation to see that to have we need to impose . Recalling that , this will have the effect of shaving off a little bit of the set of possible translations. Moreover, since we want this decay for every , we will need to shave off several parts of the set of possible translations – as such, we will need at some point to make sure that we are not shaving off all of the possible translations! This will indeed not be the case fortunately. If we impose the distance condition above we obtain that the set of allowable translations is given by the following, up to parameters to be chosen later (with and in general):Indeed the endpoints of any interval belong to a lattice, hence the definition above.

To reiterate, if we havenotice that since the last expression is and thus . We split and insert the lowerbound in the estimate (4) to obtain

For the remaining quantity we do not need to appeal to the cancellation properties of but rather we bound right away and notice that the resulting integral is bounded by the one just estimated since . Thus we have shown that when and

Now notice that since we have and therefore

is an equally good bound but has the advantage of being symmetric in . The same bound clearly holds for and thus for all . - : in this case we don’t need to split the integration over , and we simply bound
Since we have that for it holds that , the last expression is – compare this with (4). Since we have , the argument given for (i) above repeats essentially unchanged, safe for having in place of . Thus we have when

In particular, since we have and therefore even for the bound (5) continues to hold (and the right-hand sides are comparable as well).

To summarise, we have shown that when we have for any that (5) holds.

## 2. Estimating

We now have sufficiently many estimates in order to treat . As said before, it suffices to consider . We let for convenience

for some and some . Indeed, summing both sides over all dyadic intervals with fixed will allow us to replace the RHS with a full convolution of and ; at the LHS we will have instead the -th term of the sum that we are using to bound pointwise (see beginning of the proof). We remark that if then and that similarly if then .

We distinguish the two cases as before.

### 2.1. Case

In this case there are hardly any problems. Indeed, using the bound (3) proven before for this case we have

and we want to show that the second factor in the integral is bounded pointwise by for some .

Again we distinguish the two further cases as before:

- : since we have ; therefore we also have and similarly for all . This means that , and therefore we have when
that is, when estimate holds.

- : in this case we have and therefore for some
and hence . It thus follows right away that estimate holds in this case as well.

### 2.2. Case

We will consider only the contribution coming from the term of (5) with at the denominator, since the other term can be treated symmetrically. We have to bound

We distinguish two further cases:

- : in this case we have for all that . Thus we have (by the hypothesis that in belongs to )
and therefore

[

*Of course this is not a good bound because we are losing a large power of here; however, the decay previously introduced is enough to more than compensate this loss*.]

Using this bound on the quantity we have to estimate, we see that it is controlled byand therefore holds. Notice that by taking sufficiently larger than we can make sure that the exponent is always negative.

- : in this case we have more simply that , and therefore (for some )
which means . This gives immediately

and therefore holds.

It is now clear that we can set for example and and we will have

both exponents being at least , and therefore we have shown that for any the estimate is always true (for an appropriately large constant).

### 2.3. Conclusion of the estimation

The conclusion of the estimation of is therefore that for any we have (with constant uniform in such )

Observe now that in defining we have not shaved off too much off the set of translations. We actually have , because

with as above it suffices to take to have that .

We can now average the RHS of (6) over (since the estimate is uniform in such parameter) and have that

Since the are non-negative and we have that , and therefore if we define

then these are the desired functions for the smooth frequency version of the Tao-Wright lemma, because the above has just shown that

## 3. Conclusion of the proof that Lemma 2 implies Lemma 1

We have so far constructed functions that satisfy 1. of Lemma 1. It remains to verify that 2. of said lemma is also satisfied. This will be a simple consequence of Minkowski’s inequality.

Observe that by Cauchy-Schwarz inequality

Therefore we have

Integrating both sides in and using Minkowski’s inequality in the summation and the integral on the RHS we have

with the latter equalities by Fubini and translation invariance. By hypothesis (that is, by Lemma 2) the quantity in square brackets is bounded by , and therefore we have 2. of Lemma 1 and the proof is concluded.

In the next post we will finally prove Lemma 2.