This is the 3rd post in a series that started with the post on the Chang-Wilson-Wolff inequality:

1. The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

2. Proof of the square-function characterisation of L(log L)^{1/2}: part I

In today’s post we will finally complete the proof of the Tao-Wright lemma. Recall that in the 2nd post of the series we proved that the Tao-Wright lemma follows from its discrete version for Haar/dyadic-martingale-differences, which is as follows:

Lemma 2 – Square-function characterisation of for martingale-differences:

For any function in there exists a collection of non-negative functions such that:

- for any and any

- they satisfy the square-function estimate

Today we will prove this lemma.

## Proof of Lemma 2

In the following we will use to denote the function for .

Definition:

We let denote the collection of all dyadic intervals in . If we denote:

- by the unique dyadic parent of ;
- by the, respectively, left and right children of (that is, the unique dyadic intervals such that );
- by its sibling (that is, the unique dyadic interval such that ). In particular, and viceversa.

We let denote the collection of all dyadic intervals in of length .

Given an interval we let be the dyadic intervals contained in and be the dyadic intervals contained in and such that their length is at least .

We will also use to denote the Haar function associated to the interval , that is

Lemma 2 follows rather directly from the case in which is the characteristic function of a measurable set (which we can further assume to be -measurable, since we can then take a limit in ):

Theorem:

There exists a constant such that the following is true of every . Let such an be fixed and let (a set in the sigma-algebra generated by ). Then we can find a collection of functions such that

- is supported in ;
- for every

- we have the square-function estimate

Indeed, by the atomic decomposition of (as see in the interlude post) we can write where each atom is the characteristic function of the set normalised so that and . Applying the Theorem to each atom we obtain functions with and , and if we let we will have that 1. of Lemma 2 is satisfied trivially because

and similarly 2. of Lemma 2 is satisfied because by Minkowski’s inequality we have

Remark 1Notice that the statement of the Theorem rescales very naturally, and we can replace the interval by any dyadic interval and replace with . We need to be a little careful in adapting the Orlicz norm: the correct definition is

in which case the bound on the norm of the square function becomes .

This will be important in the proof of the Theorem below because we will use induction and will need to assume the Theorem holds on smaller intervals.

Remark 2Another important observation is that the theorem is easily obtained for any of sufficiently large mass: if , then we can choose the functions to be ; propertiesi), ii)are trivially verified, and we have by Hölder’s inequality and the orthogonality of the Haar functions

Thus it will suffice to assume that is small () throughout the argument.

We want to prove the theorem by inducting on , or more precisely, by inducting in the *depth* of the dyadic system used. When is small we have necessarily , so that Remark 2 applies and proves the base case. Suppose then that is not small and imagine that we have some partition of given by a collection of dyadic intervals (and ). Since for the dyadic grids have depth strictly less than , we can use the inductive hypothesis on each of them: this would give us collections for each , with the properties stated in the theorem above; in particular (cf. Remark 1 above)

A necessary step in the inductive proof would then be, at some point, to estimate the contribution to the square function coming from all these intervals in . Since the are supported in and the are disjoint, we would have

thus if we are to proceed by induction we will need a way to make sure that these contributions sum up to at most __minus something__ (minus something because there will be other contributions to the square function, and we will need the *total* contributions to be less than , in order to be able to close the induction). The appearance of the quantity suggests that it should play a special rôle in our analysis, and as a consequence we define the *density* of to be

Notice in particular that

Moreover, we have the following convexity relation:

Imagine now that the intervals happen to all have large density, in particular

for all . Then we would have a simple bound for the sum above, namely

if we choose for some constant , we will have that this contribution is indeed more-or-less proportional to , which goes in the right direction for us. This observation suggests that we will have to partition the dyadic intervals according to their densities to complete the inductive step. It is not clear a priori into how many buckets we will have to partition, but it turns out that few buckets will suffice: in particular, we will define the intervals of *intermediate density* to be the collection of intervals such that

for two constants that we will choose later. Notice that while we have given a heuristic reason for choosing the upperbound on the right, we have not given any such reason to choose a lowerbound matching its form (that would be ). Indeed, we remain at this stage open to the possibility that we will have to choose to depend on later (however this will not be the case in the end). Notice that by the convexity relation (2), if are of intermediate density, then is also of intermediate density (though we will not use this fact).

Once the collection of intervals of intermediate density is chosen, there is an obvious “complement” to it: the collection of intervals that are not in and are *maximal* with respect to . The maximality (and the combinatorics of the dyadic grid) immediately implies that the intervals in are all pairwise disjoint; but we claim that more is true, and that they actually partition . Indeed, let and let be the dyadic interval of length that contains ; since is -measurable, we have either or , and therefore . If (as it will be; see Remark 2 for why we have the liberty of assuming so), then cannot be in the collection , and therefore will be contained in an interval of ; thus every point of is covered by , as claimed.

Another interesting property of is the following: if then . Indeed, this follows from the convexity relation (2), since , because is of intermediate density, by the maximality^{1} of . So, overall there will be two types of intervals : there will be the intervals in

and those in

We stress that the intervals in are the *denser* of the two, but not dense in absolute terms: indeed, will in general be much smaller than .

In order to use the inductive hypothesis, we have to exclude the case first. This is simple: if we choose , then since we will have that automatically, and thus .

We then construct as before, and use the inductive hypothesis on each interval , giving us collections . We will let denote the square function

for later convenience.

It remains to assign functions to the remaining intervals that are missing so far, that is the dyadic intervals that are not contained in any interval of : observe that these intervals are necessarily unions of two or more intervals of (by the combinatorics of the dyadic grid); moreover, these intervals belong to , by maximality of the ‘s. We denote this collection by .

It is absolutely not obvious how to choose these remaining functions – and bear in mind that this is the only step of the construction where we are actually assigning functions to intervals, since the rest is handled recursively. To gain some understanding, we try with a naive choice first and see what it gets us. This naive choice would be the one made in Remark 2, that is, let us set^{2} for all . Since this choice, as already remarked, already satisfies properties 1., 2. as per the statement of the Theorem, we only have to estimate .

Here we make a *fundamental observation*: since the intervals strictly contain intervals , in evaluating the Haar coefficients for these “large” intervals we can forget about the finer distribution of inside the ‘s. In particular, if we let

To see this, simply write and observe that either or ; in each case, is identically verified because is constant on intervals . Identity (3) is important for the following reason: has much smaller norm than does! Indeed, we have

and at the same time

logarithmic convexity of the Lebesgue norms gives therefore that

This is to be compared to the fact that , which is in general much larger than because is very small – our density-based decomposition has given us a certain notable gain, because now we can estimate

a marked improvement over the trivial estimate with at the right-hand side.

Armed with this favourable estimate, we proceed to bound . We have, using the facts collected so far,

then we use Cauchy-Schwarz on the sum over , and since we obtain that the above is bounded by

On the face of it, this looks like a really good bound, since is of smaller order than the main term . However, closing the induction is a very delicate business, and we need to be careful to prove that the final bound on is *exactly* . We will see that the bound above, while good, is not good enough – the reason being that all other error terms have order strictly smaller than , and therefore we cannot make their total contribution be negative. In the end, we will have to change the way we choose the functions for intervals in .

Before doing that, we estimate the errors arising from other terms. Recall that we have to estimate , and that by inductive hypothesis this is bounded by

(this is (1)); we split the sum into . For we have

where the stands for Main (term) and the stands for Error (term). Observe that the second term in the RHS (the error ) is negative, because since . We keep the first term in the RHS (the main term) as it is, and for the error term () we write, using the identity,

where each term in the sum is now positive. We want to estimate each term of the sum from below, because of the minus sign, to get a negative upperbound for the error. Observe then that since we have

since and we can assume , we also have after some algebra

and this is strictly larger than 1 because .

Overall we have shown that

with , and we should observe that we have a lowerbound for as well: indeed,

Hence we have shown the error estimate

Next we estimate the contribution of analogously. We have

now things are a bit different because the second term in the RHS is now positive, since . We will therefore look for upperbounds for each term in the error. As before, we have

and since , we have

Splitting the collection dyadically in we are then able to bound

Overall we have therefore shown that for the error arising from we have the error estimate

where is a numerical constant. Notice that if is sufficiently small and sufficiently large we can have .

Summarising this discussion, we have for the *main terms*

which is precisely the term we can afford to close the induction, and for the *error terms* we have by (5) and (6)

an overall negative quantity (provided is large and is small). As you can see, the magnitude of the error terms is , which is sensibly smaller than the we got for the contribution of the tentative ‘s for . In order to close the induction, we will need to chose functions for so that their contribution to the square function’s mass is of magnitude as well.

Now that we know what we are aiming for, let us go back to the choice of functions for and make a more sophisticated one than . We will proceed driven by simplicity considerations.

In order for properties 1.,2. of the Theorem to be satisfied, we can always choose of the form

where the are positive coefficients and the are positive functions supported in , and where we impose that and that . However, in order to reduce the complexity of the task, we can limit our choices to , that is, functions that are independent of the specific interval and depend on the interval only. Thus we look for functions of the form

where the functions are to be determined (subject to the assumptions above, that is supported in and that ). Regarding the weights , there are two obvious choices here: we could choose or ; in both cases, we would have automatically .

Let us write for shortness and let

(notice that ), and proceed to estimate the square function. The choices for will arise along the way. We have, using the support properties (recall also the definition of )

Observe that the expression in square brackets is simply a numerical coefficient. It would therefore be very convenient at this point to simply choose in order to collect an factor from the sum, so we make this choice and proceed (always aided by the disjointness of the intervals ):

we see therefore that the above is equal to

At this point, to deal with the square root, we could proceed by using the trivial inequality . We will do so for pedagogical reasons, but __we warn the reader in advance that this will not be good enough for us__. We will show how much we fail by proceeding this way, and then we will backtrack and use a better inequality.

Using said inequality, we bound the quantity (9) above by

The first term is bounded by inductive hypothesis by , which we have already estimated: we have a main contribution that is exactly bounded by (this is (7)) and a negative error bounded by (8), which is of size . For the second term above, we multiply and divide each term by and use Cauchy-Schwarz in (and the fact that ) to bound it by

If we choose then we have simply

and therefore the second term is bounded thanks to (4) by

As discussed before, this contribution is too large to be neutralised by the error terms , so that’s no good to us. We can try using instead. Proceeding in the same fashion, we multiply and divide each term by and use Cauchy-Schwarz in , so that the second term is bounded by

since is of intermediate density, we can say that , and therefore using (4) again we bound the above by

which is again of the wrong size .

The issue we are having can be described as follows: the estimate that we are using is not efficient in our regime. A better estimate in this situation would be the only-slightly-less-trivial estimate

which is better than the previous one when .

Remark 3

For a suggestive calculation, assuming that every term in (9) contributed the same on average, we would have for the magnitudes

that is, the expected contribution of the second term would be of the correct order! Of course this is not rigorous, but it is indication that we are on the right track.

Using the slightly-less-trivial estimate on (9) then, we get the same first term as before, , which we have already efficiently estimated, and we get for the second term

The presence of at the denominator in the above expression heavily suggests choosing , which is what we do. The expression then becomes, after reversing the order of summation,

and this is easily seen to be controlled by provided is small: indeed,

because , and moreover we can ensure that

by imposing , which we clearly can do.

Therefore, (10) is bounded by

the last line by the fact that is of intermediate density; thus, by (4) again, we have a bound of

which *finally* matches the magnitude of the other errors! Adding this contribution to the error terms in (8) we see that the total error is

and, since the main term is exactly what we wanted, to close the induction we only need to make sure that the constant in square brackets is negative. It suffices to make sure that

choosing to be small, we choose large (recall that ) and very large, and we see that the inequality can certainly be satisfied. Therefore the induction closes and we are done for every set with sufficiently small (depending on , which in turn depends on ); and for the rest, we just appeal to Remark 2.

The proof of the Tao-Wright lemma is thus fully concluded!

**Footnotes:**

^{1}: One should realise at this point that the construction of is essentially a stopping-time argument. [go back]

^{2}: Observe that this choice would cause for all , by the recursive nature of the construction, and it was already observed in the paper that this would not work. [go back]