Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space .
1. The Chang-Wilson-Wolff inequality
Consider the collection of standard dyadic intervals that are contained in . We let for each denote the subcollection of intervals such that . Notice that these subcollections generate a filtration of , that is , where denotes the sigma-algebra generated by the collection . We can associate to this filtration the conditional expectation operators
and therefore define the martingale differences
With this notation, we have the formal telescopic identity
Demystification: the expectation is simply , where is the unique dyadic interval in such that .
Letting for brevity, the sequence of functions is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is
In the following we will only be interested in functions with zero average, that is functions such that . Given such a function then, we can define its martingale square function to be
With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.
An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent , which is sharp. This can be seen by taking a “lacunary” function (essentially one where , a constant) and randomising the signs using Khintchine’s inequality (indeed, is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which is replaced by the unit cube and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.
This inequality was proven in “Some weighted norm inequalities concerning the Schrödinger operators” by Chang, Wilson and Wolff. However, you won’t find it in there in the form (CWW1) written above; rather, you will find (essentially formula (3.1) in their paper) the following equivalent distributional inequality:
That is not at all elementary, and there are other versions that are all referred to as “Chang-Wilson-Wolff inequality” in the literature. In the interest of clarity, we describe them and their relations below.
Another inequality that is called C-W-W inequality is the endpoint inequality (again, assuming )
where denotes the Orlicz space with norm
(this space is dual to , something that will be useful later). Yet another variant is the version of (CWW1) where we replace with (assuming yet again that and that ):
The inequalities for are all equivalent. In particular, we can show the following relationships:
- ; this is simply an immediate consequence of Hölder inequality, that is of the fact that .
- ; this is a cute exercise (actually, the reverse direction is also true, essentially by reversing the argument below). For to be chosen, we Taylor-expand the exponential, and we have by exchanging sum and integral
which by (CWW4) is bounded by
by Stirling’s formula we can bound and therefore we see that we can make the above sum less than 2 if we choose to be a sufficiently large multiple of . This implies (CWW3) by definition of Orlicz norm.
- ; this is just a consequence of Markov/Chebyshev’s inequality, since (remember )
- ; this is the only non-elementary implication and will be the subject of an upcoming post. Roughly speaking the idea is: show that (CWW2) implies a good- inequality for with a very good constant (gaussian decay); use some fine properties of weights and the good- inequality to show the weighted inequality ; use the weighted inequality and a trick of Rubio de Francia to conclude (CWW1). As you can see, the proof is somewhat convoluted, and will take up an entire post of its own.
Inequality (CWW1) was originally proven by first proving (CWW2) directly using a bit of basic martingale theory (this is a clever argument due to Rubin) and then using the implications above.
In this post we will prove (CWW3) directly by means of a Lemma of Tao-Wright, that we will introduce in the next section. Before we move on though, I would like to comment a little on the motivation behind the C-W-W inequality.
1.1. What can you do with the C-W-W inequality?
Chang, Wilson and Wolff used the C-W-W inequality above as a stepping stone to prove a similar inequality for a continuous square function, the Lusin Area Integral. Let denote the Poisson kernel on (that is ) and let , so that if then is the harmonic extension of to ; we also let and observe that . Given a fixed we define the Lusin Area Integral of a function to be
Although not immediate to the eye, this square function is an operator in the same spirit as the smooth annular square function described in the 2nd post on Littlewood-Paley theory. This can be seen by converting the integral into a summation over dyadic scales and considering the fact that the Fourier transform of is morally concentrated at frequencies (and the integration in is a further average at the same scale, which is in space).
Chang, Wilson and Wolff used (CWW2) to prove the exponential integrability property
for any cube in (here denotes the average of over ). Compare this with (CWW3) above. This answered a question posed by Stein about the sharp order of local integrability of a function with (they already knew that was in because implies and all functions have this property; but having rather than is much stronger). All the work in their “Some weighted norm inequalities concerning the Schrödinger operators” paper is inspired by the study of sufficient conditions on the potentials for establishing the positivity of the Schrödinger operator .
Other results of the above type for a number of continuous square functions have been obtained by Wilson, and most can be found in his book on the subject “Weighted Littlewood-Paley theory and exponential-square integrability“.
Another remarkable use of the C-W-W inequality was made by Bourgain in his paper “On the behaviour of the constant in the Littlewood-Paley inequality“, in which he studied the sharp asymptotic behaviour in the exponent of the constants in the Littlewood-Paley inequality (here is the Littlewood-Paley square function with ). I will probably say more about this paper of Bourgain in the future. For the moment, it suffices to say that he used the C-W-W inequality in order to show that for (with of that exponent coming precisely from the exponent of in the C-W-W inequality (CWW1)).
Yet another application was found by Seeger and Trebels in “Low regularity classes and entropy numbers“, in which they used the C-W-W inequality to obtain embeddings of certain Besov spaces denoted by (spaces whose norms control regularity properties of functions) into Orlicz spaces of the type . A particular instance of their results is the inequality
for , where is a smooth dyadic frequency projection (see next section for a definition). Notice that if the and the norms on the RHS were reversed, this would be the analogue of (CWW4) for the smooth Littlewood-Paley square function! However, Minkowski’s inequality tells us that the RHS above is always larger.
A bunch of other applications of the C-W-W inequality, such as several weighted estimates for square functions of different kind, can be found by combing through the papers that cite the CWW one (duh).
2. The square-function characterisation of
In this section we introduce formally the aforementioned lemma of Tao and Wright, which will then be used in the next section to provide a proof of the C-W-W inequality (CWW3).
Tao and Wright’s motivation was the study of the endpoint mapping properties of Marcinkiewicz multipliers in dimension 1; in particular, they wanted to answer (among others) the question
“What is the “smallest” Orlicz space such that is bounded whenever is a Marcinkiewicz multiplier symbol and ?”
For context, recall that Hörmander multipliers give rise to Calderón-Zygmund operators, and as such are bounded by the classical theory of singular integral operators; however, Marcinkiewicz multipliers are somewhat rougher (and of a different nature), and if you recall the proof of the Marcinkiewicz multiplier theorem that we gave you’ll see that the argument only produced estimates for , leaving the question of the behaviour at or near open.
Tao and Wright answered the question above in “Endpoint multiplier theorems of Marcinkiewicz type“, in which they showed that a Marcinkiewicz multiplier maps1 boundedly and the exponent is sharp, in the sense that for any there is always a Marcinkiewicz multiplier that is not bounded from to . Furthermore, they showed that Marcinkiewicz multipliers map the Hardy space into boundedly and characterised the values of such that Marcinkiewicz multipliers are (locally) bounded from into the Lorentz space . Their methods also prove endpoint mapping properties for the rougher Marcinkiewicz operators with bounded -variation with , whose boundedness we have discussed before.
At the heart of their proofs of endpoint mapping properties there is -as can probably be guessed- an enhanced and adapted (vector-valued) Calderón-Zygmund decomposition; but behind that, there are certain square-function characterisations of the relevant spaces. For context, when we say “square-function characterisation” what we have in mind is the example of the Hardy space , for which we have (see e.g. Grafakos). Here denotes the smooth square function and with a smooth function supported in and identically 1 on .
[There is a connection here between the Hardy space and the Orlicz spaces that should be kept in mind: if is positive on some compact set , then (this is due to the maximal-function characterisation of and the fact that the Hardy-Littlewood maximal function maps into boundedly; see Stein’s “Harmonic Analysis“, Ch. III, §5.3 for details).]
Tao and Wright found themselves in need of a suitable square-function characterisation of that could be analogous to the one for , in order for their arguments to extend to the Orlicz spaces case. They were ultimately able to find one that -although not as neat- does the job egregiously and is quite deep:
Lemma 1 – Square-function characterisation of [Tao-Wright, 2001]:
Let for any
(notice is concentrated in and ).
If the function is in and such that , then there exists a collection of non-negative functions such that:
- pointwise for any
- they satisfy the square-function estimate
It should be noted that this proposition would be essentially trivial to prove with rather than : indeed, in that case it would suffice to take directly or a similar projection to slightly enlarged frequency intervals. Point 2. above would then be just the square-function characterisation of itself.
In the paper they illustrate very well why the same choice of does not work in the case. Consider for large the function , where are smooth bump functions adapted to the interval and with . We have that pretty clearly. When we have – at least morally, but can be made more rigorous – that . Since is concentrated in we can pretend that and therefore we see that for we have
Therefore we can estimate
which is very off the desired RHS! The problem here, as exemplified in the calculations above, is that the functions have supports of very different sizes ( in this case, which is the scale at which is approximately constant) and therefore almost disjoint in a sense. The result is that the sum that defines the square function behaves more like an sum – there is little to none “cancellation” overall. The above lemma is carefully designed to avoid this issue: adding the convolution with on the RHS of the pointwise domination of lets us free to take the ‘s with much more concentrated supports, because convolution with produces a “smearing” of exactly – the expected scale. Meanwhile, with the supports overlapping much more now, the summation in the square function behaves more correctly like an one – as it should – which makes it smaller overall. For example, in the above example of , for it would suffice to take for all of them; thus one would have quite easily, and on the other hand
whose integral is as desired.
Although we won’t be proving the Tao-Wright lemma today (that’s the subject of upcoming posts), we have to mention that it is deduced from a martingale-differences variant of it. We introduce this variant because we will need it in the next section, in order to prove the C-W-W inequality (CWW3).
The result will be best stated in terms of Haar functions. Recall that for any dyadic interval the Haar function is defined as
where denotes the left half of and denotes the right half. We have , that is they are an orthonormal system.
The martingale-differences version of the lemma above is then as follows.
- for any and any
- they satisfy the square-function estimate
You might be rightly wondering what the connection with martingale differences is in the above statement, since they don’t appear explicitely. However, we claim they are really there. Indeed, given interval , for any we can write
similarly, for we get . Thus we could have equivalently stated property 1. in the lemma above as
where is the dyadic interval of length that contains . Thus the relationship between lemma Lemma 2 and Lemma 1 should be a bit less obscure now: we have replaced the smooth frequency projections with the martingale differences and we have replaced convolution with with the conditional expectations . Effectively, Lemma 2 is a discrete version of Lemma 1.
The proof of the conversion from the martingale-differences lemma to the smooth frequency projections one is somewhat sketched in the paper, so I will reproduce an expanded version in this blog in the near future. For now, we accept that Lemma 2 implies Lemma 1 at face value.
2.1. An immediate application: two-lines-proof of an inequality of Zygmund
There is an inequality of Zygmund that, in its simplest form, says the following:
if is a function with lacunary2 Fourier series (in particular, is supported in ), then we have
Does this ring a bell? No? well, it should! At least if you have done the exercises in the 2nd post on Littlewood-Paley theory, because I basically tricked you into proving the dual form of this inequality in Exercise 8. See if you can reconstruct a proof of (ZY) starting from there (using Orlicz duality).
A nice application of the powerful square-function characterisation of is a quick proof of (ZY) as follows. Ignoring the differences between and for brevity, we have by Hausdorff-Young inequality
and we are done.
3. Proof of the C-W-W inequality using the square-function characterisation of
We are now ready to show that the Haar-version of the square-function characterisation above implies the C-W-W inequality in its endpoint form given by (CWW3), which we restate for convenience: we are going to prove that for a function with we have
using the Haar-version of the lemma above.
I learned about this proof from my colleague Odysseas Bakas who came up with it (he also kindly helped me iron out some nasty details in the conversion from Lemma 2 to Lemma 1 above). His own work relates to the Tao-Wright paper discussed in Section 2. The proof is very short and elegant.
We are going to prove (CWW3) using duality in Orlicz spaces: we have that
In other words, is in duality with the Orlicz space . This is not at all obvious and indeed I was a bit surprised the first time I ever came across this. For details, see for example Zygmund’s “Trigonometric Series“, Vol. 1, Ch. IV, §10.
Since and the Haar functions form an orthonormal basis of we can write
notice that it is that kills off the contribution of . Since the function is in the unit ball of , Lemma 2 gives us functions such that (where ) and such that . Thus we can bound the last expression above by
which we rewrite as
Inverting the sums and using Cauchy-Schwarz in this is bounded by
(the latter because the intervals are pairwise disjoint). It is a matter of a simple calculation (which I have already done for you above) to show that the factor (sometimes called the Haar square-function) is actually equal to . Therefore we can bound (using the properties of the ‘s)
Taking the supremum over all possible functions in the unit ball of shows by duality that (CWW3) indeed holds!
In general, if is a convex function such that as and as , we can define the Orlicz space as the space of functions with finite Orlicz norm (actually, it should be called Luxemburg norm)
2: Actually, this particular inequality holds as well for a generic function – that is, we can drop the lacunarity assumption (indeed, the proof we give above works just fine for any function!). This is no longer the case for the inequality that was to be proven in Exercise 8.