# Proof of the square-function characterisation of L(log L)^{1/2}: part II

This is the 3rd post in a series that started with the post on the Chang-Wilson-Wolff inequality:

In today’s post we will finally complete the proof of the Tao-Wright lemma. Recall that in the 2nd post of the series we proved that the Tao-Wright lemma follows from its discrete version for Haar/dyadic-martingale-differences, which is as follows:

Lemma 2 – Square-function characterisation of $L(\log L)^{1/2}$ for martingale-differences:
For any function $f : [0,1] \to \mathbb{R}$ in $L(\log L)^{1/2}([0,1])$ there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. for any $j \in \mathbb{N}$ and any $I \in \mathcal{D}_j$

$\displaystyle |\langle f, h_I \rangle|\lesssim \frac{1}{|I|^{1/2}} \int_{I} F_j \,dx;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{N}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}([0,1])}.$

Today we will prove this lemma.

## Proof of Lemma 2

In the following we will use ${\log^{+}}$ to denote the function ${\log^{+} t := \log(2 + t)}$ for ${t>0}$.

Definition:
We let ${\mathcal{D}}$ denote the collection of all dyadic intervals in ${\mathbb{R}}$. If ${I \in \mathcal{D}}$ we denote:

• by ${\widehat{I}}$ the unique dyadic parent of ${I}$;
• by ${I_{+}, I_{-}}$ the, respectively, left and right children of ${I}$ (that is, the unique dyadic intervals such that ${\widehat{I_{+}} = I = \widehat{I_{-}}}$);
• by ${\widetilde{I}}$ its sibling (that is, the unique dyadic interval such that ${I \sqcup \widetilde{I} = \widehat{I}}$). In particular, $\widetilde{I_{+}} = I_{-}$ and viceversa.

We let ${\mathcal{D}_N}$ denote the collection of all dyadic intervals in ${[0,1]}$ of length ${\geq 2^{-N}}$.
Given an interval ${I \in \mathcal{D}}$ we let ${\mathcal{D}(I)}$ be the dyadic intervals contained in ${I}$ and ${\mathcal{D}_N(I)}$ be the dyadic intervals contained in ${I}$ and such that their length is at least ${2^{-N}|I|}$.

We will also use ${h_I}$ to denote the Haar function associated to the interval ${I}$, that is

$\displaystyle h_I := \frac{1}{|I|^{1/2}}(\mathbf{1}_{I_{+}} - \mathbf{1}_{I_{-}}).$

Lemma 2 follows rather directly from the case in which ${f}$ is the characteristic function of a measurable set (which we can further assume to be $\sigma(\mathcal{D}_N)$-measurable, since we can then take a limit in $N$):

Theorem:
There exists a constant ${C>0}$ such that the following is true of every ${N>0}$. Let such an ${N}$ be fixed and let ${E \in \sigma(\mathcal{D}_N)}$ (a set in the sigma-algebra generated by ${\mathcal{D}_N}$). Then we can find a collection of functions ${(f_I)_{I \in \mathcal{D}_N}}$ such that

1. ${f_I}$ is supported in ${I}$;
2. for every ${I \in \mathcal{D}_N}$

$\displaystyle |\langle \mathbf{1}_E, h_I \rangle| \leq \frac{1}{|I|^{1/2}} \int |f_I| ;$

3. we have the square-function estimate

$\displaystyle {\Big\|\Big(\sum_{I \in \mathcal{D}_N} |f_I|^2 \Big)^{1/2} \Big\|_{L^1([0,1])} \leq C |E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}.$

Indeed, by the atomic decomposition of $L(\log L)^{1/2}$ (as see in the interlude post) we can write $f = \sum_{i} \alpha_i a_{E_i}$ where each atom $a_{E_i}$ is the characteristic function of the set $E_i$ normalised so that $\|a_{E_i}\|_{L(\log L)^{1/2}} = 1$ and $\sum_{i} |\alpha_i| \lesssim \|f\|_{L(\log L)^{1/2}}$. Applying the Theorem to each atom we obtain functions $(A^i_I)_{I \in \mathcal{D}}$ with $|\langle a_{E_i}, h_I \rangle| \lesssim |I|^{-1/2} \int A^i_I$ and $\big\|\big(\sum_{I} |A^i_I|^2 \big)^{1/2} \big\|_{L^1} \lesssim 1$, and if we let $f_I := \sum_{i} |\alpha_i| A^i_I$ we will have that 1. of Lemma 2 is satisfied trivially because

\displaystyle \begin{aligned} |\langle f, h_I \rangle| \leq & \sum_{i}|\alpha_i||\langle a_{E_i},h_I\rangle | \\ \lesssim & \sum_{i}|\alpha_i| |I|^{-1/2} \int A^i_I = |I|^{-1/2} \int f_I \end{aligned}

and similarly 2. of Lemma 2 is satisfied because by Minkowski’s inequality we have

\displaystyle \begin{aligned} \Big\| \Big(\sum_{I \in \mathcal{D}} |f_I|^2 \Big)^{1/2} \Big\|_{L^1} = & \Big\| \Big(\sum_{I \in \mathcal{D}} \Big[\sum_{i} |\alpha_i| A^i_I \Big]^2 \Big)^{1/2} \Big\|_{L^1} \\ \leq & \Big\| \sum_{i} |\alpha_i| \Big(\sum_{I \in \mathcal{D}} |A^i_I|^2 \Big)^{1/2} \Big\|_{L^1} \\ = & \sum_{i} |\alpha_i| \Big\|\Big(\sum_{I \in \mathcal{D}} |A^i_I|^2 \Big)^{1/2} \Big\|_{L^1} \\ \lesssim & \sum_{i} |\alpha_i| \lesssim \|f\|_{L (\log L)^{1/2}}. \end{aligned}

Remark 1 Notice that the statement of the Theorem rescales very naturally, and we can replace the interval ${[0,1]}$ by any dyadic interval ${I_0}$ and replace ${\mathcal{D}_N}$ with ${\mathcal{D}_N(I_0)}$. We need to be a little careful in adapting the Orlicz norm: the correct definition is

$\displaystyle \|g\|_{L(\log L)^{1/2}(I)} := \inf \Big\{ \mu |I| \text{ s.t. } \frac{1}{|I|} \int_{I} \frac{|g(x)|}{\mu} \Big[\log^{+}\Big(\frac{|g(x)|}{\mu}\Big)\Big]^{1/2} \,dx \leq 1 \Big\},$

in which case the bound on the ${L^1(I_0)}$ norm of the square function becomes ${C|E| \Big(\log^{+} \frac{|I_0|}{|E|} \Big)^{1/2}}$.
This will be important in the proof of the Theorem below because we will use induction and will need to assume the Theorem holds on smaller intervals.

Remark 2 Another important observation is that the theorem is easily obtained for any ${E}$ of sufficiently large mass: if ${|E| > \epsilon}$, then we can choose the functions ${f_I}$ to be ${f_I := |I|^{-1/2} |\langle \mathbf{1}_E, h_I \rangle| \mathbf{1}_I}$; properties i), ii) are trivially verified, and we have by Hölder’s inequality and the orthogonality of the Haar functions

\displaystyle \begin{aligned} \Big\|\Big(\sum_{I \in \mathcal{D}_N} |f_I|^2 \Big)^{1/2} \Big\|_{L^1([0,1])} \leq & \Big\|\Big(\sum_{I \in \mathcal{D}_N} |f_I|^2 \Big)^{1/2} \Big\|_{L^2([0,1])} \\ = & \Big(\sum_{I \in \mathcal{D}_N} \|f_I\|_{L^2(I)}^2 \Big)^{1/2} \\ = & \Big(\sum_{I \in \mathcal{D}_N} |\langle \mathbf{1}_E, h_I \rangle|^2 \Big)^{1/2} \\ = & \| \mathbf{1}_E\|_{L^2} = |E|^{1/2} = |E|^{-1/2} |E| \\ \leq & \epsilon^{-1/2} |E| \leq \epsilon^{-1/2} |E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}. \end{aligned}

Thus it will suffice to assume that ${|E|}$ is small (${|E| \leq \epsilon}$) throughout the argument.

We want to prove the theorem by inducting on ${N}$, or more precisely, by inducting in the depth of the dyadic system used. When ${N}$ is small we have necessarily ${|E| \geq \epsilon}$, so that Remark 2 applies and proves the base case. Suppose then that ${N}$ is not small and imagine that we have some partition of ${[0,1]}$ given by a collection ${\mathbf{J}}$ of dyadic intervals (and ${\mathbf{J} \neq \{[0,1]\}}$). Since for ${J \in \mathbf{J}}$ the dyadic grids ${\mathcal{D}(J) \cap \mathcal{D}_N}$ have depth strictly less than ${N}$, we can use the inductive hypothesis on each of them: this would give us collections ${(f_I)_{I \subseteq J}}$ for each ${J \in \mathbf{J}}$, with the properties stated in the theorem above; in particular (cf. Remark 1 above)

$\displaystyle \Big\|\Big(\sum_{I \in \mathcal{D}_N : I \subseteq J} |f_I|^2 \Big)^{1/2} \Big\|_{L^1 (J)} \leq C |E \cap J| \Big(\log^{+} \frac{|J|}{|E\cap J|} \Big)^{1/2}.$

A necessary step in the inductive proof would then be, at some point, to estimate the contribution to the square function coming from all these intervals in ${\mathbf{J}}$. Since the ${f_I}$ are supported in ${I \subseteq J}$ and the ${J}$ are disjoint, we would have

\displaystyle \begin{aligned} \Big\|\Big(\sum_{J \in \mathbf{J}}\sum_{I \subseteq J} |f_I|^2 \Big)^{1/2} \Big\|_{L^1([0,1])} = & \sum_{J \in \mathbf{J}} \Big\|\Big(\sum_{I \subseteq J} |f_I|^2 \Big)^{1/2} \Big\|_{L^1(J)} \nonumber \\ \leq & C \sum_{J \in \mathbf{J}} |E \cap J| \Big(\log^{+} \frac{|J|}{|E\cap J|} \Big)^{1/2}; \ \ \ \ \ \ \ (1) \end{aligned}

thus if we are to proceed by induction we will need a way to make sure that these contributions sum up to at most ${C|E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}$ minus something (minus something because there will be other contributions to the square function, and we will need the total contributions to be less than ${C|E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}$, in order to be able to close the induction). The appearance of the quantity ${\frac{|J|}{|E\cap J|}}$ suggests that it should play a special rôle in our analysis, and as a consequence we define the density of ${J}$ to be

$\displaystyle \delta_J := \frac{|E \cap J|}{|J|}.$

Notice in particular that

$\displaystyle |E \cap J| \Big(\log^{+} \frac{|J|}{|E\cap J|} \Big)^{1/2} = |J| \delta_J \Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2}.$

Moreover, we have the following convexity relation:

$\displaystyle \delta_{\widehat{J}} = \frac{1}{2} (\delta_{J} + \delta_{\widetilde{J}}); \ \ \ \ \ (2)$

indeed,

$\displaystyle \frac{|E \cap \widehat{J}|}{|\widehat{J}|} = \frac{|(E\cap J) \cup (E \cap \widetilde{J})|}{2|J|} = \frac{1}{2} \Big(\frac{|E \cap J|}{|J|} + \frac{|E \cap \widetilde{J}|}{|\widetilde{J}|}\Big).$

Imagine now that the intervals ${J}$ happen to all have large density, in particular

$\displaystyle \delta_J > B$

for all ${J \in \mathbf{J}}$. Then we would have a simple bound for the sum above, namely

$\displaystyle \sum_{J \in \mathbf{J}} |E \cap J| \Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} \leq \sum_{J \in \mathbf{J}} |E \cap J| \Big(\log^{+} \frac{1}{B} \Big)^{1/2} = |E| \Big(\log^{+} \frac{1}{B} \Big)^{1/2};$

if we choose ${B = \beta |E|}$ for some constant ${\beta >0}$, we will have that this contribution is indeed more-or-less proportional to ${|E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}$, which goes in the right direction for us. This observation suggests that we will have to partition the dyadic intervals according to their densities to complete the inductive step. It is not clear a priori into how many buckets we will have to partition, but it turns out that few buckets will suffice: in particular, we will define the intervals of intermediate density to be the collection ${\mathbf{I} \subset \mathcal{D}_N}$ of intervals ${I}$ such that

$\displaystyle \alpha|E| < \delta_I < \beta|E|,$

for two constants ${\alpha,\beta>0}$ that we will choose later. Notice that while we have given a heuristic reason for choosing the upperbound ${\beta|E|}$ on the right, we have not given any such reason to choose a lowerbound matching its form (that would be ${\alpha|E|}$). Indeed, we remain at this stage open to the possibility that we will have to choose ${\alpha}$ to depend on ${|E|}$ later (however this will not be the case in the end). Notice that by the convexity relation (2), if ${I,\widetilde{I}}$ are of intermediate density, then ${\widehat{I}}$ is also of intermediate density (though we will not use this fact).
Once the collection ${\mathbf{I}}$ of intervals of intermediate density is chosen, there is an obvious “complement” to it: the collection ${\mathbf{J}}$ of intervals that are not in ${\mathbf{I}}$ and are maximal with respect to ${\subseteq}$. The maximality (and the combinatorics of the dyadic grid) immediately implies that the intervals in ${\mathbf{J}}$ are all pairwise disjoint; but we claim that more is true, and that they actually partition ${[0,1]}$. Indeed, let ${x \in [0,1]}$ and let ${K}$ be the dyadic interval of length ${2^{-N}}$ that contains ${x}$; since ${E}$ is ${\sigma(\mathcal{D}_N)}$-measurable, we have either ${E \cap K = K}$ or ${E \cap K = \emptyset}$, and therefore ${\delta_K \in \{0,1\}}$. If ${\beta|E| < 1}$ (as it will be; see Remark 2 for why we have the liberty of assuming so), then ${K}$ cannot be in the collection ${\mathbf{I}}$, and therefore will be contained in an interval of ${\mathbf{J}}$; thus every point of ${[0,1]}$ is covered by ${\mathbf{J}}$, as claimed.
Another interesting property of ${\mathbf{J}}$ is the following: if ${J \in \mathbf{J}}$ then ${\delta_J \leq 2\beta |E|}$. Indeed, this follows from the convexity relation (2), since ${\delta_J = 2\delta_{\widehat{J}} - \delta_{\widetilde{J}} \leq 2\delta_{\widehat{J}} < 2 \beta |E|}$, because ${\widehat{J}}$ is of intermediate density, by the maximality1 of ${J}$. So, overall there will be two types of intervals ${J}$: there will be the intervals in

$\displaystyle \mathbf{J}_{\mathrm{dense}} := \{ J \in \mathbf{J} \; : \; \beta|E| \leq \delta_J \leq 2\beta|E| \}$

and those in

$\displaystyle \mathbf{J}_{\mathrm{thin}} := \{ J \in \mathbf{J} \; : \; \delta_J \leq \alpha |E| \}.$

We stress that the intervals in ${\mathbf{J}_{\mathrm{dense}}}$ are the denser of the two, but not dense in absolute terms: indeed, ${2\beta|E|}$ will in general be much smaller than ${1}$.

In order to use the inductive hypothesis, we have to exclude the case ${\mathbf{J} = \{[0,1]\}}$ first. This is simple: if we choose ${\beta > 1 > \alpha}$, then since ${\delta_{[0,1]} = |E|}$ we will have that ${[0,1] \in \mathbf{I}}$ automatically, and thus ${\mathbf{J} \neq \{[0,1]\}}$.
We then construct ${\mathbf{I}, \mathbf{J}}$ as before, and use the inductive hypothesis on each interval ${J \in \mathbf{J}}$, giving us collections ${(f_I)_{I \subseteq J}}$. We will let ${F_J}$ denote the square function

$\displaystyle F_J := \Big(\sum_{I \subseteq J} |f_I|^2 \Big)^{1/2}$

for later convenience.
It remains to assign functions to the remaining intervals that are missing so far, that is the dyadic intervals that are not contained in any interval of ${\mathbf{J}}$: observe that these intervals are necessarily unions of two or more intervals of ${\mathbf{J}}$ (by the combinatorics of the dyadic grid); moreover, these intervals belong to ${\mathbf{I}}$, by maximality of the ${J}$‘s. We denote this collection by ${\mathbf{I}'}$.

It is absolutely not obvious how to choose these remaining functions – and bear in mind that this is the only step of the construction where we are actually assigning functions to intervals, since the rest is handled recursively. To gain some understanding, we try with a naive choice first and see what it gets us. This naive choice would be the one made in Remark 2, that is, let us set2 ${\widetilde{f}_I := |I|^{-1/2}|\langle \mathbf{1}_E, h_I \rangle| \mathbf{1}_I}$ for all ${I \in \mathbf{I}'}$. Since this choice, as already remarked, already satisfies properties 1., 2. as per the statement of the Theorem, we only have to estimate ${\big\|\big(\sum_{I \in \mathbf{I}'} |\widetilde{f}_I|^2\big)^{1/2} \big\|_{L^1}}$.
Here we make a fundamental observation: since the intervals ${I \in \mathbf{I}'}$ strictly contain intervals ${J \in \mathbf{J}}$, in evaluating the Haar coefficients ${\langle \mathbf{1}_E, h_I \rangle}$ for these “large” intervals ${I \in \mathbf{I}'}$ we can forget about the finer distribution of ${E}$ inside the ${J}$‘s. In particular, if we let

$\displaystyle g := \sum_{J \in \mathbf{J}} \delta_J \mathbf{1}_J$

then we have for any ${I \in \mathbf{I}'}$ that

$\displaystyle \boxed{\langle \mathbf{1}_E, h_I \rangle = \langle g, h_I \rangle. \ \ \ \ \ (3) }$

To see this, simply write ${\mathbf{1}_E = \sum_{J \in \mathbf{J}} \mathbf{1}_{E\cap J}}$ and observe that either ${J \subseteq I_{+}}$ or ${J\subseteq I_{-}}$; in each case, ${\langle \mathbf{1}_{E\cap J}, h_I \rangle = (|E \cap J| / |J|) \langle \mathbf{1}_J, h_I \rangle}$ is identically verified because ${h_I}$ is constant on intervals ${I_{+}, I_{-}}$. Identity (3) is important for the following reason: ${g}$ has much smaller ${L^2}$ norm than ${\mathbf{1}_E}$ does! Indeed, we have

$\displaystyle \|g\|_{L^1} = \sum_{J \in \mathbf{J}} \delta_J |J| = |E|$

and at the same time

$\displaystyle \|g\|_{L^\infty} = \sup_{J \in \mathbf{J}} \delta_J \leq 2\beta |E|;$

logarithmic convexity of the Lebesgue norms gives therefore that

$\displaystyle \|g\|_{L^2} \leq (2\beta)^{1/2} |E|.$

This is to be compared to the fact that ${\|\mathbf{1}_E\|_{L^2} = |E|^{1/2}}$, which is in general much larger than ${O(|E|)}$ because ${|E|}$ is very small – our density-based decomposition has given us a certain notable gain, because now we can estimate

$\displaystyle \Big(\sum_{I \in \mathbf{I}'} |\langle \mathbf{1}_E, h_I \rangle|^2 \Big)^{1/2} \leq (2\beta)^{1/2} |E|, \ \ \ \ \ (4)$

a marked improvement over the trivial estimate with ${\|\mathbf{1}_E\|_{L^2}}$ at the right-hand side.

Armed with this favourable estimate, we proceed to bound ${\big\|\big(\sum_{I \in \mathbf{I}'} |\widetilde{f}_I|^2\big)^{1/2} \big\|_{L^1}}$. We have, using the facts collected so far,

\displaystyle \begin{aligned} \Big\|\Big(\sum_{I \in \mathbf{I}'} |\widetilde{f}_I|^2\Big)^{1/2} \Big\|_{L^1} = & \Big\|\Big(\sum_{I \in \mathbf{I}'} \frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \mathbf{1}_I\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\|\Big(\sum_{I \in \mathbf{I}'} \frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \sum_{J \in \mathbf{J} : J \subset I}\mathbf{1}_J\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\|\Big(\sum_{J \in \mathbf{J}} \mathbf{1}_J \sum_{I \in \mathbf{I}' : I \supset J}\frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \Big)^{1/2} \Big\|_{L^1} \\ = & \Big\|\sum_{J \in \mathbf{J}} \mathbf{1}_J \Big[\sum_{I \in \mathbf{I}' : I \supset J}\frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \Big]^{1/2} \Big\|_{L^1} \\ = & \sum_{J \in \mathbf{J}} |J| \Big[\sum_{I \in \mathbf{I}' : I \supset J}\frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \Big]^{1/2}; \end{aligned}

then we use Cauchy-Schwarz on the sum over ${\mathbf{J}}$, and since ${\sum_{J \in \mathbf{J}} |J| = 1}$ we obtain that the above is bounded by

\displaystyle \begin{aligned} & \Big( \sum_{J \in \mathbf{J}} |J| \sum_{I \in \mathbf{I}' : I \supset J}\frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \Big)^{1/2} \\ & = \Big( \sum_{I \in \mathbf{I}'} \frac{|\langle \mathbf{1}_E, h_I \rangle|^2}{|I|} \sum_{J \in \mathbf{J} : J \subset I} |J| \Big)^{1/2} \\ & = \Big( \sum_{I \in \mathbf{I}'} |\langle \mathbf{1}_E, h_I \rangle|^2 \Big)^{1/2} \leq (2\beta)^{1/2}|E|. \end{aligned}

On the face of it, this looks like a really good bound, since ${|E|}$ is of smaller order than the main term ${|E| (\log^{+} 1/|E|)^{1/2}}$. However, closing the induction is a very delicate business, and we need to be careful to prove that the final bound on ${\big\|\big(\sum_{I \in \mathcal{D}_N} |f_I|^2\big)^{1/2} \big\|_{L^1}}$ is exactly ${C|E|(\log^{+} 1/|E|)^{1/2}}$. We will see that the bound above, while good, is not good enough – the reason being that all other error terms have order strictly smaller than ${|E|}$, and therefore we cannot make their total contribution be negative. In the end, we will have to change the way we choose the functions for intervals in ${\mathbf{I}'}$.

Before doing that, we estimate the errors arising from other terms. Recall that we have to estimate ${\big\|\big(\sum_{J \in \mathbf{J}}\sum_{I \subseteq J} |f_I|^2\big)^{1/2} \big\|_{L^1}}$, and that by inductive hypothesis this is bounded by

$\displaystyle C \sum_{J \in \mathbf{J}} |E \cap J| \Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2}$

(this is (1)); we split the sum into $\sum_{J \in \mathbf{J}_{\mathrm{dense}}} + \sum_{J \in \mathbf{J}_{\mathrm{thin}}}$. For ${\mathbf{J}_{\mathrm{dense}}}$ we have

\displaystyle \begin{aligned} \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J| & \Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} \nonumber \\ = & \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2} \\ & + \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J| \Big[\Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} - \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}\Big] \\ =: & \; \mathscr{M}_1 + \mathscr{E}_1, \end{aligned}

where the $\mathscr{M}$ stands for Main (term) and the $\mathscr{E}$ stands for Error (term). Observe that the second term in the RHS (the error $\mathscr{E}_1$) is negative, because ${1/\delta_J < 1/(\beta|E|) < 1/|E|}$ since ${J \in \mathbf{J}_{\mathrm{dense}}}$. We keep the first term in the RHS (the main term) as it is, and for the error term ($\mathscr{E}_1$) we write, using the ${(a-b)(a+b) = a^2 - b^2}$ identity,

$\displaystyle \mathscr{E}_1 = - \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J| \frac{\log^{+} \frac{1}{|E|} - \log^{+} \frac{1}{\delta_J}}{\Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} + \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}},$

where each term in the sum is now positive. We want to estimate each term of the sum from below, because of the minus sign, to get a negative upperbound for the error. Observe then that since ${1 / \delta_J < 1/(\beta|E|) < 1/|E|}$ we have

$\displaystyle \frac{\log^{+} \frac{1}{|E|} - \log^{+} \frac{1}{\delta_J}}{\Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} + \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}} \geq \frac{\log\frac{2 + 1/|E|}{2 + 1/\delta_J}}{2\Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}};$

since ${1/\delta < 1/(\beta |E|)}$ and we can assume ${2\beta|E| < 1}$, we also have after some algebra

$\displaystyle \frac{2 + 1/|E|}{2 + 1/\delta_J} \geq \frac{2 + 1/|E|}{2 + 1/(\beta |E|)} \geq \frac{\beta + 1}{2},$

and this is strictly larger than 1 because ${\beta > 1}$.
Overall we have shown that

$\displaystyle \mathscr{E}_1 \leq - C_\beta \Big(\log^{+} \frac{1}{|E|} \Big)^{- 1/2} \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J|,$

with ${C_\beta = (1/2)\log((\beta + 1)/2) > 0}$, and we should observe that we have a lowerbound for ${ \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J|}$ as well: indeed,

\displaystyle \begin{aligned} \sum_{J \in \mathbf{J}_{\mathrm{dense}}} |E \cap J| =& |E| - \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \\ =& |E| - \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |J|\delta_J \\ \geq & |E| - \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |J| \alpha |E| \\ \geq & (1 - \alpha)|E|. \end{aligned}

Hence we have shown the error estimate

$\displaystyle \mathscr{E}_1 \leq - C_\beta (1 - \alpha) \frac{|E|}{\Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}. \ \ \ \ \ (5)$

Next we estimate the contribution of ${\mathbf{J}_{\mathrm{thin}}}$ analogously. We have

\displaystyle \begin{aligned} \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| & \Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} \\ = & \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}\\ & + \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \Big[\Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} - \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}\Big] \\ =: & \; \mathscr{M}_2 + \mathscr{E}_2 ; \end{aligned}

now things are a bit different because the second term in the RHS is now positive, since ${1 / \delta_J > 1/(\alpha|E|) > 1 / |E|}$. We will therefore look for upperbounds for each term in the error. As before, we have

\displaystyle \begin{aligned} \mathscr{E}_2 = & \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \frac{ \log^{+} \frac{1}{\delta_J} - \log^{+} \frac{1}{|E|}}{\Big(\log^{+} \frac{1}{\delta_J} \Big)^{1/2} + \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}} \\ \leq & \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \frac{ \log \frac{2 + 1/\delta_J}{2 + 1/|E|}}{\Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}, \end{aligned}

and since ${1 / \delta_J > 1/(\alpha|E|) > 1 / |E| > 2}$, we have

$\displaystyle \frac{2 + 1/\delta_J}{2 + 1/|E|} \leq \frac{|E|}{\delta_{J}}.$

Splitting the collection ${\mathbf{J}_{\mathrm{thin}}}$ dyadically in ${\delta_J / |E|}$ we are then able to bound

\displaystyle \begin{aligned} \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |E \cap J| \log \frac{|E|}{\delta_{J}} = & \sum_{J \in \mathbf{J}_{\mathrm{thin}}} |J| \delta_J \log \frac{|E|}{\delta_{J}} \\ = & \sum_{k : 2^{-k}\leq \alpha} \sum_{\substack{J \in \mathbf{J}_{\mathrm{thin}} : \\ \delta_J \sim 2^{-k}|E|}} |J| \delta_J \log \frac{|E|}{\delta_{J}} \\ \lesssim & \sum_{k : 2^{-k}\leq \alpha} \sum_{\substack{J \in \mathbf{J}_{\mathrm{thin}} : \\ \delta_J \sim 2^{-k}|E|}} |J| 2^{-k}|E| k \\ = & |E| \sum_{k : 2^{-k}\leq \alpha} 2^{-k}k \sum_{\substack{J \in \mathbf{J}_{\mathrm{thin}} : \\ \delta_J \sim 2^{-k}|E|}} |J| \\ \leq & |E| \sum_{k : 2^{-k}\leq \alpha} 2^{-k}k \\ \lesssim & |E| \alpha^{1/2}. \end{aligned}

Overall we have therefore shown that for the error arising from ${\mathbf{J}_{\mathrm{thin}}}$ we have the error estimate

$\displaystyle \mathscr{E}_2 \leq C_0 \alpha^{1/2} \frac{|E|}{\Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}, \ \ \ \ \ (6)$

where ${C_0}$ is a numerical constant. Notice that if ${\alpha}$ is sufficiently small and ${\beta}$ sufficiently large we can have ${C_0 \alpha^{1/2} \ll C_{\beta}(1-\alpha)}$.

Summarising this discussion, we have for the main terms

$\displaystyle C [\mathscr{M}_1 + \mathscr{M}_2] = C |E| \Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}, \ \ \ \ \ \ (7)$

which is precisely the term we can afford to close the induction, and for the error terms we have by (5) and (6)

$\displaystyle C[\mathscr{E}_1 + \mathscr{E}_2] \leq \Big[ -C C_\beta (1-\alpha) + C C_0 \alpha^{1/2} \Big] \frac{|E|}{\Big(\log^{+} \frac{1}{|E|} \Big)^{1/2}}, \ \ \ \ \ (8)$

an overall negative quantity (provided ${\beta}$ is large and ${\alpha}$ is small). As you can see, the magnitude of the error terms is ${O(|E| (\log^{+}1/|E|)^{-1/2})}$, which is sensibly smaller than the ${O(|E|)}$ we got for the contribution of the tentative ${\widetilde{f}_I}$‘s for ${I \in \mathbf{I}'}$. In order to close the induction, we will need to chose functions ${f_I}$ for ${I \in \mathbf{I}'}$ so that their contribution to the square function’s ${L^1}$ mass is of magnitude ${O(|E| (\log^{+}1/|E|)^{-1/2})}$ as well.

Now that we know what we are aiming for, let us go back to the choice of functions ${f_I}$ for ${I \in \mathbf{I}'}$ and make a more sophisticated one than ${\widetilde{f}_I}$. We will proceed driven by simplicity considerations.
In order for properties 1.,2. of the Theorem to be satisfied, we can always choose ${f_I}$ of the form

$\displaystyle |I|^{1/2} |\langle \mathbf{1}_E, h_I \rangle|\sum_{J \in \mathbf{J} : J \subset I} c_{I,J} \, g_{I,J},$

where the ${c_{I,J}}$ are positive coefficients and the ${g_{I,J}}$ are positive functions supported in ${J}$, and where we impose that ${\|g_{I,J}\|_{L^1} = 1}$ and that ${\sum_{J \subset I} c_{I,J}=1}$. However, in order to reduce the complexity of the task, we can limit our choices to ${g_{I,J} = g_J}$, that is, functions that are independent of the specific interval ${I \in \mathbf{I}'}$ and depend on the interval ${J \in \mathbf{J}}$ only. Thus we look for functions of the form

$\displaystyle f_I = |I|^{1/2} |\langle \mathbf{1}_E, h_I \rangle|\sum_{J \in \mathbf{J} : J \subset I} c_{I,J} \, g_{J},$

where the functions ${g_J}$ are to be determined (subject to the assumptions above, that ${g_J}$ is supported in ${J}$ and that ${\|g_J\|_{L^1} = 1}$). Regarding the weights ${c_{I,J}}$, there are two obvious choices here: we could choose ${c_{I,J} = |J|/|I|}$ or ${c_{I,J} = |E\cap J|/|E \cap I|}$; in both cases, we would have automatically ${\sum_{J \subset I} c_{I,J} = 1}$.
Let us write for shortness ${a_I := |\langle \mathbf{1}_E, h_I \rangle|}$ and let

$\displaystyle \mathbf{I}_0 := \{I \in \mathcal{D}_N \text{ s.t. } I \subseteq J \text{ for some } J \in \mathbf{J}\}$

(notice that ${\mathbf{I}_{0} = \mathcal{D}_N \backslash \mathbf{I}'}$), and proceed to estimate the square function. The choices for ${c_{I,J}, g_J}$ will arise along the way. We have, using the support properties (recall also the definition of ${F_J}$)

\displaystyle \begin{aligned} \Big\| & \Big(\sum_{I \in \mathcal{D}_N} |f_I|^2\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \Big(\sum_{I \in \mathbf{I}_0} |f_I|^2 + \sum_{I \in \mathbf{I}'} |f_I|^2\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \Big(\sum_{J \in \mathbf{J}}\sum_{I \subseteq J} |f_I|^2 + \sum_{I \in \mathbf{I}'} \Big||I|^{1/2} a_I \sum_{J \in \mathbf{J} : J \subset I} c_{I,J} \, g_{J}\Big|^2\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \Big(\sum_{J \in \mathbf{J}}F_J^2 + \sum_{I \in \mathbf{I}'} |I| a_I^2 \sum_{J \in \mathbf{J} : J \subset I} c_{I,J}^2 g_{J}^2 \Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \Big(\sum_{J \in \mathbf{J}} \Big(F_J^2 + g_{J}^2 \Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big] \Big)\Big)^{1/2} \Big\|_{L^1}. \end{aligned}

Observe that the expression in square brackets is simply a numerical coefficient. It would therefore be very convenient at this point to simply choose ${g_J := F_J / \|F_J\|_{L^1}}$ in order to collect an ${F_J}$ factor from the sum, so we make this choice and proceed (always aided by the disjointness of the intervals $J$):

\displaystyle \begin{aligned} \Big\| \Big( & \sum_{J \in \mathbf{J}} \Big(F_J^2 + F_J^2 \|F_J\|_{L^1}^{-2}\Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big] \Big)\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \Big( \sum_{J \in \mathbf{J}} F_J^2 \Big(1 + \|F_J\|_{L^1}^{-2}\Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big] \Big)\Big)^{1/2} \Big\|_{L^1} \\ = & \Big\| \sum_{J \in \mathbf{J}} F_J \Big(1 + \|F_J\|_{L^1}^{-2}\Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big]\Big)^{1/2} \Big\|_{L^1}; \end{aligned}

we see therefore that the above is equal to

\displaystyle \begin{aligned} \sum_{J \in \mathbf{J}} & \|F_J\|_{L^1(J)} \Big(1 + \|F_J\|_{L^1}^{-2}\Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big]\Big)^{1/2} \\ & = \, \sum_{J \in \mathbf{J}} \Big(\|F_J\|_{L^1}^2 + \Big[\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big]\Big)^{1/2}. \ \ \ \ \ \ (9) \end{aligned}

At this point, to deal with the square root, we could proceed by using the trivial inequality ${(a^2 + b^2)^{1/2} \leq a + b}$. We will do so for pedagogical reasons, but we warn the reader in advance that this will not be good enough for us. We will show how much we fail by proceeding this way, and then we will backtrack and use a better inequality.
Using said inequality, we bound the quantity (9) above by

$\displaystyle \sum_{J \in \mathbf{J}} \|F_J\|_{L^1} + \sum_{J \in \mathbf{J}} \Big(\sum_{I \in \mathbf{I}'\,:\, I \supset J} |I| a_I^2 c_{I,J}^2\Big)^{1/2}.$

The first term is bounded by inductive hypothesis by ${C \sum_{J \in \mathbf{J}} |E\cap J| (\log^{+} 1/\delta_J)^{1/2}}$, which we have already estimated: we have a main contribution that is exactly bounded by ${C|E|(\log^{+} 1/|E|)^{1/2}}$ (this is (7)) and a negative error bounded by (8), which is of size ${O(|E|(\log^{+} 1/|E|)^{-1/2})}$. For the second term above, we multiply and divide each term by ${|J|^{1/2}}$ and use Cauchy-Schwarz in ${\mathbf{J}}$ (and the fact that ${\sum_{J\in \mathbf{J}} |J| = 1}$) to bound it by

\displaystyle \begin{aligned} \Big( & \sum_{J \in \mathbf{J}} \sum_{I \in \mathbf{I}'\,:\, I \supset J} \frac{|I|}{|J|} a_I^2 c_{I,J}^2\Big)^{1/2} \\ & = \Big( \sum_{I \in \mathbf{I}'} a_I^2 \sum_{J \in \mathbf{J} \,:\, J \subset I} \frac{|I|}{|J|} c_{I,J}^2\Big)^{1/2}. \end{aligned}

If we choose ${c_{I,J} = |J|/|I|}$ then we have simply

$\displaystyle \sum_{J \in \mathbf{J} \,:\, J \subset I} \frac{|I|}{|J|} c_{I,J}^2 = \sum_{J \in \mathbf{J} \,:\, J \subset I} \frac{|J|}{|I|} = 1,$

and therefore the second term is bounded thanks to (4) by

$\displaystyle \Big( \sum_{I \in \mathbf{I'}} a_I^2 \Big)^{1/2} \leq (2\beta)^{1/2} |E|.$

As discussed before, this contribution is too large to be neutralised by the error terms $\mathscr{E}_1 + \mathscr{E}_2$, so that’s no good to us. We can try using ${c_{I,J} = |E\cap J|/|E\cap I|}$ instead. Proceeding in the same fashion, we multiply and divide each term by ${|E\cap J|^{1/2}}$ and use Cauchy-Schwarz in ${\mathbf{J}}$, so that the second term is bounded by

\displaystyle \begin{aligned} \Big(& \sum_{J \in \mathbf{J}}|E \cap J| \Big)^{1/2} \Big(\sum_{J \in \mathbf{J}} \sum_{I \in \mathbf{I}'\,:\, I \supset J} \frac{|I|}{|E\cap J|} \frac{|E\cap J|^2}{|E\cap I|^2} a_I^2 \Big)^{1/2} \\ & = |E|^{1/2} \Big(\sum_{I \in \mathbf{I}'} a_I^2 \sum_{J \in \mathbf{J} \,:\, J \subset I} \frac{|I|}{|E\cap I|} \frac{|E\cap J|}{|E\cap I|} \Big)^{1/2} \\ & = |E|^{1/2} \Big(\sum_{I \in \mathbf{I}'} a_I^2 \frac{1}{\delta_I} \Big)^{1/2}; \end{aligned}

since ${I \in \mathbf{I}'}$ is of intermediate density, we can say that ${1/\delta_I \leq 1/(\alpha |E|)}$, and therefore using (4) again we bound the above by

$\displaystyle |E|^{1/2} \Big(\frac{1}{\alpha|E|} \sum_{I \in \mathbf{I}'} a_I^2 \Big)^{1/2} \leq \Big(\frac{2\beta}{\alpha}\Big)^{1/2}|E|,$

which is again of the wrong size ${O(|E|)}$.

The issue we are having can be described as follows: the estimate ${(a^2 + b^2)^{1/2} \leq a + b}$ that we are using is not efficient in our regime. A better estimate in this situation would be the only-slightly-less-trivial estimate

$\displaystyle (a^2 + b^2)^{1/2} \leq a + \frac{b^2}{2a},$

which is better than the previous one when ${b \ll 2a}$.

Remark 3
For a suggestive calculation, assuming that every term in (9) contributed the same on average, we would have for the magnitudes

\displaystyle \begin{aligned} \Big((|E|(\log^{+} 1/|E|)^{1/2})^2 + |E|^2 \Big)^{1/2} & \leq |E|(\log^{+} 1/|E|)^{1/2} + \frac{|E|^2}{|E|(\log^{+} 1/|E|)^{1/2}} \\ & = |E|(\log^{+} 1/|E|)^{1/2} + \frac{|E|}{(\log^{+} 1/|E|)^{1/2}}, \end{aligned}

that is, the expected contribution of the second term would be of the correct order! Of course this is not rigorous, but it is indication that we are on the right track.

Using the slightly-less-trivial estimate on (9) then, we get the same first term as before, ${\sum_{J \in \mathbf{J}} \|F_J\|_{L^1} \leq C \sum_{J\in\mathbf{J}}|E\cap J| (\log^{+} 1/\delta_J)^{1/2}}$, which we have already efficiently estimated, and we get for the second term

$\displaystyle \sum_{J \in \mathbf{J}} \frac{\sum_{I \in \mathbf{I}' \,:\, I \supset J} a_I^2 |I| c_{I,J}^2}{2C |E\cap J| (\log^{+}1/\delta_J)^{1/2}}.$

The presence of ${|E\cap J|}$ at the denominator in the above expression heavily suggests choosing ${c_{I,J} := |E\cap J| / |E\cap I|}$, which is what we do. The expression then becomes, after reversing the order of summation,

$\displaystyle \frac{1}{2C} \sum_{I \in \mathbf{I}'} a_I^2 \sum_{J \in \mathbf{J}\,:\, J \subset I} \frac{|I|}{|E\cap I|} \frac{|E\cap J|}{|E \cap I|} \frac{1}{(\log^{+}1/\delta_J)^{1/2}}; \ \ \ \ \ (10)$

since ${\delta_J \leq 2\beta |E|}$, we have that

$\displaystyle \frac{1}{(\log^{+}1/\delta_J)^{1/2}} \leq \frac{1}{(\log^{+}1/(2\beta |E|))^{1/2}},$

and this is easily seen to be controlled by ${(\log^{+}1/|E|)^{1/2}}$ provided ${|E|}$ is small: indeed,

\displaystyle \begin{aligned} \log\Big(2 + \frac{1}{2\beta |E|}\Big) = & \log\Big(\frac{1}{2\beta} \Big(2(\beta - 1) + 2 + \frac{1}{|E|}\Big)\Big) \\ = & - \log(2\beta) + \log\Big(2(\beta - 1) + 2 + \frac{1}{|E|}\Big) \\ \geq & - \log(2\beta) + \log\Big(2 + \frac{1}{|E|}\Big) \end{aligned}

because ${\beta > 1}$, and moreover we can ensure that

$\displaystyle \log(2\beta) \leq \frac{1}{2} \log\Big(2 + \frac{1}{|E|}\Big)$

by imposing ${|E| \leq 1/(4\beta^2 - 2)}$, which we clearly can do.
Therefore, (10) is bounded by

\displaystyle \begin{aligned} \frac{1}{2^{3/2}C} \sum_{I \in \mathbf{I}'} a_I^2 & \sum_{J \in \mathbf{J}\,:\, J \subset I} \frac{|I|}{|E\cap I|} \frac{|E\cap J|}{|E \cap I|} \frac{1}{(\log^{+}1/|E|)^{1/2}} \\ = & \frac{1}{2^{3/2}C(\log^{+}1/|E|)^{1/2}} \sum_{I \in \mathbf{I}'} a_I^2 \frac{|I|}{|E\cap I|} \sum_{J \in \mathbf{J}\,:\, J \subset I} \frac{|E\cap J|}{|E \cap I|} \\ = & \frac{1}{2^{3/2}C(\log^{+}1/|E|)^{1/2}} \sum_{I \in \mathbf{I}'} a_I^2 \frac{1}{\delta_I} \\ \leq & \frac{1}{2^{3/2}C(\log^{+}1/|E|)^{1/2}} \frac{1}{\alpha |E|} \sum_{I \in \mathbf{I}'} a_I^2, \end{aligned}

the last line by the fact that ${I \in \mathbf{I}'}$ is of intermediate density; thus, by (4) again, we have a bound of

$\displaystyle \frac{\beta}{2^{1/2} C\alpha}\frac{|E|}{(\log^{+}1/|E|)^{1/2}},$

which finally matches the magnitude of the other errors! Adding this contribution to the error terms in (8) we see that the total error is

$\displaystyle \Big[ -C C_\beta (1-\alpha) + C C_0 \alpha^{1/2} + \frac{\beta}{2^{1/2} C \alpha} \Big] \frac{|E|}{\Big(\log^{+} \frac{1}{|E|}\Big)^{1/2}},$

and, since the main term is exactly what we wanted, to close the induction we only need to make sure that the constant in square brackets is negative. It suffices to make sure that

$\displaystyle -C_\beta (1-\alpha) + C_0 \alpha^{1/2} + \frac{\beta}{2^{3/2} C^2 \alpha} \leq 0;$

choosing ${\alpha}$ to be small, we choose ${\beta}$ large (recall that ${C_\beta = (1/2) \log((\beta+1)/2))}$) and ${C}$ very large, and we see that the inequality can certainly be satisfied. Therefore the induction closes and we are done for every set ${E}$ with ${|E|}$ sufficiently small (depending on ${\beta}$, which in turn depends on ${\alpha}$); and for the rest, we just appeal to Remark 2.

The proof of the Tao-Wright lemma is thus fully concluded!

Footnotes:
1: One should realise at this point that the construction of ${\mathbf{J}}$ is essentially a stopping-time argument. [go back]
2: Observe that this choice would cause ${f_I = |I|^{-1/2}|\langle \mathbf{1}_E, h_I \rangle| \mathbf{1}_I}$ for all ${I \in \mathcal{D}_N}$, by the recursive nature of the construction, and it was already observed in the paper that this would not work. [go back]