Some thoughts on the smoothing effect of convolution on measures

A question by Ben Krause, whom I met here at the Hausdorff Institute, made me think back of one of the earliest posts of this blog. The question is essentially how to make sense of the fact that the (perhaps iterated) convolution of a (singular) measure with itself is in general smoother than the measure you started with, in a variety of settings. It’s interesting to me because in this phase of my PhD experience I’m constantly trying to build up a good intuition and learn how to use heuristics effectively.

So, let’s take a measure ${\mu}$ on ${\mathbb{R}^d}$ with compact support (assume inside the unit ball wlog). We ask what we can say about ${\mu \ast \mu}$, or higher iterates ${\mu \ast \mu \ast \mu \ast \ldots}$ and more often1 about ${\mu \ast \tilde{\mu}}$. In particular, we’re interested in the case where ${\mu}$ is singular, i.e. its support has zero Lebesgue measure.

Before starting though, I would like to give a little motivation as to why such convolutions are interesting. Consider the model case where you have an operator defined by ${Tf = \sum_{j\in\mathbb{Z}}{T_j f}:= \sum_{j \in \mathbb{Z}}f\ast \mu_j}$ where ${\mu_j}$ are some singular measures and ${f \in L^2}$. One asks whether the operator ${T}$ is bounded on ${L^2}$, and the natural tool to use is Cotlar-Stein lemma, or almost-orthogonality (from which this blog takes its name). Then we need to verify that

$\displaystyle \sup_{j}{\sum_{k}{\|T_j T_k^\ast\|^{1/2}}} < \infty,$

and same for ${T^\ast_j T_k}$. But what is ${T_j T^\ast_k f}$? It’s simply

$\displaystyle T_j T^\ast_k f = f \ast d\tilde{\mu}_k \ast d\mu_j = f \ast (d\tilde{\mu}_k \ast d\mu_j),$

i.e. another convolution operator. Estimates on the convolution2 ${d\tilde{\mu}_k \ast d\mu_j}$ are likely to help estimate the norm of ${T_j T_k^\ast}$ then. But if this measure is not smooth enough, one can go forward, and since

$\displaystyle \|T T^\ast \| \leq \|T\|^{1/2} \|T^\ast T T^\ast\|^{1/2}$

one sees that estimates on ${d\tilde{\mu}_k \ast d\mu_j \ast d\tilde{\mu}_k }$ are likely to help, and so on, until a sufficient number of iterations gives a sufficiently smooth measure. This isn’t quite the iteration of a measure with itself, but in many cases one has an operator ${Tf = f \ast \mu}$ which then splits into the above sum by a spatial or frequency cutoff at dyadic scales. Then it becomes a matter of rescaling and the case ${d\tilde{\mu}_k \ast d\mu_j}$ can be reduced to that of ${d\tilde{\mu}_0 \ast d\mu_j}$ and further reduced to that of ${d\tilde{\mu}_0 \ast d\mu_0}$ by exploiting an iterate of the above norm inequality, namely that

$\displaystyle \|T_j T_0^\ast\|\leq \|T_j^\ast\|^{1-2^{\ell}}\|T_j (T_0^\ast T_0)^\ell\|^{2^{-\ell}}.$

Another possibility is to write ${d\nu = d\tilde{\mu}_k \ast d\mu_j}$ and consider working with ${d\nu \ast d\nu}$ instead, to obtain results in term of ${|j-k|}$. I will say more in the end, here I just wanted to show that they arise as natural objects.

1. Convolution of measures as push-forward of product measures

A first, almost trivial observation is that if ${\widehat{\mu}}$ enjoys some decay at ${\infty}$ (e.g. because of curvature of the support), say ${|\widehat{\mu}(\xi)|\lesssim (1+|\xi|)^{-\delta}}$, then the convolution does even better since ${|\widehat{\mu \ast \mu}(\xi)| = |\widehat{\mu}(\xi) \widehat{\mu}(\xi)| \lesssim (1+|\xi|)^{-2\delta}}$.

Now before proceeding I’ll recall the technical result in the previous post, because it can be used to gain information on the convolution:

Proposition 1 Let ${\Phi\,:\, \mathbb{R}^{d+k} \rightarrow \mathbb{R}^d}$, with ${k\geq 0}$, be a smooth function whose Jacobian has rank ${d}$ a.e., and let ${\psi\,:\, \mathbb{R}^{d+k} \rightarrow \mathbb{R}}$ be a function with enough continuous derivatives (${C^1}$ should suffice) and supported in the unit ball ${B_{d+k}}$ (we’re not assuming positivity here). We have a measure ${d\mu = \psi \,dx}$ on ${B_{d+k}}$.

If the regularity assumption below is satisfied, then the measure ${d\nu:=\Phi_\ast (d\mu)}$ on ${\mathbb{R}^d}$ (i.e. given by ${\int_{\mathbb{R}^d}{f}\,d\nu = \int_{B_{d+k}}{f(\Phi(x))}\,d\mu(x)}$) is absolutely continuous w.r.t. Lebesgue measure in ${\mathbb{R}^n}$ and ${L^1}$-Hölder continuous for any exponent ${\delta < \frac{1}{2|\alpha|}}$, i.e.

$\displaystyle \int_{\mathbb{R}^n}{|\nu(x+y)-\nu(x)|}\,dx \lesssim_\delta |y|^\delta.$

The regularity assumption is that there exists a ${d\times d}$ minor ${M}$ of the Jacobian ${J\Phi}$ s.t., with ${m = \det M \neq 0}$ a.e., there exists a multi-index ${\alpha \neq 0}$ s.t. ${|\partial^\alpha m| > c_0 >0}$ on all of ${B_{d+k}}$.

I’ll comment later on the usefulness of an ${L^1}$-Hölder estimate. The exposition in that post was really bad since it was just technical and lacked any example – but I’m making up for that now, hopefully.

So, how does the proposition above apply to the case of convolution of measures? It’s really simple: turning back to our euclidean case for example, we write out what ${\mu\ast \mu}$ really is, i.e.

$\displaystyle \int_{\mathbb{R}^d}{f(x)}\,d(\mu\ast \mu)(x) = \int_{\mathbb{R}^d}{\int_{\mathbb{R}^d}{f(x+y)}\,d\mu (x)}\,d\mu(y),$

assume for the moment ${d\mu}$ is absolutely continuous so that we have ${d\mu = \psi \,dx}$, and then

$\displaystyle \int_{\mathbb{R}^d}{f(x)}\,d(\mu\ast \mu)(x) = \int_{\mathbb{R}^d}{\int_{\mathbb{R}^d}{f(x+y)\psi(x)\psi(y)}\,dx}\,dy.$

Thus in this case ${\Phi\,:\, \mathbb{R}^d \times \mathbb{R}^d \rightarrow \mathbb{R}^d}$ is ${(x,y) \mapsto x+y}$, and the measure to transport is the product measure ${\psi\, dx \otimes \psi \,dy}$, and so

$\displaystyle \boxed{\mu\ast \mu = \Phi_\ast (d\mu \otimes d\mu).}$

This is simple yet very important.

Now, we are being a little naive. The Jacobian is

$\displaystyle J\Phi (x,y) = \begin{pmatrix} \mathbb{I}_d & \mathbb{I}_d \end{pmatrix},$

so that all the derivatives of all minors are indeed null. We can’t apply our theorem in this setting, but things are good anyway since obviously ${d\nu = \psi\ast \psi (x) \,dx}$ so it is absolutely continuous, and with some additional hypotheses one can even have ${L^1}$-Lipschitz, e.g. with the (very) coarse estimate

$\displaystyle \int{|\psi \ast \psi (x+y) - \psi\ast \psi (x)|}\,dx \leq \|\psi\|_{L^1} \|\nabla \psi\|_{L^1} |y|.$

A slightly finer estimate could be the following: assume ${\psi}$ has compact Fourier support, thus ${\psi\ast\psi}$ does as well, and therefore there exists a Schwarz function ${\eta}$ s.t. ${\psi \ast \psi = \psi \ast \psi \ast \eta}$, and by Young’s inequality3

$\displaystyle \int{|\psi \ast \psi \ast \eta (x+y) - \psi\ast \psi \ast \eta (x)|}\,dx = \|(\psi \ast \psi) \ast (T_{-y} \eta - \eta)\|_{L^1}$

$\displaystyle \leq \|\psi \ast \psi\|_{L^1}\int{|\eta(x+y)-\eta(x)|}\,dx \lesssim_\varepsilon \|\psi\|_{L^1}^2 |y|^\varepsilon,$

for every ${0<\varepsilon\leq 1}$, because ${\eta}$ is smooth.

The problem with the above is that we have assumed ${\mu}$ to be absolutely continuous, while we're rather interested exactly in the opposite case of singular measures, as said above. But before moving to that case, we want to notice something: things get better if we insert some "curvature" in the setting. Convolution requires a group structure, so instead of the Euclidean ${\mathbb{R}^d}$ consider the Heisenberg group ${\mathbb{H}^1}$, with group law

$\displaystyle x\cdot y = (x_1,x_2,x_3)\cdot (y_1,y_2,y_3) = (x_1+y_1 \;, x_2+y_2 \;, x_3+y_3 \;+\frac{1}{2}(x_1 y_2 - x_2 y_1)).$

The non commutative group law changes the rules a bit. Indeed the convolution is now

$\displaystyle \int_{\mathbb{R}^d}{f(x)}\,d(\mu\ast_{\mathbb{H}^1} \mu)(x) = \int_{\mathbb{R}^d}{\int_{\mathbb{R}^d}{f(x\cdot y)}\,d\mu (x)}\,d\mu(y)$

and with ${\Phi\,:\, (x,y) \mapsto x\cdot y}$

$\displaystyle J\Phi (x,y) = \begin{pmatrix} 1 & 0 & 0 & 1 & 0 & 0 \\ 0 & 1 & 0 & 0 & 1 & 0 \\ y_2 & -y_1 & 1 & -x_2 & x_1 & 1 \end{pmatrix},$

and a non trivial minor is for example

$\displaystyle M= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 1 \\ y_2 & -y_1 & x_1 \\ \end{pmatrix},$

so ${m(x,y) = x_1+y_1}$, and ${|\partial^1_{x_1}m| = 1>0}$. Then we can indeed apply the theorem, and we thus get that ${\mu\ast_{\mathbb{H}^1} \mu= \int_{\mathbb{R}^3}{\psi(x\cdot y^{-1})\psi(y)}\,dy = \rho(x_1,x_2,x_3) \,dx_1\,dx_2\,dx_3}$, and

$\displaystyle \int_{\mathbb{R}^3}{|\rho(x+y)-\rho(x)|}\,dx \lesssim_\delta |y|^\delta$

for any ${\delta < 1/2}$. Notice that this is not expressed in the group law of ${\mathbb{H}^1}$, and is thus unlikely to be a useful estimate as it is. The point I wanted to make was that now the Jacobian has non trivial derivatives. Anyhow, I should mention the following: if the ${L^1}$-Hölder continuity holds for ${\rho\circ \exp_G}$ compactly supported in the Lie algebra ${\mathfrak{g}}$ of some unimodular Lie group ${G}$, then it holds on ${G}$ for ${\rho}$ as well, with the group law:

$\displaystyle \int_{G}{|\rho(x\cdot \exp_G(Y))-\rho(x)|}\,dx \lesssim_\delta \|Y\|^\delta,$

where ${Y \in \mathfrak{g}}$. The converse statement is also true. See [RS].

2. Examples: a line and a parabola

So, after having explored cases where the proposition doesn’t apply or where it doesn’t give useful estimates, now we can finally talk about the interesting case of ${\mu}$ a singular measure, which we assume supported on a submanifold of ${\mathbb{R}^d}$. Consider the example of the straight line in ${\mathbb{R}^2}$ given by ${(t,t)}$ for ${t\in \mathbb{R}}$, and define the measure ${d\mu}$ by ${\int{f}\,d\mu = \int{f(t,t) \phi(t)}\,dt}$, where ${\phi}$ is a bump function (Schwartz, say). One immediately sees that integration against ${d(\mu\ast \mu)}$ is given by

$\displaystyle \int{f(t,t) \phi\ast\phi (t)}\,dt,$

i.e. that the measure is again singular, because ${\mathrm{Supp}(\phi) + \mathrm{Supp}(\phi)}$ lies on the same line as ${\mathrm{Supp}(\phi)}$. We can spot the failure in the regularity conditions required by the proposition: ${\Phi}$ is the map ${\Phi(t,u) = (t+u,t+u)}$ and thus its Jacobian is constant and all its derivatives are zero.

Now we introduce curvature instead, not through the group law but on the submanifold itself: consider the measure ${\mu}$ on the parabola ${\mathbb{P}^1\subset \mathbb{R}^2}$

$\displaystyle \int{f}\,d\mu = \int{f(t,t^2)\phi(t)}\,dt,$

where ${\phi}$ is supported in ${[1,2]}$; then

$\displaystyle \int{f(t+u,t^2+u^2)\phi(t)\phi(u)}\,dt\,du.$

Now, we must consider not the Jacobian of ${(x,y) \mapsto x+y}$ on ${\mathbb{R}^2}$, but instead the Jacobian of it as a map ${\Phi\,:\, \mathbb{P}^1\times \mathbb{P}^1 \rightarrow \mathbb{R}^2}$, and indeed we use a parametrization of the parabola (which we can always do at least locally for measures supported on submanifolds) to reduce this to ${\tilde{\Phi}\,:\, \mathbb{R}^1 \times \mathbb{R}^1 \rightarrow \mathbb{R}^2}$, given by ${\tilde{\Phi}(t,u) = (t+u,t^2+u^2)}$. For the sake of simplicity we estimate instead ${\mu \ast \tilde{\mu}}$, i.e. the measure s.t.

$\displaystyle \int{f}\,d(\mu \ast \tilde{\mu}) = \int{f(x-y)}\,d\mu(x)\,d\tilde{\mu}(y),$

and therefore

$\displaystyle \tilde{\Phi}(t,u) = (t-u,t^2-u^2).$

We calculate

$\displaystyle J\tilde{\Phi}(t,u) = \det \begin{pmatrix} 1 & -1 \\ 2t & 2u \end{pmatrix}= 2(u-t),$

which is therefore a.e. invertible (the range is shaped like a propeller blade centered at the origin, and the two lobes are the regions ${t > u}$ and ${t < u}$, thus the proposition applies and ${\mu \ast \tilde{\mu} = \tilde{\Phi}_\ast (\phi(t)\phi(u)\,dt\, du)}$ is an absolutely continuous measure ${\rho\, dx \,dy}$ with ${L^1}$-Hölder continuity for any exponent ${\delta < 1/2}$.

We can also compute pointwise estimates on the density ${\rho}$ of ${\mu \ast \tilde{\mu}}$: a change of variable shows (${x=t-u, y=t^2 - u^2}$) that

$\displaystyle \int{f(t+u,t^2 - u^2) \phi(t)\phi(u)}\,dt \,du= \int{f(x,y)\frac{\phi\left(\frac{y+x^2}{2x}\right)\phi\left(\frac{y+x^2}{2x}\right)}{2|x|}}\,dx\,dy,$

so that trivially ${|\rho(x,y)|\lesssim 1/|x|}$, and since ${|y| = |t+u||t-u|}$ one has ${|y|\sim |x|}$ for ${|x|\ll 1}$, or

$\displaystyle |\rho(x,y)|\lesssim \frac{1}{|(x,y)|}.$

One can also estimate ${|\nabla \rho|}$, for example (writing ${\phi_1}$ for ${\phi((y+x^2)/2x)}$ etc)

$\displaystyle |\partial_x \rho(x,y)| = \frac{|\pm2\phi_1\phi_2 - 2|x|(\phi_2 \partial_x \phi_1 + \phi_1 \partial_x \phi_2)|}{4|x|^2} = \frac{|\pm\phi(t)\phi(u) - (u\phi'(t)-t\phi'(u))|}{|u-t|^2}$

$\displaystyle \lesssim \frac{1}{|u-t|^2}=\frac{1}{|x|^2} \sim \frac{1}{|(x,y)|^2}.$

The proposition would’ve applied to ${\mu\ast\mu}$ as well but we wouldn’t have had the opportunity to find pointwise estimates around the origin (as the domain is far from it in that case). Let me remark that in this case, unlike in the previous one of the line, the set ${\mathrm{Supp}(\mu)+(-\mathrm{Supp}(\mu))}$ contains an open set in ${\mathbb{R}^2}$, a necessary condition for ${\mu \ast \tilde{\mu}}$ to be absolutely continuous. And it is so because ${\mathrm{Supp}(\mu)}$ is curved: notice that in general if the regularity assumption in the proposition is satisfied with ${|\alpha|=1}$ then ${\partial^\alpha J\Phi}$ contains second order derivatives, and is thus somehow measuring the existence of curvature. If you need higher derivatives, it means you have a flatter surface and thus you get worse bounds.

3. ${L^1}$-Hölder estimates put to good use

It might’ve seemed a little pointless to look for these ${L^1}$-Hölder estimates so far, so this session is dedicated to an example of how they can be put to use. One just needs some cancellation, indeed.

Assume ${K}$ is an ${L^1}$ function on ${\mathbb{R}^d}$ with compact support, say in ${B(0,1)}$, and satisfying the cancellation condition

$\displaystyle \int{K(x)}\,dx=0.$

Define

$\displaystyle K_j (x) := 2^{-jd} K(2^{-j}x)$

and consider the operator defined by

$\displaystyle Tf=\sum_{j\in\mathbb{Z}}{T_j f}:=\sum_{j\in\mathbb{Z}}{f\ast K_j}.$

Here we assume one more condition: that ${K}$ satisfies the ${L^1}$-Hölder condition

$\displaystyle \int_{\mathbb{R}^d}{|K(x+y)-K(x)|}\,dx \lesssim |y|^\delta.$

Then one can prove

Proposition 2 ${T}$ is bounded on ${L^2}$.

Proof: Adapted from [GS]. We prove it by using Cotlar-Stein lemma, as mentioned in the introduction. We have to prove that ${\|T_k^\ast T_j\|\lesssim 2^{-\delta|j-k|}}$ and same for ${T_k T_j^\ast}$. Here we deal with the former case as the second can be dealt with in the same way. Moreover, assume ${j>k}$ as the inverse case can be dealt with in the same manner.

Thus, we have to prove

$\displaystyle \|f\ast K_j \ast \tilde{K}_k\|_{L^2} \lesssim 2^{\delta(k-j)}\|f\|_{L^2},$

which in turn will follow from Young’s inequality if we prove

$\displaystyle \|K_j \ast \tilde{K}_k\|_{L^1} \lesssim 2^{\delta(k-j)}.$

We notice that by the definition

$\displaystyle K_j \ast \tilde{K}_k (x) = \int{2^{-jd} K(2^{-j}x- 2^{-j}y)\tilde{K}_k(y)}\,dy = \int{K(2^{-j}x - y) \tilde{K}_k(2^j y)}\,dy$

$\displaystyle = 2^{jd}\int{K(2^{-j}x - y) \tilde{K}_{j-k}(y)}\,dy;$

moreover, because of the cancellation condition (which holds as well for the ${K_j}$‘s) we have that the last expression is equal to

$\displaystyle 2^{-jd}\int{(K(2^{-j}x - y) - K(2^{-j}x)) \tilde{K}_{k-j}(y)}\,dy.$

Therefore by a scaling and Fubini

$\displaystyle \|K_j \ast \tilde{K}_k\|_{L^1} \leq \int{\int{|K(x+y)-K(x)||\tilde{K}_{k-j} (y)|}\,dx}\,dy \lesssim \int{|y|^\delta |\tilde{K}_{k-j}(y)|}\,dy \lesssim \|K\|_{L^1} 2^{\delta(k-j)},$

where the last inequality follows from the fact that ${\tilde{K}_{k-j}}$ is supported in a ball of diameter ${\sim 2^{k-j}}$ and ${K_j}$ has the same ${L^1}$ norm as ${K}$. The proposition follows. $\Box$

A particular case of the above is when ${K}$ is homogeneous of critical degree ${d}$ and then the cancellation condition is automatically satisfied and if you define ${K_{j}(x) = K(x) (\phi(2^{-j}x)-\phi(2^{-j+1}x))}$, where ${\phi \in \mathcal{S}(\mathbb{R}^d)}$ is a radial bump function centered in ${0}$, then ${f\ast K = \sum_{j}{f\ast K_j}}$, and by homogeneity ${K_j(x)=2^{-jd}K_0 (2^{-j}x)}$. One then need only to assume ${L^1}$-Hölder continuity for ${K_0}$ (which is ${L^1}$ since ${K}$ is ${L^1_\mathrm{loc}}$) and gets that ${f\ast K}$ is ${L^2}$ bounded, without assumptions on the smoothness of ${K}$ as in classical Calderón-Zygmund theory4.

In this regard, notice that forgetting about homogeneity and assuming decay in ${\widehat{K}}$ one can prove:

Proposition 3 Let ${K}$ be ${L^1}$ and supported on ${B(0,1)}$, with cancellation condition

$\displaystyle \int{K(x)}\,dx =0$

(same as above, so far), and such that

$\displaystyle |\widehat{K}(\xi)|\lesssim |\xi|^{-\alpha}.$

Define ${K_j}$ as above and ${Tf:=\sum_{j\in\mathbb{Z}}{f\ast K_j}}$. Then ${T}$ is bounded on ${L^p}$ for all ${1.

Proof: This is just a sketch, the full proof can be found in [Duo].

The ${L^2}$ boundedness in this case follows immediately from Plancherel and the decay condition on ${\widehat{K}}$. As for the general ${L^p}$ case, one can reduce to standard Calderón-Zygmund theory by means of a dyadic frequency cutoff: take ${\psi}$ radial Schwartz function s.t. ${\mathrm{Supp}(\widehat{\psi})\subset \{1/2 < |\xi|<2\}}$ and s.t. ${\sum_{j\in\mathbb{Z}}{\widehat{\psi_j}}\equiv 1}$, where ${\psi_j(x)=2^{-jd}\psi(2^{-j}x)}$. Then ${K_j = \sum_{k}{K_j\ast \psi_{j+k}}}$, and

$\displaystyle T f = \sum_{j,k}{f \ast K_j\ast \psi_{j+k}} = \sum_{k}{\tilde{T}_k f},$

with ${\tilde{T}_k f = \sum_{j}{f \ast K_j \ast \psi_{j+k}}}$. The idea is that now ${\tilde{T}_k}$ are Calderón-Zygmund operators, whose ${L^p\rightarrow L^p}$ norms are summable in ${k}$. Indeed, with Plancherel again one proves ${\|\tilde{T}_k\|_{L^2\rightarrow L^2} \lesssim 2^{-\delta|k|}}$, and then proves ${\tilde{T}_k}$ maps ${L^1}$ to ${L^{1,\infty}}$ by proving the H\”{o}rmander condition5 for its kernel. The norm in this case blows up like ${\sim |k|}$, but then by Marcinkiewicz interpolation one gets an ${L^p \rightarrow L^p}$ norm summable in ${k}$ if ${p>1}$. $\Box$

Thus, the principle behind all of the above can be summarized in the following:

${L^1}$-Hölder estimates can act as a substitute for Fourier decay estimates.

This doesn’t just address cases where it’s hard to extract information from the Fourier transform: it allows to tackle problems in the nilpotent setting, in which you don’t have a nice Fourier transform available at all (you can define one, but it’s not nearly as nice as the Euclidean one). This is exactly what is done in [RS], and with some modifications also in [Ch].

Footnotes:
1: here ${\int{f}\,d\tilde{\mu} = \int{f(-x)}\,d\mu(x)}$. [go back]
2: for example of the kind ${\|d\tilde{\mu}_k \ast d\mu_j\|_{L^1}\lesssim 2^{-\delta|k-j|}}$. [go back]
3: ${T_{-y}}$ is translation by ${-y}$, thus ${T_y f(x) = f(x-y)}$. [go back]
4: The argument sketched here can be adapted to nilpotent groups assuming ${K}$ is homogeneous of the correct degree and by choosing suitable automorphic dilations. [go back]
5: which in turn follows just by using the hypotheses on the ${L^1}$ boundedness and the compact support of ${K}$. [go back]

References:

[Duo] J. Duoandikoatxea, Fourier Analysis, American Mathematical Society, Graduate Studies in Mathematics, vol. 29, 2001.

[Ch] M. Christ, The strong maximal function on a nilpotent group, Transactions of the American Mathematical Society, vol. 331, n. 1, 1992.

[GS] D. Geller, E. M. Stein, Estimates for singular convolution operators on the Heisenberg group, Mathematische Annalen 267, 1-15, 1984.

[RS] F. Ricci, E. M. Stein, Harmonic Analysis on Nilpotent Groups and Singular Integrals II: Singular Kernels Supported on Submanifolds, Journal of Functional Analysis 78, 56-84, 1988.