Affine Restriction estimates imply Affine Isoperimetric inequalities

One thing I absolutely love about harmonic analysis is that it really has something interesting to say about nearly every other field of Analysis. Today’s example is exactly of this kind: I will show how a Fourier Restriction estimate can say something about Affine Geometry. This was first noted by Carbery and Ziesler (see below for references).

1. Affine Isoperimetric Inequality

Recall the Affine Invariant Surface Measure that we have defined in a previous post. Given a hypersurface \Sigma \subset \mathbb{R}^d sufficiently smooth to have a well-defined Gaussian curvature \kappa_{\Sigma}(\xi) (where \xi ranges over \Sigma ) and with surface measure denoted by d\sigma_{\Sigma} , we can define the Affine Invariant Surface measure as the weighted surface measure

\displaystyle d\Omega_{\Sigma}(\xi) := |\kappa_{\Sigma}(\xi)|^{1/(d+1)} \, d\sigma_{\Sigma}(\xi);

this measure has the property of being invariant under the action of SL(\mathbb{R}^d) – hence the name. Here invariant means that if \varphi is an equi-affine map (thus volume preserving) then

\displaystyle \Omega_{\varphi(\Sigma)}(\varphi(E)) = \Omega_{\Sigma}(E)

for any measurable E \subseteq \Sigma .
The Affine Invariant Surface measure can be used to formulate a very interesting result in Affine Differential Geometry – an inequality of isoperimetric type. Let K \subset \mathbb{R}^d be a convex body – say, centred in the origin and symmetric with respect to it, i.e. K = - K . We denote by \partial K the boundary of the convex body K and we can assume for the sake of the argument that \partial K is sufficiently smooth – for example, piecewise C^2-regular, so that the Gaussian curvature is defined at every point except maybe a \mathcal{H}^{d-1} -null set. Then the Affine Isoperimetric Inequality says that (with \Omega = \Omega_{\partial K} )

\displaystyle \boxed{ \Omega(\partial K)^{d+1} \lesssim |K|^{d-1}.  } \ \ \ \ \ \ \ (\dagger)


Notice that the inequality is invariant with respect to the action of SL(\mathbb{R}^d) indeed – thanks to the fact that d\Omega is. Observe also the curious fact that this inequality goes in the opposite direction with respect to the better known Isoperimetric Inequality of Geometric Measure Theory! Indeed, the latter says (let’s say in the usual \mathbb{R}^d ) that (a power of) the volume of a measurable set is controlled by (a power of) the perimeter of the set; more precisely, for any measurable E \subset \mathbb{R}^d

\displaystyle |E|^{d-1} \lesssim P(E)^d,

where P(E) denotes the perimeter1 of E – in case E = K a symmetric convex body as above we would have P(K) = \sigma(\partial K) . But in the affine context the “affine perimeter” is \Omega(\partial K) and is controlled by the volume instead of viceversa. This makes perfect sense: if K is taken to be a cube Q then \kappa_{\partial Q} = 0 and so the “affine perimeter” cannot control anything. Notice also that the power of the perimeter is d for the standard isoperimetric inequality and it is instead d+1 for the affine isoperimetric inequality. Informally speaking, this is related to the fact that the affine perimeter is measuring curvature too instead of just area.
So, the inequality should actually be called something like “Affine anti-Isoperimetric inequality” to better reflect this, but I don’t get to choose the names.

The inequality above is formulated for convex bodies since those are the most relevant objects for Affine Geometry. However, below we will see that Harmonic Analysis provides a sweeping generalisation of the inequality to arbitrary hypersurfaces that are not necessarily boundaries of convex bodies. Before showing this generalisation, we need to introduce Affine Fourier restriction estimates, which we do in the next section.

Continue reading

The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space L(\log L)^{1/2} .

1. The Chang-Wilson-Wolff inequality

Consider the collection \mathcal{D} of standard dyadic intervals that are contained in [0,1] . We let \mathcal{D}_j for each j \in \mathbb{N} denote the subcollection of intervals I \in \mathcal{D} such that |I|= 2^{-j} . Notice that these subcollections generate a filtration of \mathcal{D}, that is (\sigma(\mathcal{D}_j))_{j \in \mathbb{N}}, where \sigma(\mathcal{D}_j) denotes the sigma-algebra generated by the collection \mathcal{D}_j . We can associate to this filtration the conditional expectation operators

\displaystyle  \mathbf{E}_j f := \mathbf{E}[f \,|\, \sigma(\mathcal{D}_j)],

and therefore define the martingale differences

\displaystyle  \mathbf{D}_j f:= \mathbf{E}_{j+1} f - \mathbf{E}_{j}f.

With this notation, we have the formal telescopic identity

\displaystyle  f = \mathbf{E}_0 f + \sum_{j \in \mathbb{N}} \mathbf{D}_j f.

Demystification: the expectation \mathbf{E}_j f(x) is simply \frac{1}{|I|} \int_I f(y) \,dy, where I is the unique dyadic interval in \mathcal{D}_j such that x \in I .

Letting f_j := \mathbf{E}_j f for brevity, the sequence of functions (f_j)_{j \in \mathbb{N}} is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is

\displaystyle  \mathbf{E}_{j} f_{j+1} = f_j.

In the following we will only be interested in functions with zero average, that is functions such that \mathbf{E}_0 f = 0. Given such a function f : [0,1] \to \mathbb{R} then, we can define its martingale square function S_{\mathcal{D}}f to be

\displaystyle  S_{\mathcal{D}} f := \Big(\sum_{j \in \mathbb{N}} |\mathbf{D}_j f|^2 \Big)^{1/2}.

With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.

C-W-W inequality: Let {f : [0,1] \to \mathbb{R}} be such that \mathbf{E}_0 f = 0. For any {2\leq p < \infty} it holds that

\displaystyle  \boxed{\|f\|_{L^p([0,1])} \lesssim p^{1/2}\, \|S_{\mathcal{D}}f\|_{L^p([0,1])}.} \ \ \ \ \ \ (\text{CWW}_1)

An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent {p} , which is sharp. This can be seen by taking a “lacunary” function {f} (essentially one where \mathbf{D}_jf = a_j \in \mathbb{C} , a constant) and randomising the signs using Khintchine’s inequality (indeed, {p^{1/2}} is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which [0,1] is replaced by the unit cube [0,1]^d and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.

Continue reading

Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,

where A_t denotes the spherical average at radius {t} , that is

\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),

where d\sigma_{d-1} denotes the spherical measure on the (d-1) -dimensional sphere (we will omit the subscript from now on and just write d\sigma since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator \mathcal{M}_{\mathbb{S}^{d-1}} is L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d) bounded for any \frac{d}{d-1} < p \leq \infty .

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows L^p \to L^p boundedness of the operator for every 2 \geq p > \frac{d}{d-1} quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for \mathcal{M}_{\mathbb{S}^{d-1}} ! The latter means that for any measurable E of finite (Lebesgue) measure we have

\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)


which is exactly the L^{d/(d-1)} \to L^{d/(d-1),\infty} inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the L^{d/(d-1),1} \to L^{d/(d-1),\infty} inequality). The downside of Bourgain’s argument is that it only works in dimension d \geq 4 , and thus misses the dimension d=3 that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.

Continue reading

Marcinkiewicz-type multiplier theorem for q-variation (q > 1)

Not long ago we discussed one of the main direct applications of the Littlewood-Paley theory, namely the Marcinkiewicz multiplier theorem. Recall that the single-variable version of this theorem can be formulated as follows:

Theorem 1 [Marcinkiewicz multiplier theorem]: Let {m} be a function on \mathbb{R} such that

  1. m \in L^\infty
  2. for every Littlewood-Paley dyadic interval L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k] with k \in \mathbb{Z}

    \displaystyle \|m\|_{V(L)} \leq C,

    where \|m\|_{V(L)} denotes the total variation of {m} over the interval L .

Then for any {1 < p < \infty} the multiplier {T_m} defined by \widehat{T_m f} = m \widehat{f} for functions f \in L^2(\mathbb{R}) extends to an L^p \to L^p bounded operator,

\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.

You should also recall that the total variation V(I) above is defined as

\displaystyle \sup_{N}\sup_{\substack{t_0, \ldots, t_N \in I : \\ t_0 < \ldots < t_N}} \sum_{j=1}^{N} |m(t_j) - m(t_{j-1})|,

and if {m} is absolutely continuous then {m'} exists as a measurable function and the total variation over interval I is given equivalently by \int_{I} |m'(\xi)|d\xi . We have seen that the “dyadic total variation condition” 2.) above is to be seen as a generalisation of the pointwise condition |m'(\xi)|\lesssim |\xi|^{-1} , which in dimension 1 happens to coincide with the classical differential Hörmander condition (in higher dimensions the pointwise Marcinkiewicz conditions are of product type, while the pointwise Hörmander(-Mihklin) conditions are of radial type; see the relevant post). Thus the Marcinkiewicz multiplier theorem in dimension 1 can deal with multipliers whose symbol is somewhat rougher than being differentiable. It is an interesting question to wonder how much rougher the symbols can get while still preserving their L^p mapping properties (or maybe giving up some range – recall though that the range of boundedness for multipliers must be symmetric around 2 because multipliers are self-adjoint).

Coifman, Rubio de Francia and Semmes came up with an answer to this question that is very interesting. They generalise the Marcinkiewicz multiplier theorem (in dimension 1) to multipliers that have bounded {q} -variation with {q} > 1. Let us define this quantity rigorously.

Definition: Let q \geq 1 and let I be an interval. Given a function f : \mathbb{R} \to \mathbb{R}, its {q} -variation over the interval {I} is

\displaystyle \|f\|_{V_q(I)} := \sup_{N} \sup_{\substack{t_0, \ldots t_N \in I : \\ t_0 < \ldots < t_N}} \Big(\sum_{j=1}^{N} |f(t_j) - f(t_{j-1})|^q\Big)^{1/q}

Notice that, with respect to the notation above, we have \|m\|_{V(I)} = \|m\|_{V_1(I)} . From the fact that \|\cdot\|_{\ell^q} \leq \|\cdot \|_{\ell^p} when p \leq q we see that we have always \|f\|_{V_q (I)} \leq \|f\|_{V_p(I)} , and therefore the higher the {q} the less stringent the condition of having bounded {q} -variation becomes (this is linked to the Hölder regularity of the function getting worse). In particular, if we wanted to weaken hypothesis 2.) in the Marcinkiewicz multiplier theorem above, we could simply replace it with the condition that for any Littlewood-Paley dyadic interval L we have instead \|m\|_{V_q(L)} \leq C . This is indeed what Coifman, Rubio de Francia and Semmes do, and they were able to show the following:

Theorem 2 [Coifman-Rubio de Francia-Semmes, ’88]: Let q\geq 1 and let {m} be a function on \mathbb{R} such that

  1. m \in L^\infty
  2. for every Littlewood-Paley dyadic interval L := [2^k, 2^{k+1}] \cup [-2^{k+1},-2^k] with k \in \mathbb{Z}

    \displaystyle \|m\|_{V_q(L)} \leq C.

Then for any {1 < p < \infty} such that {\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} } the multiplier {T_m} defined by \widehat{T_m f} = m \widehat{f} extends to an L^p \to L^p bounded operator,

\displaystyle \|Tf\|_{L^p} \lesssim_p (\|m\|_{L^\infty} + C) \|f\|_{L^p}.

The statement is essentially the same as before, except that now we are imposing control of the {q} -variation instead and as a consequence we have the restriction that our Lebesgue exponent {p} satisfy {\Big|\frac{1}{2} - \frac{1}{p}\Big| < \frac{1}{q} }. Taking a closer look at this condition, we see that when the variation parameter is 1 \leq q \leq 2 the condition is empty, that is there is no restriction on the range of boundedness of T_m : it is still the full range {1} < {p} < \infty , and as {q} grows larger and larger the range of boundedness restricts itself to be smaller and smaller around the exponent p=2 (for which the multiplier is always necessarily bounded, by Plancherel). This is a very interesting behaviour, which points to the fact that there is a certain dichotomy between variation in the range below 2 and the range above 2, with 2 -variation being the critical case. This is not an isolated case: for example, the Variation Norm Carleson theorem is false for {q} -variation with {q \leq 2} ; similarly, the Lépingle inequality is false for 2-variation and below (and this is related to the properties of Brownian motion).

Continue reading

Representing points in a set in positional-notation fashion (a trick by Bourgain): part II

This is the second and final part of an entry dedicated to a very interesting and inventive trick due to Bourgain. In part I we saw a lemma on maximal Fourier projections due to Bourgain, together with the context it arises from (the study of pointwise ergodic theorems for polynomial sequences); we also saw a baby version of the idea to come, that we used to prove the Rademacher-Menshov theorem (recall that the idea was to represent the indices in the supremum in their binary positional notation form and to rearrange the supremum accordingly). Today we finally get to see Bourgain’s trick.

Before we start, recall the statement of Bourgain’s lemma:

Lemma 1 [Bourgain]: Let K be an integer and let \Lambda = \{\lambda_1, \ldots, \lambda_K \} a set of {K} distinct frequencies. Define the maximal frequency projections

\displaystyle \mathcal{B}_\Lambda f(x) := \sup_{j} \Big|\sum_{k=1}^{K} (\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee}\Big|,

where the supremum is restricted to those {j \geq j_0} with j_0 = j_0(\Lambda) being the smallest integer such that 2^{-j_0} \leq \frac{1}{2}\min \{ |\lambda_k - \lambda_{k'}| : 1\leq k\neq k'\leq K \}.
Then

\displaystyle \|\mathcal{B}_\Lambda f\|_{L^2} \lesssim (\log \#\Lambda)^2 \|f\|_{L^2}.

Here we are using the notation (\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} \widehat{f})^{\vee} in the statement in place of the expanded formula \int_{|\xi - \lambda_k| < 2^{-j}} \widehat{f}(\xi) e^{2\pi i \xi x} d\xi. Observe that by the definition of j_0 we have that the intervals [\lambda_k - 2^{-j_0}, \lambda_k + 2^{-j_0}] are disjoint (and j_0 is precisely maximal with respect to this condition).
We will need to do some reductions before we can get to the point where the trick makes its appearance. These reductions are the subject of the next section.

3. Initial reductions

A first important reduction is that we can safely replace the characteristic functions \mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} by smooth bump functions with comparable support. Indeed, this is the result of a very standard square-function argument which was already essentially presented in Exercise 22 of the 3rd post on basic Littlewood-Paley theory. Briefly then, let \varphi be a Schwartz function such that \widehat{\varphi} is a smooth bump function compactly supported in the interval [-1,1] and such that \widehat{\varphi} \equiv 1 on the interval [-1/2, 1/2]. Let \varphi_j (x) := \frac{1}{2^j} \varphi \Big(\frac{x}{2^j}\Big) (so that \widehat{\varphi_j}(\xi) = \widehat{\varphi}(2^j \xi)) and let for convenience \theta_j denote the difference \theta_j := \mathbf{1}_{[-2^{-j}, 2^{-j}]} - \widehat{\varphi_j}. We have that the difference

\displaystyle \sup_{j\geq j_0(\Lambda)} \Big|\sum_{k=1}^{K} ((\mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]} - \widehat{\varphi_j}(\cdot - \lambda_k)) \widehat{f})^{\vee}\Big|

is an L^2 bounded operator with norm O(1) (that is, independent of K ). Indeed, observe that \mathbf{1}_{[\lambda_k - 2^{-j}, \lambda_k + 2^{-j}]}(\xi) - \widehat{\varphi_j}(\xi - \lambda_k) = \theta_j (\xi - \lambda_k), and bounding the supremum by the \ell^2 sum we have that the L^2 norm (squared) of the operator above is bounded by

\displaystyle \sum_{j \geq j_0(\Lambda)} \Big\|\sum_{k=1}^{K} (\theta_j(\cdot - \lambda_k)\widehat{f})^{\vee}\Big\|_{L^2}^2,

where the summation in {j} is restricted in the same way as the supremum is in the lemma (that is, the intervals [\lambda_k - 2^{-j}, \lambda_k + 2^{-j}] must be pairwise disjoint). By an application of Plancherel we see that the above is equal to

\displaystyle \sum_{k=1}^{K} \Big\| \widehat{f}(\xi) \Big[\sum_{j \geq j_0} \theta_j(\xi - \lambda_k) \Big]\Big\|_{L^2}^2;

but notice that the functions \theta_j have supports disjoint in {j} , and therefore the multiplier satisfies \sum_{j\geq j_0} \theta_j(\xi - \lambda_k) \lesssim 1 in a neighbourhood of \lambda_k , and vanishes outside such neighbourhood. A final application of Plancherel allows us to conclude that the above is bounded by \lesssim \|f\|_{L^2}^2 by orthogonality (these neighbourhoods being all disjoint as well).
By triangle inequality, we see therefore that in order to prove Lemma 1 it suffices to prove that the operator

\displaystyle \sup_{j} \Big|\sum_{k=1}^{K} (\widehat{\varphi_j}(\cdot - \lambda_k) \widehat{f})^{\vee}\Big|

is L^2 bounded with norm at most O((\log \#\Lambda)^2).

Continue reading

Representing points in a set in positional-notation fashion (a trick by Bourgain): part I

If you are reading this blog, you have probably heard that Jean Bourgain – one of the greatest analysts of the last century – has unfortunately passed away last December. It is fair to say that the progress of analysis will slow down significantly without him. I am not in any position to give a eulogy to this giant, but I thought it would be nice to commemorate him by talking occasionally on this blog about some of his many profound papers and his crazily inventive tricks. That’s something everybody agrees on: Bourgain was able to come up with a variety of insane tricks in a way that no one else is. The man was a problem solver and an overall magician: the first time you see one of his tricks, you don’t believe what’s happening in front of you. And that’s just the tricks part!

In this two-parts post I am going to talk about a certain trick that loosely speaking, involves representing points on an arbitrary set in a fashion similar to how integers are represented, say, in binary basis. I don’t know if this trick came straight out of Bourgain’s magical top hat or if he learned it from somewhere else; I haven’t seen it used elsewhere except for papers that cite Bourgain himself, so I’m inclined to attribute it to him – but please, correct me if I’m wrong.
Today we introduce the context for the trick (a famous lemma by Bourgain for maximal frequency projections on the real line) and present a toy version of the idea in a proof of the Rademacher-Menshov theorem. In the second part we will finally see the trick.

1. Ergodic averages along arithmetic sequences
First, some context. The trick I am going to talk about can be found in one of Bourgain’s major papers, that were among the ones cited in the motivation for his Fields medal prize. I am talking about the paper on a.e. convergence of ergodic averages along arithmetic sequences. The main result of that paper is stated as follows: let (X,T,\mu) be an ergodic system, that is

  1. \mu is a probability on X ;
  2. T: X \to X satisfies \mu(T^{-1} A) = \mu(A) for all \mu -measurable sets A (this is the invariance condition);
  3. T^{-1} A = A implies \mu(A) = 0 \text{ or } 1 (this is the ergodicity condition).

Then the result is

Theorem: [Bourgain, ’89] Let (X,T,\mu) be an ergodic system and let p(n) be a polynomial with integer coefficients. If f \in L^q(d\mu) with {q} > 1, then the averages A_N f(x) := \frac{1}{N}\sum_{n=1}^{N}f(T^{p(n)} x) converge \mu -a.e. as N \to \infty ; moreover, if {T} is weakly mixing1, we have more precisely

\displaystyle \lim_{N \to \infty} A_N f(x) = \int_X f d\mu

for \mu -a.e. {x} .

For comparison, the more classical pointwise ergodic theorem of Birkhoff states the same for the case p(n) = n and f \in L^1(d\mu) (notice this is the largest of the L^p(X,d\mu) spaces because \mu is finite), in which case the theorem is deduced as a consequence of the L^1 \to L^{1,\infty} boundedness of the Hardy-Littlewood maximal function. The dense class to appeal to is roughly speaking L^2(X,d\mu) , thanks to the ergodic theorem of Von Neumann, which states A_N f converges in L^2 norm for f \in L^2(X,d\mu) . However, the details are non-trivial. Heuristically, these ergodic theorems incarnate a quantitative version of the idea that the orbits \{T^n x\}_{n\in\mathbb{N}} fill up the entire space {X} uniformly. I don’t want to enter into details because here I am just providing some context for those interested; there are plenty of introductions to ergodic theory where these results are covered in depth.

Continue reading