# Affine Restriction estimates imply Affine Isoperimetric inequalities

One thing I absolutely love about harmonic analysis is that it really has something interesting to say about nearly every other field of Analysis. Today’s example is exactly of this kind: I will show how a Fourier Restriction estimate can say something about Affine Geometry. This was first noted by Carbery and Ziesler (see below for references).

## 1. Affine Isoperimetric Inequality

Recall the Affine Invariant Surface Measure that we have defined in a previous post. Given a hypersurface $\Sigma \subset \mathbb{R}^d$ sufficiently smooth to have a well-defined Gaussian curvature $\kappa_{\Sigma}(\xi)$ (where $\xi$ ranges over $\Sigma$) and with surface measure denoted by $d\sigma_{\Sigma}$, we can define the Affine Invariant Surface measure as the weighted surface measure

$\displaystyle d\Omega_{\Sigma}(\xi) := |\kappa_{\Sigma}(\xi)|^{1/(d+1)} \, d\sigma_{\Sigma}(\xi);$

this measure has the property of being invariant under the action of $SL(\mathbb{R}^d)$ – hence the name. Here invariant means that if $\varphi$ is an equi-affine map (thus volume preserving) then

$\displaystyle \Omega_{\varphi(\Sigma)}(\varphi(E)) = \Omega_{\Sigma}(E)$

for any measurable $E \subseteq \Sigma$.
The Affine Invariant Surface measure can be used to formulate a very interesting result in Affine Differential Geometry – an inequality of isoperimetric type. Let $K \subset \mathbb{R}^d$ be a convex body – say, centred in the origin and symmetric with respect to it, i.e. $K = - K$. We denote by $\partial K$ the boundary of the convex body $K$ and we can assume for the sake of the argument that $\partial K$ is sufficiently smooth – for example, piecewise $C^2$-regular, so that the Gaussian curvature is defined at every point except maybe a $\mathcal{H}^{d-1}$-null set. Then the Affine Isoperimetric Inequality says that (with $\Omega = \Omega_{\partial K}$)

$\displaystyle \boxed{ \Omega(\partial K)^{d+1} \lesssim |K|^{d-1}. } \ \ \ \ \ \ \ (\dagger)$

Notice that the inequality is invariant with respect to the action of $SL(\mathbb{R}^d)$ indeed – thanks to the fact that $d\Omega$ is. Observe also the curious fact that this inequality goes in the opposite direction with respect to the better known Isoperimetric Inequality of Geometric Measure Theory! Indeed, the latter says (let’s say in the usual $\mathbb{R}^d$) that (a power of) the volume of a measurable set is controlled by (a power of) the perimeter of the set; more precisely, for any measurable $E \subset \mathbb{R}^d$

$\displaystyle |E|^{d-1} \lesssim P(E)^d,$

where $P(E)$ denotes the perimeter1 of $E$ – in case $E = K$ a symmetric convex body as above we would have $P(K) = \sigma(\partial K)$. But in the affine context the “affine perimeter” is $\Omega(\partial K)$ and is controlled by the volume instead of viceversa. This makes perfect sense: if $K$ is taken to be a cube $Q$ then $\kappa_{\partial Q} = 0$ and so the “affine perimeter” cannot control anything. Notice also that the power of the perimeter is $d$ for the standard isoperimetric inequality and it is instead $d+1$ for the affine isoperimetric inequality. Informally speaking, this is related to the fact that the affine perimeter is measuring curvature too instead of just area.
So, the inequality should actually be called something like “Affine anti-Isoperimetric inequality” to better reflect this, but I don’t get to choose the names.

The inequality above is formulated for convex bodies since those are the most relevant objects for Affine Geometry. However, below we will see that Harmonic Analysis provides a sweeping generalisation of the inequality to arbitrary hypersurfaces that are not necessarily boundaries of convex bodies. Before showing this generalisation, we need to introduce Affine Fourier restriction estimates, which we do in the next section.

# Affine Invariant Surface Measure

In this short post I want to introduce an instance of certain objects that will be the subject of a few more posts. This particular object arises naturally in Affine Differential Geometry and turned out to have a relevant rôle in Harmonic Analysis too (in both Fourier restriction and in the theory of Radon transforms).

## 1. Affine Invariant measures

Affine Differential Geometry is the study of (differential-)geometric properties that are invariant with respect to $SL(\mathbb{R}^d)$. A very interesting object arising in Affine Geometry is the notion of an Affine Invariant Measure. Sticking to examples rather than theory (since the theory is still quite underdeveloped!), consider a hypersurface $\Sigma \subset \mathbb{R}^{d}$ sufficiently smooth to have well-defined Gaussian curvature, which we denote by $\kappa$ (a function on $\Sigma$). If we let $d\sigma$ denote the surface measure on $\Sigma$ (induced from the Lebesgue measure on the ambient space $\mathbb{R}^d$ for example, or by taking directly $d\sigma = d\mathcal{H}^{d-1}\big|_{\Sigma}$, the restriction of the $(d-1)$-dimensional Hausdorff measure to the hypersurface) then this crafty little object is called Affine Invariant Surface Measure and is given by

$\displaystyle d\Omega(\xi) = |\kappa(\xi)|^{1/(d+1)} \,d\sigma(\xi).$

It was first introduced by Blaschke for $d=3$ (finding the reference seems impossible; it’s [B] in this paper, if you feel luckier) and by Leichtweiss for general $d$. The reason this measure is so interesting is that it is (equi)affine invariant in the sense that if $\varphi(\xi) = A \xi + \eta$ is an equi-affine transformation (thus with $A \in SL(\mathbb{R}^d)$ and so volume-preserving since $\det A = \pm 1$) then, using subscripts to distinguish the two surfaces, we have

$\displaystyle \boxed{ \Omega_{\varphi(\Sigma)}(\varphi(E)) = \Omega_{\Sigma}(E) } \ \ \ \ \ \ \ (1)$

for any measurable $E \subseteq \Sigma$. We remark the following fact: that seemingly mysterious power $\frac{1}{d+1}$ in the definition of $d\Omega$ is the only exponent for which the resulting measure is (equi)affine-invariant.

# Proof of the square-function characterisation of L(log L)^{1/2}: part II

This is the 3rd post in a series that started with the post on the Chang-Wilson-Wolff inequality:

In today’s post we will finally complete the proof of the Tao-Wright lemma. Recall that in the 2nd post of the series we proved that the Tao-Wright lemma follows from its discrete version for Haar/dyadic-martingale-differences, which is as follows:

Lemma 2 – Square-function characterisation of $L(\log L)^{1/2}$ for martingale-differences:
For any function $f : [0,1] \to \mathbb{R}$ in $L(\log L)^{1/2}([0,1])$ there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. for any $j \in \mathbb{N}$ and any $I \in \mathcal{D}_j$

$\displaystyle |\langle f, h_I \rangle|\lesssim \frac{1}{|I|^{1/2}} \int_{I} F_j \,dx;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{N}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}([0,1])}.$

Today we will prove this lemma.

# Interlude: Atomic decomposition of L(log L)^r

This is going to be a shorter post about a technical fact that will be used in concluding the proof of the Tao-Wright lemma.
What we are going to see today is an atomic decomposition of the Orlicz spaces of $L (\log L)^r$ type. Surprisingly, I could find no classical references that explicitely state this useful little fact – some attribute it to Titchmarsh, Zygmund and Yano; indeed, something resembling the decomposition can be found for example in Zygmund’s book (Volume II, page 120). However, I could only find a proper statement together with a proof in a paper of Tao titled “A Converse Extrapolation Theorem for Translation-Invariant Operators“, where he claims it is a well-known fact and proves it in an appendix (the paper is about reversing the implication in an old extrapolation theorem of Yano [1951], a theorem that tells you that if the operator norms $\|T\|_{L^p \to L^p}$ blow up only to finite order as $p \to 1^{+}$, then you can “extrapolate” this into an endpoint inequality of the type $\|Tf\|_{L^1} \lesssim \|f\|_{L(\log L)^r}$).

Briefly stated, the result is as follows. We will consider only $L(\log L)^r ([0,1])$, that is the Orlicz space of functions on $[0,1]$ with Orlicz/Luxemburg norm

$\displaystyle \|f\|_{L(\log L)^r ([0,1])} = \inf \Big\{\mu > 0 \text{ s.t. } \int_{0}^{1} \frac{|f(x)|}{\mu} \Big(\log \Big(2 + \frac{|f(x)|}{\mu}\Big)\Big)^{r} \,dx \leq 1 \Big\}.$

Our atoms will be quite simply normalised characteristic functions: that is, for any measurable set $E \subset [0,1]$ we let $a_E$ denote the atom associated to $E$, given by

$\displaystyle a_E := \frac{\mathbf{1}_E}{\|\mathbf{1}_E\|_{L(\log L)^r}};$

obviously $\|a_E\|_{L(\log L)^r} = 1$.
The statement is then the following.

Atomic decomposition of $L(\log L)^r$:
Let $f \in L(\log L)^{r}([0,1])$. Then there exist measurable sets $(E_j)_j$ and coefficients $(\alpha_j)_j$ such that

$\displaystyle f = \sum_{j} \alpha_j a_{E_j}$

and

$\displaystyle \sum_{j} |\alpha_j| \lesssim \|f\|_{L(\log L)^r}.$

# Proof of the square-function characterisation of L(log L)^{1/2}: part I

This is a follow-up on the post on the Chang-Wilson-Wolff inequality and how it can be proven using a lemma of Tao-Wright. The latter consists of a square-function characterisation of the Orlicz space $L(\log L)^{1/2}$ analogous in spirit to the better known one for the Hardy spaces.
In this post we will commence the proof of the Tao-Wright lemma, as promised. We will start by showing how the lemma, which is stated for smooth frequency projections, can be deduced from its discrete variant stated in terms of Haar coefficients (or equivalently, martingale differences with respect to the standard dyadic filtration). This is a minor part of the overall argument but it is slightly tricky so I thought I would spell it out.

Recall that the Tao-Wright lemma is as follows. We write $\widetilde{\Delta}_j f$ for the smooth frequency projection defined by $\widehat{\widetilde{\Delta}_j f}(\xi) = \widehat{\psi}(2^{-j}\xi) \widehat{f}(\xi)$, where $\widehat{\psi}$ is a smooth function compactly supported in $1/2 \leq |\xi| \leq 4$ and identically equal to 1 on $1 \leq |\xi| \leq 2$.

Lemma 1 – Square-function characterisation of $L(\log L)^{1/2}$ [Tao-Wright, 2001]:
Let for any $j \in \mathbb{Z}$

$\displaystyle \phi_j(x) := \frac{2^j}{(1 + 2^j |x|)^{3/2}}$

(notice $\phi_j$ is concentrated in $[-2^{-j},2^{-j}]$ and $\|\phi_j\|_{L^1} \sim 1$).
If the function ${f}$ is in $L(\log L)^{1/2}([-R,R])$ and such that $\int f(x) \,dx = 0$, then there exists a collection $(F_j)_{j \in \mathbb{Z}}$ of non-negative functions such that:

1. pointwise for any $j \in \mathbb{Z}$

$\displaystyle \big|\widetilde{\Delta}_j f\big| \lesssim F_j \ast \phi_j ;$

2. they satisfy the square-function estimate

$\displaystyle \Big\|\Big(\sum_{j \in \mathbb{Z}} |F_j|^2\Big)^{1/2}\Big\|_{L^1} \lesssim \|f\|_{L(\log L)^{1/2}}.$

# The Chang-Wilson-Wolff inequality using a lemma of Tao-Wright

Today I would like to introduce an important inequality from the theory of martingales that will be the subject of a few more posts. This inequality will further provide the opportunity to introduce a very interesting and powerful result of Tao and Wright – a sort of square-function characterisation for the Orlicz space $L(\log L)^{1/2}$.

## 1. The Chang-Wilson-Wolff inequality

Consider the collection $\mathcal{D}$ of standard dyadic intervals that are contained in $[0,1]$. We let $\mathcal{D}_j$ for each $j \in \mathbb{N}$ denote the subcollection of intervals $I \in \mathcal{D}$ such that $|I|= 2^{-j}$. Notice that these subcollections generate a filtration of $\mathcal{D}$, that is $(\sigma(\mathcal{D}_j))_{j \in \mathbb{N}}$, where $\sigma(\mathcal{D}_j)$ denotes the sigma-algebra generated by the collection $\mathcal{D}_j$. We can associate to this filtration the conditional expectation operators

$\displaystyle \mathbf{E}_j f := \mathbf{E}[f \,|\, \sigma(\mathcal{D}_j)],$

and therefore define the martingale differences

$\displaystyle \mathbf{D}_j f:= \mathbf{E}_{j+1} f - \mathbf{E}_{j}f.$

With this notation, we have the formal telescopic identity

$\displaystyle f = \mathbf{E}_0 f + \sum_{j \in \mathbb{N}} \mathbf{D}_j f.$

Demystification: the expectation $\mathbf{E}_j f(x)$ is simply $\frac{1}{|I|} \int_I f(y) \,dy$, where $I$ is the unique dyadic interval in $\mathcal{D}_j$ such that $x \in I$.

Letting $f_j := \mathbf{E}_j f$ for brevity, the sequence of functions $(f_j)_{j \in \mathbb{N}}$ is called a martingale (hence the name “martingale differences” above) because it satisfies the martingale property that the conditional expectation of “future values” at the present time is the present value, that is

$\displaystyle \mathbf{E}_{j} f_{j+1} = f_j.$

In the following we will only be interested in functions with zero average, that is functions such that $\mathbf{E}_0 f = 0$. Given such a function $f : [0,1] \to \mathbb{R}$ then, we can define its martingale square function $S_{\mathcal{D}}f$ to be

$\displaystyle S_{\mathcal{D}} f := \Big(\sum_{j \in \mathbb{N}} |\mathbf{D}_j f|^2 \Big)^{1/2}.$

With these definitions in place we can state the Chang-Wilson-Wolff inequality as follows.

C-W-W inequality: Let ${f : [0,1] \to \mathbb{R}}$ be such that $\mathbf{E}_0 f = 0$. For any ${2\leq p < \infty}$ it holds that

$\displaystyle \boxed{\|f\|_{L^p([0,1])} \lesssim p^{1/2}\, \|S_{\mathcal{D}}f\|_{L^p([0,1])}.} \ \ \ \ \ \ (\text{CWW}_1)$

An important point about the above inequality is the behaviour of the constant in the Lebesgue exponent ${p}$, which is sharp. This can be seen by taking a “lacunary” function ${f}$ (essentially one where $\mathbf{D}_jf = a_j \in \mathbb{C}$, a constant) and randomising the signs using Khintchine’s inequality (indeed, ${p^{1/2}}$ is precisely the asymptotic behaviour of the constant in Khintchine’s inequality; see Exercise 5 in the 2nd post on Littlewood-Paley theory).
It should be remarked that the inequality extends very naturally and with no additional effort to higher dimensions, in which $[0,1]$ is replaced by the unit cube $[0,1]^d$ and the dyadic intervals are replaced by the dyadic cubes. We will only be interested in the one-dimensional case here though.

# Hausdorff-Young inequality and interpolation

The Hausdorff-Young inequality is one of the most fundamental results about the mapping properties of the Fourier transform: it says that

$\displaystyle \| \widehat{f} \|_{L^{p'}(\mathbb{R}^d)} \leq \|f\|_{L^p(\mathbb{R}^d)}$

for all ${1 \leq p \leq 2}$, where $\frac{1}{p} + \frac{1}{p'} = 1$. It is important because it tells us that the Fourier transform maps $L^p$ continuously into $L^{p'}$, something which is not obvious when the exponent ${p}$ is not 1 or 2. When the underlying group is the torus, the corresponding Hausdorff-Young inequality is instead

$\displaystyle \| \widehat{f} \|_{\ell^{p'}(\mathbb{Z}^d)} \leq \|f\|_{L^p(\mathbb{T}^d)}.$

The optimal constant is actually less than 1 in general, and it has been calculated for $\mathbb{R}^d$ (and proven to be optimal) by Beckner, but this will not concern us here (if you want to find out what it is, take ${f}$ to be a gaussian). In the notes on Littlewood-Paley theory we also saw (in Exercise 7) that the inequality is false for ${p}$ greater than 2, and we proved so using a randomisation trick enabled by Khintchine’s inequality1.

Today I would like to talk about how the Hausdorff-Young inequality (H-Y) is proven and how important (or not) interpolation theory is to this inequality. I won’t be saying anything new or important, and ultimately this detour into H-Y will take us nowhere; but I hope the ride will be enjoyable.

## I found these fantastic lecture notes on Penrose’s…

### Status

I found these fantastic lecture notes on Penrose’s aperiodic tilings by Alexander F. Ritter, which he wrote for a masterclass at Oxford in 2014.

There is not much systematic well-structured material on aperiodic tilings around, so I thought I would share this for whoever is interested. The material is accessible to every undergrad that has taken a class in real analysis (some nice application of extracting a converging subsequence from a sequence in a compact space, in the proof of the Extension Theorem), and includes many painstaking hand-drawings that are very helpful in following the arguments. Besides being overall well-written, the notes are also good at hammering home some important points – for example, the fact that the Composition and Decomposition operations that one can make on a tiling are unique and thus reversible in the case of tilings by Penrose’s Kites and Darts tiles, which has a cascade of consequences (they prove aperiodicity of the tilings in a few lines, for example) which do not apply to regular periodic tilings for this very reason (non-uniqueness).

# Carbery's proof of the Stein-Tomas theorem

Writing the article on Bourgain’s proof of the spherical maximal function theorem I suddenly recalled another interesting proof that uses a trick very similar to that of Bourgain – and apparently directly inspired from it. Recall that the “trick” consists of the following fact: if we consider only characteristic functions as our inputs, then we can split the operator in two, estimate these parts each in a different Lebesgue space, and at the end we can combine the estimates into an estimate in a single $L^p$ space by optimising in some parameter. The end result looks as if we had done “interpolation”, except that we are “interpolating” between distinct estimates for distinct operators!

The proof I am going to talk about today is a very simple proof given by Tony Carbery of the well-known Stein-Tomas restriction theorem. The reason I want to present it is that I think it is nice to see different incarnations of a single idea, especially if applied to very distinct situations. I will not spend much time discussing restriction because there is plenty of material available on the subject and I want to concentrate on the idea alone. If you are already familiar with the Stein-Tomas theorem you will certainly appreciate Carbery’s proof.

As you might recall, the Stein-Tomas theorem says that if $R$ denotes the Fourier restriction operator of the sphere $\mathbb{S}^{d-1}$ (but of course everything that follows extends trivially to arbitrary positively-curved compact hypersurfaces), that is

$\displaystyle Rf = \widehat{f} \,\big|_{\mathbb{S}^{d-1}}$

(defined initially on Schwartz functions), then

Stein-Tomas theorem: $R$ satisfies the a-priori inequality

$\displaystyle \|Rf\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim_p \|f\|_{L^p(\mathbb{R}^d)} \ \ \ \ \ \ (1)$

for all exponents ${p}$ such that $1 \leq p \leq \frac{2(d+1)}{d+3}$ (and this is sharp, by the Knapp example).

There are a number of proofs of such statement; originally it was proven by Tomas for every exponent except the endpoint, and then Stein combined the proof of Tomas with his complex interpolation method to obtain the endpoint too (and this is still one of the finest examples of the power of the method around).
Carbery’s proof obtains the restricted endpoint inequality directly, and therefore obtains inequality (1) for all exponents $1 \leq p$ < $\frac{2(d+1)}{d+3}$ by interpolation of Lorentz spaces with the $p=1$ case (which is a trivial consequence of the Hausdorff-Young inequality).

In other words, Carbery proves that for any (Borel) measurable set ${E}$ one has

$\displaystyle \|R \mathbf{1}_{E}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}, \ \ \ \ \ \ (2)$

where the LHS is clearly the $L^{2(d+1)/(d+3)}$ norm of the characteristic function $\mathbf{1}_E$. Notice that we could write the inequality equivalently as $\|\widehat{\mathbf{1}_{E}}\|_{L^2(\mathbb{S}^{d-1},d\sigma)} \lesssim |E|^{\frac{d+3}{2(d+1)}}$.

# Bourgain's proof of the spherical maximal function theorem

Recently I have presented Stein’s proof of the boundedness of the spherical maximal function: it was in part III of a set of notes on basic Littlewood-Paley theory. Recall that the spherical maximal function is the operator

$\displaystyle \mathscr{M}_{\mathbb{S}^{d-1}} f(x) := \sup_{t > 0} |A_t f(x)|,$

where $A_t$ denotes the spherical average at radius ${t}$, that is

$\displaystyle A_t f(x) := \int_{\mathbb{S}^{d-1}} f(x - t\omega) d\sigma_{d-1}(\omega),$

where $d\sigma_{d-1}$ denotes the spherical measure on the $(d-1)$-dimensional sphere (we will omit the subscript from now on and just write $d\sigma$ since the dimension will not change throughout the arguments). We state Stein’s theorem for convenience:

Spherical maximal function theorem [Stein]: The maximal operator $\mathcal{M}_{\mathbb{S}^{d-1}}$ is $L^p(\mathbb{R}^d) \to L^p(\mathbb{R}^d)$ bounded for any $\frac{d}{d-1}$ < $p \leq \infty$.

There is however an alternative proof of the theorem due to Bourgain which is very nice and conceptually a bit simpler, in that instead of splitting the function into countably many dyadic frequency pieces it splits the spherical measure into two frequency pieces only. The other ingredients in the two proofs are otherwise pretty much the same: domination by the Hardy-Littlewood maximal function, Sobolev-type inequalities to control suprema by derivatives and oscillatory integral estimates for the Fourier transform of the spherical measure (and its derivative). However, Bourgain’s proof has an added bonus: remember that Stein’s argument essentially shows $L^p \to L^p$ boundedness of the operator for every $2 \geq p$ > $\frac{d}{d-1}$ quite directly; Bourgain’s argument, on the other hand, proves the restricted weak-type endpoint estimate for $\mathcal{M}_{\mathbb{S}^{d-1}}$! The latter means that for any measurable $E$ of finite (Lebesgue) measure we have

$\displaystyle |\{x \in \mathbb{R}^d \; : \; \mathcal{M}_{\mathbb{S}^{d-1}}\mathbf{1}_E(x) > \alpha \}| \lesssim \frac{|E|}{\alpha^{d/(d-1)}}, \ \ \ \ \ \ (1)$

which is exactly the $L^{d/(d-1)} \to L^{d/(d-1),\infty}$ inequality but restricted to characteristic functions of sets (in the language of Lorentz spaces, it is the $L^{d/(d-1),1} \to L^{d/(d-1),\infty}$ inequality). The downside of Bourgain’s argument is that it only works in dimension $d \geq 4$, and thus misses the dimension $d=3$ that is instead covered by Stein’s theorem.

It seems to me that, while Stein’s proof is well-known and has a number of presentations around, Bourgain’s proof is less well-known – it does not help that the original paper is impossible to find. As a consequence, I think it would be nice to share it here. This post is thus another tribute to Jean Bourgain, much in the same spirit as the posts (III) on his positional-notation trick for sets.