# Fine structure of some classical affine-invariant inequalities and near-extremizers (account of a talk by Michael Christ)

I’m currently in Bonn, as mentioned in the previous post, participating to the Trimester Program organized by the Hausdorff Institute of Mathematics – although my time is almost over here. It has been a very pleasant experience: Bonn is lovely, the studio flat they got me is incredibly nice, Germany won the World Cup (nice game btw) and the talks were interesting. 2nd week has been pretty busy since there were all the main talks and some more unexpected talks in number theory which I attended. The week before that had been more relaxed instead, but I’ve followed a couple of talks then as well. Here I want to report about Christ’s talk on his work in the last few years, because I found it very interesting and because I had the opportunity to follow a second talk, which was more specific of the Hausdorff-Young inequality and helped me clarify some details I was confused about. If you get a chance, go to his talks, they’re really good.

What follows is an account of Christ’s talks – there are probably countless out there, but here’s another one. This is by no means original work, it’s very close to the talks themselves and I’m doing it only as a way to understand better. I’ll stick to Christ’s notation too. Also, I’m afraid the bibliography won’t be very complete, but I have included his papers, you can make your way to the other ones from there.

1. Four classical inequalities and their extremizers

Prof. Christ introduced four famous apparently unrelated inequalities. These are

• the Hausdorff-Young inequality: for all functions ${f \in L^p (\mathbb{R}^d)}$, with ${1\leq p \leq 2}$,

$\displaystyle \boxed{\|\widehat{f}\|_{L^{p'}}\leq \|f\|_{L^p};} \ \ \ \ \ \ \ \ \ \ \text{(H-Y)}$

• the Young inequality for convolution: if ${1+\frac{1}{q_3}=\frac{1}{q_1}+\frac{1}{q_2}}$ then

$\displaystyle \|f \ast g\|_{L^{q_3}} \leq \|f\|_{L^{q_1}}\|g\|_{L^{q_2}};$

for convenience, he put it in trilinear form

$\displaystyle \boxed{ |\left\langle f\ast g, h \right\rangle|\leq \|f\|_{L^{p_1}}\|g\|_{L^{p_2}}\|h\|_{L^{p_3}}; } \ \ \ \ \ \ \ \ \ \ \text{(Y)}$

notice the exponents satisfy ${\frac{1}{p_1}+\frac{1}{p_2}+\frac{1}{p_3}=2}$ (indeed ${q_1=p_1}$ and same for index 2, but ${p_3 = q'_3}$);

• the Brunn-Minkowski inequality: for any two measurable sets ${A,B \subset \mathbb{R}^d}$ of finite measure it is

$\displaystyle \boxed{ |A+B|^{1/d} \geq |A|^{1/d} + |B|^{1/d}; } \ \ \ \ \ \ \ \ \ \ \text{(B-M)}$

• the Riesz-Sobolev inequality: this is a rearrangement inequality, of the form

$\displaystyle \boxed{ \left\langle \chi_A \ast \chi_B, \chi_C \right\rangle \leq\left\langle \chi_{A^\ast} \ast \chi_{B^\ast}, \chi_{C^\ast} \right\rangle,} \ \ \ \ \ \ \ \ \ \ \text{(R-S)}$

where ${A,B,C}$ are measurable sets and given set ${E}$ the notation ${E^\ast}$ stands for the symmetrized set given by ball ${B(0, c_d |E|^{1/d})}$, where ${c_d}$ is a constant s.t. ${|E|=|E^\ast|}$: it’s a ball with the same volume as ${E}$.

These inequalities share a large group of symmetries, indeed they are all invariant w.r.t. the group of affine invertible transformations (which includes dilations and translations) – an uncommon feature. Moreover, for all of them the extremizers exist and have been characterized in the past. A natural question then arises

Is it true that if ${f}$ (or ${E}$, or ${\chi_E}$ where appropriate) is close to realizing the equality, then ${f}$ must also be close (in an appropriate sense) to an extremizer of the inequality?

Another way to put it is to think of these questions as relative to the stability of the extremizers, and that’s why they are referred to as fine structure of the inequalities. If proving the inequality is the first level of understanding it, answering the above question is the second level. As an example, answering the above question for (H-Y) led to a sharpened inequality. Christ’s work was motivated by the fact that nobody seemed to have addressed the question before in the literature, despite being a very natural one to ask.

Now, I’ll get to the heart of the matter soon, but first I have to introduce the extremizers and make the above more rigourous. Let’s look into each inequality separately.

1.1. Hausdorff-Young inequality

First of all, notice that the exponents in the inequality make it so that the inequality is invariant w.r.t. the group of invertible linear transformations ${\mathrm{GL}(\mathbb{R}^d)}$: indeed if ${T\in \mathrm{GL}(\mathbb{R}^d)}$

$\displaystyle \widehat{f\circ T}(\xi) = |\det T|^{-1} \widehat{f}((T^{-1})^\ast \xi),$

and therefore ${\|\widehat{f\circ T}\|_{L^{p'}} = |\det T|^{-1/p} \|\widehat{f}\|_{L^{p'}} \leq |\det T|^{-1/p} \|f\|_{L^p} = \|f \circ T\|_{L^p}}$. Moreover, since translations in physical space are modulations in the frequency space, they don’t change the magnitude of ${\widehat{f}}$, and thus the inequality is invariant w.r.t. translations as well. It follows that the (H-Y) inequality is invariant with respect to the full group of affine invertible transformations. Dually, it’s invariant to modulations as well, and finally to scalar multiplication.

It’s probably well known to anyone that ${1}$ isn’t the best constant in the H-Y inequality for ${\mathbb{R}^d}$ [1]: it was proved by Beckner [Be] that ${\|\widehat{f}\|_{L^{p'}}\leq A_p^d\|f\|_{L^p}}$ with

$\displaystyle A_p = p^{1/2p} p'^{-1/2p'}.$

What Beckner proved is that Gaussians are extremizers of H-Y (i.e. functions that realize the equality ${\|\widehat{f}\|_{L^{p'}}= A_p^d\|f\|_{L^p}}$): a gaussian function is of the form

$\displaystyle \mathcal{G}(x)=C e^{-\left\langle Qx,x\right\rangle + b\cdot x},$

with ${Q}$ positive definite matrix. Later Lieb proved that actually all the extremizers of H-Y are gaussians (he did so by exploiting the symmetries pointed out above and the tensorial structure of the Fourier transform in multiple dimensions).

An interesting observation is that one can prove Young’s inequality (Y) for a restricted range of exponents from (H-Y) and Hölder: since ${\widehat{f\ast g} = \widehat{f}\widehat{g}}$, with ${1+\frac{1}{r}=\frac{1}{p}+\frac{1}{q}}$, and assuming ${1\leq p,q,r'\leq 2}$

$\displaystyle \|f\ast g\|_{L^r} = \|\widehat{(\widehat{f}\,\widehat{g})}\|_{L^r} \leq \|\widehat{f}\widehat{g}\|_{L^{r'}} \leq \|\widehat{f}\|_{L^{p'}}\|\widehat{g}\|_{L^{q'}} \leq \|f\|_{L^p} \|g\|_{L^q};$

one can indeed verify that ${\frac{1}{r'} = \frac{1}{p'} + \frac{1}{q'}}$.

Finally, Christ remarked (in the 2nd talk) that (H-Y) enjoys a form of strong non-locality: if you take ${M,N \gg 1}$ and form the set ${E = \bigcup_{j=1}^{N}{\left[ j - \frac{1}{M}, j +\frac{1}{M}\right]}}$ (the union of ${N}$ intervals centered at ${j = 1, \ldots, N}$, of length ${M^{-1}}$), then there exists ${c}$ s.t. for ${p>1}$

$\displaystyle \frac{\|\widehat{\chi_E}\|_{L^{p'}}}{\|\chi_E\|_{L^p}}\geq c > 0$

uniformly in ${M, N}$. This is to be interpreted as the fact that the ratio is quite independent of the relative separation of the intervals, and thus one can’t exclude a priori near-extremizing functions that concentrate in more than one point (and thus aren’t close to gaussians, which are concentrated only in one point). This case actually presents itself explicitely in the proof of Thm. 1 below.

1.2. Young’s convolution inequality

The optimal constant for Young’s inequality is ${<1}$ when the exponents are in the range ${]1,\infty[}$, and this again was proved by Beckner, with some more general result by Brascamp-Lieb. For the curious reader, the optimal constant is

$\displaystyle C_{p_1,p_2,p_3}^d = \left(\frac{A_{p_1} A_{p_2}}{A_{p_3}}\right)^{d}.$

What’s more important here is that again all the extremizers are (particular) triplets of gaussian functions ${(\mathcal{G}_1,\mathcal{G}_2,\mathcal{G}_3)}$, and this is due to Lieb, and there’s a nice proof via the heat flow monotonicity method by Carlen, Lieb and Loss (2004). The proof can be sketched as follows: let ${I_0}$ be the quantity

$\displaystyle I_0:= \int{\int{f_1(x)f_2(y)f_3(x+y)}\,dx}\,dy,$

which is essentially ${\left\langle f_1\ast f_2, f_3 \right\rangle}$. We assume the functions are positive. The idea is to use the ${f_j}$‘s as initial data and let them evolve indefinitely under the heat flow (i.e. treat them as temperatures). It’s known that they will then approach gaussians under appropriate rescaling, but we have to make sure they’ll approach equality in (Y) too. This will be allowed by the fact that the heat equation flow preserves the ${L^1}$ mass. Introducing the time variable ${t}$, let initial data ${f_j^{p_j}}$ evolve under the heat flow, i.e.

$\displaystyle u_j(x,t):= (e^{t \Delta} f_j^{p_j})(x),$

so that ${u_j}$ satisfies

$\displaystyle \frac{\partial u_j}{\partial t} = c_j \Delta u_j;$

then consider the quantity

$\displaystyle I(t) := \int{\int{u_1(x,t)^{1/p_1}u_2(y,t)^{1/p_2}u_3(x+y,t)^{1/p_3}}\,dx}\,dy;$

thus ${I(0)=I_0}$. A direct calculation shows that ${I'(t)\geq 0}$, thus ${I(t)\geq I_0}$. But by the change of variables ${x \mapsto t^{1/2}x, y\mapsto t^{1/2}y}$ and the relationship amongst the exponents, ${I(t)}$ is also equal to the quantity

$\displaystyle \int{\int{\left(t^{d/2} u_1(t^{1/2}x,t)\right)^{1/{p_1}}\left(t^{d/2} u_2(t^{1/2}y,t)\right)^{1/{p_2}}\left(t^{d/2} u_3(t^{1/2}(x+y),t)\right)^{1/{p_3}}}\,dy}\,dy.$

Now, as anticipated

$\displaystyle t^{d/2} u_j (t^{1/2} x, t) \rightarrow \mathcal{G}_j (x) \text{ (gaussian)},$

and thus by Fatou’s lemma [2]

$\displaystyle I_0 \leq \int{\int{\mathcal{G}_1 (x)^{1/{p_1}}\mathcal{G}_2 (y)^{1/{p_2}}\mathcal{G}_3 (x+y)^{1/{p_3}}}\,dx}\,dy.$

Finally, the gimmick is that since mass is preserved by the flow, then ${\|G_j^{1/{p_j}}\|_{L^{p_j}} = \left(\int{G_j}\,dx\right)^{1/{p_j}}=\left(\int{u_j (x,0)}\,dx\right)^{1/{p_j}} = \|f_j\|_{L^{p_j}}}$, and thus in the end

$\displaystyle \frac{\left\langle f_1 \ast f_2 , f_3 \right\rangle}{\|f_1\|_{L^{p_1}}\|f_2\|_{L^{p_2}}\|f_3\|_{L^{p_3}}} \leq \frac{\left\langle \mathcal{G}_1^{1/{p_1}} \ast \mathcal{G}_2^{1/{p_2}} , \mathcal{G}_3^{1/{p_3}} \right\rangle}{\|\mathcal{G}_1^{1/{p_1}}\|_{L^{p_1}}\|\mathcal{G}_2^{1/{p_2}}\|_{L^{p_2}}\|\mathcal{G}_3^{1/{p_3}}\|_{L^{p_3}}}.$

We remark here that the (Y) inequality is invariant w.r.t. the affine invertible transformations as well.

An interesting observation is that (Y) implies (H-Y) for even exponents, again thanks to the identity ${\widehat{f\ast g} = \widehat{f}\,\widehat{g}}$: indeed, let ${k\in\mathbb{N}}$, then by Plancherel and (Y)

$\displaystyle \|\widehat{f}\|_{L^{2k}}^{2k} = \|\widehat{f}^k\|_{L^2}^2 = \|\underbrace{f \ast \cdots \ast f}_{k}\|_{L^2}^2 \leq \|\underbrace{f \ast \cdots \ast f}_{k-1}\|_{L^{\frac{2k}{k+1}}}^2 \|f\|_{L^{\frac{2k}{2k-1}}}^2$

$\displaystyle \leq \cdots \leq \|f\|_{L^{\frac{2k}{2k-1}}}^{2k},$

and ${\frac{2k}{2k-1}}$ is the dual exponent to ${2k}$. This is indeed how Young himself proved this particular case of (H-Y), before Hausdorff proved the result for all exponents ${p'\geq 2}$.

1.3. Brunn-Minkowski inequality

Observe there are no upperbounds on ${|A+B|}$, since it can well be ${|A|=|B|=0}$ but ${|A+B|\neq 0}$ (e.g. take ${A=\{0\}\times [0,1]}$, ${B=[0,1]\times\{0\}}$, so that ${A+B = [0,1]^2}$).

In this case the extremizers are the convex sets: ${A}$ must be convex and ${B}$ must be homotetic to it (i.e. there’s a translation+homotety that is a bijection between ${A}$ and ${B}$). This was proved by Minkowski.

In this case we have again invariance w.r.t. the affine invertible transformations, because ${|T(A)| = |\det T| |A|}$ and the Minkowski sum ${A+B}$ is linear, so ${T(A+B) = T(A) + T(B)}$; moreover, Minkowski sum commutes with translation. Notice that convexity is preserved by affine invertible maps, thus convex sets are transformed into convex sets.

1.4. Riesz-Sobolev inequality

Notice the inequality actually holds for general positive functions if you use the symmetric decreasing rearrangements (but it is equivalent to this one). In the case of characteristic functions though the RHS has a particularly simple expression: for balls ${B_1, B_2, B_3}$ centered in the origin indeed

$\displaystyle \left\langle \chi_{B_1} \ast \chi_{B_2}, \chi_{B_3} \right\rangle = \int_{B_3}{|B_2 \cap (B_1+x)|}\,dx.$

In the case of the Riesz-Sobolev inequality for characteristic functions, the extremizers are triplets ${A,B,C}$ that are homotetic to the same ellipsoid – on the condition that

$\displaystyle |C|^{1/d} < |A|^{1/d} + |B|^{1/d}$

holds for all the permutations of ${A,B,C}$ (i.e. their “radii” are comparable). This was proved by Burchard in ’98. The reason why it has to be ellipsoids is the affine invariance that holds for this inequality as well: indeed, consider an invertible transformation ${T}$, then ${\chi_A \circ T = \chi_{T^{-1}A}}$ and ${(T^{-1} A)^\ast}$ is a ball of volume ${|T^{-1} A| = |A| |\det T|^{-1}}$, thus ${\chi_{(T^{-1}A)^\ast}(x) = \chi_{A^\ast}(|\det T|^{1/d} x)}$. Thus, since the inequality is certainly an equality for balls centered at the origin, it is so for balls transformed under ${GL(\mathbb{R}^d)}$ too, which are infact ellipsoids.

Here we can spot another relationship amongst the inequalities: using (R-S) we can prove (B-M). Indeed, consider measurable sets ${A,B}$ and notice that since ${A^\ast, B^\ast}$ are balls then ${|A^\ast + B^\ast| = (|A^\ast|^{1/d}+|B^\ast|^{1/d})^d= (|A|^{1/d}+|B|^{1/d})^d}$. Now it suffices to prove that ${A^\ast + B^\ast \subseteq (A+B)^\ast}$. Since ${\mathrm{Supp}(\chi_A \ast \chi_B) = A+B}$ (except maybe for a set of null measure [3]), by (R-S)

$\displaystyle \int{\chi_A\ast \chi_B}=\int_{A+B}{\chi_A\ast \chi_B} \leq \int_{(A+B)^\ast}{\chi_{A^\ast}\ast \chi_{B^\ast}},$

but ${\int{\chi_A\ast \chi_B} = |A||B|=|A^\ast||B^\ast| = \int{\chi_{A^\ast}\ast \chi_{B^\ast}}}$, and since ${\mathrm{Supp}({\chi_{A^\ast}\ast \chi_{B^\ast}}) = A^\ast + B^\ast}$ we have that ${(A+B)^\ast}$ contains at least this support.

I will have more to say about this in the near future because I’m planning to read the paper relative to the near-extremizers of this inequality [ChRS] (because it seems more approachable than the others, starting from zero as I do).

2. Christ’s results

Let me sum up in a table what has been said so far

 Invariance H-Y Y B-M R-S ${GL(\mathbb{R}^d)}$ ✓ ✓ ✓ ✓ translation ✓ ✓ ✓ ✓ modulation ✓ extremizers gaussians triplets of gaussians convex sets ellipsoids

Remark 1 There are connections amongst all the extremizers seen so far: besides the trivial fact that an ellipsoid is a convex body, we could notice that an ellipsoid ${\mathcal{E}}$ centered at the origin can be specified as

$\displaystyle \mathcal{E} = \{x \in \mathbb{R}^d \,:\, |\left\langle Qx, x\right\rangle|\leq 1\}$

for ${Q}$ a positive definite matrix; in particular, if ${\lambda_1, \ldots, \lambda_d}$ are the eigenvalues of ${Q}$, the principal axes of ${\mathcal{E}}$ have lengths ${\lambda_1^{-1}, \ldots, \lambda_d^{-1}}$. Thus we can associate gaussians and ellipsoids in a natural way.

One thing that Christ noticed is that all the above inequalities have some additive combinatorial structure. What does he mean by this? Well, first of all, consider the following objects:

Definition 1 A discrete multi-progression in ${\mathbb{R}^d}$ is a set of the form

$\displaystyle P = \{u + \sum_{i=1}^{r}{n_i v_i}\,:\, 0\leq n_i \leq N_i\},$

where the ${v_i}$‘s are linearly independent and the ${N_i}$ are arbitrary non-negative integers. One defines the rank of ${P}$ as ${\mathrm{rk} P = r}$. Notice it can be bigger than ${d}$.

Discrete multi-progressions are thought of as a generalization of arithmetic progressions, and they are affine-invariant objects: rank is preserved by an invertible linear transformation. Combining this with the previous observations one is led to

Definition 2 A continuum multi-progression in ${\mathbb{R}^d}$ is a set of the form

$\displaystyle Q = P+K$

where ${P}$ is a discrete multi-progression and ${K}$ is a convex compact set. The rank of ${Q}$ is defined as the rank of ${P}$.

A multiprogression of rank 2. The centers of the balls form a discrete multiprogression.

Thus affine invertible transformations preserve the continuum multi-progressions (from now on referred to simply as multi-progressions). These objects proved to be fundamental in the theory for (Y), (B-M) and (R-S), and finally for (H-Y) by partially reducing to (Y). They encode enough additive structure to be treated with combinatorial results. See below.

Another thing to notice is that the R-S inequality on (say) ${\mathbb{Z}^d}$ is counting the number of pairs ${(a,b)}$ s.t. ${a+b \in C}$. Analogously, one could see (Y) as a weighted estimate on sumsets.

In the rest of this section I will collect the results of Christ for the above inequalities, in the order given above.

Theorem 3 (Sharpened Hausdorff-Young inequality, [ChHY])
There exists a constant ${c>0}$ s.t. for every non-null ${f\in L^p(\mathbb{R}^d)}$, for ${1\leq p\leq 2}$, it holds

$\displaystyle \boxed{\|\widehat{f}\|_{L^{p'}}\leq \left(A_p^d - c \frac{\mathrm{dist}_{L^p}(f,\mathcal{G})^2}{\|f\|_{L^p}^2}\right)\|f\|_{L^p},} \ \ \ \ \ (5)$

where ${\mathcal{G}}$ is the set of all gaussian functions.

Moreover, if ${\mathrm{dist}_{L^p}(f,\mathcal{G}) \,\|f\|_{L^p}^{-1}}$ is sufficiently small, then

$\displaystyle \|\widehat{f}\|_{L^{p'}}\leq \left(A_p^d - B_{p,d} \frac{\mathrm{dist}_{L^p}(f,\mathcal{G})^2}{\|f\|_{L^p}^2}\right)\|f\|_{L^p} + \,o\left(\frac{\mathrm{dist}_{L^p}(f,\mathcal{G})^2}{\|f\|_{L^p}^2}\right)\|f\|_{L^p}^2, \ \ \ \ \ \ \ \ \ \ \text{(1')}$

where ${B_{p,d}:=\frac{(p-1)(2-p)}{2}A_p^d}$.

Prof. Christ also pointed out that ${B_{p,d}}$ is not sharp.

Theorem 4 (Near-extremizers for (Y), [ChY])
If ${f_1, f_2, f_3}$ are such that for some ${1\gg \delta>0}$

$\displaystyle |\left\langle f_1 \ast f_2, f_3\right\rangle|>(1-\delta)C_{p_1,p_2,p_3}^d \|f_1\|_{L^{p_1}}\|f_2\|_{L^{p_2}}\|f_3\|_{L^{p_3}},$

then there exists ${\varepsilon = \varepsilon(\delta)>0}$ and a triplet of gaussians ${(\mathcal{G}_1, \mathcal{G}_2, \mathcal{G}_3)}$ s.t.

$\displaystyle \|f_j - \mathcal{G}_j\|_{L^{p_j}} \leq \varepsilon(\delta) \|f\|_{L^{p_j}} \qquad j=1,2,3.$

Moreover, ${\varepsilon(\delta) \rightarrow 0}$ as ${\delta\rightarrow 0}$.

Theorem 5 (Near-extremizers for (B-M), [ChBM])
Let ${A,B\subset \mathbb{R}^d}$ be Borel measurable sets. If ${0<\delta \ll 1}$ and

$\displaystyle |A+B|^{1/d} < |A|^{1/d}+|B|^{1/d} + \delta \max\{|A|,|B|\}^{1/d},$

then there exists a convex set ${K}$ and an ${\varepsilon=\varepsilon(\delta)>0}$ s.t.

$\displaystyle A \subset K \quad \text{ and } |K|\leq (1+\varepsilon) |A|.$

Moreover, ${\varepsilon(\delta) \rightarrow 0}$ as ${\delta \rightarrow 0}$.

Theorem 6 (Sharpened Riesz-Sobolev inequality, [ChRS])
There exists a constant ${c_0>0}$ s.t. if ${A, B,C \subset \mathbb{R}^1}$ are measurable sets of finite measure for which (for a certain ${\delta\leq 1}$) it holds

$\displaystyle |A|+|B|>|C|+\delta \max\{|A|,|B|,|C|\},$

and the same holds for all permutations of ${A,B,C}$ (i.e. their sizes are all comparable), then

$\displaystyle \boxed{ \left\langle \chi_A \ast \chi_B , \chi_C\right\rangle \leq \left\langle \chi_{A^\ast} \ast \chi_{B^\ast} , \chi_{C^\ast}\right\rangle - c_0 \delta^2 \inf_{I \text{ interval}}{|A\Delta I|^2}. } \ \ \ \ \ (7)$

These comments will regard mainly the proof of Theorem 1, but because of the common structure of the inequalities there are several aspects of this proof that are shared by all of the proofs. In particular, I want to point out that the result for (B-M) is needed in the proof of (R-S), which is in turn needed for the proof of (Y), which is in turn needed to prove the result for (H-Y), as I will highlight below. All in all the proof for (H-Y) is a complicated one and I’m just trying to illustrate (and understand myself) its structure – I will be very sketchy. I’m not able to offer any insight on it at the moment.

Assume ${d=1}$ in the following.

A common method to prove existence of extremizers [4] is that of a pre-compactness argument: for a linear operator ${T:A\rightarrow B}$ that satisfies ${\|T f\|_B \leq C\|f\|_A}$, one seeks to prove that if a sequence of functions ${\{f_n\}_n}$ is s.t. ${\|f_n\|_A = 1}$ and ${\|Tf_n\|_B \rightarrow C}$, then there must be a subsequence ${\{f_{n_k}\}_k}$ that converges in ${A}$-norm (in other words, the sequence is pre-compact in ${A}$) and realizes equality. Part of the proof of Thm. 1 relies on a qualitative result of this kind,

Theorem 7 (Theorem ${\varepsilon}$) For every ${\varepsilon > 0}$, there exists ${\delta >0}$ s.t. if

$\displaystyle \|\widehat{f}\|_{L^{p'}} > (1-\delta)A_p^d \|f\|_{L^p}$

then

$\displaystyle \mathrm{dist}_{L^p}(f,\mathcal{G}) \leq \varepsilon \|f\|_{L^p}$

.

One can verify it is indeed equivalent to the above, with the exception that you conclude pre-compactness of a suitably normalized sequence, i.e. ${\{\tilde{f_{n_k}}\}_k}$ obtained by renormalizing with some element of the symmetry group of (H-Y) . This is then combined with formula (1′) to deduce Thm. 1. Formula (1′) looks like a Taylor asymptotic expansion, and indeed it is! It’s the asymptotic expansion of the functional ${f \mapsto \|\widehat{f}\|_{p'} / \|f\|_p}$ around a gaussian. Quite surprising, if you ask. Of course it isn’t trivial to come to (1′) at all, and indeed it required to prove a general lemma for the second variation of this and similar functionals, but I don’t want to further comment on it as it’s not as interesting to me as the other ideas in the proof.

This said, the rest of the proof boils down to the proof of Thm. ${\varepsilon}$.

Definition 8 A ${\delta}$-quasi-extremizer for an inequality like ${\|Tf\|_{L^q} \leq C\|f\|_{L^p}}$ is a function ${f}$ s.t. ${\|Tf\|_{L^q} > C \delta \|f\|}$.

A ${\delta}$-near-extremizer is a function ${f}$ s.t. ${\|Tf\|_{L^q} > (1-\delta) C \|f\|_{L^p}}$.

It is an observation that a ${\eta}$-quasi-extremizer of (H-Y) then satisfies

$\displaystyle \||f|^\gamma \ast |f|^\gamma\|_{L^r} > c \eta^2 \|f^\gamma\|_{L^s}^2$

for some particular values of ${\gamma, r, s}$, i.e. it is a quasi-extremizer for (Y) as well, thus reducing the theory for (H-Y) to that for (Y) (at least for quasi-extr.; but one can iterate to tackle the near-extr. as well). The relationship between the two inequalities is really important, as you see. Proof: Indeed, assume ${\|f\|_{L^p}=1}$, and consider the family

$\displaystyle \{F_z = f |f|^{z-1}\}_{z\in \mathbb{C}},$

which is analytic in ${z}$ and s.t. ${\|F_z\|_{L^{q}}=\|f\|_{q\Re(z)}^{\Re(z)}}$, and in particular for ${\Re (z) = p/2}$ it is ${\|F_z\|_{L^{2}}=\|f\|_{p}^{p/2}=1}$. We want to use the Three Lines Lemma to interpolate: we choose strip ${p/2 \leq \Re(z) \leq s, and on the left we just saw ${\|\widehat{F_z}\|_{L^{2}}}$ is constant (by Plancherel); as for ${\Re(z) = s}$, ${F_z \in L^{p/s}}$ and therefore ${\widehat{F_z} \in L^{(p/s)'}}$. It follows

$\displaystyle \eta < \|\widehat{f}\|_{L^{p'}}=\|\widehat{F_1}\|_{L^{p'}}\leq \sup_{\Re(z) = s}{\|\widehat{F_z}\|_{L^{(p/s)'}}^\theta},$

where ${1 = \frac{1-\theta}{2}+\frac{\theta}{s}}$, i.e. ${\theta =s (2-s)^{-1}}$. Now, there exists ${\zeta}$ that realizes at least half of the supremum, and ${\|\widehat{F_\zeta}\|_{L^{(p/s)'}}^2 = \|\widehat{F_z}\widehat{F_z}\|_{L^{(p/s)'/2}}\leq A_p^d \|F_\zeta \ast F_\zeta\|_{L^{((p/s)'/2)'}}}$, and since ${|F_\zeta \ast F_\zeta|\leq |f|^s \ast |f|^s}$ we have

$\displaystyle \||f|^s \ast |f|^s\|_{L^{((p/s)'/2)'}} > c \eta^\gamma \|f^s\|_{L^{p/s}}^2,$

with ${\gamma = 2/\theta = 2(2-s)/s}$. One can verify we can take ${s}$ sufficiently close to ${p}$ in order for the exponents to make sense. $\Box$

Now one has to resort to the theory for (Y), a theory that relies heavily on the additive structure. In particular, a quasi extremizer for (Y) must concentrate its ${L^p}$ mass on a multiprogression of controlled rank and measure. Rigourously

Lemma 9 (Quasi-extremizers for (Y)) Suppose ${f}$ is a ${\delta}$-quasi-extremizer of (Y) in the sense that ${\|f\ast f\|_{L^r} > \delta \|f\|_{L^p}^2}$. Then there exists a disjoint decomposition of ${f}$ as

$\displaystyle f=g+h$

with ${h}$ small in the sense that ${\|h\|_{L^p} \leq (1-c \delta^\alpha) \|f\|_{L^p}}$ and ${g}$ structured, in the sense that there exists multiprogression ${P}$ s.t.

• ${\mathrm{Supp}(g) \subset P}$;
• ${\mathrm{rk}(P) \lesssim_\delta 1}$;
• ${\|g\|_{L^\infty} |P|^{1/p} \lesssim_\delta \|f\|_{L^p}}$.

One can adapt this lemma to the case of quasi-extremizers of (H-Y) by the connection pointed out above. And then, one can address the case of near-extremizers of (H-Y) instead, by choosing a parameter ${\varepsilon>0}$ and applying the lemma for quasi-extremizers iteratively ${O_{\varepsilon,p}(1)}$ times to the “remainder” ${h}$ of a ${\delta(\varepsilon)}$-near extremizer ${f}$, thus obtaining ${f = h_N + g_1 + \ldots + g_N}$, with ${\|h_N\|_{L^p} \leq \varepsilon \|f\|_{L^p}}$ and ${g_j}$‘s each supported on some multiprogression ${P_j}$ controlled in terms of ${\varepsilon}$. ${\delta(\varepsilon)}$ is chosen small enough.

One can further refine this decomposition in such a way that all the multiprogressions are contained in a single multiprogression of controlled rank and size (comparable to the biggest multiprogression in the decomposition), and the decomposition is then simply ${f=g+h}$, thus eliminating the problem of having to deal with multiple multiprogressions at once.

Now by scaling one can reduce to the case ${|P|\sim 1}$. If ${P}$ is then made of a single piece, one can deal with it directly because it’s a bounded interval (remember ${d=1}$); the opposite case is that ${P}$ is a proper multiprogression. This case resembles the bad example of ${\chi_E}$ in section 1 : it’s an effect of the non-locality that’s featured in (H-Y). Part of the proof is then devoted to prove that this case is actually incompatible with the assumption of near-extremizing.

Assume then ${g}$ is supported in a neighbourhood of ${\mathbb{Z}}$ in ${\mathbb{R}}$. Since near extremizers for the (H-Y) inequality in ${\mathbb{Z}}$ are indeed all concentrated in one point, the idea is to prove that the same happens for ${g}$ by showing it is essentially a function on ${\mathbb{Z}}$. In general, one lifts ${f\,:\, \mathbb{R} \rightarrow \mathbb{C}}$ supported in ${\bigcup_{j\in\mathbb{Z}}[j-\delta, j+\delta]}$ to ${F(n,x)\,:\,\mathbb{Z}\times \mathbb{R}\rightarrow \mathbb{C}}$ by defining

$\displaystyle F(n,x) := f(x+n).$

It can be proved that if such a ${f}$ is a near-extremizer for (H-Y) in ${\mathbb{R}}$ then ${F}$ is a near-extremizer for (H-Y) in ${\mathbb{Z}\times \mathbb{R}}$. This in turn implies that for most ${x}$‘s the function ${F(\cdot, x)}$ is a near extremizer of (H-Y) in ${\mathbb{Z}}$ and therefore ${F(\cdot, x)}$ has ${L^p}$ mass concentrated in one point. This is translated back to ${f}$ yielding that ${f}$ has ${L^p}$ mass nearly all concentrated on an interval ${I}$.

Plot-twist: if ${f}$ is a near-extremizer of (H-Y), then so is ${|\widehat{f}|^{p'-2}\widehat{f}}$! And then everything that’s been said of ${f}$ so far applies to ${\widehat{f}}$ as well, and ${\widehat{f}}$ in particular is concentrated on an interval ${J}$. Indeed, if we set ${g = |\widehat{f}|^{p'-2}\widehat{f}}$, then ${\|g\|_{L^p} = \|\widehat{f}\|_{L^{p'}}^{p'-1}}$; then

$\displaystyle \|\widehat{f}\|_{L^{p'}}^{p'} = \int{g \overline{\widehat{f}}} = \int{\widehat{g} \overline{f}}\leq \|\widehat{g}\|_{L^{p'}}\|f\|_{L^p},$

and therefore

$\displaystyle \|g\|_{L^p} \frac{\|\widehat{f}\|_{L^{p'}}}{\|f\|_{L^p}}\leq \|\widehat{g}\|_{L^{p'}},$

but ${\|\widehat{f}\|_{L^{p'}}\|f\|_{L^p}^{-1} > (1-\delta)A_p^d}$, therefore ${ \|\widehat{g}\|_{L^{p'}} > (1-\delta)A_p^d \|g\|_{L^p}}$, which is the exact definition of ${\delta}$-near extremizer for ${g}$.

So, we’ve said so far that if ${f}$ is a near extremizer then it’s concentrated in a time-frequency tile ${I\times J}$. The uncertainty principle tells us that ${|I||J|\gtrsim 1}$, but Christ proves that by construction one has a reverse Heisenberg inequality too, and then ${|I||J|\sim_\varepsilon 1}$, which also implies one can obtain ${|I|+|J|\lesssim_\varepsilon 1}$. At this point, it should be heuristically evident that our function will be close to being a gaussian.

Nevertheless, remember we were interested in the precompactness result of Thm. ${\varepsilon}$. The last result for the time-frequency support can be used to conclude that for a sequence ${\{f_n\}_n}$ s.t. ${\|f_n\|_{L^p}=1}$ as before and ${\|\widehat{f}\|_{L^{p'}}\rightarrow A_p^d}$, there must exist a subsequence of renormalized elements ${\{F_{n_k}\}_k}$ (i.e. ${F_{n_k}}$ is obtained from ${f_{n_k}}$ by dilation, translation and modulation, each preserving ${L^p}$ norm) s.t. ${\{\widehat{F_{n_k}}\}_k}$ is convergent in ${L^{p'}}$. Sadly, we wanted convergence of ${\{F_{n_k}\}_k}$ in ${L^p}$ instead, but we can deduce it from the one for the Fourier transforms because ${\{f_n\}_n}$ is an extremizing sequence and the unit ball in ${L^p}$ is ${\mathrm{weak}^\ast}$-compact. This is all.

You might’ve noticed I’ve become progressively sketchier in the comments, and the reason is the one I’ve pointed out above: it is indeed a very complicated proof, and a great achievement. Also, I’ve followed the outline of the proof in [ChHY].

I think this is enough for this time, but hope to come back on the subject in the near future.

footnotes:

[1] although it is 1 for other Locally Compact Abelian groups like ${\mathbb{T}^d}$ and ${\mathbb{Z}^d}$.
[2] notice ${\mathcal{G}_j^{1/{p_j}}}$ is still a gaussian.
[3] consider the Lebesgue points of the sets…
[4] but it has applications to PDEs as well.

References:

[Be] W. Beckner, Inequalities in Fourier analysis, Annals of Math., 102, 159-182, 1975.

[ChHY] M. Christ, A sharpened Hausdorff-Young inequality, arXiv:1406.1210 [math.CA]

[ChY] M. Christ, Near extremizers of Young’s inequality in ${\mathbb{R}^d}$, arXiv:1112.4875 [math.CA]

[ChBM] M. Christ, Near equality in the Brunn-Minkowski inequality, arXiv:1207.5062 [math.CA]

[ChRS] M. Christ, Near equality in the Riesz-Sobolev inequality, arXiv:1309.5856 [math.CA]