# Hausdorff-Young inequality and interpolation

The Hausdorff-Young inequality is one of the most fundamental results about the mapping properties of the Fourier transform: it says that

$\displaystyle \| \widehat{f} \|_{L^{p'}(\mathbb{R}^d)} \leq \|f\|_{L^p(\mathbb{R}^d)}$

for all ${1 \leq p \leq 2}$, where $\frac{1}{p} + \frac{1}{p'} = 1$. It is important because it tells us that the Fourier transform maps $L^p$ continuously into $L^{p'}$, something which is not obvious when the exponent ${p}$ is not 1 or 2. When the underlying group is the torus, the corresponding Hausdorff-Young inequality is instead

$\displaystyle \| \widehat{f} \|_{\ell^{p'}(\mathbb{Z}^d)} \leq \|f\|_{L^p(\mathbb{T}^d)}.$

The optimal constant is actually less than 1 in general, and it has been calculated for $\mathbb{R}^d$ (and proven to be optimal) by Beckner, but this will not concern us here (if you want to find out what it is, take ${f}$ to be a gaussian). In the notes on Littlewood-Paley theory we also saw (in Exercise 7) that the inequality is false for ${p}$ greater than 2, and we proved so using a randomisation trick enabled by Khintchine’s inequality1.

Today I would like to talk about how the Hausdorff-Young inequality (H-Y) is proven and how important (or not) interpolation theory is to this inequality. I won’t be saying anything new or important, and ultimately this detour into H-Y will take us nowhere; but I hope the ride will be enjoyable.

# 1. Classical proofs of the Hausdorff-Young inequality

When I was a lowly undergraduate, I was baffled by ${L^p}$-norm inequalities that were stated for exponents ${p}$ in a continuous range. I reasoned that – based on my very limited experience – most times one can only prove such inequalities for specific exponents; the number of instances in which one can prove an inequality for a whole range of exponents directly must be very scarce I thought, depending heavily on your expressions having a nice structure. [What I had in mind were examples from linear analysis like Hölder’s or Young’s inequality and the like.]
It turns out that, in a sense, although very naive at the time I was right: now that I have more experience I know that indeed it is true that most of the times we can only prove inequalities for very specific exponents! The piece of the puzzle that I was missing was the interpolation theory that allows one to take two different estimates and get “all the ones in between”.

For example, the Hausdorff-Young inequality is commonly proven as follows: when ${p=2}$, the inequality is actually an equality, commonly known as Plancherel’s identity:

$\displaystyle \|\widehat{f}\|_{L^2} = \|f\|_{L^2}.$

When ${p=1}$ instead, the inequality is a simple consequence of the triangle inequality for integrals and the fact that ${|e^{it}| = 1}$:

$\displaystyle |\widehat{f}(\xi)| = \Big|\int f(x) e^{-2\pi i \xi \cdot x} \,dx\Big| \leq \int |f(x)| \,dx = \|f\|_{L^1}.$

Appealing to the complex interpolation result commonly known as the Riesz-Thorin theorem we can conclude (since the Fourier transform is a linear map) that the Fourier transform will also be bounded from $L^{p_\theta} \to L^{q_\theta}$ for all $\theta \in [0,1]$ and exponents ${p_\theta, q_\theta}$ that are obtained by linear interpolation of the inverse exponents above: namely,

\displaystyle \begin{aligned} \frac{1}{p_{\theta}} =& \frac{1 - \theta}{1} + \frac{\theta}{2}, \\ \frac{1}{q_{\theta}} =& \frac{1 - \theta}{\infty} + \frac{\theta}{2}, \end{aligned}

with the understanding that $\frac{1}{\infty} = 0$. One can see that $\frac{1}{p_\theta} + \frac{1}{q_\theta} =1$, which simply means $q_\theta = (p_\theta)'$; moreover, we have $L^{p_0} \to L^{q_0} = L^{1} \to L^{\infty}$ and $L^{p_1} \to L^{q_1} = L^2 \to L^2$, which are the inequalities above.

Notice that complex interpolation between inequalities with constants $A,B$ yields an intermediate inequality with constant $A^{1- \theta} B^{\theta}$; in this particular case, $A = B = 1$ and therefore we obtain constant 1 for all intermediate values of ${p}$ with this method.

However, one might argue that using complex interpolation is a bit too much. The Riesz-Thorin theorem is a consequence of Hadamard’s three-lines lemma from complex analysis, which itself is a consequence of the maximum modulus principle that says that a holomorphic/harmonic function reaches its maximum at the boundary of the domain. So, complex interpolation is the result of some elementary but very deep property of holomorphic functions, and one might argue that such a proof is not a fully real-variable proof of the Hausdorff-Young inequality if we have to bring complex analysis into the mix. We can ask the following questions:

Q1: Is there a proof of the Hausdorff-Young inequality that does not use complex interpolation?

Q2: Even better, is there a proof of the Hausdorff-Young inequality that does not use any interpolation at all?

The first question has a trivial answer and a not-so-trivial one. The trivial answer is that, since ${p' \geq p}$ when $p \leq 2$, we can use real interpolation between Lorentz spaces instead of complex interpolation, and reach the same conclusion.

Recall that real interpolation between Lorentz spaces is an extension of Marcinkiewicz real interpolation to Lebesgue exponents that are not necessarily equal. In particular, it says that if one has a sub-linear operator $T$ such that $T$ is

$\displaystyle L^{p_0} \to L^{q_0,\infty} \quad \text{ and } \quad L^{p_1} \to L^{q_1,\infty} \quad \text{ bounded}$

(with $p_0 \neq p_1$ and $q_0 \neq q_1$), then for all $1 \leq r \leq \infty$ and $\theta \in (0,1)$ the operator is also2

$\displaystyle L^{p_\theta, r} \to L^{q_\theta,r} \quad \text{ bounded.}$

When $q_\theta \geq p_\theta$ this implies the strong $L^{p_\theta} \to L^{q_\theta}$ boundedness of $T$, because by choosing $r = p_\theta$ and standard properties of Lorentz spaces one has

$\displaystyle \|Tf\|_{L^{q_\theta}} = \|Tf\|_{L^{q_\theta,q_\theta}} \leq \|Tf\|_{L^{q_\theta,p_\theta}} \lesssim \|f\|_{L^{p_\theta,p_\theta}} = \|f\|_{L^{p_{\theta}}}.$

In particular, we see that the above applies to the Fourier transform operator, and therefore one can conclude the Hausdorff-Young inequality. However, it should be noticed that while complex interpolation gives a clean constant $\leq A^{1-\theta} B^{\theta}$ (where $A = \|T\|_{L^{p_0} \to L^{q_0}}, B = \|T\|_{L^{p_1} \to L^{q_1}}$ are the constants for the known inequalities we want to interpolate between), real interpolation gives instead a “polluted” constant of $\lesssim_{p_0,p_1,q_0,q_1,r,\theta} A^{1-\theta} B^{\theta}$, that is a constant that is affected by a numerical factor that depends on the various exponents involved and crucially one that blows up near the endpoints, that is when $\theta \to 0 \text{ or } 1$.

Further remark: While the caveat about the polluted constants resulting from real interpolation is important in general, in the specific situation of the H-Y inequality one must not panic. Indeed, the Fourier transform tensorises nicely, and as a consequence one can remove whatever spurious constants obtained by using the tensor power trick (see Tao’s post about it).

This certainly constitutes a fully real-variable proof of the Hausdorff-Young inequality, but it relies on a more sophisticated real interpolation technique involving the theory of Lorentz spaces, instead of the simpler Marcinkiewicz real interpolation (which corresponds to the case $p_0 = q_0$ and $p_1 = q_1$ in the above). Someone who really cares for the simplest possible proof might ask whether we can’t maybe do even better and use a simpler real interpolation technique (such as Marcinkiewicz’s). Indeed, it turns out that we can, if we throw in the mix some rearrangement inequalities of Hardy, Littlewood and Paley! That would be the not-so-trivial answer mentioned before. More on this below, in the next section.

Regarding the second question, consider first the following elementary argument (which works not just for $\mathbb{R}^d$ and $\mathbb{T}^d$ but for generic abelian groups). It will prove Hausdorff-Young for some special exponents (besides the ${p=1,2}$ already considered) and was originally given by Young himself3.

Young’s argument: Quite simply, we take an exponent ${p}$ such that ${p'}$ is even: say $p = \frac{2k}{2k - 1}$, so that ${p' = 2k}$, where ${k \geq 1}$ is an integer. We observe that

$\displaystyle \|\widehat{f}\|_{L^{2k}} = \|(\widehat{f})^k \|_{L^2}^{1/k},$

and since $\widehat{f}\widehat{g} = \widehat{f \ast g}$ we have by Plancherel’s identity that

$\displaystyle \|(\widehat{f})^k \|_{L^2}^{1/k} = \|f \ast \ldots \ast f \|_{L^2}^{1/k},$

where there are ${k}$ copies of ${f}$ in the convolution at the RHS. At this point we simply apply Young’s convolution inequality many times: one can check that for generic functions $f_1, \ldots, f_k$ and exponents ${q}$ and $p_1, \ldots, p_k$ satisfying

$\displaystyle k-1+\frac{1}{q} = \frac{1}{p_1} + \ldots + \frac{1}{p_k}$

it holds that

$\displaystyle \|f_1 \ast \ldots \ast f_k\|_{L^q} \leq \|f_1\|_{L^{p_1}} \cdot \ldots \cdot \|f_k\|_{L^{p_k}}.$

Applied to $\|f \ast \ldots \ast f \|_{L^2}$, this generalised Young’s inequality says that

$\displaystyle \|f \ast \ldots \ast f \|_{L^2} \leq \|f\|_{L^{p_0}}^k$

for the exponent ${p_0}$ such that $k - 1 + \frac{1}{2} = \frac{k}{p_0}$; but this exponent is precisely $\frac{2k}{2k - 1} = (2k)'$ as we wanted!

This simple argument shows the Hausdorff-Young inequality for all the values of ${p}$ for which ${p'}$ is even, but it does not say anything about those other exponents in between. I am not aware of an argument that might prove H-Y on $\mathbb{R}^d \text{ or } \mathbb{T}^d$ for all exponents without using any real interpolation, but there is such a proof for H-Y on finite abelian groups! The argument involves the tensor power trick and is presented in Section 3 below.

# 2. A proof of Hausdorff-Young on $\mathbb{R}^d$ or $\mathbb{T}^d$ using simple REAL interpolation (and rearrangements)

I learned about this proof from my colleague Odysseas Bakas, who learned it from Zygmund’s “Trigonometric series” (the full reference is: Volume II, Chapter XII, section 5). It is based on an inequality for certain rearrangements that is commonly attributed to Paley, but Paley’s contribution was actually in extending the inequality below to general orthonormal systems. The proof of this version for the trigonometric system is originally due to Hardy and Littlewood instead – but I won’t be sticking to their presentation because it is quite different from today’s taste.

The proof itself only works for $\mathbb{T}$ or $\mathbb{R}$, but this is alright because once you have the $d=1$ case sorted, the higher dimensional cases of $\mathbb{T}^d, \mathbb{R}^d$ follow inductively from Minkowski’s inequality and the fact that the Fourier transform “tensorises”. Indeed, assume that you have proven the inequality for dimensions $1$ and $d-1$; write $\xi \in \mathbb{R}^d$ as $\xi = (\xi_1, \xi') \in \mathbb{R} \times \mathbb{R}^{d-1}$ and let $\mathcal{F}_{1}$ denote the Fourier transform in the first component and $\mathcal{F}'$ in the last $d-1$ components (thus $\mathcal{F}_1 \mathcal{F}' f = \widehat{f}$). Then we have

\displaystyle \begin{aligned} \|\widehat{f}\|_{L^{p'}(\mathbb{R}^d)} = & \Big(\int_{\mathbb{R}^d} |\widehat{f}(\xi)|^{p'} \, d\xi\Big)^{1/{p'}} \\ = & \Big(\int_{\mathbb{R}^{d-1}}\int_{\mathbb{R}} |\mathcal{F}_1 \mathcal{F}'f(\xi_1, \xi')|^{p'} \, d\xi_1 d \xi'\Big)^{1/{p'}} \\ \underset{(\text{H-Y in dim } 1)}{\leq} & \Big(\int_{\mathbb{R}^{d-1}} \Big[\int_{\mathbb{R}} | \mathcal{F}'f(x_1, \xi')|^{p} \, dx_1 \Big]^{p'/p} d \xi'\Big)^{1/{p'}} \\ \underset{(\text{Mink})}{\leq} & \Big[ \int_{\mathbb{R}} \Big(\int_{\mathbb{R}^{d-1}} | \mathcal{F}'f(x_1, \xi')|^{p'} \,d \xi' \Big)^{p/{p'}} dx_1 \Big]^{1/p} \\ \underset{(\text{H-Y in dim } d-1)}{\leq} & \Big[ \int_{\mathbb{R}} \Big(\int_{\mathbb{R}^{d-1}} |f(x_1, x')|^{p} \,dx' \Big)^{p/p} dx_1 \Big]^{1/p} \\ = & \|f\|_{L^p (\mathbb{R}^d)}. \end{aligned}

The argument for $\mathbb{T}^d$ is entirely similar. Notice that we have been able to apply Minkowski because $p \leq p'$ – if ${p}$ were bigger than 2, the reverse would hold and the argument would not work!

## 2.1. Proof of H-Y on $\mathbb{T}$ (with rearrangements)

We will next prove the H-Y inequality on $\mathbb{T}$ for convenience, and at the end of the section we will indicate the modifications needed to adapt the proof to $\mathbb{R}$. First we will prove the inequality of Paley(-Hardy-Littlewood) using Marcinkiewicz’s real-interpolation, a fully real-variable method. In order to avoid confusion, we prove a preliminary version that is more straightforward:

Preliminary Paley(-Hardy-Littlewood)’s inequality: Let ${1 < p \leq 2}$ and $f \in L^p(\mathbb{T})$. Then

$\displaystyle \Big(\sum_{k \in \mathbb{Z}} \frac{|\widehat{f}(k)|^p (1 + |k|)^p}{(1 + |k|)^2} \Big)^{1/p} \lesssim_p \|f\|_{L^p(\mathbb{T})}. \ \ \ \ \ \ (1)$

Proof:
When $p=2$, inequality (1) is actually an equality, and more precisely Plancherel’s identity – you can verify this by yourself.
While the inequality above does not (necessarily) hold when ${p=1}$, we claim that a weak-type inequality does hold at this endpoint; this will enable us to conclude the inequality for the stated range of exponents by using Marcinkiewicz interpolation.
Notice indeed that the LHS of (1) is the $\ell^{p}(\mathbb{Z}; (1 + |k|)^{-2})$ (that is, a weighted $\ell^p$ norm) of the sequence $(\widehat{f}(k) (1 + |k|))_{k \in \mathbb{Z}}$. What we claim therefore is that

$\displaystyle \big\|(\widehat{f}(k) (1 + |k|))_{k \in \mathbb{Z}}\big\|_{\ell^{1,\infty}(\mathbb{Z}; (1 + |k|)^{-2})} \lesssim \|f\|_{L^1(\mathbb{T})};$

equivalently, we are claiming that for all $\lambda$ > 0 we have

$\displaystyle \lambda \sum_{k : |\widehat{f}(k)|(1 + |k|) \geq \lambda} \frac{1}{(1 + |k|)^2} \lesssim \|f\|_{L^1(\mathbb{T})}. \ \ \ \ \ \ (2)$

Notice that the $p=2$ case of the inequality we are trying to prove can also be rephrased as $\big\|(\widehat{f}(k) (1 + |k|))_{k \in \mathbb{Z}}\big\|_{\ell^{2}(\mathbb{Z}; (1 + |k|)^{-2})} \lesssim \|f\|_{L^2(\mathbb{T})}$.
(2) is actually a very simple consequence of the trivial Hausdorff-Young inequality for $p=1$, that is the fact that $|\widehat{f}(k)| \leq \|f\|_{L^1(\mathbb{T})}$. Indeed, this implies that the LHS of (2) is bounded by

$\displaystyle \lambda \sum_{k : \|f\|_{L^1(\mathbb{T})}(1 + |k|) \geq \lambda} \frac{1}{(1 + |k|)^2} = \lambda \sum_{|k| +1 \geq \lambda \|f\|_{L^1(\mathbb{T})}^{-1}} \frac{1}{(1 + |k|)^2};$

clearly, the RHS of the last expression is comparable to

$\displaystyle \lambda \Big(\frac{\lambda}{\|f\|_{L^1(\mathbb{T})}} \Big)^{-1} = \|f\|_{L^1(\mathbb{T})},$

which is precisely the claim.
The proof is therefore concluded by appealing to Marcinkiewicz’s real-interpolation theorem, applied to the linear operator $f \mapsto \widehat{f}(k) (1 + |k|)$.

$\square$

Now we are able to approach the full Paley(-Hardy-Littlewood) inequality; its statement will require a small definition:

Definition: Let $(c_k)_{k \in \mathbb{Z}}$ be a sequence in $\mathbb{C}$ such that

$\displaystyle \lim_{k \to \pm \infty} |c_k| = 0.$

We define the rearrangement $(c_k^{\ast})_{k \in \mathbb{Z}}$ to be the sequence given by

$\displaystyle c_k^{\ast} = \text{the }k\text{-th largest element of } (|c_k|)_{k\geq 0}$

when $k \geq 0$, and

$\displaystyle { c_k^{\ast} = \text{the }(-k)\text{-th largest element of } (|c_k|)_{k<0} }$

when ${k}$ < 0.

Notice that, according to the definition, we are rearranging the sequence separately in the positive and negative indices.
Paley’s inequality is as follows:

Paley(-Hardy-Littlewood)’s inequality: Let ${1 < p \leq 2}$ and $f \in L^p(\mathbb{T})$. Then

$\displaystyle \Big(\sum_{k \in \mathbb{Z}} \frac{|\widehat{f}(k)^\ast|^p (1 + |k|)^p}{(1 + |k|)^2} \Big)^{1/p} \lesssim_p \|f\|_{L^p(\mathbb{T})}, \ \ \ \ \ \ (3)$

where $\widehat{f}(k)^\ast$ is the rearranged sequence of the Fourier coefficients of ${f}$.

The statement is virtually identical to the one for the preliminary version (1) above, save for the fact that we now have the rearranged Fourier coefficients at the LHS. What difference does this make? Well, in terms of the strength of the inequality, the latter is certainly stronger: indeed, it is not hard to see that the LHS of (3) is always LARGER than the one of (1). This is due to the fact that, since $p \leq 2$, the resulting numerical coefficient $(1 + |k|)^{p-2}$ is decreasing, and therefore by rearranging the Fourier coefficients by order of magnitude we are concentrating the entire mass where these numerical coefficients are largest.
Moreover, another difference that must be appreciated is that we cannot prove (3) using word-by-word the same proof that we used for (1), because the operator $f \mapsto \widehat{f}(k)^\ast (1 + |k|)$ is not even sublinear! However, a small trick will allow us to repeat essentially the same proof.

Proof:
The trick mentioned above is simple: we linearise the LHS! Indeed, for any $f \in L^p(\mathbb{T})$, there exists at least one permutation $\sigma$ of $\mathbb{Z}$ such that

$\displaystyle \widehat{f}(k)^\ast = \widehat{f}(\sigma(k))$

(by permutation I mean a bijection of $\mathbb{Z}$ with itself). Since the LHS of (3) is a sum of positive quantities, we can always rearrange the summation; in particular, we have

$\displaystyle \sum_{k \in \mathbb{Z}} \frac{|\widehat{f}(k)^\ast|^p (1 + |k|)^p}{(1 + |k|)^2} = \sum_{k \in \mathbb{Z}} \frac{|\widehat{f}(k)|^p (1 + |\sigma^{-1}(k)|)^p}{(1 + |\sigma^{-1}(k)|)^2}.$

It suffices therefore to prove that the linear operator $T_{\sigma}g(k) := \widehat{g}(k) (1 + |\sigma^{-1}(k)|)$ satisfies

$\displaystyle \|T_{\sigma}g\|_{\ell^{p}(\mathbb{Z};(1 + |\sigma^{-1}(k)|)^{-2})} \lesssim_p \|g\|_{L^p(\mathbb{T})},$

for then we can apply this inequality to the fixed function $f$ above; but this can now be done by repeating verbatim the proof of (1), since the specific weight is irrelevant for the real-interpolation argument.

$\square$

Now we can finally prove the Hausdorff-Young inequality; from the proof, it will be clear why we needed to bother with the rearrangements.

Proof of H-Y using Paley(-Hardy-Littlewood)’s inequality: We will show that $\|\widehat{f}\|_{\ell^{p'}}$ is always dominated by the LHS of (3).
Assume we have only non-negative frequencies, for ease of notation, and actually assume $\widehat{f}(0) = \widehat{f}(1)=0$ as well so that I don’t have to write them over and over (you can add them back in yourself without changing the argument one bit). We have

$\displaystyle \sum_{k \geq 2} |\widehat{f}(k)^\ast|^p (1 + |k|)^{p-2} = \sum_{j \geq 0} \sum_{k= 1 + 2^j}^{2^{j+1}} |\widehat{f}(k)^\ast|^p (1 + |k|)^{p-2},$

and by the fact that we have rearranged the Fourier coefficients we have that the above is bounded from below by

\displaystyle \begin{aligned} \sum_{j \geq 0} \sum_{k= 1 + 2^j}^{2^{j+1}} |\widehat{f}(2^{j+1})^\ast|^p (1 + |k|)^{p-2} \sim & \sum_{j \geq 0} \sum_{k= 1 + 2^j}^{2^{j+1}} |\widehat{f}(2^{j+1})^\ast|^p 2^{j(p-2)} \\ \sim &\sum_{j \geq 0} |\widehat{f}(2^{j+1})^\ast|^p 2^{j(p-1)}. \end{aligned}

Observe now this convenient little miracle: we always have $p-1 = \frac{p}{p'}$. But then, since $\|\cdot\|_{\ell^{p}} \geq \|\cdot\|_{\ell^{p'}}$ (which only holds because $p' \geq p$, which in turn is only true because $p\leq 2$), we have

$\displaystyle \Big(\sum_{j \geq 0} |\widehat{f}(2^{j+1})^\ast|^p 2^{jp/{p'}} \Big)^{1/p} \geq \Big( \sum_{j \geq 0} |\widehat{f}(2^{j+1})^\ast|^{p'} 2^{j} \Big)^{1/{p'}}.$

It is easy to check – again because of the fact that $\widehat{f}(k)^\ast$ is a non-increasing rearrangement – that the latter quantity is

$\displaystyle \sim \Big(\sum_{k \geq 2} |\widehat{f}(k)|^{p'}\Big)^{1/{p'}} = \|\widehat{f}\|_{\ell^{p'}}$

(yes, without the rearrangement now), which is precisely the LHS of the Hausdorff-Young inequality for $\mathbb{T}$! The proof of H-Y on $\mathbb{T}$ using only real-interpolation is now concluded. $\square$

## 2.2. Proof of H-Y on $\mathbb{R}$ (with rearrangements)

Now, before we move to the next section, we need to address the modifications needed for when the group is $\mathbb{R}$ instead of $\mathbb{T}$. They are not too complicated, but some care is required because of some technicalities.
First of all, the inequality that takes the place of the preliminary (1) is

$\displaystyle \Big(\int_{\mathbb{R}} |\widehat{f}(\xi)|^p |\xi|^{p-2} \,d\xi \Big)^{1/p} \lesssim_p \|f\|_{L^p(\mathbb{R})}, \ \ \ \ \ \ \ (1')$

in the same range $p \in (1,2]$; the inequality that takes the place of (3) is then

$\displaystyle \Big(\int_{0}^{\infty} [(\widehat{f})^{\ast}(s)]^p s^{p-2} \,ds \Big)^{1/p} \lesssim_p \|f\|_{L^p(\mathbb{R})}, \ \ \ \ \ \ \ (3')$

where $(\widehat{f})^{\ast}(s)$ is the decreasing rearrangement of $\widehat{f}$, which for a given function $g$ is defined as

$\displaystyle g^{\ast}(s) := \inf \big\{t \, : \, |\{\xi \in \mathbb{R} \,:\, |g(\xi)|>t \}| \leq s \big\}$

(it is not obvious that this is the correct choice, though it is the first guess one might make; but the proof goes through in the end). See the post on Lorentz spaces about the properties of these rearrangements.
The proof of (1′) goes pretty much the same as the proof of (1): the operator $Tf(\xi) := \widehat{f}(\xi) \xi$ is linear and the inequality to be proven is $\|Tf\|_{L^p(|\xi|^{-2}d\xi)} \lesssim_p \|f\|_{L^p}$; when $p=2$ this is again Plancherel, and when $p=1$ the weak inequality $\|Tf\|_{L^{1,\infty}(|\xi|^{-2}d\xi)} \lesssim \|f\|_{L^1}$ can be proven verbatim; Marcinkiewicz interpolation finishes the proof.
However, the proof of (3′) is a bit trickier. Indeed, we can no longer linearise the rearrangement!

It is FALSE in general that there exists a measurable function $\sigma$ such that

$\displaystyle g^{\ast} = g \circ \sigma.$

[Curiously, the reverse is true, in the sense that there always exists a measurable (and measure-preserving) function $\sigma'$ such that

$\displaystyle g = g^{\ast} \circ \sigma';$

unfortunately, this does not help us. See this paper of John V. Ryff for details.]

So, we cannot use the same strategy we used for $\mathbb{T}$, and we are stuck with operator $Tf(s):= (\widehat{f})^{\ast}(s) s$ which is not even sub-linear. However, a little reflection reveals that it is not too far from being sublinear! Indeed, as seen in the post on Lorentz spaces, the decreasing rearrangement of a sum of functions $f+g$ satisfies

$\displaystyle (f+g)^\ast (2s) \leq f^\ast (s) + g^\ast (s);$

applied to our operator $T$ above this shows that

$\displaystyle T(f+g)(s) \leq 2\, Tf\Big(\frac{s}{2}\Big) + 2\, Tg\Big(\frac{s}{2}\Big);$

it is then a simple exercise to show that, for the particular measure spaces involved (that is the regular Lebesgue spaces $L^p(\mathbb{R})$ and the weighted $L^{p}(|\xi|^{-2}\,d\xi)$), the Marcinkiewicz interpolation theorem continues to hold. This means that it will suffice to prove the endpoint inequalities for $p=1,2$ as before to conclude; but these are quite simple to prove. For $p=2$ we use the fact that rearrangements satisfy $\|g^{\ast}\|_{L^p} = \|g\|_{L^p}$ and Plancherel; for the weak $(1,1)$ inequality we use the same argument as before, since $\|(\widehat{f})^{\ast}\|_{L^{\infty}} \leq \|\widehat{f}\|_{L^{\infty}} \leq \|f\|_{L^1}$ too.
It remains to show, in order to conclude, that $\|\widehat{f}\|_{L^{p'}}$ is dominated by the LHS of (3′). If we allowed ourselves the use of the theory of Lorentz spaces, we’d see that the LHS of (3′) is nothing but the $L^{p',p}$ norm of $\widehat{f}$, and the desired inequality would be an immediate consequence of the fact that $p \leq p'$ in the range $p \in (1,2]$; however, we are trying to avoid the use of anything too sophisticated so we will use a direct argument instead. The argument works for a generic function $g$, so using the monotonicity of $g^{\ast}$ we write

\displaystyle \begin{aligned} \Big(\int_{0}^{\infty} [g^{\ast}(s)]^p s^{p-2} \,ds\Big)^{1/{p}} \sim_p & \Big( \sum_{j \in \mathbb{Z}} 2^{j(p-1)} g^{\ast}(2^j)^p \Big)^{1/{p}} \\ = & \Big( \sum_{j \in \mathbb{Z}} 2^{j p/ {p'}} g^{\ast}(2^j)^p \Big)^{1/{p}} \\ \underset{(\text{Mink})}{\geq} & \Big( \sum_{j \in \mathbb{Z}} 2^{j} g^{\ast}(2^j)^{p'} \Big)^{1/{p'}}. \end{aligned}

Now let

$\displaystyle J_k := \{j \in \mathbb{Z} \, : \, 2^{k+1} > g^{\ast}(2^j) \geq 2^k \}$

for any $k \in \mathbb{Z}$ (notice that $J_k$ is an interval of integers, by the monotonicity of $g^{\ast}$) and observe that we can therefore compare the last expression above to

$\displaystyle \Big( \sum_{k \in \mathbb{Z}} 2^{kp'} \sum_{j \in J_k} 2^j \Big)^{1/{p'}} \sim \Big( \sum_{k \in \mathbb{Z}} 2^{kp'} 2^{\max J_k} \Big)^{1/{p'}}.$

Again, by the monotonicity of $g^{\ast}$ and definition of $J_k$, we see that $2^{\max J_k} \sim |\{ s \,: \, g^{\ast}(s) \sim 2^k \}|$ and therefore the last expression above is comparable to

\displaystyle \begin{aligned} & \sim \Big( \sum_{k \in \mathbb{Z}} 2^{kp'} |\{ s \,: \, g^{\ast}(s) \sim 2^k \}| \Big)^{1/{p'}} \\ & \sim \Big( \int_{0}^{\infty} |g^{\ast}(s)|^{p'} \, ds \Big)^{1/{p'}} = \|g\|_{L^{p'}}, \end{aligned}

and this shows what we wanted if we take $g = \widehat{f}$. The adaptation of the proof of H-Y using rearrangements to $\mathbb{R}$ is concluded.

# 3. A proof of Hausdorff-Young in finite abelian groups without ANY interpolation (sort of)

This is something that I learned from my BSc advisor who pointed me to Terry Tao’s blog, specifically to his post on the tensor power trick. In particular, I will reproduce here Tao’s proof of the Hausdorff-Young inequality in the setting of a finite abelian group which does not use any interpolation (nor rearrangements). Technically speaking, we will still be performing something that acts roughly like an interpolation, but in the finite setting things simplify so much that our “interpolation step” will become so trivial that one can hardly call it interpolation anymore.

## 3.1. A quick primer on the Fourier transform on finite abelian groups

Before I give you the proof, it is best if I quickly review what the Fourier transform on a finite abelian group is. Let $(G,\oplus)$ be this finite abelian group; then we must define its Pontryagin dual, which is the set $\widehat{G}$ of all functions $G \to \mathbb{U}^1$ (where by $\mathbb{U}^1$ we mean the circle group, that is, the group of complex numbers of modulus 1) that are actually homomorphisms, i.e. that preserve the group operations: a function $\chi : G \to \mathbb{U}^1$ is in $\widehat{G}$ if and only if

$\displaystyle \chi (x \oplus y) = \chi(x) \chi(y)$

for all $x,y \in G$. The elements of $\widehat{G}$ are usually called characters. Observe that $\widehat{G}$ is more than just a set, namely, it is a finite abelian group itself: indeed, multiplication is commutative in $\mathbb{C}$, so if $\chi,\chi'$ are characters we can define the function $(\chi \chi')(x):= \chi(x)\cdot\chi'(x)$ and see that $(\chi\chi')$ is a homomorphism and hence a character as well4. The inverse of character $\chi$ is easily seen to be the character $\overline{\chi}(x) := \overline{\chi(x)}$, the complex conjugate, and the identity is the so-called trivial character $\chi_0(x) \equiv 1$.
A very important thing about the characters is that they are orthogonal to each other. Indeed, first of all, if we let $dx$ denote the normalised counting measure on $G$, that is

$\displaystyle \int_G f(x) \,dx := \frac{1}{|G|} \sum_{x \in G}f(x),$

then we can see that if $\chi$ is a character, the integral

$\displaystyle Z:= \int_G \chi(x) \,dx := \frac{1}{|G|} \sum_{x \in G} \chi(x)$

is a translation-invariant quantity, in the sense that for all $y \in G$ we must have

$\displaystyle Z \chi(y) = Z.$

As a consequence, either $\chi$ is the trivial character $\chi_0(y) \equiv 1$ (in which case it can be seen that $Z = 1$) or ${Z}$ must be zero. This has the following consequence: since

\displaystyle \begin{aligned} \langle \chi,\chi'\rangle_{L^2(G)} = & \int_G \chi(x) \overline{\chi'(x)} \,dx \\ = & \frac{1}{|G|} \sum_{x \in G}\chi(x)\overline{\chi'}(x) \\ = & \frac{1}{|G|} \sum_{x \in G} (\chi\overline{\chi'})(x), \\ \end{aligned}

and $\chi\overline{\chi'}$ is a character, the above is always equal to ${0}$, unless $\chi\overline{\chi'} = \chi_0$ (the trivial character), in which case clearly $\chi = \chi'$. This is precisely an orthogonality relation, and it is not hard to show that the characters form a basis of $L^2(G)$ (this is the finite abelian case of the Peter-Weyl theorem).
The Fourier transform is then defined as one would expect, that is simply as the projection against the basis of characters: the Fourier transform of $f : G \to \mathbb{C}$ is the function $\widehat{f} : \widehat{G} \to \mathbb{C}$ defined by

$\displaystyle \widehat{f}(\chi):= \langle f ,\chi \rangle_{L^2(G)} = \frac{1}{|G|} \sum_{x \in G} f(x) \overline{\chi(x)}.$

It is easy to verify that this Fourier transform corresponds very closely to the usual one on $\mathbb{R}^d$ or $\mathbb{T}^d$, in the sense that a number of standard identities hold for it:

1. Translations become modulations and viceversa, under the action of the Fourier transform: that is

$\displaystyle \widehat{f(\cdot \oplus y^{-1})}(\chi) = \chi(y) \widehat{f}(\chi)$

and

$\displaystyle \widehat{(\chi' f)}(\chi) = \widehat{f}(\chi \oplus (\chi')^{-1}).$

2. for the convolution of functions

$\displaystyle f \ast g(x) := \int_G f(y)g(x \oplus y^{-1}) \,dx = \frac{1}{|G|} \sum_{y \in G} f(y) g(x \oplus y^{-1}),$

we have that

$\displaystyle \widehat{f \ast g}(x) = \widehat{f}(x) \widehat{g}(x).$

3. The Inverse Fourier transform of $F : \widehat{G} \to \mathbb{C}$ is

$\displaystyle \check{F}(x) := \sum_{\chi \in \widehat{G}} F(\chi)\chi(x);$

applied to $\widehat{f}$, this gives the Fourier inversion formula

$\displaystyle f(x) := \sum_{\chi \in \widehat{G}} \widehat{f}(\chi)\chi(x);$

(notice how trivial this statement is in the finite setting: identifying the functions with vectors on $\mathbb{C}^{|G|}$, this is simply saying that a function is the sum of its components in the specific orthonormal basis we are considering).

4. Plancherel and Parseval’s formulas hold, that is

$\displaystyle \|f\|_{L^2(G)} = \|\widehat{f}\|_{\ell^2(\widehat{G})}$

(notice that on the RHS we have $\ell^2$ summation, that is non-normalised, unlike at the LHS) and

$\displaystyle \langle f, g\rangle_{L^2(G)} = \langle \widehat{f}, \widehat{g}\rangle_{\ell^2(\widehat{G})}.$

5. The trivial Hausdorff-Young inequality holds, that is

$\displaystyle \|\widehat{f}\|_{\ell^\infty(\widehat{G})} \leq \|f\|_{L^1(G)}.$

That said, we are ready to prove H-Y for a finite abelian group without using interpolation (or, if we are being completely honest, using so little interpolation that it cannot even be called that).

## 3.2. Proof of H-Y in finite abelian groups without interpolation

In this subsection we will prove that for all $1 \leq p \leq 2$ we have

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \lesssim \|f\|_{L^p(G)},$

for any finite abelian group $G$, where the constant hidden in $\lesssim$ is independent of $G$.

First of all, we show that we can prove H-Y in the case that $f$ takes a simple form, namely we assume that for some $k \in \mathbb{Z}$ it holds that

$\displaystyle 2^k \mathbf{1}_A \leq |f| < 2^{k+1} \mathbf{1}_A$

for some set $A \subseteq G$. In this case, we have for the Fourier transform $\widehat{f}$ that

$\displaystyle \|\widehat{f}\|_{\ell^2(\widehat{G})} = \Big(\frac{1}{|G|} \sum_{x \in G} |f(x)|^2\Big)^{1/2} \leq 2^{k+1} \Big(\frac{|A|}{|G|}\Big)^{1/2}$

by Plancherel, and

$\displaystyle \|\widehat{f}\|_{\ell^\infty(\widehat{G})} \leq \frac{1}{|G|} \sum_{x \in G} |f(x)| \leq 2^{k+1} \frac{|A|}{|G|}$

by the trivial H-Y inequality. By the logarithmic convexity of the $\ell^p$ norms (which is a very primitive form of interpolation, if you will) we then have that

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \leq \|\widehat{f}\|_{\ell^2(\widehat{G})}^{1-\theta} \|\widehat{f}\|_{\ell^\infty(\widehat{G})}^{\theta} \leq 2^{k+1} \Big(\frac{|A|}{|G|}\Big)^{(1-\theta)/2} \Big(\frac{|A|}{|G|}\Big)^{\theta},$

where $\theta$ is such that $\frac{1}{p'} = \frac{1-\theta}{2} + \frac{\theta}{\infty}$. Hence $\theta = 1 - 2/{p'}$ and a little algebra reveals that

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \leq 2^{k+1} \Big(\frac{|A|}{|G|}\Big)^{1/p}.$

However, at the same time we have by our hypotheses on ${f}$ that

$\displaystyle \|f\|_{L^p(G)} = \Big(\frac{1}{|G|} \sum_{x \in G} |f(x)|\Big)^{1/p} \geq 2^k \Big(\frac{|A|}{|G|}\Big)^{1/p},$

and this therefore proves that

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \leq 2 \|f\|_{L^p(G)} \ \ \ \ \ \ (4)$

for the very special functions ${f}$ that we are working with.
In order to extend this result to general functions, we will decompose them in pieces like the above and use triangle inequality to sum over all the pieces; this will introduce an inefficient constant that grows in $|G|$, which is bad, but we will use the fact that the Fourier transform tensorises to remove it with the tensor power trick.
Let then for $k \in \mathbb{Z}$

$\displaystyle A_k := \{x \in G \, : \, 2^{k} \leq |f(x)| < 2^{k+1} \},$

and correspondigly define $f_k : = f \mathbf{1}_{A_k}$. Each $f_k$ is a function of the type above. We observe first that when $2^k$ is sufficiently small then the piece does not contribute much at all. Indeed, if $\|g\|_{L^\infty(G)} \leq K$ we see that (by the basic inequality $\|\cdot\|_{\ell^p} \leq \|\cdot\|_{\ell^q}$ when $p \geq q$)

$\displaystyle \|\widehat{g}\|_{\ell^{p'}(\widehat{G})} \leq \|\widehat{g}\|_{\ell^{2}(\widehat{G})} = \|g\|_{L^2(G)} \leq K |G|^{-1/2};$

if $K \leq \|f\|_{L^p(G)} |G|^{1/2}$ the above contributes (by triangle inequality) at most $\|f\|_{L^p(G)}$. What this means is that we can forget about those indices $k$ such that $2^k \leq \|f\|_{L^p(G)} |G|^{1/2}$, or equivalently $k \leq \log \|f\|_{L^p(G)} + \frac{1}{2} \log |G|$.
There are other indices that we can forget about (those where $k$ is very large), because the corresponding sets $A_k$ will be empty: indeed, we have that

$\displaystyle \|f\|_{L^\infty(G)} \leq \Big(\sum_{x \in G} |f(x)|^p \Big)^{1/p} = \|f\|_{L^p(G)} |G|^{1/p},$

and therefore the only values of ${k}$ that are relevant are those for which (say) $k \leq \log \|f\|_{L^p(G)} + \frac{1}{p} \log |G| + 1$.
In light of the above observations, our chosen decomposition of $f$ is as follows:

$\displaystyle f = g + \sum_{k = K_1}^{K_2} f_k,$

where

\displaystyle \begin{aligned} K_1 & = \log \|f\|_{L^p(G)} + \frac{1}{2} \log |G|, \\ K_2 & = \log \|f\|_{L^p(G)} + \frac{1}{p} \log |G| + 1 \end{aligned}

and where ${g}$ satisfies $|g| \lesssim \|f\|_{L^p(G)} |G|^{1/2}$ pointwise. Then we have $\|\widehat{g}\|_{\ell^{p'}(\widehat{G})} \lesssim \|f\|_{L^p(G)}$ as seen above, and we have by (4) and triangle inequality that

$\displaystyle \Big\|\sum_{k = K_1}^{K_2} \widehat{f_k} \Big\|_{\ell^{p'}(\widehat{G})} \leq 2 \sum_{k= K_1}^{K_2} \|f_k\|_{L^p(G)} \leq 2 (K_2 - K_1 + 1) \|f\|_{L^p(G)}.$

Observe that $K_2 - K_1 = 1 + \Big(\frac{1}{p}-\frac{1}{2}\Big) \log |G| \leq 1 + \frac{1}{2} \log |G|$, so putting everything together we have shown that (for some absolute constant $C>0$)

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \leq C (1 + \log |G|) \|f\|_{L^p(G)}, \ \ \ \ \ \ \ (5)$

which as anticipated is the H-Y inequality with a bad constant that grows with the cardinality $|G|$.

Now we will finally use the tensor power trick to remove this unfortunate logarithm. The first thing to notice is that we can make $G^n = G \times \ldots \times G$ into a group: the group operation is simply $\oplus$ componentwise, so we can still denote it by $\oplus$. The second thing to notice is that we also have $\widehat{G^n} = (\widehat{G})^n$, because the characters of $\widehat{G^n}$ are all necessarily of the form $\chi_1 \otimes \ldots \chi_n$ for $\chi_j \in \widehat{G}$, where

$\displaystyle \chi_1 \otimes \ldots \chi_n (x_1, \ldots, x_n) := \chi_1(x_1) \cdot \ldots \cdot \chi_n(x_n)$

is the tensor product of the single-variable characters. If we denote by $f^{\otimes n} = f \otimes \ldots \otimes f$ the tensor product of ${f}$ with itself ${n}$ times, then we can see that we have for its Fourier transform (on the group $G^n$)

$\displaystyle \widehat{f^{\otimes n}} = (\widehat{f})^{\otimes n};$

in words, the Fourier transform of the tensored ${f}$ is the tensored Fourier transform of ${f}$ itself. Now, since $G^n$ is a finite abelian group, we have by (5) that

$\displaystyle \| \widehat{f^{\otimes n}} \|_{\ell^{p'}(\widehat{G^n})} \leq C (1 + n \log |G|) \| f^{\otimes n}\|_{L^p(G^n)}$

(notice the factor of ${n}$ that appeared). It is trivial to check that $\| \widehat{f^{\otimes n}} \|_{\ell^{p'}(\widehat{G^n})} = \| (\widehat{f})^{\otimes n} \|_{\ell^{p'}(\widehat{G^n})} = \| \widehat{f} \|_{\ell^{p'}(\widehat{G})}^n$ and that $\| f^{\otimes n}\|_{L^p(G^n)} = \| f\|_{L^p(G)}^n$, and therefore by taking $n$-th roots of the last inequality we have shown the improved version of (5) given by

$\displaystyle \|\widehat{f}\|_{\ell^{p'}(\widehat{G})} \leq C^{1/n} (1 + n \log |G|)^{1/n} \|f\|_{L^p(G)}.$

However, nothing prevents us from taking ${n}$ arbitrarily large. We do so and see that $C^{1/n} (1 + n \log |G|)^{1/n} \to 1$, and thus we have proven H-Y with constant $C=1$ – and without using essentially any interpolation!

Unfortunately, the trick only applies to the finite abelian case.

# 4. Why does this matter?

You might rightfully think that all of the above amounts to nothing. Why bother looking for other proofs of H-Y, specifically proofs that do not use complex interpolation or no interpolation at all? There’s nothing wrong with using complex interpolation after all, as the Riesz-Thorin theorem is rather easy to prove. Sure, some people have a fondness for finding the simplest possible proof of a theorem, but is the reward worth the effort given that there is nothing too complicated going on? All the proofs fit in a few pages, not 200 or so.

Well, as it turns out, there are situations in which interpolation is not available at all! One of these situations is that of Nonlinear Fourier Analysis, in which one can construct certain non-linear transforms (often called scattering transforms) that can do for certain special non-linear PDEs what the ordinary Fourier transform does for linear PDEs. Here I am referring to the fact that, such as for example with the free Schrödinger equation $i\partial_t u = \Delta u$, one can take a Fourier transform in the spacial variable and turn the PDE into a collection of ODEs: in this instance, by taking the Fourier transform in $x$ we obtain for each fixed frequency $\xi$ the ODE $i\frac{d}{dt} \widehat{u}(\xi;t) = -4 \pi^2 |\xi|^2 \widehat{u}(\xi;t)$ in the variable ${t}$, which has the solution $\widehat{u}(\xi, t) = \widehat{u_0}(\xi) e^{i 4\pi^2 t |\xi|^2}$; Fourier inversion then gives the propagator formula $u(x,t) = \int \widehat{u_0}(\xi) e^{2\pi i (x \cdot \xi + 2\pi t |\xi|^2)} \, d\xi$, loosely speaking. Suitable nonlinear Fourier transforms / scattering transforms can do a similar job for PDEs such as the Korteveg-deVries equation $\partial_t u + \partial_x^3 u = 6 u \partial_x u$ and the cubic non-linear Schrödinger equation(s) $i \partial_t u = \Delta u \pm |u|^2 u$.
These nonlinear Fourier transforms (I am deliberately avoiding defining them because they will be the subject of a future post) can be shown to satisfy estimates analogous to Plancherel and the trivial H-Y inequality, and therefore the question naturally arises as to whether they satisfy a full H-Y inequality for exponents inbetween $p=1,2$. However, the answer is far from trivial! Since the transforms are intrinsically non-linear objects, there is no known interpolation theory available for them. A fundamental technique that is the bread and butter of every respectable harmonic analyst is suddenly gone, and one is left with picking up the pieces.
Interestingly, the full H-Y inequality for (some of) these transforms has been proven, but the argument uses the Christ-Kiselev argument, something which is surprising in itself. You can find more details in these notes by Tao and Thiele, which are a go-to for the subject. A byproduct of the Christ-Kiselev argument, as we have experienced ourselves, is that one obtains a constant $C_p$ for the H-Y inequality that blows up as $p \to 2^{-}$, while on the other hand the inequality is true with a finite constant for $p=2$. It is still an open problem whether this behaviour of the constant $C_p$ is real or not; intuition and some partial results (e.g. this by Kovač, Oliveira e Silva and Rupčić) suggest otherwise. This is a real-life situation in which the above questions Q1 and Q2 become very relevant: when there is no interpolation available, we are forced to do without, and one way to get there is to go back to the settings we are familiar with and try to do without in there too – hoping we can learn something useful or at least get some inspiration. This, to me, was justification enough to start pondering about these questions; of course, nothing I said in this post has any consequences for the nonlinear H-Y constant problem.

Footnotes:
1: An interesting observation is that one can modify the construction in Exercise 7 and get rid of Khintchine’s inequality, making the proof entirely elementary. What one does is take a function of the form $f(x) := \sum_{j=1}^{N} e^{2\pi i \xi_j x} \varphi(x - x_j)$ with ${\varphi, \widehat{\varphi}}$ both sufficiently concentrated around the origin. Taking the ${x_j}$ sufficiently distant from each other we can make the different terms of the sum essentially disjoint in space, so that $\|f\|_{L^p} \sim N^{1/p}$; taking the ${\xi_j}$ sufficiently distant from each other we can make the different terms of the sum essentially disjoint in frequency instead, so that $\|\widehat{f}\|_{L^{p'}} \sim N^{1/{p'}}$. Ultimately, the Hausdorff-Young inequality fails above 2 because it is possible for two functions to be more or less orthogonal both in space and in frequency, and because then we have ${p > 2 > p'}$ and hence $\frac{1}{p'} > \frac{1}{p}$. [go back]
2: for the sake of clarity, let me restate that $p_\theta$ is given by

$\displaystyle \frac{1}{p_\theta}= \frac{1 - \theta}{p_0} + \frac{\theta}{p_1},$

and similarly for $q_\theta$. [go back]
3: It was the very first version of the Hausdorff-Young inequality to appear besides the endpoints, and it was Hausdorff that later extended it to the full range. [go back]
4: A more familiar notation can be obtained by observing that every character $\chi$ can be written as $\chi(x) = e^{2\pi i \xi(x)}$ for some function $\xi : G \to \mathbb{R} / \mathbb{Z}$, and noticing that $\xi$ is now an additive homomorphism and that the sum of two such $\xi$‘s gives again rise to an additive homomorphism and hence a character. However, this is unnecessary. [go back]