# Matematički blogovi

### Pointwise ergodic theorems for non-conventional bilinear polynomial averages

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

**Theorem 1 (Birkhoff ergodic theorem)**Let be a measure-preserving system (by which we mean is a -finite measure space, and is invertible and measure-preserving), and let for any . Then the averages converge pointwise for -almost every .

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint , where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

**Theorem 2 (Pointwise ergodic theorem for polynomial averages)**Let be a measure-preserving system, and let for any . Let be a polynpmial with integer coefficients. Then the averages converge pointwise for -almost every .

For bilinear averages, we have a separate 1990 result of Bourgain (for functions), extended to other spaces by Lacey, and with an alternate proof given, by Demeter:

**Theorem 3 (Pointwise ergodic theorem for two linear polynomials)**Let be a measure-preserving system with finite measure, and let , for some with . Then for any integers , the averages converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

**Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial)**Let be a measure-preserving system, and let , for some with . Then for any polynomial of degree , the averages converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with , but we will not discuss these extensions here. A good model case to keep in mind is when and (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which with counting measure, and ). A model problem is to establish the maximal inequality

where ranges over powers of two and is the bilinear operator The single scale estimate or equivalently (by duality) is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales .The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when , , where is a small modulus, are such that , is a smooth cutoff to an interval of length , and is also supported on and behaves like a constant on intervals of length . Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when are supported on a common interval of length and are normalised in rather than . (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the factors; the corresponding statement for was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the -normalized version of (2)).

For our applications we had to extend the inverse theory of Peluse and Prendiville to an theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

can be well approximated in by a function that has Fourier support on “major arcs” if enjoy control. To get the required extension to in the aspect one has to improve the control on the error from to ; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as , one can relax the hypothesis on to an hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function ; a modification of the arguments also gives something similar for .Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when are supported near rational numbers with for some moderately large . The inverse theory gives good control (with an exponential decay in ) on individual scales , and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of to eventually handle all “small” scales, with ranging up to say where for some small constant and large constant . For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator , and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers to the locally compact abelian group . Actually it was conceptually clearer for us to work instead with the adelic integers , which is the inverse limit of the . Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

on , and the “arithmetic” bilinear operator on the profinite integers , equipped with probability Haar measure . After a number of standard manipulations (interpolation, Fubini’s theorem, Holder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an improving estimate for some . Splitting the profinite integers into the product of the -adic integers , it suffices to establish this claim for each separately (so long as we keep the implied constant equal to for sufficiently large ). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the -adic field , which are a variant of some estimates of Kowalski and Wright.### Higher uniformity of bounded multiplicative functions in short intervals on average

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the *local Fourier uniformity conjecture* for bounded multiplicative functions such as the Liouville function . One form of this conjecture is the assertion that

The conjecture gets more difficult as increases, and also becomes more difficult the more slowly grows with . The conjecture is equivalent to the assertion

which was proven (for arbitrarily slowly growing ) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.For , the conjecture is equivalent to the assertion

This remains open for sufficiently slowly growing (and it would be a major breakthrough in particular if one could obtain this bound for as small as for any fixed , particularly if applicable to more general bounded multiplicative functions than , as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range (in fact we have since worked out in the current paper that we can get as small as ). In our current paper we establish Fourier uniformity conjecture for higher for the same range of . This in particular implies local orthogonality to polynomial phases, where denotes the polynomials of degree at most , but the full conjecture is a bit stronger than this, establishing the more general statement for any degree filtered nilmanifold and Lipschitz function , where now ranges over polynomial maps from to . The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate for many and some polynomial depending on . The difficulty here is to understand how can depend on . We write the above correlation estimate more suggestively as Because of the multiplicativity at small primes , one expects to have a relation of the form for many for which for some small primes . (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between and for “edges” in a rather sparse “graph” connecting the elements of say . Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form for some large and close (but not identical) integers , where should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form for some real number of polynomial size, and with further analysis of the relation (5) one can make basically independent of . This simplifies (3) to something like and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)Now we apply the same strategy to (4). For abelian the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence attached to many , and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

where the relation is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutoins to this equation, such as for some , which do not necessarily lead to multiplicative characters like as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each , but*many*, leading to dilation invariances for a “dense” set of . From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) must behave like for some

*central*element of . This eventually brings one back to the multiplicative characters that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

of length sign patterns in the Liouville function. The Chowla conjecture implies that , but even the weaker conjecture of Sarnak that for some remains open. Until recently, the best asymptotic lower bound on was , due to McNamara; with our result, we can now show for any (in fact we can get for any ). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor ). However the observation is that if there are extremely few sign patterns of length , then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length sign patterns. If there are not extremely few sign patterns of length then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.The second application is to obtain cancellation for various polynomial averages involving the Liouville function or von Mangoldt function , such as

or where are polynomials of degree at most , no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials had distinct degrees, in which one could use the theory of Matomaki and Radziwill; now that higher is available at the scale we can now remove this restriction.### The sunflower lemma via Shannon entropy

A family of sets for some is a sunflower if there is a *core set* contained in each of the such that the *petal sets* are disjoint. If , let denote the smallest natural number with the property that any family of distinct sets of cardinality at most contains distinct elements that form a sunflower. The celebrated Erdös-Rado theorem asserts that is finite; in fact Erdös and Rado gave the bounds

*sunflower conjecture*asserts in fact that the upper bound can be improved to . This remains open at present despite much effort (including a Polymath project); after a long series of improvements to the upper bound, the best general bound known currently is for all , established in 2019 by Rao (building upon a recent breakthrough a month previously of Alweiss, Lovett, Wu, and Zhang). Here we remove the easy cases or in order to make the logarithmic factor a little cleaner.

Rao’s argument used the Shannon noiseless coding theorem. It turns out that the argument can be arranged in the very slightly different language of Shannon entropy, and I would like to present it here. The argument proceeds by locating the core and petals of the sunflower separately (this strategy is also followed in Alweiss-Lovett-Wu-Zhang). In both cases the following definition will be key. In this post all random variables, such as random sets, will be understood to be discrete random variables taking values in a finite range. We always use boldface symbols to denote random variables, and non-boldface for deterministic quantities.

**Definition 1 (Spread set)** Let . A random set is said to be -spread if one has

The core can then be selected greedily in such a way that the remainder of a family becomes spread:

**Lemma 2 (Locating the core)** Let be a family of subsets of a finite set , each of cardinality at most , and let . Then there exists a “core” set of cardinality at most such that the set

*Proof:* We may assume is non-empty, as the claim is trivial otherwise. For any , define the quantity

Let be the set (3). Since , is non-empty. It remains to check that the family is -spread. But for any and drawn uniformly at random from one has

Since and , we obtain the claimIn view of the above lemma, the bound (2) will then follow from

**Proposition 3 (Locating the petals)** Let be natural numbers, and suppose that for a sufficiently large constant . Let be a finite family of subsets of a finite set , each of cardinality at most which is -spread. Then there exist such that is disjoint.

Indeed, to prove (2), we assume that is a family of sets of cardinality greater than for some ; by discarding redundant elements and sets we may assume that is finite and that all the are contained in a common finite set . Apply Lemma 2 to find a set of cardinality such that the family is -spread. By Proposition 3 we can find such that are disjoint; since these sets have cardinality , this implies that the are distinct. Hence form a sunflower as required.

**Remark 4** Proposition 3 is easy to prove if we strengthen the condition on to . In this case, we have for every , hence by the union bound we see that for any with there exists such that is disjoint from the set , which has cardinality at most . Iterating this, we obtain the conclusion of Proposition 3 in this case. This recovers a bound of the form , and by pursuing this idea a little further one can recover the original upper bound (1) of Erdös and Rado.

It remains to prove Proposition 3. In fact we can locate the petals one at a time, placing each petal inside a random set.

**Proposition 5 (Locating a single petal)** Let the notation and hypotheses be as in Proposition 3. Let be a random subset of , such that each lies in with an independent probability of . Then with probability greater than , contains one of the .

To see that Proposition 5 implies Proposition 3, we randomly partition into by placing each into one of the , chosen uniformly and independently at random. By Proposition 5 and the union bound, we see that with positive probability, it is simultaneously true for all that each contains one of the . Selecting one such for each , we obtain the required disjoint petals.

We will prove Proposition 5 by gradually increasing the density of the random set and arranging the sets to get quickly absorbed by this random set. The key iteration step is

**Proposition 6 (Refinement inequality)** Let and . Let be a random subset of a finite set which is -spread, and let be a random subset of independent of , such that each lies in with an independent probability of . Then there exists another random subset of with the same distribution as , such that and

Note that a direct application of the first moment method gives only the bound

but the point is that by switching from to an equivalent we can replace the factor by a quantity significantly smaller than .One can iterate the above proposition, repeatedly replacing with (noting that this preserves the -spread nature ) to conclude

**Corollary 7 (Iterated refinement inequality)** Let , , and . Let be a random subset of a finite set which is -spread, and let be a random subset of independent of , such that each lies in with an independent probability of . Then there exists another random subset of with the same distribution as , such that

Now we can prove Proposition 5. Let be chosen shortly. Applying Corollary 7 with drawn uniformly at random from the , and setting , or equivalently , we have

In particular, if we set , so that , then by choice of we have , hence In particular with probability at least , there must exist such that , giving the proposition.It remains to establish Proposition 6. This is the difficult step, and requires a clever way to find the variant of that has better containment properties in than does. The main trick is to make a conditional copy of that is conditionally independent of subject to the constraint . The point here is that this constrant implies the inclusions

and Because of the -spread hypothesis, it is hard for to contain any fixed large set. If we could apply this observation in the contrapositive to we could hope to get a good upper bound on the size of and hence on thanks to (4). One can also hope to improve such an upper bound by also employing (5), since it is also hard for the random set to contain a fixed large set. There are however difficulties with implementing this approach due to the fact that the random sets are coupled with in a moderately complicated fashion. In Rao’s argument a somewhat complicated encoding scheme was created to give information-theoretic control on these random variables; below thefold we accomplish a similar effect by using Shannon entropy inequalities in place of explicit encoding. A certain amount of information-theoretic sleight of hand is required to decouple certain random variables to the extent that the Shannon inequalities can be effectively applied. The argument bears some resemblance to the “entropy compression method” discussed in this previous blog post; there may be a way to more explicitly express the argument below in terms of that method. (There is also some kinship with the method of dependent random choice, which is used for instance to establish the Balog-Szemerédi-Gowers lemma, and was also translated into information theoretic language in these unpublished notes of Van Vu and myself.)

** — 1. Shannon entropy — **

In this section we lay out all the tools from the theory of Shannon entropy that we will need.

Define an *empirical sequence* for a random variable taking values in a discrete set to be a sequence in such that the empirical samples of this sequence converge in distribution to in the sense that

If is a random variable taking values in some set , its *Shannon entropy* is defined by the formula

We record the following standard and easily verified facts:

**Lemma 8 (Basic Shannon inequalities)** Let be random variables.

- (i) (Monotonicity) If is a deterministic function of , then . More generally, if is a deterministic function of and , then . If is a deterministic function of , then .
- (ii) (Subadditivity) One has , with equality iff , are independent. More generally, one has , with equality iff , are conditionally independent with respect to .
- (iii) (Chain rule) One has . More generally . In particular , and iff are independent; similarly, , and iff are conditionally independent with respect to .
- (iv) (Jensen’s inequality) If takes values in a finite set then , with equality iff is uniformly distributed in . More generally, if takes values in a set that depends on , then , with equality iff is uniformly distributed in after conditioning on .
- (v) (Gibbs inequality) If take values in the same finite set , then (we permit the right-hand side to be infinite, which makes the inequality vacuously true).

See this previous blog post for some intuitive analogies to understand Shannon entropy.

Now we establish some inequalities of relevance to random sets.

We first observe that any small random set largely determines any of its subsets. Define a *random subset* of a random set to be a random set such that holds almost surely.

**Lemma 9 (Subsets of small sets have small conditional entropy)** Let be a random finite set.

- (i) One has for any random subset of .
- (ii) One has . If is almost surely non-empty, we can improve this to .

*Proof:* The set takes values in the power set of , so the claim (i) follows from Lemma 8(iv). (Note how it is convenient here that we are using the base for the logarithm.)

For (ii), apply Lemma 8(v) with and the geometric random variable for natural numbers (or for positive , if is non-empty).

Now we encode the property of a random variable being -spread in the language of Shannon entropy.

**Lemma 10 (Information-theoretic interpretation of spread)** Let be a random finite set that is -spread for some .

- (i) If is uniformly distributed amongst some finite collection of sets, then for all random subsets of .
- (ii) In the general case, if are an empirical sequence of , then as , where is drawn uniformly from and is a random subset of .

Informally: large random subsets of an -spread set necessarily have a lot of mutual information with . Conversely, one can bound the size of a random subset of an -spread set by bounding its mutual information with .

*Proof:* In case (i), it suffices by Lemma 8(iv) to establish the bound

Given a finite non-empty set and , let denote the collection of -element subsets of . A uniformly chosen element of is thus a random -element subset of ; we refer to the quantity as the *density* of this random subset, and as a *uniformly chosen random subset of of density *. (Of course, this is only defined when is an integer multiple of .) Uniformly chosen random sets have the following information-theoretic relationships to small random sets have the following information-theoretic properties:

**Lemma 11** Let be a finite non-empty set, let be a uniformly chosen random subset of of some density (which is a multiple of ).

- (i) (Spread) If is a random subset of , then
- (ii) (Absorption) If is a random subset of , then

*Proof:* To prove (i), it suffices by Lemma 10(i) to show that is -spread, which amounts to showing that

For (ii), by replacing with we may assume that are disjoint. From Lemma 8(iii) and Lemma 9(ii) it suffices to show that

which in turn is implied by for each . By Lemma 8(iv) it suffices to show that but this follows from multiplying together the inequalities for .The following “relative product” construction will be important for us. Given a random variable and a deterministic function of that variable, one can construct a conditionally independent copy of subject to the condition , with the joint distribution

(Note that this is usually*not*the same as starting with a completely independent copy of and then conditioning to the event ; cf. Simpson’s paradox.) By construction, has the same distribution as , and is conditionally independent of relative to . In particular, from Lemma 8 we have which we can also write as a “entropy Cauchy-Schwarz identity” This can be compared with the combinatorial inequality or equivalently whenever is a function on a non-empty finite set , which is easily proven as a consequence of Cauchy-Schwarz. One nice advantage of the entropy formalism over the combinatorial one is that the analogue of this instance of the Cauchy-Schwarz inequality automatically becomes an equality (this is related to the asymptotic equipartition property in the microstates interpretation of entropy).

** — 2. Proof of refinement inequality — **

Now we have enough tools to prove Proposition 6. Let be as in that proposition. On the event when is empty we can set so we can instead condition to the event that is non-empty. In particular

In order to use Lemma 10 we fix an empirical sequence for . We relabel as , and let be a parameter going off to infinity (so in particular is identified with a subset of . We let be drawn uniformly at random from , and let be a uniform random subset of of density independent of . Observe from Stirling’s formula that converges in distribution to . Thus it will suffice to find another uniform random variable from such that

as , since we can pass to a subsequence in which converges in distribution to . From (8) we haveFrom we can form the random set ; we then form a conditionally independent copy of subject to the constraint

We use as the uniform variable to establish (9). The point is that the relation (11) implies that so it will suffice to show that and hence by (7) and hence by Lemma 8(ii) and independence of Now we try to relate the first term on the left-hand side with . Note from (11) that we have the identity and hence by Lemma 8(i) We estimate the relative entropy of here by selecting first , then , then . More precisely, using the chain rule and monotonicity (Lemma 8(i), (iii)) we have From Lemma 9(i) we have and Putting all this together, we conclude If we apply Lemma 10(ii) right away we will get the estimate which is a bound resembling (12), but the dependence on the parameters are too weak. To do better we return to the relative product construction to decouple some of the random variables here. From the tuple we can form the random variable , then form a conditionally independent copy of subject to the constraints From (11) and Lemma 8(i) we then have The point is that is now conditionally independent of relative to , so we can also rewrite the above conditional entropy as We now use the chain rule to disentangle the role of , writing the previous as From independence we have and from Lemma 9(i) we have We discard the negative term . Putting all this together, we obtainNow, from the constraints (11), (13) we have

and Thus by Lemma 8(i), (ii), followed by Lemma 10(ii) and Lemma 11(i), we have which when inserted back into (14) using and simplifies to and the claim follows.### Homogenization of iterated singular integrals with applications to random quasiconformal maps

Kari Astala, Steffen Rohde, Eero Saksman and I have (finally!) uploaded to the arXiv our preprint “Homogenization of iterated singular integrals with applications to random quasiconformal maps“. This project started (and was largely completed) over a decade ago, but for various reasons it was not finalised until very recently. The motivation for this project was to study the behaviour of “random” quasiconformal maps. Recall that a (smooth) quasiconformal map is a homeomorphism that obeys the Beltrami equation

for some*Beltrami coefficient*; this can be viewed as a deformation of the Cauchy-Riemann equation . Assuming that is asymptotic to at infinity, one can (formally, at least) solve for in terms of using the

*Beurling transform*by the Neumann series We looked at the question of the asymptotic behaviour of if is a random field that oscillates at some fine spatial scale . A simple model to keep in mind is where are independent random signs and is a bump function. For models such as these, we show that a homogenisation occurs in the limit ; each multilinear expression converges weakly in probability (and almost surely, if we restrict to a lacunary sequence) to a deterministic limit, and the associated quasiconformal map similarly converges weakly in probability (or almost surely). (Results of this latter type were also recently obtained by Ivrii and Markovic by a more geometric method which is simpler, but is applied to a narrower class of Beltrami coefficients.) In the specific case (1), the limiting quasiconformal map is just the identity map , but if for instance replaces the by non-symmetric random variables then one can have significantly more complicated limits. The convergence theorem for multilinear expressions such as is not specific to the Beurling transform ; any other translation and dilation invariant singular integral can be used here.

The random expression (2) is somewhat reminiscent of a moment of a random matrix, and one can start computing it analogously. For instance, if one has a decomposition such as (1), then (2) expands out as a sum

The random fluctuations of this sum can be treated by a routine second moment estimate, and the main task is to show that the expected value becomes asymptotically independent of .If all the were distinct then one could use independence to factor the expectation to get

which is a relatively straightforward expression to calculate (particularly in the model (1), where all the expectations here in fact vanish). The main difficulty is that there are a number of configurations in (3) in which various of the collide with each other, preventing one from easily factoring the expression. A typical problematic contribution for instance would be a sum of the form This is an example of what we call a*non-split*sum. This can be compared with the

*split sum*If we ignore the constraint in the latter sum, then it splits into where and and one can hope to treat this sum by an induction hypothesis. (To actually deal with constraints such as requires an inclusion-exclusion argument that creates some notational headaches but is ultimately manageable.) As the name suggests, the non-split configurations such as (4) cannot be factored in this fashion, and are the most difficult to handle. A direct computation using the triangle inequality (and a certain amount of combinatorics and induction) reveals that these sums are somewhat localised, in that dyadic portions such as exhibit power decay in (when measured in suitable function space norms), basically because of the large number of times one has to transition back and forth between and . Thus, morally at least, the dominant contribution to a non-split sum such as (4) comes from the local portion when . From the translation and dilation invariance of this type of expression then simplifies to something like (plus negligible errors) for some reasonably decaying function , and this can be shown to converge to a weak limit as .

In principle all of these limits are computable, but the combinatorics is remarkably complicated, and while there is certainly some algebraic structure to the calculations, it does not seem to be easily describable in terms of an existing framework (e.g., that of free probability).

### Mathematical Research Reports: a “new” mathematics journal is launched

From time to time academic journals undergo an interesting process of fission. Typically as a result of some serious dissatisfaction, the editorial board resigns en masse to set up a new journal, the publishers of the original journal build a new editorial board from scratch, and the result is two journals, one inheriting the editors and collective memory of the original journal, and the other keeping the name and the publisher. Which is the “true” successor? In practice it tends to be the one with the editors, with its sibling surviving as a zombie journal that is the successor in name only. Perhaps there are examples that go the other way, and there may be examples where both journals go on to thrive, but I have not looked closely at the examples I know about.

I’m mentioning this because recently I have been involved in a rather unusual example of this phenomenon. Most cases I know of are the result either of frustration with the practices of the big commercial publishers or of malpractice by an editor-in-chief. But this was an open access journal with no publication charges, and with an extremely efficient and impeccably behaved editor-in-chief. So what was the problem?

The journal started out in 1995 as Electronic Research Announcements of the AMS, or ERA-AMS for short. It was still called that when I first joined the editorial board. Its editor-in-chief was Svetlana Katok, who did a great job, and there was a high-powered editorial board. As its name suggests, it specialized in shortish papers announcing results that would then appear with more details in significantly longer papers, so it was a little like Comptes Rendus in its aim. It would also accept short articles of a more traditional kind.

It never published all that many papers, and in 2007, I think for that reason (but don’t remember for sure), the AMS decided to discontinue it. But Svetlana Katok had put a lot into the journal and managed to find another publisher, the American Institute of Mathematical Sciences, and the editorial board agreed to continue serving. The name of the journal was changed to Electronic Research Announcements in the Mathematical Sciences, and its abbreviation was slightly abbreviated from ERA-AMS to ERA-MS.

In 2016, after 22 years, Svetlana Katok decided to step down, and Boris Hasselblatt took over. It was a good moment to try to revitalize the journal, so measures were taken such as designing a new and better website and making more effort to publicize the journal, in the hope of attracting more submissions (or more precisely, more submissions of a high enough quality that we would want to publish them).

However, despite these measures, the numbers remained fairly low — around ten a year (with quite a bit of variation), and this, indirectly, caused the problem that led to the split. The editors would have liked to see more papers published, but were not worried about it to the point where we would have been prepared to sacrifice quality to achieve it: we were ready to accept that this was, at least for now, a small journal. But AIMS was not so happy. In an effort to remedy (as they saw it) the situation, they appointed a co-editor-in-chief, who in turn appointed a number of new editors, with a more applied focus, with the idea that by broadening the scope of the journal they would increase the number of papers published.

That did not precipitate the resignations, but at that stage most of us did not know that the new editors had been appointed without any consultation even with Boris Hasselblatt. But then AIMS took things a step further. Until that point, the journal had adopted a practice that I strongly approve of, which was for the editor who handled a paper to make a recommendation to the rest of the editorial board, with other editors encouraged to comment on that recommendation. This practice helps to guard against “rogue” editors and against abuse of the system in general. It also helps to maintain consistent standards, and provides a gentle pressure on editors to do their job conscientiously — there’s nothing like knowing that you’re going to have to justify your decision to a bunch of mathematicians.

But suddenly the publishers told us that this system had to change, and that from now on the editorial board would not have the opportunity to vet papers, and would continue to have no say in new editorial appointments. (Various justifications were given for this, including that it would make it harder to recruit editors if they thought they had to make judgments about papers not in their immediate area.) At that point, it was clear that the soul of the journal was about to be destroyed, so over a few days the entire board (from before the start of the changes) resigned, resolving to start afresh with a new name.

That new name is Mathematical Research Reports. We will continue to accept reports on longer work, as well as short articles. In addition we welcome short survey articles. We regard it as the continuation in spirit of ERA-MS. Another unusual feature of this particular split is that the other half, still published by AIMS, has also changed its name and is now called Electronic Research Archive.

If, like me, you are always on the lookout for high-quality “ethical” journals (which I loosely define as free to read, free to publish in, and adopting high standards of editorial practice), then please add Mathematical Research Reports to your list. Have a look at the back catalogue of ERA-MS and ERA-AMS and you will get an idea of our standards. It would be wonderful if the unfortunate events of the last year or so were to be the catalyst that led to the journal finally becoming established in the way that it has deserved to be for a long time.

### 247B, Notes 4: almost everywhere convergence of Fourier series

This set of notes discusses aspects of one of the oldest questions in Fourier analysis, namely the nature of convergence of Fourier series.

If is an absolutely integrable function, its Fourier coefficients are defined by the formula

If is smooth, then the Fourier coefficients are absolutely summable, and we have the Fourier inversion formula where the series here is uniformly convergent. In particular, if we define the partial summation operators then converges uniformly to when is smooth.What if is not smooth, but merely lies in an class for some ? The Fourier coefficients remain well-defined, as do the partial summation operators . The question of convergence in norm is relatively easy to settle:

**Exercise 1**

- (i) If and , show that converges in norm to . (
*Hint:*first use the boundedness of the Hilbert transform to show that is bounded in uniformly in .) - (ii) If or , show that there exists such that the sequence is unbounded in (so in particular it certainly does not converge in norm to . (
*Hint:*first show that is not bounded in uniformly in , then apply the uniform boundedness principle in the contrapositive.)

The question of pointwise almost everywhere convergence turned out to be a significantly harder problem:

**Theorem 2 (Pointwise almost everywhere convergence)**

- (i) (Kolmogorov, 1923) There exists such that is unbounded in for almost every .
- (ii) (Carleson, 1966; conjectured by Lusin, 1913) For every , converges to as for almost every .
- (iii) (Hunt, 1967) For every and , converges to as for almost every .

Note from Hölder’s inequality that contains for all , so Carleson’s theorem covers the case of Hunt’s theorem. We remark that the precise threshold near between Kolmogorov-type divergence results and Carleson-Hunt pointwise convergence results, in the category of Orlicz spaces, is still an active area of research; see this paper of Lie for further discussion.

Carleson’s theorem in particular was a surprisingly difficult result, lying just out of reach of classical methods (as we shall see later, the result is much easier if we smooth either the function or the summation method by a tiny bit). Nowadays we realise that the reason for this is that Carleson’s theorem essentially contains a *frequency modulation symmetry* in addition to the more familiar translation symmetry and dilation symmetry. This basically rules out the possibility of attacking Carleson’s theorem with tools such as Calderón-Zygmund theory or Littlewood-Paley theory, which respect the latter two symmetries but not the former. Instead, tools from “time-frequency analysis” that essentially respect all three symmetries should be employed. We will illustrate this by giving a relatively short proof of Carleson’s theorem due to Lacey and Thiele. (There are other proofs of Carleson’s theorem, including Carleson’s original proof, its modification by Hunt, and a later time-frequency proof by Fefferman; see Remark 18 below.)

** — 1. Equivalent forms of almost everywhere convergence of Fourier series — **

A standard technique to prove almost everywhere convergence results is by first establishing a weak-type estimate of an associated maximal function. For instance, the Lebesgue differentiation theorem is usually established with the assistance of the Hardy-Littlewood maximal inequality; see for instance this previous blog post. A remarkable observation of Stein, known as Stein’s maximal principle, allows one to reverse this implication in certain cases by exploiting a symmetry of the problem. Here is the principle specialised to the application of pointwise convergence of Fourier series, and also combined with a transference principle of Kenig and Tomas:

**Proposition 3 (Equivalent forms of almost everywhere convergence)**Let . Then the following statements are equivalent:

- (i) For every , one has for almost every .
- (ii) There does not exist such that for almost every .
- (iii) One has the maximal inequality for all smooth , where the weak norm is defined as and denotes the Lebesgue measure of a set (which in this setting is a subset of the unit circle).
- (iv) One has the maximal inequality for all smooth , where denotes the partial Fourier series
- (v) One has the maximal inequality for all , where denotes the Fourier multiplier operator

Among other things, this proposition equates the qualitative property (i) of almost everywhere convergence to the quantitative property (iii) of a maximal inequality. This equivalence (first observed by Calderón) is similar in spirit to the uniform boundedness principle (see e.g. Corollary 1 of this previous blog post). The restriction is needed for just one implication (from (ii) to (iii)) in the arguments below, and arises due to the use of Khintchine’s inequality at one point. The equivalence of (iv) and (v) is part of a more general principle of *transference* that allows one to pass back and forth between periodic domains such as with non-periodic domains such as (or, on the Fourier side, between discrete domains and continuous domains ) if the estimates in question enjoy suitable scaling symmetries. We will use the formulation (v), as it enjoys the most symmetries.

*Proof:* We first show that (iii) implies (i). If (1) holds for all smooth , then certainly for all finite one has

Clearly (i) implies (ii). Now we assume that (iii) fails and use this to show that (ii) fails as well. From the failure of (iii) and monotone convergence, for any one can find , a measurable subset of , a finite , and such that

and such that In particular, has positive measure. By homogeneity we may normalise . At this stage, nothing prevents the measure of from being much smaller than ; but we can exploit translation invariance to increase the measure of to be comparable to as follows. Let be the integer part of . We claim that there exist translations of whose union has measure comparable to : This is easiest to establish by the probabilistic method (which in this context we might call the*random translation*method). If we select uniformly and independently at random we see that every point will lie in a given translate (or equivalently, that lies in ) with probability , hence Integrating in and using the Fubini-Tonelli theorem, we conclude that and hence there exists deterministic choices of for which By definition of the RHS is comparable to , giving the claim (clearly the left-hand side cannot exceed ).

Now consider the randomised linear combination

of translates of , where are random Bernoulli signs. From Khintchine’s inequality and the hypothesis we have hence by construction of and (4)Now we study the behaviour of when . Since is a convolution operator, it commutes with translations, and hence

for each . On the other hand, from (3) we have and hence there exists such that In particular, the square function is at least . Meanwhile, from Khintchine’s inequality and (7) we have for all . Applying the Paley-Zygmund inequality (setting , for instance) we conclude that (for suitable choices of implied constants), so in particular Integrating in using (5), and applying the Fubini-Tonelli theorem, we conclude that hence by (6) one has In particular, there exists a deterministic choice of signs (and hence of ) for which On the other hand, the left-hand side is at most . We conclude that for every , we can find a smooth function with and a finite , as well as a set of measure , such that for all .Applying this fact iteratively (each time choosing to be sufficiently large depending on all previous choices), we can construct a sequence of smooth functions , finite , and sets for such that

- (a) for all .
- (b) for all .
- (c) One has for all and (note that the right-hand side is finite since the are smooth for ).
- (d) for all (note that the left-hand side is bounded by ).

Now we assume (iv) and work to establish (v). The idea here is to use a rescaling argument, viewing as the limit as of the large circle (in physical space) or the fine lattice (in frequency space).

By limiting arguments we may assume that is compactly supported on some interval . Let be a large scaling parameter, and consider the periodic function defined by

For large enough, this function is smooth and supported on the interval , with norm The Fourier coefficients of is given as so that Applying (iv), we see that for any , we have Rescaling by , we conclude that We can let range over the reals rather than the integers as this does not affect the constraint . Rescaling by , we see that for any compact intervals , we have By uniform Riemann integrability and the rapid decrease of uniformly for , . We conclude that By monotone convergence we may replace with , and we then obtain (v).Finally, we assume (v) and establish (iv). By a limiting argument it suffices to establish (iv) for trigonometric polynomials , that is to say periodic functions whose Fourier coefficients are supported in for some natural number . Let be a non-zero Schwartz function with supported in , and for a given scaling parameter let denote the Schwartz function

For sufficiently large one easily checks that The Fourier transform of can be calculated as hence (for large enough) and thus From (v) we conclude that for any we have For large enough, the left-hand side is for some depending on . Dividing by and replacing by , we obtain the claim (iv).

**Exercise 4**For , let denote the Fejér summation operators

- (i) For any , establish the pointwise bound where is the Hardy-Littlewood maximal function
- (ii) Show that for , one has for almost all .

**Exercise 5 (Pointwise convergence of Fourier integrals)**Let be such that the conclusion of Theorem 3(v) holds. Show that for any , one has for almost all , where is defined for Schwartz functions by the formula and then extended to by density.

**Exercise 6**Let . Suppose that is such that one has the restriction estimate for all Schwartz functions , where denotes the surface measure on the sphere . Conclude that for all Schwartz functions . (This observation is due to Bourgain.) In particular, by Marcinkiewicz interpolation, implies for all . (

*Hint:*adapt some parts of the argument used to get from (iii) to (i) in Proposition 3, using rotation invariance as a substitute for translation invariance. (But the translational symmetry of the restriction problem – more precisely, the ability to translate a function in physical space without changing the absolute value of its Fourier transform – will also be useful.))

We are now ready to establish Kolmogorov’s theorem (Theorem 2(i)); our arguments are loosely based on the original construction of Kolmogorov (though he was not in possession at the time of the Stein maximal principle). In view of the equivalence between (ii) and (v) in Theorem 3, it suffices to show that the maximal operator

fails to be of weak-type on Schwartz functions. Recalling that the Hilbert transform is also a Fourier multiplier operator some routine calculations then show that for any Schwartz function . By the triangle inequality, it then suffices to show that the maximal operator fails to be of weak type on Schwartz functions.To motivate the construction, note from a naive application of the triangle inequality that

If the function was absolutely integrable, then by Young’s inequality we would conclude that the maximal operator was strong type , and hence also weak type . Thus any counterexample must somehow exploit the logarithmic divergence of the integral of . However, there are two potential sources of cancellation that could ameliorate this divergence: the sign of the Hilbert kernel , and the phase . But because of the supremum in , we can select the frequency parameter as we please, as long as it depends only on and not on . The idea is then to choose (and the support of ) to remove both sources of cancellation as much as possible.We turn to the details. Let be a large natural number, and then select widely separated frequency scales

In order to assist with removing cancellation in the phases later, we will require these scales to be integers. The precise choice of scales is not too important as long as they are widely separated and integer valued, but for sake of concreteness one could for instance set . Let be a bump function of total mass supported on , and let be the Schwartz function thus is an approximation (in a weak sense) to the sum of Dirac masses , with the frequency scale of the approximation to increasing rapidly in . We easily compute the norm of :Now we estimate for in the interval for some natural number ; note the set of all such has measure . In this range we will test the maximal operator at the frequency cutoff :

As is supported in , we see (for large enough) that avoids the support of and we can replace the principal value integral with the ordinary integral. Substituting (9), we conclude that As is an integer, the phase is equal to . We also cancel out the phase as being independent of , thus For , we exploit the oscillatory nature of the phase through an integration by parts, leading to the bound (one could even gain a factor of here if desired, but we will not need it). Summing, we have For , we instead exploit the near-constant nature of the phase by writing and similarly to conclude that Summing and combining with (11), we conclude (from the rapidly increasing nature of the ) that and thus (for large) Comparing this with (10) we contradict the conclusion of Theorem 3(iv), giving the claim.

**Remark 7**In 1926, Kolmogorov refined his construction to obtain a function whose Fourier sums diverged everywhere (not just almost everywhere).

**Exercise 8 (Rademacher-Menshov theorem)**

- (i) Let be some square-integrable functions on a probability space , with a power of two. By performing a suitable Whitney type decomposition (similar to that used in Section 3 of Notes 1), establish the pointwise bound where for each , ranges over dyadic intervals of the form with . If furthermore the are orthogonal to each other, establish the maximal inequality
- (ii) If is a trigonometric polynomial with at most non-zero coefficients for some , use part (i) to establish the bound
- (iii) If lies in the Sobolev space for some , use (ii) to show that for almost every .

** — 2. Carleson’s theorem — **

We now begin the proof of Carleson’s theorem (Theorem 2(ii)), loosely following the arguments of Lacey and Thiele (we briefly comment on other approaches at the end of these notes). In view of Proposition 3, it suffices to establish the weak-type bound

for Schwartz functions . Because of the supremum, the expression depends sublinearly on rather than linearly; however there is a trick to reduce matters to considering linear estimates. By selecting, for each , to be a frequency which attains (or nearly attains) the supremal value of , it suffices to establish the linearised estimate uniformly for all measurable functions , where is the operator One can think of this operator as the (Kohn-Nirenberg) quantisation of the rough symbol . Unfortunately this symbol is far too rough for us to be able to use pseudodifferential operator tools from the previous set of notes. Nevertheless, the “time-frequency analysis” mindset of trying to efficiently decompose phase space into rectangles consistent with the uncertainty principle will remain very useful.The next step is to dualise the weak norm to linearise the dependence on even further:

**Exercise 9**Let , let be a -finite measure space, let be a measurable function, and let . Show that the following claims are equivalent (up to changes in the implied constants in the asymptotic notation):

- (i) One has .
- (ii) For every subset of of finite measure, the function is absolutely integrable on , and

In view of this exercise, we see that it suffices to obtain the bound

for all Schwartz , all sets of finite measure, and all measurable functions . Actually only the restriction of to is relevant here, so one can view as a function just on if desired. The operator can be viewed as the quantisation of the (very rough) symbol , that is to say the indicator function of the region lying underneath the graph of :
A notable feature of the estimate (12) is that it enjoys *three* different symmetries (or near-symmetries), each of which is “non-compact” in the sense that it is parameterised by a parameter taking values in a non-compact space such as or :

- (i) (Translation symmetry) For any spatial shift , both sides of (12) remain unchanged if we replace by , the set by the translate , and the function by .
- (ii) (Dilation symmetry) For any scaling factor , both sides of (12) become multiplied by the same scaling factor if we replace by , by the dilate , and the function by .
- (iii) (Modulation symmetry) For any frequency shift , both sides of (12) remain (almost) unchanged if we replace by , do not modify the set , and replace the function by . (Technically the left-hand side changes because of an additional factor of , but this factor can be handled for instance by generalising the indicator function cutoff to a subindicator function cutoff that has the pointwise bound ; we will ignore this very minor issue here.)

Each of these symmetries corresponds to a different symmetry of phase space , namely spatial translation , dilation , and frequency translation respectively. As a general rule of thumb, if one wants to prove a delicate estimate such as (12) that is invariant with respect to one or more non-compact symmetries, then one should use tools that are similarly invariant (or approximately invariant) with respect to these symmetries. Thus for instance Littlewood-Paley theory or Calderón-Zygmund theory would not be suitable tools to use here, as they are only invariant with respect to translation and dilation symmetry but absolutely fail to have any modulation symmetry properties (these theories prescribe a privileged role to the frequency origin, or equivalently they isolate functions of mean zero as playing a particularly important role).

Besides the need to respect the symmetries of the problem, one of the main difficulties in establishing (12) is that the expression , couples together the function with the function in a rather complicated way (via the frequency variable ). We would like to try to decouple this interaction by making and instead interact with simpler objects (such as “wave packets”), rather than being coupled directly to each other. To motivate the decomposition to use, we begin with a heuristic discussion. The first main idea is to temporarily work in the (non-invertible) coordinate system of phase space rather than in order to simplify the constraint to the simple geometric region of a half-plane (this coordinate system is of course a terrible choice for most of the other parts of the argument, but is the right system to use for the frequency decompositions we will now employ). In analogy to the Whitney type decompositions used in Notes 1, one can split

for almost all choices of and (at least if have the same sign), where range over pairs of dyadic intervals that are “close” in the sense that and that and are not adjacent, but their parents are adjacent, and with to the left of . (Here it is convenient to work with half-open dyadic intervals , to avoid issues with overlap.) If one ignores the caveats and blindly substitutes in the decomposition (13), the expression in the left of (12) becomes To decouple further, we will try to decompose into “rank one” operators. More precisely, we manipulate where we use the notation . It will be convenient to try to discretise this integral average. From the uncertainty principle, modifying by should only modify approximately by a phase, so the integral here is roughly constant at spatial scales . So we heuristically have If we now define a*tile*to be a rectangle in phase space of the form where are dyadic intervals and with unit area , we see that every in the above sum is associated to a tile . The interval is then similarly assocated to a nearby tile , and we write to indicate the relationship between the two tiles (they share the same spatial interval , but lies just above ). We can then approximately write the left-hand side of (12) as where is an -normalised “wave packet” that is roughly localised to in phase space. This approximate form of (12) has achieved the goal of decoupling the function from the data , as they both now interact with the tile pair rather than through each other. Note also that the set of tiles obeys an approximate version of the three symmetries that (12) does. Firstly, the set of tiles is invariant under dilations if is a power of two; secondly, once one fixes the scales of the tiles, the remaining set of tiles is invariant under spatial translations by integer multiples of the spatial scale , and under frequency translations by integer multiples of . (We will need the discrete and nested nature of the tiles for some subsequent combinatorial arguments, and it turns out to be worthwhile to accept a slightly degraded form of the three basic symmetries of the problem in return for such a discretisation.)

We now make the above heuristic decomposition rigorous. For any dyadic interval , let denote the left child interval, and the right child interval. We fix a bump function supported on normalised to have norm ; henceforth we permit all implied constants in the asymptotic notation to depend on . For each interval let denote the rescaled function

noting that this is a bump function supported on . We will establish the estimate where ranges over all dyadic intervals. We assume (15) for now and see why it implies (12). The left-hand side of (15) is not quite dilation or frequency modulation invariant, but we can fix this by an averaging argument as follows. Applying the modulation invariance, we see for any that since we thus have We temporarily truncate to a finite range of scales, and use the triangle inequality, to obtain for any finite . For fixed , the expression is periodic in with period , with average equal to which we can rewrite as which one can rewrite further (using the change of variables ) as whereHence if we average over all in (say) , we conclude that

and hence on sending to infinity Using dilation symmetry, we also see that for any . Averaging this for with Haar measure , we conclude that But as is a bump function supported in , one has The quantity is a non-zero constant, hence which is (12).It remains to prove (15). As in the heuristic discussion, we approximately decompose the convolution into a sum over tiles. We have

Motivated by this, we define as before a*tile*to be a rectangle with dyadic intervals with ; we also split each such tile into an upper half and a lower half . We refer to as the

*spatial scale*of the tile, and the reciprocal as the

*frequency scale*. For each tile define the wave packet which is a Schwartz function with Fourier support in (in fact it is supported in ) that is normalised to have norm and is localised spatially near , so morally it has “phase space support in “. We will later establish the estimate for all and sets of finite measure (cf. (14)), where ranges over the set of all tiles. For now, we show why this estimate implies (15) and hence (12). Just as (12) was obtained from (15) by averaging over dilation and frequency modulations, we shall recover (15) from (17) by averaging over spatial translations. As before, we first temporarily restrict the size range of and use the triangle inequality to obtain Applying translation symmetry, we conclude that for any . The left-hand side may be rewritten as where we extend the definition of to translated tiles in the obvious fashion. The expression inside the absolute values is periodic in with period , and averages to which by (16) simplifies to and so on averaging in and then sending to infinity we recover (15).

It remains to establish (17). It is convenient to introduce the sets

so that the target estimate (17) simplifies slightly to As advertised, we have now decoupled the influences of and the influences of (which determine the sets ), as these quantities now only directly interact with the wave packets , rather than with each other. Moreover, in some sense only interacts with the lower half of the tile (as this is where is concentrated), while and only interact with the upper half of the tile.One advantage of this “model” formulation of the problem is that one can naturally build up to the full problem by trying to establish estimates of the form

where is some smaller set of tiles. For instance, if we can prove (19) for all finite collections of tiles, then by monotone convergence we recover the required estimate.The key problem here is that tiles have three degrees of freedom: scale, spatial location, and frequency location, corresponding to the three symmetries of dilation, spatial translation, and frequency modulation of the original estimate (12). But one can warm up by looking at families of tiles that only exhibit two or fewer degrees of freedom, in a way that slowly builds up the various techniques we will need to apply to establish the general case:

**The case of a single tile** We begin with the simplest case of a single tile (so that there are zero degrees of freedom):

**The case of separated tiles of fixed scale** Now we let be a collection of tiles all of a fixed spatial scale (so that (so that we have the two parameters of spatial and frequency location, but not the scale parameter). Among other things, this makes the tiles in essentially disjoint (i.e., disjoint ignoring sets of measure zero). This disjointness manifests itself in two useful ways. Firstly, we claim that we can improve the trivial bound

Now let us see why (24) is true. To motivate the argument, suppose that had no tail outside of , so that one could replace to in (22). Then would have

and as the tiles are all essentially disjoint the claim (24) would then follow from summing in , since each contributes to at most one of the sets . Now we have to deal with the contribution of the tails. We can bound For each , there is at most one dyadic interval of the fixed length such that . Thus in the above sum is fixed, and only can vary; from (22) we then see that , giving (24).Now we prove (25). The intuition here is that the essential disjointness of the tiles make the approximately orthogonal, so that (25) should be a variant of Bessel’s inequality. We exploit this approximate orthogonality by a method, which we perform here explicitly. By duality we have

for some coefficients with , so by Cauchy-Schwarz it suffices to show that The left-hand side expands as From the Fourier support of we see that the inner product vanishes unless the intervals overlap which by the equal sizes of force . In this case we can use (22) to bound the inner product by and then a routine application of Schur’s test gives (26). This establishes (25), giving (19) in the case of tiles of equal dimensions.
**The case of a regular -tree**

Now we attack some cases where the tiles can vary in scale. In phase space, a key geometric difficulty now arises from the fact that tiles may start partially overlapping each other, in contrast to the previous case in which the essential disjointness of the tile set was crucial in establishing the key estimates (24), (25). However, because we took care to restrict the intervals of the tiles to be dyadic, there are only a limited number of ways in which two tiles can overlap. Given two rectangles and , we define the relation if and ; this is clearly a partial order on rectangles. The key observation is as follows: if two tiles overlap, then either or . Similarly if are replaced by their upper tiles or by their lower tiles . Note that if are tiles with , then one of or holds (and the only way both inequalities can hold simultaneously is if ).

As was first observed by Fefferman, a key configuration of tiles that needs to be understood for these sorts of problems is that of a *tree*.

**Definition 10**Let be a tile. A

*tree with top*is a collection of tiles with the property that for all . (For minor technical reasons it is convenient to not require the top to actually lie in the tree , though this is often the case.) We write for the spatial support of the tree, and for the frequency support of the tree top. If we in fact have for all , we say that is a -tree; similarly if for all , we say that is a -tree. (Thus every tree can be partitioned into a -tree and a -tree with the same top as the original tree.)

The tiles in a tree can vary in scale and in spatial location, but once these two parameters are given, the frequency location is fixed, so a tree can again be viewed as a “two-parameter” subfamily of the three-parameter family of tiles.

We now prove (19) in the case when is a -tree , thus for all . Here, the factors will all “collide” with each other and there will be no orthogonality to exploit here; on the other hand, there will be a lot of “disjointness” in the that can be exploited instead.

To illustrate the key ideas (and to help motivate the arguments for the general case) we will also make the following “regularity” hypotheses: there exists two quantities (which we will refer to as the *energy* and *mass* of the tree respectively) for which we have the upper bounds

We also assume that we have the reverse bounds for the tree top:

and It will be through a combination of both these lower and upper bounds that we can obtain a bound (19) that does not involve either or .
We will use (27), (28), (29) to establish the *tree estimate*

Note from (30) and Cauchy-Schwarz that

and from (31) and Cauchy-Schwarz one similarly has and so (32) recovers the desired estimate (19).It remains to establish the tree estimate (32). It will be convenient to use the tree to partition the real line into dyadic intervals that are naturally “adapted to” the geometry of the tree (or more precisely to the spatial intervals of the tree) in a certain way (in a manner reminiscent of a Whitney decomposition).

**Exercise 11 (Whitney-type decomposition associated to a tree)**Let be a non-empty tree. Show that there exists a family of dyadic intervals with the following properties:

- (i) The intervals in form a partition of (up to sets of measure zero).
- (ii) For each and any with , we have .
- (iii) For each , there exists with and .

We can of course assume that the tree is non-empty, since (32) is trivial for empty sets of tiles. We apply the partition from Exercise 11. By the triangle inequality, we can bound the left hand side of (32) by

which by (27), (22) may be bounded by We first dispose of the narrow tiles in which . By Exercise 11(ii) this forces . From (28) we have (say). For each fixed spatial scale , the intervals in the tree are all essentially disjoint, so a routine calculation then shows (say), so that which from Exercise 11(ii) implies that the contribution of the case to (32) is acceptable.Now we consider the wide tiles in which . From Exercise 11(ii) this case is only possible if and . Thus the are now restricted to an interval of length , and it will suffice to establish the local estimate

for each . Note that for each fixed spatial scale , there is at most one choice of frequency interval with and , thus for fixed the set is independent of . We may then sum in for each such scale to conclude Now we make the crucial observation that in a -tree , the intervals are all essentially disjoint, hence the are disjoint as well. As these sets are also contained in , we conclude that From Exercise 11(iii) and (29) (choosing a tile with spatial scale and within of , and with for the tile provided by Exercise 11(iii)) we have giving the claim.
**The case of a regular -tree**

We now complement the previous case by establishing (19) for (certain types of) -trees . The situation is now reversed: there is a lot of “collision” in the , but on the other hand there is now some “orthogonality” in the that can be exploited.

As before we will assume some regularity on the -tree , namely that there exist for which one has the upper bounds

for all (note this is slightly stronger than (27)), as well as the bound (29) for any tile with for some . We complement this with the matching lower bounds and (31).As before we will focus on establishing the tree estimate (32). From (31) and Cauchy-Schwarz as before we have

As we now have a -tree, the tiles become disjoint (up to null sets), and we can obtain an almost orthogonality estimate:

**Exercise 12 (Almost orthogonality)**For any -tree , show that for all complex numbers , and use this to deduce the Bessel-type inequality

From this exercise and (34) we see that

and so the desired bound (19) will follow from the tree estimate (32).In this case it will be convenient to linearise the sum to remove the absolute value signs; more precisely, to show (32) it suffices to show that

for any complex numbers of magnitude . Again we may assume that the tree is non-empty, and use the partition from Exercise 11, to split the left-hand side as The contribution of the narrow tiles can be disposed of as before without any additional difficulty, so we focus on estimating the contribution of the wide tiles. As before, in order for this sum to be non-empty has to be contained in an neighbourhood of .The main difficulty here is the dependence of on . We rewrite

so that the above expression can be written as Now for a key geometric observation: the intervals are nested (and decrease when increases), so the condition is equivalent to a condition of the form for some scale depending on . Thus the above sum can be written as One can bound the integrand here by a “maximal Calderón-Zygmund operator” which is basically a sup over truncations of the “(modulated) pseudodifferential operator” The point of this formulation is that the integrand can now be expressed as a sort of “Littlewood-Paley projection” of the function to the region of frequency space corresponding to those intervals with :

**Exercise 13**Establish the pointwise estimate for all where ranges over all intervals (not necessarily dyadic) containing .

From (29) and Exercise 11(iii) as before we have

and so we can bound the expression (35) by which one can bound in terms of the Hardy-Littlewood maximal function of , followed by Cauchy-Schwarz and the Hardy-Littlewood inequality, and finally Exercise 12, as On the other hand, from (33) we have for every . By grouping the tiles in according to their maximal elements (which necessarily have essentially disjoint spatial intervals) and applying the above inequality to each such group and summing, we conclude that and the tree estimate (32) follows.
**The general case**

We are now ready to handle the general case of an arbitrary finite collection of tiles. Motivated by the previous discussion, we define two quantities:

**Definition 14 (Energy and mass)**For any non-empty finite collection of tiles, we define the

*energy*to be the quantity where ranges over all -trees in , and the

*mass*to be the quantity where is the set (thus for instance ). By convention, we declare the empty set of tiles to have energy and mass equal to zero.

Note here that the definition of mass has been modified slightly from previous arguments, in that we now use instead of . However, this turns out to be an acceptable modification, in the sense that we still continue to have the analogue of (32):

**Exercise 15 (Tree estimate)**If is a tree, show that

Since has an norm of , we also have the trivial bound

for any finite collection of tiles .The strategy is now to try to partition an arbitrary family of tiles into collections of disjoint trees (or “forests”, if you will) whose energy , mass , and spatial scale are all under control, apply Exercise 15 to each tree, and sum. To do this we rely on two key selection results, which are vaguely reminiscent of the Calderón-Zygmund decomposition:

**Proposition 16 (Energy selection)**Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

**Proposition 17 (Mass selection)**Let be a finite collection of tiles with for some . Then one can partition into a collection of disjoint trees with together with a remainder set with

(In these propositions, “disjoint” means that any given tile belongs to at most one of the trees in ; but the tiles in one tree are allowed to overlap the tiles in another tree.)

Let us assume these two propositions for now and see how these (together with Exercise 15) establishes the required estimate (19) for an arbitrary collection of tiles. We may assume without loss of generality that and are non-zero. Rearranging the above two propositions slightly, we see that if is a finite collection of tiles such that

for some integer then after applying Proposition 16 followed by Proposition 17, we can partition into a disjoint collection of trees with together with a remainder with Note that any finite collection of tiles will obey (38) for some sufficiently large and negative . Starting with this and then iterating indefinitely, and discarding any empty families, we can therefore partition any finite collection of tiles as where are collections of trees (empty for all but finitely many ) such that and (39) holds, and is a residual collection of tiles with We can then bound the left-hand side of (19) by From Exercise 15 applied to individual tiles and (41) we see that the second term in this expression vanishes. For the first term, we use Exercise 15, (40), (36) to bound this sum by which by (39) is bounded by which sums to as required.It remains to establish the energy and mass selection lemmas. We begin with the mass selection claim, Proposition 17. Let denote the set of all tiles with for some and such that

Let denote the set of tiles in that are maximal with respect to the tile partial order. (Note that the left-hand side of (42) is bounded by , so there is an upper bound to the spatial scales of the tiles involved here.) Then every tile in is either less than or equal to a tile in , or is such that for all . Thus if we let be the collection of tiles of the second form, and let be the collection of trees with tree top associated to each (selected greedily, and in arbitrary order, subject of course to the requirement that no tile belongs to more than one tree), we obtain the required partition with and it remains to establish the bound This will be a (rather heavily disguised) variant of the Hardy-Littlewood maximal inequality. By construction, the tree tops are essentially disjoint, and one has for all such tree tops. To motivate the argument, suppose for sake of discussion that we had the stronger estimate By the essential disjointness of the , the sets are also essentially disjoint subsets of , hence and the claim (43) would then follow. Now we do not quite have (44); but from the pigeonhole principle we see that for each there is a natural number such that (say), where denotes the interval with the same center as but times the length (this is not quite a dyadic interval). We now restrict attention to those associated to a fixed choice of . Let denote the corresponding dilated tiles, then we have for each with .Unfortunately, the are no longer disjoint. However, by the greedy algorithm (repeatedly choosing maximal tiles (in the tile ordering)), we can find a collection such that

- (i) All the dilated tree tops are essentially disjoint.
- (ii) For every with , there is such that intersects and .

From property (i) and (45) we have

On the other hand, from property (ii) we see that the sum of all the for all with associated to a single is . Putting the two statements together we see that and on summing in we obtain the required claim (43).Finally, we prove the energy selection claim, Proposition 16. The basic idea is to extract all the high-energy trees from in such a way that the -tree component of those trees are sufficiently “disjoint” from each other that a useful Bessel inequality, generalising Exercise 12, may be deployed. Implementing this strategy correctly turns out however to be slightly delicate. We perform the following iterative algorithm to generate a partition

as well as a companion collection of -trees as follows.

- Step 1. Initialise and .
- Step 2. If then STOP. Otherwise, go on to Step 3.
- Step 3. Since we now have , contains a -tree for which
Among all such , choose one for which the midpoint of the frequency is
*minimal*. (The reason for this rather strange choice will be made clearer shortly.) - Step 4. Add to , add the larger tree (with the same top as ) to , then remove from . We also remove the adjacent trees and from and also place them into . Now return to Step 2.

This procedure terminates in finite time to give a partition (46) with , and with the trees coming in triplets all associated to a -tree in with the same spatial scale as , with all the -trees disjoint and obeying the estimates

(both the upper and lower bounds will be important for this argument). It will then suffice to show that by (48), it then suffices to show the Bessel type inequalityNow we make a crucial observation: not only are the trees in disjoint (in the sense that no tile belongs to two of these trees), but the lower tiles are also essentially disjoint. Indeed we claim an even stronger disjointness property: if , are such that , then is not only disjoint from the larger dyadic interval , but is in fact disjoint from the even larger interval . To see this, suppose for contradiction that and . There are three possibilities to rule out:

- is equal to . This can be ruled out because any two lower frequency intervals associated to a -tree are either equal or disjoint.
- was selected after was. To rule this out, observe that contains the parent of , and hence , , or . Thus, when was selected, should have been placed with one of the three trees associated to and would therefore not have been available for inclusion into , a contradiction.
- was selected before was. If this case held, then the midpoint of would have to be greater than or equal to that of , otherwise would not have a minimal midpoint at the time of its selection. But is contained in , which is contained in , which lies below , which contains , which contains the midpoint of ; thus the midpoint of lies strictly below that of , a contradiction.

If the were perfectly orthogonal to each other, this disjointness would be more than enough to establish (49). Unfortunately we only have imperfect orthogonality, and we have to work a little harder. As usual, we turn to a type argument. We can write the left-hand side of (49) as

so by Cauchy-Schwarz it suffices to show that By the triangle inequality, the left-hand side may be bounded by As has Fourier support in , we see that vanishes unless and overlap. By symmetry it suffices to consider the cases and .First let us consider the contribution of . Using Young’s inequality and symmetry, we may bound this contribution by

A direct calculation using (22) reveals that so the contribution of this case is at most as desired.Now we deal with the case when , which by the preceding discussion implies that and lies outside of . Here we use (37) to bound

andand then we can bound this contribution by

Direct calculation using (22) reveals that (say), and also so we obtain a bound of which is acceptable by (48). This finally finishes the proof of Proposition 16, which in turn completes the proof of Carleson’s theorem.

**Remark 18**The Lacey-Thiele proof of Carleson’s theorem given above relies on a decomposition of a tileset in a way that controls both energy and mass. The original proof of Carleson dispenses with mass (or with the function ), and focuses on controlling maximal operators that (in our notation) are basically of the form To control such functions, one iterates a decomposition similar to Proposition 16 to partition into trees with good energy control, and establishes pointwise control of the contribution of each tree outside of an exceptional set. See Section 4 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis. The proof of Fefferman takes the opposite tack, dispensing with energy and focusing on bounding the operator norm of the linearised operator Roughly speaking, the strategy is to iterate a version of Proposition 16 for partition into “forests” of disjoint trees, though in Fefferman’s argument some additional work is invested into obtaining even better disjointness properties on these forests than is given here. See Section 5 of this article of Demeter for an exposition in the simplified setting of Walsh-Fourier analysis.

A modification of the above arguments used to establish the weak estimate can also establish restricted weak-type estimates for any :

**Exercise 19**For any sets of finite measure, and any measurable function , show that for any . (

*Hint:*repeat the previous analysis with , but supplement it with an additional energy bound coming from a suitably localised version of Exercise 12.)

The bound (51) is also true for , yielding Hunt’s theorem, but this requires some additional arguments of Calderón-Zygmund type, involving the removal of an exceptional set defined using the Hardy-Littlewood maximal function:

**Exercise 20 (Hunt’s theorem)**Let be of finite non-zero measure, and let be a measurable function. Let be the exceptional set for a large absolute constant ; note from the Hardy-Littlewood inequality that if is large enough.

- (i) If be a finite collection of tiles with for all , show that
(
*Hint:*By using (22) and the disjointness of the when is fixed, first establish the estimate whenever is a natural number and is an interval with and .) - (ii) If be a finite collection of tiles with for all , show that . (For a given tree , one can introduce the dyadic intervals as in Exercise 11, then perform a Calderón-Zygmund type decomposition to , splitting it into a “good” function bounded pointwise by , plus “bad functions” that are supported on the intervals and have mean zero. See this paper of Grafakos, Terwilleger, and myself for details.)
- (iii) For any finite collection of tiles for all
- (iv) Show that (51) holds for all , and conclude Theorem 2(iii).

**Remark 21**The methods of time-frequency analysis given here can handle several other operators that, like the Carleson operator, exhibit scaling, translation, and frequency modulation symmetries. One model example is the bilinear Hilbert transform for . The methods in this set of notes were used by Lacey and Thiele to establish the estimates for with (these estimates have since been strengthened and extended in a number of ways). We only give the briefest of sketches here. Much as how Carleson’s theorem can be reduced to a bound (19), the above estimates can be reduced to the estimation of a model sum where is a certain collection of triples of tiles with common spatial interval and frequency intervals varying along a certain one-parameter family for each fixed choice of spatial interval. One then uses a variant of Proposition 16 to partition into “-trees”, “-trees”, and “-trees”, the contribution of each of which can be controlled by the energies of on such trees, times the length of the spatial support of the tree, in analogy with Exercise 15. See for instance the text of Muscalu and Schlag for more discussion and further results.

**Remark 22**The concepts of mass and energy can be abstracted into a framework of spaces associated to outer measures (as opposed to the classical setup of spaces associated to countably additive measures), in which the mass and energy selection propositions can be viewed as consequences of an abstract Carleson embedding theorem, and the calculations establishing estimates such as (19) from such propositions and a tree estimate can be viewed as consequences of an “outer Hölder inequality”. See this paper of Do and Thiele for details.

### 247B, Notes 3: pseudodifferential operators

In contrast to previous notes, in this set of notes we shall focus exclusively on Fourier analysis in the one-dimensional setting for simplicity of notation, although all of the results here have natural extensions to higher dimensions. Depending on the physical context, one can view the physical domain as representing either space or time; we will mostly think in terms of the former interpretation, even though the standard terminology of “time-frequency analysis”, which we will make more prominent use of in later notes, clearly originates from the latter.

In previous notes we have often performed various localisations in either physical space or Fourier space , for instance in order to take advantage of the uncertainty principle. One can formalise these operations in terms of the functional calculus of two basic operations on Schwartz functions , the *position operator* defined by

and the *momentum operator* , defined by

(The terminology comes from quantum mechanics, where it is customary to also insert a small constant on the right-hand side of (1) in accordance with de Broglie’s law. Such a normalisation is also used in several branches of mathematics, most notably semiclassical analysis and microlocal analysis, where it becomes profitable to consider the semiclassical limit , but we will not emphasise this perspective here.) The momentum operator can be viewed as the counterpart to the position operator, but in frequency space instead of physical space, since we have the standard identity

for any and . We observe that both operators are formally self-adjoint in the sense that

for all , where we use the Hermitian inner product

Clearly, for any polynomial of one real variable (with complex coefficients), the operator is given by the spatial multiplier operator

and similarly the operator is given by the Fourier multiplier operator

Inspired by this, if is any smooth function that obeys the derivative bounds

for all and (that is to say, all derivatives of grow at most polynomially), then we can define the spatial multiplier operator by the formula

one can easily verify from several applications of the Leibniz rule that maps Schwartz functions to Schwartz functions. We refer to as the *symbol* of this spatial multiplier operator. In a similar fashion, we define the Fourier multiplier operator associated to the symbol by the formula

For instance, any constant coefficient linear differential operators can be written in this notation as

however there are many Fourier multiplier operators that are not of this form, such as fractional derivative operators for non-integer values of , which is a Fourier multiplier operator with symbol . It is also very common to use spatial cutoffs and Fourier cutoffs for various bump functions to localise functions in either space or frequency; we have seen several examples of such cutoffs in action in previous notes (often in the higher dimensional setting ).

We observe that the maps and are ring homomorphisms, thus for instance

and

for any obeying the derivative bounds (2); also is formally adjoint to in the sense that

for , and similarly for and . One can interpret these facts as part of the functional calculus of the operators , which can be interpreted as densely defined self-adjoint operators on . However, in this set of notes we will not develop the spectral theory necessary in order to fully set out this functional calculus rigorously.

In the field of PDE and ODE, it is also very common to study *variable coefficient* linear differential operators

where the are now functions of the spatial variable obeying the derivative bounds (2). A simple example is the quantum harmonic oscillator Hamiltonian . One can rewrite this operator in our notation as

and so it is natural to interpret this operator as a combination of both the position operator and the momentum operator , where the *symbol* this operator is the function

Indeed, from the Fourier inversion formula

for any we have

and hence on multiplying by and summing we have

Inspired by this, we can introduce the *Kohn-Nirenberg quantisation* by defining the operator by the formula

whenever and is any smooth function obeying the derivative bounds

for all and (note carefully that the exponent in on the right-hand side is required to be uniform in ). This quantisation clearly generalises both the spatial multiplier operators and the Fourier multiplier operators defined earlier, which correspond to the cases when the symbol is a function of only or only respectively. Thus we have combined the physical space and the frequency space into a single domain, known as phase space . The term “time-frequency analysis” encompasses analysis based on decompositions and other manipulations of phase space, in much the same way that “Fourier analysis” encompasses analysis based on decompositions and other manipulations of frequency space. We remark that the Kohn-Nirenberg quantization is not the only choice of quantization one could use; see Remark 19 below.

- (i) Show that for obeying (6), that does indeed map to .
- (ii) Show that the symbol is uniquely determined by the operator . That is to say, if are two functions obeying (6) with for all , then . (
*Hint:*apply to a suitable truncation of a plane wave and then take limits.)

In principle, the quantisations are potentially very useful for such tasks as inverting variable coefficient linear operators, or to localize a function simultaneously in physical and Fourier space. However, a fundamental difficulty arises: map from symbols to operators is now no longer a ring homomorphism, in particular

in general. Fundamentally, this is due to the fact that pointwise multiplication of symbols is a commutative operation, whereas the composition of operators such as and does not necessarily commute. This lack of commutativity can be measured by introducing the *commutator*

of two operators , and noting from the product rule that

(In the language of Lie groups and Lie algebras, this tells us that are (up to complex constants) the standard Lie algebra generators of the Heisenberg group.) From a quantum mechanical perspective, this lack of commutativity is the root cause of the uncertainty principle that prevents one from simultaneously localizing in both position and momentum past a certain point. Here is one basic way of formalising this principle:

**Exercise 2 (Heisenberg uncertainty principle)** For any and , show that

(*Hint:* evaluate the expression in two different ways and apply the Cauchy-Schwarz inequality.) Informally, this exercise asserts that the spatial uncertainty and the frequency uncertainty of a function obey the Heisenberg uncertainty relation .

Nevertheless, one still has the correspondence principle, which asserts that in certain regimes (which, with our choice of normalisations, corresponds to the high-frequency regime), quantum mechanics continues to behave like a commutative theory, and one can sometimes proceed as if the operators (and the various operators constructed from them) commute up to “lower order” errors. This can be formalised using the *pseudodifferential calculus*, which we give below the fold, in which we restrict the symbol to certain “symbol classes” of various orders (which then restricts to be pseudodifferential operators of various orders), and obtains approximate identities such as

where the error between the left and right-hand sides is of “lower order” and can in fact enjoys a useful asymptotic expansion. As a first approximation to this calculus, one can think of functions as having some sort of “phase space portrait” which somehow combines the physical space representation with its Fourier representation , and pseudodifferential operators behave approximately like “phase space multiplier operators” in this representation in the sense that

Unfortunately the uncertainty principle (or the non-commutativity of and ) prevents us from making these approximations perfectly precise, and it is not always clear how to even define a phase space portrait of a function precisely (although there are certain popular candidates for such a portrait, such as the FBI transform (also known as the Gabor transform in signal processing literature), or the Wigner quasiprobability distribution, each of which have some advantages and disadvantages). Nevertheless even if the concept of a phase space portrait is somewhat fuzzy, it is of great conceptual benefit both within mathematics and outside of it. For instance, the musical score one assigns a piece of music can be viewed as a phase space portrait of the sound waves generated by that music.

To complement the pseudodifferential calculus we have the basic *Calderón-Vaillancourt theorem*, which asserts that pseudodifferential operators of order zero are Calderón-Zygmund operators and thus bounded on for . The standard proof of this theorem is a classic application of one of the basic techniques in harmonic analysis, namely the exploitation of *almost orthogonality*; the proof we will give here will achieve this through the elegant device of the Cotlar-Stein lemma.

Pseudodifferential operators (especially when generalised to higher dimensions ) are a fundamental tool in the theory of linear PDE, as well as related fields such as semiclassical analysis, microlocal analysis, and geometric quantisation. There is an even wider class of operators that is also of interest, namely the Fourier integral operators, which roughly speaking not only approximately multiply the phase space portrait of a function by some multiplier , but also move the portrait around by a canonical transformation. However, the development of theory of these operators is beyond the scope of these notes; see for instance the texts of Hormander or Eskin.

This set of notes is only the briefest introduction to the theory of pseudodifferential operators. Many texts are available that cover the theory in more detail, for instance this text of Taylor.

** — 1. Pseudodifferential operators — **

The Kohn-Nirenberg quantisation was defined above for any symbol obeying the very loose estimates (6). To obtain a clean theory it is convenient to focus attention to more restrictive classes of symbols. There are many such classes one can consider, but we shall only work with the classical symbol classes:

**Definition 3 (Classical symbol class)** Let . A function is said to be a (classical) *symbol of order * if it is smooth and one has the derivative bounds

for all and . (Informally: “behaves like” , with each derivative in the frequency variable gaining an additional decay factor of , but with each derivative in the spatial variable exhibiting no gain.) The collection of all symbols of order will be denoted . If is a symbol of order , the operator is referred to as a pseudodifferential operator of order .

As a major motivating example, any variable coefficient linear differential operator (3) of order will be a pseudodifferential operator of order , so long as the coefficients obey the bounds

for , , and . (This would then exclude operators with unbounded coefficients, such as the harmonic oscillator, but can handle localised versions of these operators, and in any event there are other symbol classes in the literature that can be used to handle certain types of differential operators with unbounded coefficients.) Also, a fractional differential operator such as will be a pseudodifferential operator of order for any . We refer the reader to Stein’s text for a discussion of more exotic symbol classes than the one given here.

The space of pseudodifferential operators of order form a vector space that is non-decreasing in : any pseudodifferential operator of order is automatically also of order for any . (Thus, strictly speaking, it would be more appropriate to say that is a pseudodifferential operator of order *at most* if , but we will not adopt this convention for brevity.) The intuition to keep in mind is that a pseudodifferential operator of order behaves like a variable coefficient linear differential operator of order , with the obvious caveat that in the latter case is restricted to be a natural number, whereas in the former can be any real number. This intuition will be supported by the various components of the *pseudodifferential calculus* that we shall develop later, for instance we will show that the composition of a pseudodifferential operator of order and a pseudodifferential operator of order is a pseudodifferential operator of order .

Before we set out this calculus, though, we give a fundamental estimate, which can be viewed as a variable coefficient version of the Hörmander-Mikhlin multiplier theorem:

**Theorem 4 (Calderón-Vallaincourt theorem)** Let , and let be a pseudodifferential operator of order . Then one has

for all . In particular, extends to a bounded linear operator on each space with .

We now begin the proof of this theorem. The first step is a dyadic decomposition of Littlewood-Paley type. Let be a bump function supported on that equals on . Then we can write

where

and

for . From dominated convergence, implies that

pointwise for . Thus by Fatou’s lemma, it will suffice to show that

uniformly in . Observe from Definition 3 and the Leibniz rule that each is supported in the strip and obeys the derivative estimates

From (5) and Fubini’s theorem we can express as an integral operator

for , where the integral kernel is given by the formula

We can obtain several estimates on this kernel. Firstly, from the triangle inequality, (11), and the support property of we have the trivial bound

When , we may integrate by parts repeatedly, gaining factors of at the cost of applying a derivative to for each such factor, and then if one applies the triangle inequality, (11), and support property of as before we conclude that

for any ; by combining the estimates, we conclude that

for all and . Differentiating (13) in or , and repeating the above arguments, we also obtain the estimates

Since the function has an norm of for any , we now see from (12) and Young’s inequality that

Thus each component of is under control (so for instance we may now discard the term); the difficulty is to sum in without losing any -dependent factors. To do this, we first observe from (14), (15) and a routine summation of that the total kernel (which is the integral kernel for ) obeys the pointwise bound

as well as the pointwise derivative bound

These are the usual kernel bounds for one-dimensional Calderón-Zygmund theory. From that theory we conclude that in order to prove the estimate (10), it suffices to establish the case

From (17), we have already established a preliminary bound

for each , but a direct application of the triangle inequality will cost us a -dependent factor, which we cannot afford. To do better, we need some “orthogonality” between the . The intuition here is that each component only interacts with the portion of that corresponds to frequencies of magnitude , and that these regions are somehow “orthogonal” to each other. Informally, this suggests that

where is something like a Littlewood-Paley projection operator to frequencies . If we accepted this heuristic, then we could informally use the Littlewood-Paley inequality (or decoupling theory) to calculate

It is possible to make this approximation (19) more precise and establish (18): see Exercise 7. However, we will take the opportunity to showcase another elegant way to exploit “almost orthogonality”, known as the Cotlar-Stein lemma:

**Lemma 5 (Cotlar-Stein lemma)** Let be bounded linear maps from one Hilbert space to another . Suppose that the maps obey the operator norm bounds

for all and some , and similarly the maps obey the operator norm bounds

for all and some . Then we have

Note that if the had pairwise orthogonal ranges then would vanish whenever , and similarly if the had pairwise orthogonal coranges then the would vanish whenever . Thus the hypotheses of the Cotlar-Stein lemma are indeed some quantitative form of “almost orthogonality” of the .

*Proof:* We use the method (which asserts that for a bounded linear map between Hilbert spaces, the operator norm of or is the square of that of or ). Applying this method to a single operator we have

Taking geometric means we have

then by the triangle inequality we have . This loses a factor of over the trivial bound. We can reduce this loss to by a further application of the method as follows. Writing , we have

and similarly

so on taking geometric means we have .

We now reduce the loss in all the way to by iterating the method (this is an instance of a neat trick in analysis, namely the tensor power trick). For any integer that is a power of two, we see from iterating the method that

(In fact, this identity holds for any natural number , not just powers of two, as can be seen from spectral theory, but powers of two will suffice for the argument here.) We expand out the right-hand side and bound using the triangle inequality by

On the one hand, we can bound the norm by

grouping things slightly differently and using (22) twice, we can also bound this norm by

Taking the geometric mean, we can bound the norm by

Summing in using (20), then in using (21) and so forth until the sum (which is just summed with a loss of ), we conclude that

Sending , we obtain the claim.

**Remark 6** There is a refinement of the Cotlar-Stein lemma for infinite series of operators obeying the hypotheses of the lemma, in which it is shown that the series actually converges in the strong operator topology (though not necessarily in the operator norm topology); this refinement was first observed by Meyer, and can be found for instance in this note of Comech.

We will shortly establish the bounds

for any . The claim (18) then follows from the Cotlar-Stein lemma (using (17) to dispose of the term).

We shall just show that

when ; the case is treated similarly, as is the treatment of (in fact this latter operator vanishes when , though we will not really need this fact). We have

where

A direct application of (15) and the triangle inequality gives the bounds

(say), which when combined with Young’s inequality does not give the desired gain of . To recover this gain we begin integrating by parts. From (13) we have

Note that obeys similar estimates to but with an additional gain of . Thus the contribution of this term to will be acceptable. The contribution of the other term, after an integration by parts, is

The kernel obeys the same bounds as (15) but with an additional gain of ; similarly from (16) the expression obeys the same bounds as (15) but with an additional loss of . The claim follows. This concludes the proof of the Calderón-Vaillancourt theorem.

**Exercise 7** With the hypotheses as above, and with a suitable Littlewood-Paley projection to frequencies , establish the operator norm bounds

for all and . Use this to provide an alternate proof of (18) that does not require the Cotlar-Stein lemma.

Now we give a preliminary composition estimate:

**Theorem 8 (Preliminary composition)** Let be a pseudodifferential operator of some order , and let be a pseudodifferential operator of some order . Then the composition is a pseudodifferential operator of order , thus there exists such that (note from Exercise 1 that is uniquely determined).

*Proof:* We begin with some technical reductions in order to justify some later exchanges of integrals. We can express the symbol as a locally uniform limit of truncated symbols as , where is a bump function equal to near the origin; from the product rule we see that the symbol estimates (8) are obeyed by the uniformly in as long as . If is Schwartz, then so is , and can be verified to converge pointwise to . If one can show that for some pseudodifferential operator of order , with all the required symbol estimates (8) on obeyed uniformly in , then the claim will follow by using the Arzelà-Ascoli theorem to extract a locally uniformly convergent susbequence of the and taking a limit. The upshot of this is that we may assume without loss of generality that the symbol is compactly supported in , so long as our estimates do not depend on the size of this compact support, but only on the constants in the symbol bounds (8) for .

Similarly, we may approximate locally uniformly as the limit of symbols that are compactly supprted in , which makes converge locally uniformly to ; from the compact support of this also shows that converges pointwise to . From the same limiting argument as before, we may thus assume that is compactly supported in , so long as our estimates do not depend on the size of this support, but only on the constants in the symbol bounds (8) for .

For , we have

hence on taking Fourier transforms

hence

and hence by Fubini’s theorem (and the compact support of and the Schwartz nature of ) we have , where

for all and , where the understanding is that the dependence of constants on is only through the symbol bounds (8) for these symbols.

From differentiation under the integral sign and integration by parts we obtain the Leibniz identities

and

From this and an induction on (varying as necessary, noting that if maps to and maps to ) we see that to prove (25) it suffices to do so in the case, thus we now only need to show that

Applying a smooth partition of unity in the variable to , it suffices to verify the claim in one of two cases:

- is supported in the region (so in particular ).
- is supported in the region .

(One can verify that applying the required cutoffs to do not significantly worsen the symbol estimates (8).) In the former case we write the left-hand side as

where

By repeating the proof of (14) we have

so from this and the symbol bound we obtain the claim in this case.

It remains to handle the latter case. Here we integrate by parts repeatedly in the variable to write the left-hand side of (26) as

for any . Then as before we can rewrite this as

where

By taking large enough we will eventually recover the bound

(in fact one can gain arbitrary powers of if desired), and so by repeating the previous arguments we also obtain the claim in this case.

The above proposition shows that if and then . The following exercise gives some refinements to this fact:

**Exercise 9 (Composition of pseudodifferential operators)** Let and for some .

- (i) Show that . (
*Hint:*reduce as before to the case where are compactly supported, and use the fundamental theorem of calculus to write , where . Then use the Fourier inversion formula, integration by parts, and arguments similar to those used to prove Theorem 8. - (ii) Show that , where and . (Hint: now apply the fundamental theorem of calculus once more to expand .)
- (iii) Check (i) and (ii) directly in the classical case when and for some smooth obeying the bounds (9) and for . Based on this, for any integer , make a prediction for an approximation to as a polynomial combination of the symbols arbitrary and finitely many of their derivatives which is accurate up to an error in . Then verify this prediction.

**Remark 10** From Exercise 9 we see that if are pseudodifferential operators of order respectively, then the commutator differs from by a pseudodifferential operator of order , where is the Poisson bracket

This approximate correspondence between the Lie bracket (which plays a fundamental role in the dynamics of quantum mechanics) and the Poisson bracket (which plays a fundamental role in the dynamics of classical mechanics) is one of the mathematical foundations of the correspondence principle relating quantum and classical mechanics, but we will not discuss this topic further here.

There is also a companion result regarding adjoints of pseudodifferential operators:

**Exercise 11 (Adjoint of pseudodifferential operator)** Let .

- (i) If is compactly supported, show that the function defined by
is also a symbol of order , and that is the adjoint of in the sense that

for all .

- (ii) Show that even if is not compactly supported, there is a unique pseudodifferential operator of order which is the adjoint of in the sense that
for all .

- (iii) Show that is a pseudodifferential operator of order .

Now we give some applications of the above pseudodifferential calculus.

**Exercise 12 (Pseudodifferential operators and Sobolev spaces)** For any and , define the Sobolev space to be the completion of the Schwartz functions with respect to the norm

- (i) If is a non-negative integer, show that
for any , thus in this case the Sobolev spaces agree (up to constants) with the classical Sobolev spaces (as discussed for instance in this set of notes).

- (ii) If is a pseudodifferential operator of some order , show that
for any , thus extends to a bounded linear map from to . (

*Hint:*use Theorem 4 and Theorem 8). - (iii) Let be a pseudodifferential operator of some order that obeys the strong ellipticity condition
for all . Establish the Garding inequality

for all and some depending only on . (

*Hint:*use Exercises 9, 11 to express as for some pseudodifferential operators of orders and respectively.) If , deduce also the variant inequality(possibly with slightly different choices of ).

The behaviour of pseudodifferential operators may be clarified by using a type of phase space transform, which we will call a *Gabor-type transform*.

**Exercise 13 (Gabor-type transforms and pseudodifferential operators)** Given any function with the normalisation , and any , define the *Gabor-type transform* by the formula

thus is the inner product of with the function , which is the “wave packet” formed from function by translating by and then modulating by . (Intuitively, measures the extent to which lives at spatial location and frequency location .) We also define the adjoint map for by the formula

- (i) Show that for any , is a Schwartz function on , thus is a linear map from to . Similarly, show that for , is a Schwartz function on , thus is a linear map from to .
- (ii) Establish the identity for any , and conclude inparticular that
for any , thus extends to a linear isometry from into .

- (iii) For any smooth compactly supported and , establish the identity
where is the (Kohn-Nirenberg) Wigner distribution of , defined by the formula

and is the phase space convolution

**Remark 14** When is a Gaussian, the transform is essentially the Gabor transform (in signal processing) or the FBI transform (in microlocal analysis), and is also closely related to the Bargmann transform in complex analysis. There are some technical advantages with working with Gaussian choices of , particularly with regards to the treatment of certain lower order terms in the pseudodifferential calculus; see for instance these notes of Tataru.

Note that is a Schwartz function on , and by the Fourier inversion formula it has unit mass: . (One also has the marginal distributions and , so would be a strong candidate for a “phase space probability distribution” for , save for the unfortunate fact that has no reason to be non-negative. But even with oscillation, still behaves like an approximation to the identity, so for slowly varying can be viewed as an approximation to . Thus, Exercise 13(iii) can be intuitively viewed as saying that behaves approximately like a multiplier in phase space:

Another informal way of viewing this assertion is that (for suitable choices of ) the translated and modulated functions can be viewed as approximate eigenfunctions of with eigenvalue . This is for instance consistent with the approximate functional calculus and that one saw in Exercises 9, 11. The exercise below gives another way to view this approximation:

**Exercise 15 ( bound)** Let be a smooth function obeying the “ bound”

for all and . Let and be as in Exercise 13. Show that there is a smooth kernel obeying the bounds

for any , such that

for any . (*Hint:* work first in the case when is compactly supported, where one can use Fubini’s theorem to derive an explicit integral expression for , which one can then control by various integrations by parts.) Use this to establish the bound

for any ; note that this gives an alternate proof of (18). (See also these notes of Tataru for further elaboration of this approach to pseudodifferential operators.)

As a sample application of the Gabor transform formalism we give a variant of the Garding inequality from Exercise 12(iii).

**Theorem 16 (Sharp Garding inequality)** Let be a pseudodifferential operator of order such that for all . Then one has

for all , where depends only on .

*Proof:* From Exercise 11 we see that is a pseudodifferential operator of order , hence by Exercise 12(ii) we have

Thus we may remove the imaginary part from and assume that is real and non-negative. Applying a smooth partition of unity of Littlewood-Paley type, we can write , where each is also non-negative, supported on the region , and obeys essentially the same symbol estimates as uniformly in . It then suffices to show that

uniformly in .

We now use the Gabor-type transforms from Exercise 13, except that we make dependent on . Specifically we pick a single real even with norm , then define for all . We will approximate by

Observe that

so by the triangle inequality it will suffice to establish the bound

However, it is not difficult (see exercise below) to show that is a symbol of order uniformly in , and the claim now follows from Exercise 12(ii).

**Exercise 17** Verify the claim that is a symbol of order uniformly in . (Here one will need the fact that is a rescaling by a scaling factor of , which is an even Schwartz function of mean . The even nature of is needed to cancel some linear terms which would otherwise only allow one to obtain symbol bounds of order rather than .)

**Remark 18** It is possible to improve the error term in the sharp Garding inequality, particularly if one uses the Weyl quantization rather than the Kohn-Nirenberg one (see Remark 19 below); also the non-negativity hypothesis on can be relaxed in a manner consistent with the uncertainty principle; see this deep paper of Fefferman and Phong.

**Remark 19** Throughout this set of notes we have used the Kohn-Nirenberg quantization

or equivalently (taking to be compactly supported for sake of discussion)

However, this is not the only quantization that one could use. For instance, one could also use the adjoint Kohn-Nirenberg quantization

which one can easily relate to the Kohn-Nirenberg quantization by the identity

In particular, from Exercise 11 we see that if is a symbol of order , then and only differ by pseudodifferential operators of order (and that both quantizations produce the same class of pseudodifferential operators of a given order). The operators appearing earlier can also be viewed as a quantization of (known as the *anti-Wick quantization* of associated to the test function ). But perhaps the most popular quantization used in the literature is the Weyl quantization

which in some sense “splits the difference” between the Kohn-Nirenberg and adjoint Kohn-Nirenberg quantizations, being completely symmetric between the input spatial variable and output spatial variable . (Strictly speaking, this formula is only well-defined for say compactly supported symbols ; for more general symbols one can define in the weak sense as the distribution for which

for (it is not difficult to use integration by parts to show that the expression in parentheses is rapidly decreasing in , hence absolutely integrable). In particular there is now no error term in the analogue of Exercise 11:

All of the preceding theory for the Kohn-Nirenberg quantization can be adapted to the Weyl quantization with minor changes (for instance, the definition of the Wigner transform changes slightly, and the operation defined in (24) is replaced with the Moyal product), and as seen in Exercise 20 below, the two quantizations again produce the same classes of pseudodifferential operators, with symbols agreeing up to lower order terms.

**Exercise 20 (Kohn-Nirenberg and Weyl quantizations are equivalent up to lower order)** Let be a real number.

- (i) If is a symbol of order , show that there exists a symbol of order such that . Furthermore, show that is a symbol of order .
- (ii) If is a symbol of order , show that there exists a symbol of order such that . Furthermore, show that is a symbol of order .

**Exercise 21 (Comparison of quantizations)** Let be natural numbers, and let be the monomial .

- (i) Show that .
- (ii) Show that .
- (iii) Show that , where ranges over all tuples of operators consisting of copies of and copies of . For instance, if , then

Informally, the Kohn-Nirenberg quantization always applies position operators to the left of momentum operators; the adjoint Kohn-Nirenberg quantization always applies position operators to the right of momentum operators; and the Weyl quantization averages equally over all possible orderings. (Taking formal generating functions, we also see (formally, at least) that the quantization of a plane wave for real numbers is equal to in the Kohn-Nirenberg quantization, in the adjoint Kohn-Nirenberg quantization, and in the Weyl quantization.)

**Exercise 22 (Gabor-type transforms and symmetries)** Let .

- (i) (Physical translation) If and is the function , show that for all .
- (ii) (Frequency modulation) If and is the function , show that for all .
- (iii) (Dilation) If and is the function , show that for all , where .
- (iv) (Fourier transform) If , show that .
- (v) (Quadratic phase modulation) If and is the function , show that for all , where .

We remark that the group generated by the transformations (i)-(v) is the (Weil representation of the) metaplectic group .

**Remark 23** Ignoring the changes in the Gabor test function , as well as the various phases appearing on the right-hand side, we conclude from the above exercise that basic transformations on functions seem to correspond to various area-preserving maps of phase space; for instance, the Fourier transform is associated to the rotation , which is consistent in particular with the fact that a fourfold iteration of the Fourier transform yields the identity operator. This is in fact a quite general phenomenon, with something asymptotically resembling such identities available for an important class of operators known as Fourier integral operators (but in higher dimensions one replaces the adjective with “area-preserving” with “symplectomorphism” or “canonical transformation“). However, as stated previously, the systematic development of the theory of Fourier integral operators is beyond the scope of this course.

**Remark 24** Virtually all of the above theory extends to higher dimensions, and also to general smooth manifolds as domains. In the latter case, the natural analogue of phase space is the cotangent bundle , and the symplectic geometry of this bundle then plays a fundamental role in the theory (as already hinted at by the appearance of the Poisson bracket in Remark 10. See for instance this text of Folland for more discussion.

### BiSTRO seminar

Simion Filip, Curtis McMullen, Martin Moeller and I are co-organizing an online seminar called **Bi**lliards, **S**urfaces à la **T**eichmueller and **R**iemann, **O**nline (BiSTRO).

Similarly to several other current online seminars, the idea of BiSTRO arose after several conferences, meetings, etc. in Teichmueller dynamics were cancelled due to the covid-19 crisis.

In any case, the first talk of BiSTRO seminar will be delivered tomorrow by Kasra Rafi (at 18h CEST): in a nutshell, he will speak extend the scope of a result of Furstenberg on stationary measures (previously discussed in this blog here and here) to the context of mapping class groups.

Closing this short post, let me point out that the reader wishing to attend BiSTRO seminar can find the relevant informations at the bottom of the seminar’s official webpage.

### Maximal entropy measures and Birkhoff normal forms of certain Sinai billiards

Sinai billiards are a fascinating class of dynamical systems: in fact, despite their simple definition in terms of a dynamical billiard on a table given by a square or a two torus with a certain number of dispersing obstacles, they present some “nasty” features (related to the existence of “grazing” collisions) placing them slightly beyond the standard theory of smooth uniformly hyperbolic systems.

The seminal works by several authors (including Sinai, Bunimovich, Chernov, …) paved the way to establish many properties of Sinai billiards including the so-called fast decay of correlations for the Liouville measure (a feature which is illustrated in this numerical simulation [due to Dyatlov] here).

Nevertheless, some ergodic-theoretical aspects of (certain) Sinai billiards were elucidated only very recently: for instance, the existence of an unique probability measure of maximal entropy was proved by Baladi–Demers in this article here.

An interesting remark made by Baladi–Demers is the fact that the maximal entropy measure should be typically different from the Liouville measure : in fact, forces a *rigidity* property for periodic orbits, namely, the Lyapunov exponents of *all* periodic orbits coincide (note that this is a huge number of conditions because the number of periodic orbits of period grows exponentially with ) and, in particular, there are no known examples of Sinai billiards where .

In a recent paper, De Simoi–Leguil–Vinhage–Yang showed that forces another rigidity property for periodic orbits, namely, the Birkhoff normal forms (see also Section 6 of Chapter 6 of Hasselblatt–Katok book and Appendix A of Moreira–Yoccoz article) at periodic orbits whose invariant manifolds produce homoclinic orbits are all *linear*. In particular, they proposed in Remark 5.6 of their preprint to prove that for certain Sinai billiards by computing the Taylor expansion of the derivative of the billiard map at a periodic orbit of period two and checking that its quadratic part is non-degenerate.

In this short post, we explain that the strategy from the previous paragraph can be implemented to verify the non-linearity of the Birkhoff normal form at a periodic orbit of period two of the Sinai billiards in triangular lattices considered by Baladi–Demers.

**Remark 1**

*In our subsequent discussion, we shall assume some familiarity with the basic aspects of billiards maps described in the classical book of Chernov–Markarian.*

**1. Sinai billiards in triangular lattices**

We consider the Sinai billiard table on the hexagonal torus obtained from the following picture (extracted from Baladi–Demers article):

Here, the obstacles are disks of radii centered at the points of a triangular lattice and the distance between adjacent scatterers is .

As it is discussed in Baladi–Demers paper, the billiard map has a probability measure of maximal entropy whenever (where the limit case corresponds to touching obstacles and the limit case produces “infinite horizon”).

**2. Billiard map near a periodic orbit of period two**

We want to study the billiard map near the periodic orbit of period two given by the horizontal segment between and .

A trajectory leaving the point in the direction travels along the line until it hits the boundary of the disk of radius and center for the first time . By definition, the time is the smallest solution of , i.e.,

Hence, where and , the billiard trajectory starting at in the direction hits the obstacle at

where and, after a specular reflection, it takes the direction .

As it is explained in page 35 of Chernov–Markarian’s book, this geometric description of the billiard map near the periodic orbit of period two allows to compute along the following lines. If we adopt the convention in Chernov–Markarian’s book of using arc-length parametrization on the obstacle and describing the angle with the normal direction pointing towards the interior of the table using a sign determined by the orientation of the obstacles so that this normal vector stays at the left of the tangent vectors to the obstacles, then the billiard map becomes

where

(and for ) and, moreover,

and

where

Since we also have

and

it follows that

where

and

In particular, the determinant of the quadratic part of the Taylor expansion of near the periodic orbit of period two is

Thus, for , one has

so that the quadratic part of the Taylor expansion of is non-degenerate because for .

**Remark 2**

*in the limit case .*

### Hédi Daboussi, in memoriam

I was very sad to learn today from R. de la Bretèche and É. Fouvry that Hédi Daboussi passed away yesterday. Hédi played an important part in my life; he was the first actual analytic number theorist that I met, one day at IHP in 1987, at the beginning of my first bachelor-thesis style project. (This was before internet was widely available.) Fouvry and him advised me on this project, which was devoted to the large sieve, especially the proof of Selberg based on the Beurling functions. They also introduced me to Henryk Iwaniec, who was visiting Orsay at the time (in fact, the meeting at IHP was organized to coincide with a talk of Iwaniec).

Daboussi is probably best known outside the French analytic number theory community for two things: his elegant elementary proof of the Prime Number Theorem, found in 1983, which does not use Selberg’s identity, and which is explained in the nice book of Mendès-France and Tenenbaum, and the “Rencontres de théorie élémentaire et analytique des nombres”, which he organized for a long time as a weekly seminar in Paris, before they were transformed, after his retirement, into (roughly) monthly meetings, which are still known as the “Journées Daboussi”, and are organized by Régis de la Bretèche. The first of these were two days of a meeting in 2006 in honor of Hédi.

For me, the original Monday seminar organized by Daboussi was especially memorable, both because I gave my first “real” mathematics lecture there (I think that it was about my bachelor project), and also because on another occasion (either the same time or close to that), I first met Philippe Michel in Hédi’s seminar. It is very obvious to me that, without him, my life would have been very different, and I will always remember him because of that.