# Matematički blogovi

### Climbing the cosmic distance ladder (book announcement)

Several years ago, I developed a public lecture on the cosmic distance ladder in astronomy from a historical perspective (and emphasising the role of mathematics in building the ladder). I previously blogged about the lecture here; the most recent version of the slides can be found here. Recently, I have begun working with Tanya Klowden (a long time friend with a background in popular writing on a variety of topics, including astronomy) to expand the lecture into a popular science book, with the tentative format being non-technical chapters interspersed with some more mathematical sections to give some technical details. We are still in the middle of the writing process, but we have produced a sample chapter (which deals with what we call the “fourth rung” of the distance ladder – the distances and orbits of the planets – and how the work of Copernicus, Brahe, Kepler and others led to accurate measurements of these orbits, as well as Kepler’s famous laws of planetary motion). As always, any feedback on the chapter is welcome. (Due to various pandemic-related uncertainties, we do not have a definite target deadline for when the book will be completed, but presumably this will occur sometime in the next year.)

The book is currently under contract with Yale University Press. My coauthor Tanya Klowden can be reached at tklowden@gmail.com.

### The structure of translational tilings in Z^d

Rachel Greenfeld and I have just uploaded to the arXiv our paper “The structure of translational tilings in “. This paper studies the tilings of a finite tile in a standard lattice , that is to say sets (which we call *tiling sets*) such that every element of lies in exactly one of the translates of . We also consider more general *tilings of level * for a natural number (several of our results consider an even more general setting in which is periodic but allowed to be non-constant).

In many cases the tiling set will be periodic (by which we mean translation invariant with respect to some lattice (a finite index subgroup) of ). For instance one simple example of a tiling is when is the unit square and is the lattice . However one can modify some tilings to make them less periodic. For instance, keeping one also has the tiling set

where is an arbitrary function. This tiling set is periodic in a single direction , but is not doubly periodic. For the slightly modified tile , the set for arbitrary can be verified to be a tiling set, which in general will not exhibit any periodicity whatsoever; however, it is*weakly periodic*in the sense that it is the disjoint union of finitely many sets, each of which is periodic in one direction.

The most well known conjecture in this area is the Periodic Tiling Conjecture:

**Conjecture 1 (Periodic tiling conjecture)**If a finite tile has at least one tiling set, then it has a tiling set which is periodic.

This conjecture was stated explicitly by Lagarias and Wang, and also appears implicitly in this text of Grunbaum and Shepard. In one dimension there is a simple pigeonhole principle argument of Newman that shows that all tiling sets are in fact periodic, which certainly implies the periodic tiling conjecture in this case. The case was settled more recently by Bhattacharya, but the higher dimensional cases remain open in general.

We are able to obtain a new proof of Bhattacharya’s result that also gives some quantitative bounds on the periodic tiling set, which are polynomial in the diameter of the set if the cardinality of the tile is bounded:

**Theorem 2 (Quantitative periodic tiling in )**If a finite tile has at least one tiling set, then it has a tiling set which is -periodic for some .

Among other things, this shows that the problem of deciding whether a given subset of of bounded cardinality tiles or not is in the NP complexity class with respect to the diameter . (Even the decidability of this problem was not known until the result of Bhattacharya.)

We also have a closely related structural theorem:

**Theorem 3 (Quantitative weakly periodic tiling in )**Every tiling set of a finite tile is weakly periodic. In fact, the tiling set is the union of at most disjoint sets, each of which is periodic in a direction of magnitude .

We also have a new bound for the periodicity of tilings in :

**Theorem 4 (Universal period for tilings in )**Let be finite, and normalized so that . Then every tiling set of is -periodic, where is the least common multiple of all primes up to , and is the least common multiple of the magnitudes of all .

We remark that the current best complexity bound of determining whether a subset of tiles or not is , due to Biro. It may be that the results in this paper can improve upon this bound, at least for tiles of bounded cardinality.

On the other hand, we discovered a genuine difference between level one tiling and higher level tiling, by locating a counterexample to the higher level analogue of (the qualitative version of) Theorem 3:

**Theorem 5 (Counterexample)**There exists an eight-element subset and a level tiling such that is not weakly periodic.

We do not know if there is a corresponding counterexample to the higher level periodic tiling conjecture (that if tiles at level , then there is a periodic tiling at the same level ). Note that it is important to keep the level fixed, since one trivially always has a periodic tiling at level from the identity .

The methods of Bhattacharya used the language of ergodic theory. Our investigations also originally used ergodic-theoretic and Fourier-analytic techniques, but we ultimately found combinatorial methods to be more effective in this problem (and in particular led to quite strong quantitative bounds). The engine powering all of our results is the following remarkable fact, valid in all dimensions:

**Lemma 6 (Dilation lemma)**Suppose that is a tiling of a finite tile . Then is also a tiling of the dilated tile for any coprime to , where is the least common multiple of all the primes up to .

Versions of this dilation lemma have previously appeared in work of Tijdeman and of Bhattacharya. We sketch a proof here. By the fundamental theorem of arithmetic and iteration it suffices to establish the case where is a prime . We need to show that . It suffices to show the claim , since both sides take values in . The convolution algebra (or group algebra) of finitely supported functions from to is a commutative algebra of characteristic , so we have the Frobenius identity for any . As a consequence we see that . The claim now follows by convolving the identity by further copies of .

In our paper we actually establish a more general version of the dilation lemma that can handle tilings of higher level or of a periodic set, and this stronger version is useful to get the best quantitative results, but for simplicity we focus attention just on the above simple special case of the dilation lemma.

By averaging over all in an arithmetic progression, one already gets a useful structural theorem for tilings in any dimension, which appears to be new despite being an easy consequence of Lemma 6:

**Corollary 7 (Structure theorem for tilings)**Suppose that is a tiling of a finite tile , where we normalize . Then we have a decomposition where each is a function that is periodic in the direction , where is the least common multiple of all the primes up to .

*Proof:* From Lemma 6 we have for any , where is the Kronecker delta at . Now average over (extracting a weak limit or generalised limit as necessary) to obtain the conclusion.

The identity (1) turns out to impose a lot of constraints on the functions , particularly in one and two dimensions. On one hand, one can work modulo to eliminate the and terms to obtain the equation

which in two dimensions in particular puts a lot of structure on each individual (roughly speaking it makes the behave in a polynomial fashion, after collecting commensurable terms). On the other hand we have the inequality which can be used to exclude “equidistributed” polynomial behavior after a certain amount of combinatorial analysis. Only a small amount of further argument is then needed to conclude Theorem 3 and Theorem 2.For level tilings the analogue of (2) becomes

which is a significantly weaker inequality and now no longer seems to prohibit “equidistributed” behavior. After some trial and error we were able to come up with a completely explicit example of a tiling that actually utilises equidistributed polynomials; indeed the tiling set we ended up with was a finite boolean combination of Bohr sets.We are currently studying what this machinery can tell us about tilings in higher dimensions, focusing initially on the three-dimensional case.

### Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration

Asgar Jamneshan and I have just uploaded to the arXiv our paper “Foundational aspects of uncountable measure theory: Gelfand duality, Riesz representation, canonical models, and canonical disintegration“. This paper arose from our longer-term project to systematically develop “uncountable” ergodic theory – ergodic theory in which the groups acting are not required to be countable, the probability spaces one acts on are not required to be standard Borel, or Polish, and the compact groups that arise in the structural theory (e.g., the theory of group extensions) are not required to be separable. One of the motivations of doing this is to allow ergodic theory results to be applied to ultraproducts of finite dynamical systems, which can then hopefully be transferred to establish combinatorial results with good uniformity properties. An instance of this is the uncountable Mackey-Zimmer theorem, discussed in this companion blog post.

In the course of this project, we ran into the obstacle that many foundational results, such as the Riesz representation theorem, often require one or more of these countability hypotheses when encountered in textbooks. Other technical issues also arise in the uncountable setting, such as the need to distinguish the Borel -algebra from the (two different types of) Baire -algebra. As such we needed to spend some time reviewing and synthesizing the known literature on some foundational results of “uncountable” measure theory, which led to this paper. As such, most of the results of this paper are already in the literature, either explicitly or implicitly, in one form or another (with perhaps the exception of the canonical disintegration, which we discuss below); we view the main contribution of this paper as presenting the results in a coherent and unified fashion. In particular we found that the language of category theory was invaluable in clarifying and organizing all the different results. In subsequent work we (and some other authors) will use the results in this paper for various applications in uncountable ergodic theory.

The foundational results covered in this paper can be divided into a number of subtopics (Gelfand duality, Baire -algebras and Riesz representation, canonical models, and canonical disintegration), which we discuss further below the fold.

** — 1. Gelfand duality — **

Given a compact Hausdorff space , one can form the space of continuous functions from to ; such functions are necessarily bounded and compactly supported, and they form a commutative unital -algebra. Conversely, given a commutative unital -algebra , one can produce the *Gelfand spectrum* , which we define here as the collection of unital -homomorphisms from to . This spectrum can be viewed as a subset of and inherits a topology from the product topology on that latter space that turns the spectrum into a compact Hausdorff space. The classical Gelfand duality between compact Hausdorff spaces and unital commutative -algebras asserts that these two operations and “essentially invert” each other: is homeomorphic to , and is isomorphic as a commutative unital -algebra to . In fact there is a more precise statement: the operations , are in fact contravariant functors that form a duality of categories between the category of compact Hausdorff spaces and the category of commutative unital -algebras. This duality of categories asserts, roughly speaking, that the operations , and the Gelfand duality isomorphisms interact in the “natural” fashion with respect to morphisms: for instance, any continuous map between compact Hausdorff spaces induces a commutative unital -homomorphism defined by the pullback map , and similarly any commutative unital -homomorphism induces a continuous map defined by , and the homeomorphisms between and and between and commute with the continuous maps and in the “natural” fashion. These sorts of properties are routine to verify and mostly consist of expanding out all the definitions.

It is natural to ask what the analogous Gelfand duality is for *locally* compact Hausdorff spaces. Somewhat confusingly, there appeared to be two such Gelfand dualities in the literature, which appeared at first glance to be incompatible with each other. Eventually we understood that there were *two* natural categories of locally compact Hausdorff spaces, and dually there were *two* natural categories of (non-unital) commutative algebras dual to them:

- The category of locally compact Hausdorff spaces, whose morphisms consist of arbitrary continuous maps between those spaces.
- The subcategory of locally compact Hausdorff spaces, whose morphisms consist of continuous proper maps between those spaces.
- The category of commutative algebras (not necessarily unital), whose morphisms were homomorphisms which were non-degenerate in the sense that is dense in (this was automatic in the unital case, but now needs to be imposed as an additional hypothesis).
- The larger category of commutative algebras (not necessarily unital), whose morphisms were non-degenerate homomorphisms into the multiplier algebra of . (It is not immediately obvious that one can compose two such morphisms together, but it can be shown that every such homomorphism has a unique extension to a homomorphism from to , which can then be used to create a composition law.)

The map that takes a locally compact space to the space of continuous functions that vanish at infinity, together with the previously mentioned Gelfand spectrum map , then forms a duality of categories between and , and between and . Furthermore, these dualities of categories interact well with the two standard compactifications of locally compact spaces: the Stone-Cech compactification , and the Alexandroff compactification (also known as the one point compactification). From a category theoretic perspective, it is most natural to interpret as a functor from to , and to interpret as a functor from to the category of pointed compact Hausdorff spaces (the notation here is a special case of comma category notation). (Note in particular that a continuous map between locally compact Hausdorff spaces needs to be proper in order to guarantee a continuous extension to the Alexandroff compactification, whereas no such properness condition is needed to obtain a continuous extension to the Stone-Cech compactification.) In our paper, we summarized relationships between these functors (and some other related functors) in the following diagram, which commutes up to natural isomorphisms:

Thus for instance the space of bounded continuous functions on a locally compact Hausdorff space is naturally isomorphic to the multiplier algebra of , and the Stone-Cech compactification is naturally identified with the Gelfand spectrum of :

The coloring conventions in this paper are that (a) categories of “algebras” are displayed in red (and tend to be dual to categories of “spaces”, displayed in black; and (b) functors displayed in blue will be considered “casting functors” (analogous to type casting operators in various computing languages), and are used as the default way to convert an object or morphism in one category to an object or morphism in another. For instance, if is a compact Hausdorff space, the associated unital commutative -algebra is defined to be by the casting convention.Almost every component of the above diagram was already stated somewhere in the literature; our main contribution here is the synthesis of all of these components together into the single unified diagram.

** — 2. Baire -algebras and Riesz duality — **

Now we add measure theory to Gelfand duality, by introducing -algebras and probability measures on the (locally) compact Hausdorff side of the duality, and by introducing traces on the -algebra side. Here we run into the issue that there are *three* natural choices of -algebra one can assign to a topological space :

- The Borel -algebra, generated by the open (or closed) subsets of .
- The -Baire -algebra, generated by the bounded continuous functions on (equivalently, one can use arbitrary continuous functions on ; these are also precisely the sets whose indicator functions are Baire functions with respect to a countable ordinal).
- The -Baire -algebra, generated by the compactly supported continuous functions on (equivalently, one can use continuous functions on going to zero at infinity; these are also precisely the sets generated by compact sets).

For compact Hausdorff spaces, the two types of Baire -algebras agree, but they can diverge for locally compact Hausdorff spaces. Similarly, the Borel and Baire algebras agree for compact metric spaces, but can diverge for more “uncountable” compact Hausdorff spaces. This is most dramatically exemplified by the Nedoma pathology, in which the Borel -algebra of the Cartesian square of a locally compact Hausdorff space need not be equal to the product of the individual Borel -algebra, in contrast to the Baire -algebra which reacts well with the product space construction (even when there are uncountably many factors). In particular, the group operations on a locally compact Hausdorff group can fail to be Borel measurable, even though they are always Baire measurable. For these reasons we found it desirable to adopt a “Baire-centric” point of view, in which one prefers to use the Baire -algebras over the Borel -algebras in compact Hausdorff or locally compact Hausdorff settings. (However, in Polish spaces it seems Borel -algebras remain the more natural -algebra to use.) It turns out that the two Baire -algebras can be divided up naturally between the two categories of locally compact Hausdorff spaces, with the -Baire -algebra naturally associated to and the -algebra naturally associated to . The situation can be summarized by the following commuting diagram of functors between the various categories of (locally) compact Hausdorff spaces and the category of concrete measurable spaces (sets equipped with a -algebra):

To each category of (locally) compact Hausdorff spaces, we can then define an associated category of (locally) compact Hausdorff spaces equipped with a Radon probability measure, where the underlying -algebra is as described above, and “Radon” means “compact -inner regular”. For instance, is the category of compact Hausdorff spaces equipped with a Baire probability measure with the inner regularity property

and one similarly defines , , . (In the compact Hausdorff case the inner regularity property is in fact automatic, but in locally compact Hausdorff categories it needs to be imposed as an explicit hypothesis.) On the -algebra side, categories such as can be augmented to tracial categories where the algebra is now equipped with a trace , that is to say a non-negative linear functional of operator norm . It then turns out that the Gelfand spectrum functors from -algebras to (locally) compact Hausdorff spaces can be augmented to “Riesz functors” from tracial -algebras to (locally) compact Hausdorff spaces with Radon probability measures, where the probability measure in question is given by some form of the Riesz representation theorem; dually, the functors can similarly be augmented using the Lebesgue integral with respect to the given measure as the trace. This leads one to a complete analogue of the previous diagram of functors, but now in tracial and probabilistic categories, giving a new set of Gelfand-like dualities that we call “Riesz duality”:

Again, each component of this diagram was essentially already in the literature in either explicit or implicit form. In the paper we also review the Riesz representation theory for traces on rather than , in which an additional “-smoothness” property is needed in order to recover a Radon probability measure on . The distinction between Baire and Borel -algebras ends up being largely elided in the Riesz representation theory, as it turns out that every Baire-Radon probability measure (using either of the two Baire algebras) on a locally compact Hausdorff has a unique extension to a Borel-Radon probability measure (where for Borel measures, the Radon property is now interpreted as “compact inner regular” rather than “compact inner regular”).

** — 3. The canonical model of opposite probability algebras — **

Given a concrete probability space – a set equipped with a -algebra and a probability measure – one can form the associated *probability algebra* by quotienting the -algebra by the null ideal and then similarly quotienting the measure to obtain an abstract countably additive probability measure . More generally, one can consider abstract probability algebras where is a -complete Boolean algebra and is a countably additive probability measure with the property that whenever . This gives a category . Actually to align things closer to the category of concrete probability spaces , it is better to work with the opposite category of opposite probability algebras where the directions of the morphisms are reversed.

Many problems and concepts in ergodic theory are best phrased “up to null sets”, which can be interpreted category-theoretically by applying a suitable functor from a concrete category such as to the opposite probability algebra category to remove all the null sets. However, it is sometimes convenient to reverse this process and *model* an opposite probability algebra by a more concrete probability space, and to also model morphisms between opposite probability algebras (-complete Boolean homomorphisms from to that preserve measure) by concrete maps. Ideally the model spaces should also be compact Hausdorff spaces, and the model morphisms continuous maps, so that methods from topological dynamics may be applied. There are various *ad hoc* ways to create such models in “countable” settings, but it turns out that there is a canonical and “universal” model of any opposite probability algebra that one can construct in a completely functorial setting (so that any dynamics on the opposite probability algebra automatically carry over to concrete dynamics on ). The quickest way to define this model is by the formula

- (Topological structure) is a compact Hausdorff space. In fact it is a Stone space with the additional property that every Baire-measurable set is equal to a clopen set modulo a Baire-meager set (such spaces we call -spaces in our paper). Furthermore a Baire set is null if and only if it is meager. Any opposite probability algebra morphism gives rise (in a functorial fashion) to a surjective continuous map with the additional property that the inverse image of Baire-meager sets are Baire-meager. Furthermore, one has the (somewhat surprising) “strong Lusin property”: every element of has a unique representative in (thus, every bounded measurable function on is equal almost everywhere to a unique continuous function).
- (Concrete model) The opposite probability algebra of is (naturally) isomorphic to .
- (Universality) There is a natural inclusion (in the category of abstract probability spaces) from (which is interpreted in ) into , which is universal amongst all inclusions of into compact Hausdorff probability spaces.
- (Canonical extension) Every abstractly measurable map from to a compact Hausdorff space has a unique extension to a continuous map from to .

This canonical model seems quite analogous to the Stone-Cech compactification . For instance the analogue of the canonical extension property for is that every continuous map from a locally compact space to a compact Hausdorff space has a unique extension to a continuous map from to . In both cases the model produced is usually too “large” to be separable, so this is a tool that is only available in “uncountable” frameworks.

There is an alternate description of the canonical model, which is basically due to Fremlin, and is based on Stone-type dualities rather than Riesz type dualities. Namely, one starts with the -complete Boolean algebra attached to the opposite probability space , and constructs its Stone dual, which is a space. Every Baire set in this space is equal modulo Baire-meager sets to a clopen set, which by Stone duality is identifiable with an element of the probability algebra. The measure on then induces a measure on the -space, and this compact Hausdorff probability space can serve as the canonical model . The functoriality of this construction is closely tied to the functoriality of the Loomis-Sikorski theorem (discussed in this previous blog post). The precise connection in terms of functors and categories is a little complicated to describe, though, as the following diagram indicates:

One quick use of the canonical model is that it allows one to perform a variety of constructions on opposite probability algebras by passing to the canonical model and performing the analogous operation there. For instance, if one wants to construct a product of some number of opposite probability algebras, one can first take the product of the concrete models (as a compact Hausdorff probability space), then extract the opposite probability algebra of that space. We will similarly use this model to construct group skew-products and homogeneous skew-products in a later paper.

** — 4. Canonical disintegration — **

Given a probability space and a probability-preserving factor map from to another probability space , it is often convenient to look for a *disintegration* of into fibre measures with the property that the conditional expectation of a function (defined as the orthogonal projection in to , viewed as a subspace in ) is given pointwise almost everywhere by

Among other things, the canonical disintegration makes it easy to construct relative products of opposite probability algebras, and we believe it will also be of use in developing uncountable versions of Host-Kra structure theory.

### An uncountable Mackey-Zimmer theorem

Asgar Jamneshan and I have just uploaded to the arXiv our paper “An uncountable Mackey-Zimmer theorem“. This paper is part of our longer term project to develop “uncountable” versions of various theorems in ergodic theory; see this previous paper of Asgar and myself for the first paper in this series (and another paper will appear shortly).

In this case the theorem in question is the Mackey-Zimmer theorem, previously discussed in this blog post. This theorem gives an important classification of group and homogeneous extensions of measure-preserving systems. Let us first work in the (classical) setting of concrete measure-preserving systems. Let be a measure-preserving system for some group , thus is a (concrete) probability space and is a group homomorphism from to the automorphism group of the probability space. (Here we are abusing notation by using to refer both to the measure-preserving system and to the underlying set. In the notation of the paper we would instead distinguish these two objects as and respectively, reflecting two of the (many) categories one might wish to view as a member of, but for sake of this informal overview we will not maintain such precise distinctions.) If is a compact group, we define a *(concrete) cocycle* to be a collection of measurable functions for that obey the *cocycle equation*

*group skew-product*of , which is another measure-preserving system where

- is the Cartesian product of and ;
- is the product measure of and Haar probability measure on ; and
- The action is given by the formula

*homogeneous skew-product*in which the group is replaced by the homogeneous space for some closed subgroup of , noting that still comes with a left-action of and a Haar measure. Group skew-products are very “explicit” ways to extend a system , as everything is described by the cocycle which is a relatively tractable object to manipulate. (This is not to say that the cohomology of measure-preserving systems is trivial, but at least there are many tools one can use to study them, such as the Moore-Schmidt theorem discussed in this previous post.)

This group skew-product comes with a factor map and a coordinate map , which by (2) are related to the action via the identities

and where in (4) we are implicitly working in the group of (concretely) measurable functions from to . Furthermore, the combined map is measure-preserving (using the product measure on ), indeed the way we have constructed things this map is just the identity map.
We can now generalize the notion of group skew-product by just working with the maps , and weakening the requirement that be measure-preserving. Namely, define a *group extension* of by to be a measure-preserving system equipped with a measure-preserving map obeying (3) and a measurable map obeying (4) for some cocycle , such that the -algebra of is generated by . There is also a more general notion of a *homogeneous extension* in which takes values in rather than . Then every group skew-product is a group extension of by , but not conversely. Here are some key counterexamples:

- (i) If is a closed subgroup of , and is a cocycle taking values in , then can be viewed as a group extension of by , taking to be the vertical coordinate (viewing now as an element of ). This will not be a skew-product by because pushes forward to the wrong measure on : it pushes forward to rather than .
- (ii) If one takes the same example as (i), but twists the vertical coordinate to another vertical coordinate for some measurable “gauge function” , then is still a group extension by , but now with the cocycle replaced by the cohomologous cocycle Again, this will not be a skew product by , because pushes forward to a twisted version of that is supported (at least in the case where is compact and the cocycle is continuous) on the -bundle .
- (iii) With the situation as in (i), take to be the union for some outside of , where we continue to use the action (2) and the standard vertical coordinate but now use the measure .

As it turns out, group extensions and homogeneous extensions arise naturally in the Furstenberg-Zimmer structural theory of measure-preserving systems; roughly speaking, every compact extension of is an inverse limit of group extensions. It is then of interest to classify such extensions.

Examples such as (iii) are annoying, but they can be excluded by imposing the additional condition that the system is ergodic – all invariant (or essentially invariant) sets are of measure zero or measure one. (An essentially invariant set is a measurable subset of such that is equal modulo null sets to for all .) For instance, the system in (iii) is non-ergodic because the set (or ) is invariant but has measure . We then have the following fundamental result of Mackey and Zimmer:

**Theorem 1 (Countable Mackey Zimmer theorem)**Let be a group, be a concrete measure-preserving system, and be a compact Hausdorff group. Assume that is at most countable, is a standard Borel space, and is metrizable. Then every (concrete) ergodic group extension of is abstractly isomorphic to a group skew-product (by some closed subgroup of ), and every (concrete) ergodic homogeneous extension of is similarly abstractly isomorphic to a homogeneous skew-product.

We will not define precisely what “abstractly isomorphic” means here, but it roughly speaking means “isomorphic after quotienting out the null sets”. A proof of this theorem can be found for instance in .

The main result of this paper is to remove the “countability” hypotheses from the above theorem, at the cost of working with opposite probability algebra systems rather than concrete systems. (We will discuss opposite probability algebras in a subsequent blog post relating to another paper in this series.)

**Theorem 2 (Uncountable Mackey Zimmer theorem)**Let be a group, be an opposite probability algebra measure-preserving system, and be a compact Hausdorff group. Then every (abstract) ergodic group extension of is abstractly isomorphic to a group skew-product (by some closed subgroup of ), and every (abstract) ergodic homogeneous extension of is similarly abstractly isomorphic to a homogeneous skew-product.

We plan to use this result in future work to obtain uncountable versions of the Furstenberg-Zimmer and Host-Kra structure theorems.

As one might expect, one locates a proof of Theorem 2 by finding a proof of Theorem 1 that does not rely too strongly on “countable” tools, such as disintegration or measurable selection, so that all of those tools can be replaced by “uncountable” counterparts. The proof we use is based on the one given in this previous post, and begins by comparing the system with the group extension . As the examples (i), (ii) show, these two systems need not be isomorphic even in the ergodic case, due to the different probability measures employed. However one can relate the two after performing an additional averaging in . More precisely, there is a canonical factor map given by the formula

This is a factor map not only of -systems, but actually of -systems, where the opposite group to acts (on the left) by right-multiplication of the second coordinate (this reversal of order is why we need to use the opposite group here). The key point is that the ergodicity properties of the system are closely tied the group that is “secretly” controlling the group extension. Indeed, in example (i), the invariant functions on take the form for some measurable , while in example (ii), the invariant functions on take the form . In either case, the invariant factor is isomorphic to , and can be viewed as a factor of the invariant factor of , which is isomorphic to . Pursuing this reasoning (using an abstract ergodic theorem of Alaoglu and Birkhoff, as discussed in the previous post) one obtains the*Mackey range*, and also obtains the quotient of to in this process. The main remaining task is to lift the quotient back up to a map that stays measurable, in order to “untwist” a system that looks like (ii) to make it into one that looks like (i). In countable settings this is where a “measurable selection theorem” would ordinarily be invoked, but in the uncountable setting such theorems are not available for concrete maps. However it turns out that they still remain available for abstract maps: any abstractly measurable map from to has an abstractly measurable lift from to . To prove this we first use a canonical model for opposite probability algebras (which we will discuss in a companion post to this one, to appear shortly) to work with continuous maps (on a Stone space) rather than abstractly measurable maps. The measurable map then induces a probability measure on , formed by pushing forward by the graphing map . This measure in turn has several lifts up to a probability measure on ; for instance, one can construct such a measure via the Riesz representation theorem by demanding for all continuous functions . This measure does not come from a graph of any single lift , but is in some sense an “average” of the entire ensemble of these lifts. But it turns out one can invoke the Krein-Milman theorem to pass to an extremal lifting measure which

*does*come from an (abstract) lift , and this can be used as a substitute for a measurable selection theorem. A variant of this Krein-Milman argument can also be used to express any homogeneous extension as a quotient of a group extension, giving the second part of the Mackey-Zimmer theorem.

### Exploring the toolkit of Jean Bourgain

I have uploaded to the arXiv my paper “Exploring the toolkit of Jean Bourgain“. This is one of a collection of papers to be published in the Bulletin of the American Mathematical Society describing aspects of the work of Jean Bourgain; other contributors to this collection include Keith Ball, Ciprian Demeter, and Carlos Kenig. Because the other contributors will be covering specific areas of Jean’s work in some detail, I decided to take a non-overlapping tack, and focus instead on some basic tools of Jean that he frequently used across many of the fields he contributed to. Jean had a surprising number of these “basic tools” that he wielded with great dexterity, and in this paper I focus on just a few of them:

- Reducing qualitative analysis results (e.g., convergence theorems or dimension bounds) to quantitative analysis estimates (e.g., variational inequalities or maximal function estimates).
- Using dyadic pigeonholing to locate good scales to work in or to apply truncations.
- Using random translations to amplify small sets (low density) into large sets (positive density).
- Combining large deviation inequalities with metric entropy bounds to control suprema of various random processes.

Each of these techniques is individually not too difficult to explain, and were certainly employed on occasion by various mathematicians prior to Bourgain’s work; but Jean had internalized them to the point where he would instinctively use them as soon as they became relevant to a given problem at hand. I illustrate this at the end of the paper with an exposition of one particular result of Jean, on the Erdős similarity problem, in which his main result (that any sum of three infinite sets of reals has the property that there exists a positive measure set that does not contain any homothetic copy of ) is basically proven by a sequential application of these tools (except for dyadic pigeonholing, which turns out not to be needed here).

I had initially intended to also cover some other basic tools in Jean’s toolkit, such as the uncertainty principle and the use of probabilistic decoupling, but was having trouble keeping the paper coherent with such a broad focus (certainly I could not identify a single paper of Jean’s that employed all of these tools at once). I hope though that the examples given in the paper gives some reasonable impression of Jean’s research style.

### Course announcement: Math 246A, complex analysis

Starting on Oct 2, I will be teaching Math 246A, the first course in the three-quarter graduate complex analysis sequence at the math department here at UCLA. This first course covers much of the same ground as an honours undergraduate complex analysis course, in particular focusing on the basic properties of holomorphic functions such as the Cauchy and residue theorems, the classification of singularities, and the maximum principle, but there will be more of an emphasis on rigour, generalisation and abstraction, and connections with other parts of mathematics. The main text I will be using for this course is Stein-Shakarchi (with Ahlfors as a secondary text), but I will also be using the blog lecture notes I wrote the last time I taught this course in 2016. At this time I do not expect to significantly deviate from my past lecture notes, though I do not know at present how different the pace will be this quarter when the course is taught remotely. As with my 247B course last spring, the lectures will be open to the public, though other coursework components will be restricted to enrolled students.

### Vaughan Jones

Vaughan Jones, who made fundamental contributions in operator algebras and knot theory (in particular developing a surprising connection between the two), died this week, aged 67.

Vaughan and I grew up in extremely culturally similar countries, worked in adjacent areas of mathematics, shared (as of this week) a coauthor in Dima Shylakhtenko, started out our career with the same postdoc position (as UCLA Hedrick Assistant Professors, sixteen years apart) and even ended up in sister campuses of the University of California, but surprisingly we only interacted occasionally, via chance meetings at conferences or emails on some committee business. I found him extremely easy to get along with when we did meet, though, perhaps because of our similar cultural upbringing.

I have not had much occasion to directly use much of Vaughan’s mathematical contributions, but I did very much enjoy reading his influential 1999 preprint on planar algebras (which, for some odd reason has never been formally published). Traditional algebra notation is one-dimensional in nature, with algebraic expressions being described by strings of mathematical symbols; a linear operator , for instance, might appear in the middle of such a string, taking in an input on the right and returning an output on its left that might then be fed into some other operation. There are a few mathematical notations which are two-dimensional, such as the commutative diagrams in homological algebra, the tree expansions of solutions to nonlinear PDE (particularly stochastic nonlinear PDE), or the Feynman diagrams and Penrose graphical notations from physics, but these are the exception rather than the rule, and the notation is often still concentrated on a one-dimensional complex of vertices and edges (or arrows) in the plane. Planar algebras, by contrast, fully exploit the topological nature of the plane; a planar “operator” (or “operad”) inhabits some punctured region of the plane, such as an annulus, with “inputs” entering from the inner boundaries of the region and “outputs” emerging from the outer boundary. These algebras arose for Vaughan in both operator theory and knot theory, and have since been used in some other areas of mathematics such as representation theory and homology. I myself have not found a direct use for this type of algebra in my own work, but nevertheless I found the mere possibility of higher dimensional notation being the natural choice for a given mathematical problem to be conceptually liberating.

### Zarankiewicz’s problem for semilinear hypergraphs

Abdul Basit, Artem Chernikov, Sergei Starchenko, Chiu-Minh Tran and I have uploaded to the arXiv our paper Zarankiewicz’s problem for semilinear hypergraphs. This paper is in the spirit of a number of results in extremal graph theory in which the bounds for various graph-theoretic problems or results can be greatly improved if one makes some additional hypotheses regarding the structure of the graph, for instance by requiring that the graph be “definable” with respect to some theory with good model-theoretic properties.

A basic motivating example is the question of counting the number of incidences between points and lines (or between points and other geometric objects). Suppose one has points and lines in a space. How many incidences can there be between these points and lines? The utterly trivial bound is , but by using the basic fact that two points determine a line (or two lines intersect in at most one point), a simple application of Cauchy-Schwarz improves this bound to . In graph theoretic terms, the point is that the bipartite incidence graph between points and lines does not contain a copy of (there does not exist two points and two lines that are all incident to each other). Without any other further hypotheses, this bound is basically sharp: consider for instance the collection of points and lines in a finite plane , that has incidences (one can make the situation more symmetric by working with a projective plane rather than an affine plane). If however one considers lines in the real plane , the famous Szemerédi-Trotter theorem improves the incidence bound further from to . Thus the incidence graph between real points and lines contains more structure than merely the absence of .

More generally, bounding on the size of bipartite graphs (or multipartite hypergraphs) not containing a copy of some complete bipartite subgraph (or in the hypergraph case) is known as *Zarankiewicz’s problem*. We have results for all and all orders of hypergraph, but for sake of this post I will focus on the bipartite case.

In our paper we improve the bound to a near-linear bound in the case that the incidence graph is “semilinear”. A model case occurs when one considers incidences between points and axis-parallel rectangles in the plane. Now the condition is not automatic (it is of course possible for two distinct points to both lie in two distinct rectangles), so we impose this condition by *fiat*:

**Theorem 1** Suppose one has points and axis-parallel rectangles in the plane, whose incidence graph contains no ‘s, for some large .

- (i) The total number of incidences is .
- (ii) If all the rectangles are dyadic, the bound can be improved to .
- (iii) The bound in (ii) is best possible (up to the choice of implied constant).

We don’t know whether the bound in (i) is similarly tight for non-dyadic boxes; the usual tricks for reducing the non-dyadic case to the dyadic case strangely fail to apply here. One can generalise to higher dimensions, replacing rectangles by polytopes with faces in some fixed finite set of orientations, at the cost of adding several more logarithmic factors; also, one can replace the reals by other ordered division rings, and replace polytopes by other sets of bounded “semilinear descriptive complexity”, e.g., unions of boundedly many polytopes, or which are cut out by boundedly many functions that enjoy coordinatewise monotonicity properties. For certain specific graphs we can remove the logarithmic factors entirely. We refer to the preprint for precise details.

The proof techniques are combinatorial. The proof of (i) relies primarily on the order structure of to implement a “divide and conquer” strategy in which one can efficiently control incidences between points and rectangles by incidences between approximately points and boxes. For (ii) there is additional order-theoretic structure one can work with: first there is an easy pruning device to reduce to the case when no rectangle is completely contained inside another, and then one can impose the “tile partial order” in which one dyadic rectangle is less than another if and . The point is that this order is “locally linear” in the sense that for any two dyadic rectangles , the set is linearly ordered, and this can be exploited by elementary double counting arguments to obtain a bound which eventually becomes after optimising certain parameters in the argument. The proof also suggests how to construct the counterexample in (iii), which is achieved by an elementary iterative construction.

### Fractional free convolution powers

Dimitri Shlyakhtenko and I have uploaded to the arXiv our paper Fractional free convolution powers. For me, this project (which we started during the 2018 IPAM program on quantitative linear algebra) was motivated by a desire to understand the behavior of the *minor process* applied to a large random Hermitian matrix , in which one takes the successive upper left minors of and computes their eigenvalues in non-decreasing order. These eigenvalues are related to each other by the Cauchy interlacing inequalities

*Gelfand-Tsetlin pattern*, as discussed in these previous blog posts.

When is large and the matrix is a random matrix with empirical spectral distribution converging to some compactly supported probability measure on the real line, then under suitable hypotheses (e.g., unitary conjugation invariance of the random matrix ensemble ), a “concentration of measure” effect occurs, with the spectral distribution of the minors for for any fixed converging to a specific measure that depends only on and . The reason for this notation is that there is a surprising description of this measure when is a natural number, namely it is the free convolution of copies of , pushed forward by the dilation map . For instance, if is the Wigner semicircular measure , then . At the random matrix level, this reflects the fact that the minor of a GUE matrix is again a GUE matrix (up to a renormalizing constant).

As first observed by Bercovici and Voiculescu and developed further by Nica and Speicher, among other authors, the notion of a free convolution power of can be extended to non-integer , thus giving the notion of a “fractional free convolution power”. This notion can be defined in several different ways. One of them proceeds via the Cauchy transform

of the measure , and can be defined by solving the Burgers-type equation with initial condition (see this previous blog post for a derivation). This equation can be solved explicitly using the*-transform*of , defined by solving the equation for sufficiently large , in which case one can show that (In the case of the semicircular measure , the -transform is simply the identity: .)

Nica and Speicher also gave a free probability interpretation of the fractional free convolution power: if is a noncommutative random variable in a noncommutative probability space with distribution , and is a real projection operator free of with trace , then the “minor” of (viewed as an element of a new noncommutative probability space whose elements are minors , with trace ) has the law of (we give a self-contained proof of this in an appendix to our paper). This suggests that the minor process (or fractional free convolution) can be studied within the framework of free probability theory.

One of the known facts about integer free convolution powers is monotonicity of the *free entropy*

*free Fisher information*which were introduced by Voiculescu as free probability analogues of the classical probability concepts of differential entropy and classical Fisher information. (Here we correct a small typo in the normalization constant of Fisher entropy as presented in Voiculescu’s paper.) Namely, it was shown by Shylakhtenko that the quantity is monotone non-decreasing for integer , and the Fisher information is monotone non-increasing for integer . This is the free probability analogue of the corresponding monotonicities for differential entropy and classical Fisher information that was established by Artstein, Ball, Barthe, and Naor, answering a question of Shannon.

Our first main result is to extend the monotonicity results of Shylakhtenko to fractional . We give two proofs of this fact, one using free probability machinery, and a more self contained (but less motivated) proof using integration by parts and contour integration. The free probability proof relies on the concept of the *free score* of a noncommutative random variable, which is the analogue of the classical score. The free score, also introduced by Voiculescu, can be defined by duality as measuring the perturbation with respect to semicircular noise, or more precisely

The free score interacts very well with the free minor process , in particular by standard calculations one can establish the identity

whenever is a noncommutative random variable, is an algebra of noncommutative random variables, and is a real projection of trace that is free of both and . The monotonicity of free Fisher information then follows from an application of Pythagoras’s theorem (which implies in particular that conditional expectation operators are contractions on ). The monotonicity of free entropy then follows from an integral representation of free entropy as an integral of free Fisher information along the free Ornstein-Uhlenbeck process (or equivalently, free Fisher information is essentially the rate of change of free entropy with respect to perturbation by semicircular noise). The argument also shows when equality holds in the monotonicity inequalities; this occurs precisely when is a semicircular measure up to affine rescaling.After an extensive amount of calculation of all the quantities that were implicit in the above free probability argument (in particular computing the various terms involved in the application of Pythagoras’ theorem), we were able to extract a self-contained proof of monotonicity that relied on differentiating the quantities in and using the differential equation (1). It turns out that if for sufficiently regular , then there is an identity

where is the kernel and . It is not difficult to show that is a positive semi-definite kernel, which gives the required monotonicity. It would be interesting to obtain some more insightful interpretation of the kernel and the identity (2).These monotonicity properties hint at the minor process being associated to some sort of “gradient flow” in the parameter. We were not able to formalize this intuition; indeed, it is not clear what a gradient flow on a varying noncommutative probability space even means. However, after substantial further calculation we were able to formally describe the minor process as the Euler-Lagrange equation for an intriguing Lagrangian functional that we conjecture to have a random matrix interpretation. We first work in “Lagrangian coordinates”, defining the quantity on the “Gelfand-Tsetlin pyramid”

by the formula which is well defined if the density of is sufficiently well behaved. The random matrix interpretation of is that it is the asymptotic location of the eigenvalue of the upper left minor of a random matrix with asymptotic empirical spectral distribution and with unitarily invariant distribution, thus is in some sense a continuum limit of Gelfand-Tsetlin patterns. Thus for instance the Cauchy interlacing laws in this asymptotic limit regime become After a lengthy calculation (involving extensive use of the chain rule and product rule), the equation (1) is equivalent to the Euler-Lagrange equation where is the Lagrangian density Thus the minor process is formally a critical point of the integral . The quantity measures the mean eigenvalue spacing at some location of the Gelfand-Tsetlin pyramid, and the ratio measures mean eigenvalue drift in the minor process. This suggests that this Lagrangian density is some sort of measure of entropy of the asymptotic microscale point process emerging from the minor process at this spacing and drift. There is work of Metcalfe demonstrating that this point process is given by the Boutillier bead model, so we conjecture that this Lagrangian density somehow measures the entropy density of this process.### The Ionescu-Wainger multiplier theorem and the adeles

I’ve just uploaded to the arXiv my paper The Ionescu-Wainger multiplier theorem and the adeles“. This paper revisits a useful multiplier theorem of Ionescu and Wainger on “major arc” Fourier multiplier operators on the integers (or lattices ), and strengthens the bounds while also interpreting it from the viewpoint of the adelic integers (which were also used in my recent paper with Krause and Mirek).

For simplicity let us just work in one dimension. Any smooth function then defines a discrete Fourier multiplier operator for any by the formula

where is the Fourier transform on ; similarly, any test function defines a continuous Fourier multiplier operator by the formula where . In both cases we refer to as the*symbol*of the multiplier operator .

We will be interested in discrete Fourier multiplier operators whose symbols are supported on a finite union of arcs. One way to construct such operators is by “folding” continuous Fourier multiplier operators into various target frequencies. To make this folding operation precise, given any continuous Fourier multiplier operator , and any frequency , we define the discrete Fourier multiplier operator for any frequency shift by the formula

or equivalently More generally, given any finite set , we can form a multifrequency projection operator on by the formula thus This construction gives discrete Fourier multiplier operators whose symbol can be localised to a finite union of arcs. For instance, if is supported on , then is a Fourier multiplier whose symbol is supported on the set .There are a body of results relating the theory of discrete Fourier multiplier operators such as or with the theory of their continuous counterparts. For instance we have the basic result of Magyar, Stein, and Wainger:

**Proposition 1 (Magyar-Stein-Wainger sampling principle)**Let and .

- (i) If is a smooth function supported in , then , where denotes the operator norm of an operator .
- (ii) More generally, if is a smooth function supported in for some natural number , then .

When the implied constant in these bounds can be set to equal . In the paper of Magyar, Stein, and Wainger it was posed as an open problem as to whether this is the case for other ; in an appendix to this paper I show that the answer is negative if is sufficiently close to or , but I do not know the full answer to this question.

This proposition allows one to get a good multiplier theory for symbols supported near cyclic groups ; for instance it shows that a discrete Fourier multiplier with symbol for a fixed test function is bounded on , uniformly in and . For many applications in discrete harmonic analysis, one would similarly like a good multiplier theory for symbols supported in “major arc” sets such as

and in particular to get a good Littlewood-Paley theory adapted to major arcs. (This is particularly the case when trying to control “true complexity zero” expressions for which the minor arc contributions can be shown to be negligible; my recent paper with Krause and Mirek is focused on expressions of this type.) At present we do not have a good multiplier theory that is directly adapted to the classical major arc set (1) (though I do not know of rigorous negative results that show that such a theory is not possible); however, Ionescu and Wainger were able to obtain a useful substitute theory in which (1) was replaced by a somewhat larger set that had better multiplier behaviour. Starting with a finite collection of pairwise coprime natural numbers, and a natural number , one can form the major arc type set where consists of all rational points in the unit circle of the form where is the product of at most elements from and is an integer. For suitable choices of and not too large, one can make this set (2) contain the set (1) while still having a somewhat controlled size (very roughly speaking, one chooses to consist of (small powers of) large primes between and for some small constant , together with something like the product of all the primes up to (raised to suitable powers)).In the regime where is fixed and is small, there is a good theory:

**Theorem 2 (Ionescu-Wainger theorem, rough version)**If is an even integer or the dual of an even integer, and is supported on for a sufficiently small , then

There is a more explicit description of how small needs to be for this theorem to work (roughly speaking, it is not much more than what is needed for all the arcs in (2) to be disjoint), but we will not give it here. The logarithmic loss of was reduced to by Mirek. In this paper we refine the bound further to

when or for some integer . In particular there is no longer any logarithmic loss in the cardinality of the set .The proof of (3) follows a similar strategy as to previous proofs of Ionescu-Wainger type. By duality we may assume . We use the following standard sequence of steps:

- (i) (Denominator orthogonality) First one splits into various pieces depending on the denominator appearing in the element of , and exploits “superorthogonality” in to estimate the norm by the norm of an appropriate square function.
- (ii) (Nonconcentration) One expands out the power of the square function and estimates it by a “nonconcentrated” version in which various factors that arise in the expansion are “disjoint”.
- (iii) (Numerator orthogonality) We now decompose based on the numerators appearing in the relevant elements of , and exploit some residual orthogonality in this parameter to reduce to estimating a square-function type expression involving sums over various cosets .
- (iv) (Marcinkiewicz-Zygmund) One uses the Marcinkiewicz-Zygmund theorem relating scalar and vector valued operator norms to eliminate the role of the multiplier .
- (v) (Rubio de Francia) Use a reverse square function estimate of Rubio de Francia type to conclude.

The main innovations are that of using the probabilistic decoupling method to remove some logarithmic losses in (i), and recent progress on the Erdos-Rado sunflower conjecture (as discussed in this recent post) to improve the bounds in (ii). For (i), the key point is that one can express a sum such as

where is the set of -element subsets of an index set , and are various complex numbers, as an average where is a random partition of into subclasses (chosen uniformly over all such partitions), basically because every -element subset of has a probability exactly of being completely shattered by such a random partition. This “decouples” the index set into a Cartesian product which is more convenient for application of the superorthogonality theory. For (ii), the point is to efficiently obtain estimates of the form where are various non-negative quantities, and a sunflower is a collection of sets that consist of a common “core” and disjoint “petals” . The other parts of the argument are relatively routine; see for instance this survey of Pierce for a discussion of them in the simple case .In this paper we interpret the Ionescu-Wainger multiplier theorem as being essentially a consequence of various quantitative versions of the Shannon sampling theorem. Recall that this theorem asserts that if a (Schwartz) function has its Fourier transform supported on , then can be recovered uniquely from its restriction . In fact, as can be shown from a little bit of routine Fourier analysis, if we narrow the support of the Fourier transform slightly to for some , then the restriction has the same behaviour as the original function, in the sense that

for all ; see Theorem 4.18 of this paper of myself with Krause and Mirek. This is consistent with the uncertainty principle, which suggests that such functions should behave like a constant at scales .The quantitative sampling theorem (4) can be used to give an alternate proof of Proposition 1(i), basically thanks to the identity

whenever is Schwartz and has Fourier transform supported in , and is also supported on ; this identity can be easily verified from the Poisson summation formula. A variant of this argument also yields an alternate proof of Proposition 1(ii), where the role of is now played by , and the standard embedding of into is now replaced by the embedding of into ; the analogue of (4) is now whenever is Schwartz and has Fourier transform supported in , and is endowed with probability Haar measure.The locally compact abelian groups and can all be viewed as projections of the adelic integers (the product of the reals and the profinite integers ). By using the Ionescu-Wainger multiplier theorem, we are able to obtain an adelic version of the quantitative sampling estimate (5), namely

whenever , is Schwartz-Bruhat and has Fourier transform supported on for some sufficiently small (the precise bound on depends on in a fashion not detailed here). This allows one obtain an “adelic” extension of the Ionescu-Wainger multiplier theorem, in which the operator norm of any discrete multiplier operator whose symbol is supported on major arcs can be shown to be comparable to the operator norm of an adelic counterpart to that multiplier operator; in principle this reduces “major arc” harmonic analysis on the integers to “low frequency” harmonic analysis on the adelic integers , which is a simpler setting in many ways (mostly because the set of major arcs (2) is now replaced with a product set ).### Pointwise ergodic theorems for non-conventional bilinear polynomial averages

Ben Krause, Mariusz Mirek, and I have uploaded to the arXiv our paper Pointwise ergodic theorems for non-conventional bilinear polynomial averages. This paper is a contribution to the decades-long program of extending the classical ergodic theorems to “non-conventional” ergodic averages. Here, the focus is on pointwise convergence theorems, and in particular looking for extensions of the pointwise ergodic theorem of Birkhoff:

**Theorem 1 (Birkhoff ergodic theorem)**Let be a measure-preserving system (by which we mean is a -finite measure space, and is invertible and measure-preserving), and let for any . Then the averages converge pointwise for -almost every .

Pointwise ergodic theorems have an inherently harmonic analysis content to them, as they are closely tied to maximal inequalities. For instance, the Birkhoff ergodic theorem is closely tied to the Hardy-Littlewood maximal inequality.

The above theorem was generalized by Bourgain (conceding the endpoint , where pointwise almost everywhere convergence is now known to fail) to polynomial averages:

**Theorem 2 (Pointwise ergodic theorem for polynomial averages)**Let be a measure-preserving system, and let for any . Let be a polynpmial with integer coefficients. Then the averages converge pointwise for -almost every .

For bilinear averages, we have a separate 1990 result of Bourgain (for functions), extended to other spaces by Lacey, and with an alternate proof given, by Demeter:

**Theorem 3 (Pointwise ergodic theorem for two linear polynomials)**Let be a measure-preserving system with finite measure, and let , for some with . Then for any integers , the averages converge pointwise almost everywhere.

It has been an open question for some time (see e.g., Problem 11 of this survey of Frantzikinakis) to extend this result to other bilinear ergodic averages. In our paper we are able to achieve this in the partially linear case:

**Theorem 4 (Pointwise ergodic theorem for one linear and one nonlinear polynomial)**Let be a measure-preserving system, and let , for some with . Then for any polynomial of degree , the averages converge pointwise almost everywhere.

We actually prove a bit more than this, namely a maximal function estimate and a variational estimate, together with some additional estimates that “break duality” by applying in certain ranges with , but we will not discuss these extensions here. A good model case to keep in mind is when and (which is the case we started with). We note that norm convergence for these averages was established much earlier by Furstenberg and Weiss (in the case at least), and in fact norm convergence for arbitrary polynomial averages is now known thanks to the work of Host-Kra, Leibman, and Walsh.

Our proof of Theorem 4 is much closer in spirit to Theorem 2 than to Theorem 3. The property of the averages shared in common by Theorems 2, 4 is that they have “true complexity zero”, in the sense that they can only be only be large if the functions involved are “major arc” or “profinite”, in that they behave periodically over very long intervals (or like a linear combination of such periodic functions). In contrast, the average in Theorem 3 has “true complexity one”, in the sense that they can also be large if are “almost periodic” (a linear combination of eigenfunctions, or plane waves), and as such all proofs of the latter theorem have relied (either explicitly or implicitly) on some form of time-frequency analysis. In principle, the true complexity zero property reduces one to study the behaviour of averages on major arcs. However, until recently the available estimates to quantify this true complexity zero property were not strong enough to achieve a good reduction of this form, and even once one was in the major arc setting the bilinear averages in Theorem 4 were still quite complicated, exhibiting a mixture of both continuous and arithmetic aspects, both of which being genuinely bilinear in nature.

After applying standard reductions such as the Calderón transference principle, the key task is to establish a suitably “scale-invariant” maximal (or variational) inequality on the integer shift system (in which with counting measure, and ). A model problem is to establish the maximal inequality

where ranges over powers of two and is the bilinear operator The single scale estimate or equivalently (by duality) is immediate from Hölder’s inequality; the difficulty is how to take the supremum over scales .The first step is to understand when the single-scale estimate (2) can come close to equality. A key example to keep in mind is when , , where is a small modulus, are such that , is a smooth cutoff to an interval of length , and is also supported on and behaves like a constant on intervals of length . Then one can check that (barring some unusual cancellation) (2) is basically sharp for this example. A remarkable result of Peluse and Prendiville (generalised to arbitrary nonlinear polynomials by Peluse) asserts, roughly speaking, that this example basically the only way in which (2) can be saturated, at least when are supported on a common interval of length and are normalised in rather than . (Strictly speaking, the above paper of Peluse and Prendiville only says something like this regarding the factors; the corresponding statement for was established in a subsequent paper of Peluse and Prendiville.) The argument requires tools from additive combinatorics such as the Gowers uniformity norms, and hinges in particular on the “degree lowering argument” of Peluse and Prendiville, which I discussed in this previous blog post. Crucially for our application, the estimates are very quantitative, with all bounds being polynomial in the ratio between the left and right hand sides of (2) (or more precisely, the -normalized version of (2)).

For our applications we had to extend the inverse theory of Peluse and Prendiville to an theory. This turned out to require a certain amount of “sleight of hand”. Firstly, one can dualise the theorem of Peluse and Prendiville to show that the “dual function”

can be well approximated in by a function that has Fourier support on “major arcs” if enjoy control. To get the required extension to in the aspect one has to improve the control on the error from to ; this can be done by some interpolation theory combined with the useful Fourier multiplier theory of Ionescu and Wainger on major arcs. Then, by further interpolation using recent improving estimates of Han, Kovac, Lacey, Madrid, and Yang for linear averages such as , one can relax the hypothesis on to an hypothesis, and then by undoing the duality one obtains a good inverse theorem for (2) for the function ; a modification of the arguments also gives something similar for .Using these inverse theorems (and the Ionescu-Wainger multiplier theory) one still has to understand the “major arc” portion of (1); a model case arises when are supported near rational numbers with for some moderately large . The inverse theory gives good control (with an exponential decay in ) on individual scales , and one can leverage this with a Rademacher-Menshov type argument (see e.g., this blog post) and some closer analysis of the bilinear Fourier symbol of to eventually handle all “small” scales, with ranging up to say where for some small constant and large constant . For the “large” scales, it becomes feasible to place all the major arcs simultaneously under a single common denominator , and then a quantitative version of the Shannon sampling theorem allows one to transfer the problem from the integers to the locally compact abelian group . Actually it was conceptually clearer for us to work instead with the adelic integers , which is the inverse limit of the . Once one transfers to the adelic integers, the bilinear operators involved split up as tensor products of the “continuous” bilinear operator

on , and the “arithmetic” bilinear operator on the profinite integers , equipped with probability Haar measure . After a number of standard manipulations (interpolation, Fubini’s theorem, Holder’s inequality, variational inequalities, etc.) the task of estimating this tensor product boils down to establishing an improving estimate for some . Splitting the profinite integers into the product of the -adic integers , it suffices to establish this claim for each separately (so long as we keep the implied constant equal to for sufficiently large ). This turns out to be possible using an arithmetic version of the Peluse-Prendiville inverse theorem as well as an arithmetic improving estimate for linear averaging operators which ultimately arises from some estimates on the distribution of polynomials on the -adic field , which are a variant of some estimates of Kowalski and Wright.### Higher uniformity of bounded multiplicative functions in short intervals on average

Kaisa Matomäki, Maksym Radziwill, Joni Teräväinen, Tamar Ziegler and I have uploaded to the arXiv our paper Higher uniformity of bounded multiplicative functions in short intervals on average. This paper (which originated from a working group at an AIM workshop on Sarnak’s conjecture) focuses on the *local Fourier uniformity conjecture* for bounded multiplicative functions such as the Liouville function . One form of this conjecture is the assertion that

The conjecture gets more difficult as increases, and also becomes more difficult the more slowly grows with . The conjecture is equivalent to the assertion

which was proven (for arbitrarily slowly growing ) in a landmark paper of Matomäki and Radziwill, discussed for instance in this blog post.For , the conjecture is equivalent to the assertion

This remains open for sufficiently slowly growing (and it would be a major breakthrough in particular if one could obtain this bound for as small as for any fixed , particularly if applicable to more general bounded multiplicative functions than , as this would have new implications for a generalization of the Chowla conjecture known as the Elliott conjecture). Recently, Kaisa, Maks and myself were able to establish this conjecture in the range (in fact we have since worked out in the current paper that we can get as small as ). In our current paper we establish Fourier uniformity conjecture for higher for the same range of . This in particular implies local orthogonality to polynomial phases, where denotes the polynomials of degree at most , but the full conjecture is a bit stronger than this, establishing the more general statement for any degree filtered nilmanifold and Lipschitz function , where now ranges over polynomial maps from to . The method of proof follows the same general strategy as in the previous paper with Kaisa and Maks. (The equivalence of (4) and (1) follows from the inverse conjecture for the Gowers norms, proven in this paper.) We quickly sketch first the proof of (3), using very informal language to avoid many technicalities regarding the precise quantitative form of various estimates. If the estimate (3) fails, then we have the correlation estimate for many and some polynomial depending on . The difficulty here is to understand how can depend on . We write the above correlation estimate more suggestively as Because of the multiplicativity at small primes , one expects to have a relation of the form for many for which for some small primes . (This can be formalised using an inequality of Elliott related to the Turan-Kubilius theorem.) This gives a relationship between and for “edges” in a rather sparse “graph” connecting the elements of say . Using some graph theory one can locate some non-trivial “cycles” in this graph that eventually lead (in conjunction to a certain technical but important “Chinese remainder theorem” step to modify the to eliminate a rather serious “aliasing” issue that was already discussed in this previous post) to obtain functional equations of the form for some large and close (but not identical) integers , where should be viewed as a first approximation (ignoring a certain “profinite” or “major arc” term for simplicity) as “differing by a slowly varying polynomial” and the polynomials should now be viewed as taking values on the reals rather than the integers. This functional equation can be solved to obtain a relation of the form for some real number of polynomial size, and with further analysis of the relation (5) one can make basically independent of . This simplifies (3) to something like and this is now of a form that can be treated by the theorem of Matomäki and Radziwill (because is a bounded multiplicative function). (Actually because of the profinite term mentioned previously, one also has to insert a Dirichlet character of bounded conductor into this latter conclusion, but we will ignore this technicality.)Now we apply the same strategy to (4). For abelian the claim follows easily from (3), so we focus on the non-abelian case. One now has a polynomial sequence attached to many , and after a somewhat complicated adaptation of the above arguments one again ends up with an approximate functional equation

where the relation is rather technical and will not be detailed here. A new difficulty arises in that there are some unwanted solutions to this equation, such as for some , which do not necessarily lead to multiplicative characters like as in the polynomial case, but instead to some unfriendly looking “generalized multiplicative characters” (think of as a rough caricature). To avoid this problem, we rework the graph theory portion of the argument to produce not just one functional equation of the form (6)for each , but*many*, leading to dilation invariances for a “dense” set of . From a certain amount of Lie algebra theory (ultimately arising from an understanding of the behaviour of the exponential map on nilpotent matrices, and exploiting the hypothesis that is non-abelian) one can conclude that (after some initial preparations to avoid degenerate cases) must behave like for some

*central*element of . This eventually brings one back to the multiplicative characters that arose in the polynomial case, and the arguments now proceed as before.

We give two applications of this higher order Fourier uniformity. One regards the growth of the number

of length sign patterns in the Liouville function. The Chowla conjecture implies that , but even the weaker conjecture of Sarnak that for some remains open. Until recently, the best asymptotic lower bound on was , due to McNamara; with our result, we can now show for any (in fact we can get for any ). The idea is to repeat the now-standard argument to exploit multiplicativity at small primes to deduce Chowla-type conjectures from Fourier uniformity conjectures, noting that the Chowla conjecture would give all the sign patterns one could hope for. The usual argument here uses the “entropy decrement argument” to eliminate a certain error term (involving the large but mean zero factor ). However the observation is that if there are extremely few sign patterns of length , then the entropy decrement argument is unnecessary (there isn’t much entropy to begin with), and a more low-tech moment method argument (similar to the derivation of Chowla’s conjecture from Sarnak’s conjecture, as discussed for instance in this post) gives enough of Chowla’s conjecture to produce plenty of length sign patterns. If there are not extremely few sign patterns of length then we are done anyway. One quirk of this argument is that the sign patterns it produces may only appear exactly once; in contrast with preceding arguments, we were not able to produce a large number of sign patterns that each occur infinitely often.The second application is to obtain cancellation for various polynomial averages involving the Liouville function or von Mangoldt function , such as

or where are polynomials of degree at most , no two of which differ by a constant (the latter is essential to avoid having to establish the Chowla or Hardy-Littlewood conjectures, which of course remain open). Results of this type were previously obtained by Tamar Ziegler and myself in the “true complexity zero” case when the polynomials had distinct degrees, in which one could use the theory of Matomäki and Radziwill; now that higher is available at the scale we can now remove this restriction.### The sunflower lemma via Shannon entropy

A family of sets for some is a sunflower if there is a *core set* contained in each of the such that the *petal sets* are disjoint. If , let denote the smallest natural number with the property that any family of distinct sets of cardinality at most contains distinct elements that form a sunflower. The celebrated Erdös-Rado theorem asserts that is finite; in fact Erdös and Rado gave the bounds

*sunflower conjecture*asserts in fact that the upper bound can be improved to . This remains open at present despite much effort (including a Polymath project); after a long series of improvements to the upper bound, the best general bound known currently is for all , established in 2019 by Rao (building upon a recent breakthrough a month previously of Alweiss, Lovett, Wu, and Zhang). Here we remove the easy cases or in order to make the logarithmic factor a little cleaner.

Rao’s argument used the Shannon noiseless coding theorem. It turns out that the argument can be arranged in the very slightly different language of Shannon entropy, and I would like to present it here. The argument proceeds by locating the core and petals of the sunflower separately (this strategy is also followed in Alweiss-Lovett-Wu-Zhang). In both cases the following definition will be key. In this post all random variables, such as random sets, will be understood to be discrete random variables taking values in a finite range. We always use boldface symbols to denote random variables, and non-boldface for deterministic quantities.

**Definition 1 (Spread set)** Let . A random set is said to be -spread if one has

The core can then be selected greedily in such a way that the remainder of a family becomes spread:

**Lemma 2 (Locating the core)** Let be a family of subsets of a finite set , each of cardinality at most , and let . Then there exists a “core” set of cardinality at most such that the set

*Proof:* We may assume is non-empty, as the claim is trivial otherwise. For any , define the quantity

Let be the set (3). Since , is non-empty. It remains to check that the family is -spread. But for any and drawn uniformly at random from one has

Since and , we obtain the claimIn view of the above lemma, the bound (2) will then follow from

**Proposition 3 (Locating the petals)** Let be natural numbers, and suppose that for a sufficiently large constant . Let be a finite family of subsets of a finite set , each of cardinality at most which is -spread. Then there exist such that is disjoint.

Indeed, to prove (2), we assume that is a family of sets of cardinality greater than for some ; by discarding redundant elements and sets we may assume that is finite and that all the are contained in a common finite set . Apply Lemma 2 to find a set of cardinality such that the family is -spread. By Proposition 3 we can find such that are disjoint; since these sets have cardinality , this implies that the are distinct. Hence form a sunflower as required.

**Remark 4** Proposition 3 is easy to prove if we strengthen the condition on to . In this case, we have for every , hence by the union bound we see that for any with there exists such that is disjoint from the set , which has cardinality at most . Iterating this, we obtain the conclusion of Proposition 3 in this case. This recovers a bound of the form , and by pursuing this idea a little further one can recover the original upper bound (1) of Erdös and Rado.

It remains to prove Proposition 3. In fact we can locate the petals one at a time, placing each petal inside a random set.

**Proposition 5 (Locating a single petal)** Let the notation and hypotheses be as in Proposition 3. Let be a random subset of , such that each lies in with an independent probability of . Then with probability greater than , contains one of the .

To see that Proposition 5 implies Proposition 3, we randomly partition into by placing each into one of the , chosen uniformly and independently at random. By Proposition 5 and the union bound, we see that with positive probability, it is simultaneously true for all that each contains one of the . Selecting one such for each , we obtain the required disjoint petals.

We will prove Proposition 5 by gradually increasing the density of the random set and arranging the sets to get quickly absorbed by this random set. The key iteration step is

**Proposition 6 (Refinement inequality)** Let and . Let be a random subset of a finite set which is -spread, and let be a random subset of independent of , such that each lies in with an independent probability of . Then there exists another random subset of with the same distribution as , such that and

Note that a direct application of the first moment method gives only the bound

but the point is that by switching from to an equivalent we can replace the factor by a quantity significantly smaller than .One can iterate the above proposition, repeatedly replacing with (noting that this preserves the -spread nature ) to conclude

**Corollary 7 (Iterated refinement inequality)** Let , , and . Let be a random subset of a finite set which is -spread, and let be a random subset of independent of , such that each lies in with an independent probability of . Then there exists another random subset of with the same distribution as , such that

Now we can prove Proposition 5. Let be chosen shortly. Applying Corollary 7 with drawn uniformly at random from the , and setting , or equivalently , we have

In particular, if we set , so that , then by choice of we have , hence In particular with probability at least , there must exist such that , giving the proposition.It remains to establish Proposition 6. This is the difficult step, and requires a clever way to find the variant of that has better containment properties in than does. The main trick is to make a conditional copy of that is conditionally independent of subject to the constraint . The point here is that this constrant implies the inclusions

and Because of the -spread hypothesis, it is hard for to contain any fixed large set. If we could apply this observation in the contrapositive to we could hope to get a good upper bound on the size of and hence on thanks to (4). One can also hope to improve such an upper bound by also employing (5), since it is also hard for the random set to contain a fixed large set. There are however difficulties with implementing this approach due to the fact that the random sets are coupled with in a moderately complicated fashion. In Rao’s argument a somewhat complicated encoding scheme was created to give information-theoretic control on these random variables; below thefold we accomplish a similar effect by using Shannon entropy inequalities in place of explicit encoding. A certain amount of information-theoretic sleight of hand is required to decouple certain random variables to the extent that the Shannon inequalities can be effectively applied. The argument bears some resemblance to the “entropy compression method” discussed in this previous blog post; there may be a way to more explicitly express the argument below in terms of that method. (There is also some kinship with the method of dependent random choice, which is used for instance to establish the Balog-Szemerédi-Gowers lemma, and was also translated into information theoretic language in these unpublished notes of Van Vu and myself.)

** — 1. Shannon entropy — **

In this section we lay out all the tools from the theory of Shannon entropy that we will need.

Define an *empirical sequence* for a random variable taking values in a discrete set to be a sequence in such that the empirical samples of this sequence converge in distribution to in the sense that

If is a random variable taking values in some set , its *Shannon entropy* is defined by the formula

We record the following standard and easily verified facts:

**Lemma 8 (Basic Shannon inequalities)** Let be random variables.

- (i) (Monotonicity) If is a deterministic function of , then . More generally, if is a deterministic function of and , then . If is a deterministic function of , then .
- (ii) (Subadditivity) One has , with equality iff , are independent. More generally, one has , with equality iff , are conditionally independent with respect to .
- (iii) (Chain rule) One has . More generally . In particular , and iff are independent; similarly, , and iff are conditionally independent with respect to .
- (iv) (Jensen’s inequality) If takes values in a finite set then , with equality iff is uniformly distributed in . More generally, if takes values in a set that depends on , then , with equality iff is uniformly distributed in after conditioning on .
- (v) (Gibbs inequality) If take values in the same finite set , then (we permit the right-hand side to be infinite, which makes the inequality vacuously true).

See this previous blog post for some intuitive analogies to understand Shannon entropy.

Now we establish some inequalities of relevance to random sets.

We first observe that any small random set largely determines any of its subsets. Define a *random subset* of a random set to be a random set such that holds almost surely.

**Lemma 9 (Subsets of small sets have small conditional entropy)** Let be a random finite set.

- (i) One has for any random subset of .
- (ii) One has . If is almost surely non-empty, we can improve this to .

*Proof:* The set takes values in the power set of , so the claim (i) follows from Lemma 8(iv). (Note how it is convenient here that we are using the base for the logarithm.)

For (ii), apply Lemma 8(v) with and the geometric random variable for natural numbers (or for positive , if is non-empty).

Now we encode the property of a random variable being -spread in the language of Shannon entropy.

**Lemma 10 (Information-theoretic interpretation of spread)** Let be a random finite set that is -spread for some .

- (i) If is uniformly distributed amongst some finite collection of sets, then for all random subsets of .
- (ii) In the general case, if are an empirical sequence of , then as , where is drawn uniformly from and is a random subset of .

Informally: large random subsets of an -spread set necessarily have a lot of mutual information with . Conversely, one can bound the size of a random subset of an -spread set by bounding its mutual information with .

*Proof:* In case (i), it suffices by Lemma 8(iv) to establish the bound

Given a finite non-empty set and , let denote the collection of -element subsets of . A uniformly chosen element of is thus a random -element subset of ; we refer to the quantity as the *density* of this random subset, and as a *uniformly chosen random subset of of density *. (Of course, this is only defined when is an integer multiple of .) Uniformly chosen random sets have the following information-theoretic relationships to small random sets have the following information-theoretic properties:

**Lemma 11** Let be a finite non-empty set, let be a uniformly chosen random subset of of some density (which is a multiple of ).

- (i) (Spread) If is a random subset of , then
- (ii) (Absorption) If is a random subset of , then

*Proof:* To prove (i), it suffices by Lemma 10(i) to show that is -spread, which amounts to showing that

For (ii), by replacing with we may assume that are disjoint. From Lemma 8(iii) and Lemma 9(ii) it suffices to show that

which in turn is implied by for each . By Lemma 8(iv) it suffices to show that but this follows from multiplying together the inequalities for .The following “relative product” construction will be important for us. Given a random variable and a deterministic function of that variable, one can construct a conditionally independent copy of subject to the condition , with the joint distribution

Note that this is usually*not*the same as starting with a completely independent copy of and then conditioning to the event , which has the slightly different distribution cf. Simpson’s paradox. By construction, has the same distribution as , and is conditionally independent of relative to . In particular, from Lemma 8 we have which we can also write as a “entropy Cauchy-Schwarz identity” This can be compared with the combinatorial inequality or equivalently whenever is a function on a non-empty finite set , which is easily proven as a consequence of Cauchy-Schwarz. One nice advantage of the entropy formalism over the combinatorial one is that the analogue of this instance of the Cauchy-Schwarz inequality automatically becomes an equality (this is related to the asymptotic equipartition property in the microstates interpretation of entropy).

** — 2. Proof of refinement inequality — **

Now we have enough tools to prove Proposition 6. Let be as in that proposition. On the event when is empty we can set so we can instead condition to the event that is non-empty. In particular

In order to use Lemma 10 we fix an empirical sequence for . We relabel as , and let be a parameter going off to infinity (so in particular is identified with a subset of . We let be drawn uniformly at random from , and let be a uniform random subset of of density independent of . Observe from Stirling’s formula that converges in distribution to . Thus it will suffice to find another uniform random variable from such that

as , since we can pass to a subsequence in which converges in distribution to . From (8) we haveFrom we can form the random set ; we then form a conditionally independent copy of subject to the constraint

We use as the uniform variable to establish (9). The point is that the relation (11) implies that so it will suffice to show that and hence by (7) and hence by Lemma 8(ii) and independence of Now we try to relate the first term on the left-hand side with . Note from (11) that we have the identity and hence by Lemma 8(i) We estimate the relative entropy of here by selecting first , then , then . More precisely, using the chain rule and monotonicity (Lemma 8(i), (iii)) we have From Lemma 9(i) we have and Putting all this together, we conclude If we apply Lemma 10(ii) right away we will get the estimate which is a bound resembling (12), but the dependence on the parameters are too weak. To do better we return to the relative product construction to decouple some of the random variables here. From the tuple we can form the random variable , then form a conditionally independent copy of subject to the constraints From (11) and Lemma 8(i) we then have The point is that is now conditionally independent of relative to , so we can also rewrite the above conditional entropy as We now use the chain rule to disentangle the role of , writing the previous as From independence we have and from Lemma 9(i) we have We discard the negative term . Putting all this together, we obtainNow, from the constraints (11), (13) we have

and Thus by Lemma 8(i), (ii), followed by Lemma 10(ii) and Lemma 11(i), we have which when inserted back into (14) using and simplifies to and the claim follows.### BiSTRO seminar

Simion Filip, Curtis McMullen, Martin Moeller and I are co-organizing an online seminar called **Bi**lliards, **S**urfaces à la **T**eichmueller and **R**iemann, **O**nline (BiSTRO).

Similarly to several other current online seminars, the idea of BiSTRO arose after several conferences, meetings, etc. in Teichmueller dynamics were cancelled due to the covid-19 crisis.

In any case, the first talk of BiSTRO seminar will be delivered tomorrow by Kasra Rafi (at 18h CEST): in a nutshell, he will speak extend the scope of a result of Furstenberg on stationary measures (previously discussed in this blog here and here) to the context of mapping class groups.

Closing this short post, let me point out that the reader wishing to attend BiSTRO seminar can find the relevant informations at the bottom of the seminar’s official webpage.

### Maximal entropy measures and Birkhoff normal forms of certain Sinai billiards

Sinai billiards are a fascinating class of dynamical systems: in fact, despite their simple definition in terms of a dynamical billiard on a table given by a square or a two torus with a certain number of dispersing obstacles, they present some “nasty” features (related to the existence of “grazing” collisions) placing them slightly beyond the standard theory of smooth uniformly hyperbolic systems.

The seminal works by several authors (including Sinai, Bunimovich, Chernov, …) paved the way to establish many properties of Sinai billiards including the so-called fast decay of correlations for the Liouville measure (a feature which is illustrated in this numerical simulation [due to Dyatlov] here).

Nevertheless, some ergodic-theoretical aspects of (certain) Sinai billiards were elucidated only very recently: for instance, the existence of an unique probability measure of maximal entropy was proved by Baladi–Demers in this article here.

An interesting remark made by Baladi–Demers is the fact that the maximal entropy measure should be typically different from the Liouville measure : in fact, forces a *rigidity* property for periodic orbits, namely, the Lyapunov exponents of *all* periodic orbits coincide (note that this is a huge number of conditions because the number of periodic orbits of period grows exponentially with ) and, in particular, there are no known examples of Sinai billiards where .

In a recent paper, De Simoi–Leguil–Vinhage–Yang showed that forces another rigidity property for periodic orbits, namely, the Birkhoff normal forms (see also Section 6 of Chapter 6 of Hasselblatt–Katok book and Appendix A of Moreira–Yoccoz article) at periodic orbits whose invariant manifolds produce homoclinic orbits are all *linear*. In particular, they proposed in Remark 5.6 of their preprint to prove that for certain Sinai billiards by computing the Taylor expansion of the derivative of the billiard map at a periodic orbit of period two and checking that its quadratic part is non-degenerate.

In this short post, we explain that the strategy from the previous paragraph can be implemented to verify the non-linearity of the Birkhoff normal form at a periodic orbit of period two of the Sinai billiards in triangular lattices considered by Baladi–Demers.

**Remark 1**

*In our subsequent discussion, we shall assume some familiarity with the basic aspects of billiards maps described in the classical book of Chernov–Markarian.*

**1. Sinai billiards in triangular lattices**

We consider the Sinai billiard table on the hexagonal torus obtained from the following picture (extracted from Baladi–Demers article):

Here, the obstacles are disks of radii centered at the points of a triangular lattice and the distance between adjacent scatterers is .

As it is discussed in Baladi–Demers paper, the billiard map has a probability measure of maximal entropy whenever (where the limit case corresponds to touching obstacles and the limit case produces “infinite horizon”).

**2. Billiard map near a periodic orbit of period two**

We want to study the billiard map near the periodic orbit of period two given by the horizontal segment between and .

A trajectory leaving the point in the direction travels along the line until it hits the boundary of the disk of radius and center for the first time . By definition, the time is the smallest solution of , i.e.,

Hence, where and , the billiard trajectory starting at in the direction hits the obstacle at

where and, after a specular reflection, it takes the direction .

As it is explained in page 35 of Chernov–Markarian’s book, this geometric description of the billiard map near the periodic orbit of period two allows to compute along the following lines. If we adopt the convention in Chernov–Markarian’s book of using arc-length parametrization on the obstacle and describing the angle with the normal direction pointing towards the interior of the table using a sign determined by the orientation of the obstacles so that this normal vector stays at the left of the tangent vectors to the obstacles, then the billiard map becomes

where

(and for ) and, moreover,

and

where

Since we also have

and

it follows that

where

and

In particular, the determinant of the quadratic part of the Taylor expansion of near the periodic orbit of period two is

Thus, for , one has

so that the quadratic part of the Taylor expansion of is non-degenerate because for .

**Remark 2**

*in the limit case .*

### Hédi Daboussi, in memoriam

I was very sad to learn today from R. de la Bretèche and É. Fouvry that Hédi Daboussi passed away yesterday. Hédi played an important part in my life; he was the first actual analytic number theorist that I met, one day at IHP in 1987, at the beginning of my first bachelor-thesis style project. (This was before internet was widely available.) Fouvry and him advised me on this project, which was devoted to the large sieve, especially the proof of Selberg based on the Beurling functions. They also introduced me to Henryk Iwaniec, who was visiting Orsay at the time (in fact, the meeting at IHP was organized to coincide with a talk of Iwaniec).

Daboussi is probably best known outside the French analytic number theory community for two things: his elegant elementary proof of the Prime Number Theorem, found in 1983, which does not use Selberg’s identity, and which is explained in the nice book of Mendès-France and Tenenbaum, and the “Rencontres de théorie élémentaire et analytique des nombres”, which he organized for a long time as a weekly seminar in Paris, before they were transformed, after his retirement, into (roughly) monthly meetings, which are still known as the “Journées Daboussi”, and are organized by Régis de la Bretèche. The first of these were two days of a meeting in 2006 in honor of Hédi.

For me, the original Monday seminar organized by Daboussi was especially memorable, both because I gave my first “real” mathematics lecture there (I think that it was about my bachelor project), and also because on another occasion (either the same time or close to that), I first met Philippe Michel in Hédi’s seminar. It is very obvious to me that, without him, my life would have been very different, and I will always remember him because of that.