Matematički blogovi

Registration for the virtual ICM is now open

Terrence Tao - Čet, 2022-06-02 23:16

The (now virtual) 2022 International Congress of Mathematicians, which will be held on July 6-14, now has open registration (free of charge).

I’ll also take this opportunity to mention that there are a large number of supporting satellite events for the virtual ICM, which are listed on this web page. I’d like to draw particular attention to the public lecture satellite event, now hosted by the London Mathematical Society, that will feature three speakers:

(As with many other of the satellite events, these public lectures will require a separate registration from that of the main ICM.)

Kategorije: Matematički blogovi

Notes on inverse theorem entropy

Terrence Tao - Sub, 2022-05-28 00:13

Let be a finite set of order ; in applications will be typically something like a finite abelian group, such as the cyclic group . Let us define a -bounded function to be a function such that for all . There are many seminorms of interest that one places on functions that are bounded by on -bounded functions, such as the Gowers uniformity seminorms for (which are genuine norms for ). All seminorms in this post will be implicitly assumed to obey this property.

In additive combinatorics, a significant role is played by inverse theorems, which abstractly take the following form for certain choices of seminorm , some parameters , and some class of -bounded functions:

Theorem 1 (Inverse theorem template) If is a -bounded function with , then there exists such that , where denotes the usual inner product

Informally, one should think of as being somewhat small but fixed independently of , as being somewhat smaller but depending only on (and on the seminorm), and as representing the “structured functions” for these choices of parameters. There is some flexibility in exactly how to choose the class of structured functions, but intuitively an inverse theorem should become more powerful when this class is small. Accordingly, let us define the -entropy of the seminorm to be the least cardinality of for which such an inverse theorem holds. Seminorms with low entropy are ones for which inverse theorems can be expected to be a useful tool. This concept arose in some discussions I had with Ben Green many years ago, but never appeared in print, so I decided to record some observations we had on this concept here on this blog.

Lebesgue norms for have exponentially large entropy (and so inverse theorems are not expected to be useful in this case):

Proposition 2 ( norm has exponentially large inverse entropy) Let and . Then the -entropy of is at most . Conversely, for any , the -entropy of is at least for some absolute constant .

Proof: If is -bounded with , then we have

and hence by the triangle inequality we have

where is either the real or imaginary part of , which takes values in . If we let be rounded to the nearest multiple of , then by the triangle inequality again we have

There are only at most possible values for each value of , and hence at most possible choices for . This gives the first claim.

Now suppose that there is an -inverse theorem for some of cardinality . If we let be a random sign function (so the are independent random variables taking values in with equal probability), then there is a random such that

and hence by the pigeonhole principle there is a deterministic such that

On the other hand, from the Hoeffding inequality one has

for some absolute constant , hence

as claimed.

Most seminorms of interest in additive combinatorics, such as the Gowers uniformity norms, are bounded by some finite norm thanks to Hölder’s inequality, so from the above proposition and the obvious monotonicity properties of entropy, we conclude that all Gowers norms on finite abelian groups have at most exponential inverse theorem entropy. But we can do significantly better than this:

  • For the seminorm , one can simply take to consist of the constant function , and the -entropy is clearly equal to for any .
  • For the norm, the standard Fourier-analytic inverse theorem asserts that if then for some Fourier character . Thus the -entropy is at most .
  • For the norm on cyclic groups for , the inverse theorem proved by Green, Ziegler, and myself gives an -inverse theorem for some and consisting of nilsequences for some filtered nilmanifold of degree in a finite collection of cardinality , some polynomial sequence (which was subsequently observed by Candela-Sisask (see also Manners) that one can choose to be -periodic), and some Lipschitz function of Lipschitz norm . By the Arzela-Ascoli theorem, the number of possible (up to uniform errors of size at most , say) is . By standard arguments one can also ensure that the coefficients of the polynomial are , and then by periodicity there are only such polynomials. As a consequence, the -entropy is of polynomial size (a fact that seems to have first been implicitly observed in Lemma 6.2 of this paper of Frantzikinakis; thanks to Ben Green for this reference). One can obtain more precise dependence on using the quantitative version of this inverse theorem due to Manners; back of the envelope calculations using Section 5 of that paper suggest to me that one can take to be polynomial in and the entropy to be of the order , or alternatively one can reduce the entropy to at the cost of degrading to .
  • If one replaces the cyclic group by a vector space over some fixed finite field of prime order (so that ), then the inverse theorem of Ziegler and myself (available in both high and low characteristic) allows one to obtain an -inverse theorem for some and the collection of non-classical degree polynomial phases from to , which one can normalize to equal at the origin, and then by the classification of such polynomials one can calculate that the entropy is of quasipolynomial size in . By using the recent work of Gowers and Milicevic, one can make the dependence on here more precise, but we will not perform these calcualtions here.
  • For the norm on an arbitrary finite abelian group, the recent inverse theorem of Jamneshan and myself gives (after some calculations) a bound of the polynomial form on the -entropy for some , which one can improve slightly to if one degrades to , where is the maximal order of an element of , and is the rank (the number of elements needed to generate ). This bound is polynomial in in the cyclic group case and quasipolynomial in general.

For general finite abelian groups , we do not yet have an inverse theorem of comparable power to the ones mentioned above that give polynomial or quasipolynomial upper bounds on the entropy. However, there is a cheap argument that at least gives some subexponential bounds:

Proposition 3 (Cheap subexponential bound) Let and , and suppose that is a finite abelian group of order for some sufficiently large . Then the -complexity of is at most .

Proof: (Sketch) We use a standard random sampling argument, of the type used for instance by Croot-Sisask or Briet-Gopi (thanks to Ben Green for this latter reference). We can assume that for some sufficiently large , since otherwise the claim follows from Proposition 2.

Let be a random subset of with the events being iid with probability to be chosen later, conditioned to the event . Let be a -bounded function. By a standard second moment calculation, we see that with probability at least , we have

Thus, by the triangle inequality, if we choose for some sufficiently large , then for any -bounded with , one has with probability at least that

We can write the left-hand side as where is the randomly sampled dual function

Unfortunately, is not -bounded in general, but we have

and the right-hand side can be shown to be on the average, so we can condition on the event that the right-hand side is without significant loss in falure probability.

If we then let be rounded to the nearest Gaussian integer multiple of in the unit disk, one has from the triangle inequality that

where is the discretised randomly sampled dual function

For any given , there are at most places where can be non-zero, and in those places there are possible values for . Thus, if we let be the collection of all possible associated to a given , the cardinality of this set is , and for any with , we have

with probability at least .

Now we remove the failure probability by independent resampling. By rounding to the nearest Gaussian integer multiple of in the unit disk for a sufficiently small , one can find a family of cardinality consisting of -bounded functions of norm at least such that for every -bounded with there exists such that

Now, let be independent samples of for some to be chosen later. By the preceding discussion, we see that with probability at least , we have

for any given , so by the union bound, if we choose for a large enough , we can find such that

for all , and hence y the triangle inequality

Taking to be the union of the (applying some truncation and rescaling to these -bounded functions to make them -bounded, and then -bounded), we obtain the claim.

One way to obtain lower bounds on the inverse theorem entropy is to produce a collection of almost orthogonal functions with large norm. More precisely:

Proposition 4 Let be a seminorm, let , and suppose that one has a collection of -bounded functions such that for all , one has for all but at most choices of for all distinct . Then the -entropy of is at least .

Proof: Suppose we have an -inverse theorem with some family . Then for each there is such that . By the pigeonhole principle, there is thus such that for all in a subset of of cardinality at least :

We can sum this to obtain

for some complex numbers of unit magnitude. By Cauchy-Schwarz, this implies

and hence by the triangle inequality

On the other hand, by hypothesis we can bound the left-hand side by . Rearranging, we conclude that

and hence

giving the claim.

Thus for instance:

  • For the norm, one can take to be the family of linear exponential phases with and , and obtain a linear lower bound of for the -entropy, thus matching the upper bound of up to constants when is fixed.
  • For the norm, a similar calculation using polynomial phases of degree , combined with the Weyl sum estimates, gives a lower bound of for the -entropy for any fixed ; by considering nilsequences as well, together with nilsequence equidistribution theory, one can replace the exponent here by some quantity that goes to infinity as , though I have not attempted to calculate the exact rate.
  • For the norm, another similar calculation using polynomial phases of degree should give a lower bound of for the -entropy, though I have not fully performed the calculation.

We close with one final example. Suppose is a product of two sets of cardinality , and we consider the Gowers box norm

One possible choice of class here are the indicators of “rectangles” with , (cf. this previous blog post on cut norms). By standard calculations, one can use this class to show that the -entropy of is , and a variant of the proof of the second part of Proposition 2 shows that this is the correct order of growth in . In contrast, a modification of Proposition 3 only gives an upper bound of the form (the bottleneck is ensuring that the randomly sampled dual functions stay bounded in ), which shows that while this cheap bound is not optimal, it can still broadly give the correct “type” of bound (specifically, intermediate growth between polynomial and exponential).
Kategorije: Matematički blogovi

Partially specified mathematical objects, ambient parameters, and asymptotic notation

Terrence Tao - Sri, 2022-05-11 03:12

In orthodox first-order logic, variables and expressions are only allowed to take one value at a time; a variable , for instance, is not allowed to equal and simultaneously. We will call such variables completely specified. If one really wants to deal with multiple values of objects simultaneously, one is encouraged to use the language of set theory and/or logical quantifiers to do so.

However, the ability to allow expressions to become only partially specified is undeniably convenient, and also rather intuitive. A classic example here is that of the quadratic formula:

Strictly speaking, the expression is not well-formed according to the grammar of first-order logic; one should instead use something like

or

or

in order to strictly adhere to this grammar. But none of these three reformulations are as compact or as conceptually clear as the original one. In a similar spirit, a mathematical English sentence such as

is also not a first-order sentence; one would instead have to write something like

or

instead. These reformulations are not all that hard to decipher, but they do have the aesthetically displeasing effect of cluttering an argument with temporary variables such as which are used once and then discarded.

Another example of partially specified notation is the innocuous notation. For instance, the assertion

when written formally using first-order logic, would become something like

which is not exactly an elegant reformulation. Similarly with statements such as

or

Below the fold I’ll try to assign a formal meaning to partially specified expressions such as (1), for instance allowing one to condense (2), (3), (4) to just

When combined with another common (but often implicit) extension of first-order logic, namely the ability to reason using ambient parameters, we become able to formally introduce asymptotic notation such as the big-O notation or the little-o notation . We will explain how to do this at the end of this post.

— 1. Partially specified objects —

Let’s try to assign a formal meaning to partially specified mathematical expressions. We now allow expressions to not necessarily be a single (completely specified) mathematical object, but more generally a partially specified instance of a class of mathematical objects. For instance, denotes a partially specified instance of the class of numbers consisting of and ; that is to say, a number which is either and . A single completely specified mathematical object, such as the number , can now be also interpreted as the (unique) instance of a class consisting only of . Here we are using set notation to describe classes, ignoring for now the well known issue from Russell’s paradox that some extremely large classes are technically not sets, as this is not the main focus of our discussion here.

For reasons that will become clearer later, we will use the symbol rather than to denote the assertion that two partially specified objects range across exactly the same class. That is to say, we use as a synonym for . Thus, for instance, it is not the case that , because the class has instances that does not.

Any finite sequence of objects can also be viewed as a partially specified instance of a class , which I will denote in analogy with regular expressions, thus we now also have a new name for the set . (One could in fact propose as the notation for , as is done implicitly in assertions such as “ is true for “, but this creates notational conflicts with other uses of the comma in mathematics, such as the notation for an -tuple, so I will use the regular expression symbol here to avoid ambiguity.) For instance, denotes an partially specified instance from the class , that is to say a number which is either , , and . Similarly, we have

and

One can mimic set builder notation and denote a partially specified instance of a class as (or one can replace by any other variable name one pleases); similarly, one can use to denote a partially specified element of that obeys the predicate . Thus for instance

would denote a partially specified odd number. By a slight abuse of notation, we can abbreviate as or simply , if the domain of is implicitly understood from context. For instance, under this convention,

refers to an partially specified odd integer, while

refers to a partially specified integer. Under these conventions, it now becomes theoretically possible that the class one is drawing from becomes empty, and instantiation becomes vacuous. For instance, with our conventions, refers to a partially specified odd perfect number, which is conjectured to not exist. As it turns out, our notation can handle instances of empty classes without difficulty (basically thanks to the concept of a vacuous truth), but we will avoid dwelling on this edge case much here since this concept is not intuitive for beginners. (But if one does want to confront this possibility, one can use a symbol such as to denote an instance of the empty class, i.e., an object that has no specifications whatsoever.

The symbol introduced above can now be extended to be a binary operation on partially specified objects, defined by the formula

Thus for instance

and

One can then define other logical operations on partially specified objects if one wishes. For instance, we could define an “and” operator by defining

Thus for instance

and

(Here we are deviating from the syntax of regular expression, but I am not insisting that the entirety of mathematical notation conform to that syntax, and in any event regular expressions do not appear to have a direct analogue of this “and” operation.) We leave it to the reader to propose other logical operations on partially specified objects, though the “or” operator and the “and” operator will suffice for our purposes.

Any operation on completely specified mathematical objects can be extended to partially specified of mathematical objects by applying that operation to arbitrary instances of the class , with the convention that if a class appears multiple times in an expression, then we allow each instance of that class to take different values. For instance, if are partially specified numbers, we can define to be the class of all numbers formed by adding an instance of to an instance of (this is analogous to the operation of Minkowski addition for sets, or interval arithmetic in numerical analysis). For example,

and

(recall that there is no requirement that the signs here align). Note that

but that

So we now see the first sign that some care has to be taken with the law of substitution; we have

but we do not have

However, the law of substitution works fine as long as the variable being substituted appears exactly once on both sides of the equation.

One can have an unbounded number of partially specified instances of a class, for instance will be the class of all integers between and with the same parity as .

Remark 1 When working with signs , one sometimes wishes to keep all signs aligned, with denoting the sign opposite to , thus for instance with this notation one would have the geometric series formula

whenever . However, this notation is difficult to place in the framework used in this blog post without causing additional confusion, and as such we will not discuss it further here. (The syntax of regular expressions does have some tools for encoding this sort of alignment, but in first-order logic we also have the perfectly servicable tool of named variables and quantifiers (or plain old mathematical English) to do so also.)

One can also extend binary relations, such as or , to partially specified objects, by requiring that every instance on the left-hand side of the relation relates to some instance on the right-hand side (thus binary relations become sentences). Again, if class is instantiated multiple times, we allow different appearances to correspond to different classes. For instance, the statement is true, because every instance of is less than or equal to :

But the statement is false. Similarly, the statement is true, because is an instance of :

The statement is also true, because every instance of is less than some instance of :

The relationship between a partially specified representative of a class can then be summarised as

Note how this convention treats the left-hand side and right-hand side of a relation involving partially specified expressions asymmetrically. In particular, for partially specified expressions , the relation is no longer equivalent to ; the former states that every instance of is also an instance of , while the latter asserts the converse. For instance, is a true statement, but is a false statement (much as “ is prime” is a true statement (or “ in our notation), but “primes are ” (or in our notation) is false). In particular, we see a distinction between equality and equivalence ; indeed, holds if and only if and . On the other hand, as can be easily checked, the following three basic laws of mathematics remain valid for partially specified expressions :

  • (i) (Reflexivity) .
  • (ii) (Transitivity) If and , then . Similarly, if and , then , etc..
  • (iii) (Substitution) If , then for any function . Similarly, if , then for any monotone function , etc..

These conventions for partially specified expressions align well with informal mathematical English. For instance, as discussed in the introduction, the assertion

can now be expressed as

Similarly, the even Goldbach’s conjecture can now be stated as

while the Archimedean property of the reals can be reformulated as the assertion that

for any . Note also how the equality symbol for partially specified expressions corresponds well with the multiple meanings of the word “is” in English (consider for instance “two plus two is four”, “four is even”, and “the sum of two odd numbers is even”); the set-theoretic counterpart of this concept would be a sort of amalgam of the ordinary equality relation , the inclusion relation , and the subset relation .

There are however a number of caveats one has to keep in mind, though, when dealing with formulas involving partially specified objects. The first, which has already been mentioned, is a lack of symmetry: does not imply ; similarly, does not imply . The second is that negation behaves very strangely, so much so that one should basically avoid using partially specified notation for any sentence that will eventually get negated. For instance, observe that the statements and are both true, while and are both false. In fact, the negation of such statements as or involving partially specified objects usually cannot be expressed succinctly in partially specified notation, and one must resort to using several quantifiers instead. (In the language of the arithmetic hierarchy, the negation of a sentence is a sentence, rather than an another sentence.)

Another subtlety, already mentioned earlier, arises from our choice to allow different instantiations of the same class to refer to different instances, namely that the law of universal instantiation does not always work if the symbol being instantiated occurs more than once on the left-hand side. For instance, the identity

is of course true for all real numbers , but if one naively substitutes in the partially specified expression for one obtains the claim

which is a false statement under our conventions (because the two instances of the sign do not have to match). However, there is no problem with repeated instantiations on the right-hand side, as long as there is at most a single instance on the left-hand side. For instance, starting with the identity

we can validly instantiate the partially specified expression for to obtain

A common practice that helps avoid these sorts of issues is to keep the partially specified quantities on the right-hand side of one’s relations, or if one is working with a chain of relations such as , to keep the partially specified quantities away from the left-most side (so that , , and are allowed to be multi-valued, but not ). This doesn’t automatically prevent all issues (for instance, one may still be tempted to “cancel” an expression such as that might arise partway through a chain of relations), but it can reduce the chance of accidentally making an error.

One can of course translate any formula that involves partially specified objects into a more orthodox first-order logic sentence by inserting the relevant quantifiers in the appropriate places – but note that the variables used in quantifiers are always completely specified, rather than partially specified. For instance, if one expands “” (for some completely specified quantities ) as “there exists such that “, the quantity is completely specified; it is not the partially specified . (If or were also partially specified, the first-order translation of the expression “” would be more complicated, as it would need more quantifiers.)

One can combine partially specified notation with set builder notation, for instance the set is just the four-element set , since these are indeed the four real numbers for which the formula is true. I would however avoid combining particularly heavy uses of set-theoretic notation with partially specified notation, as it may cause confusion.

Our examples above of partially specified objects have been drawn from number systems, but one can use this notation for other classes of objects as well. For instance, within the class of functions from the reals to themselves, one can make assertions like

where is the class of monotone increasing functions; similarly we have

(with denoting the Fourier transform) and so forth. Or, in the class of topological spaces, we have for instance

and

while conversely the classifying space construction gives (among other things)

Restricting to metric spaces, we have the well known equivalences

Note in the last few examples, we are genuinely working with proper classes now, rather than sets. As the above examples hopefully demonstrate, mathematical sentences involving partially specified objects can align very well with the syntax of informal mathematical English, as long as one takes care to distinguish the asymmetric equality relation from the symmetric equivalence relation .

As an example of how such notation might be integrated into an actual mathematical argument, we prove a simple and well known topological result in this notation:

Proposition 2 Let be a continuous bijection from a compact space to a Hausdorff space . Then is a homeomorphism.

Proof: We have

(since is a bijection)

(since is compact)

(since is continuous)

(since is Hausdorff)

Thus is open, hence is continuous. Since was already continuous, is a homeomorphism.

— 2. Working with parameters —

In order to introduce asymptotic notation, we will need to combine the above conventions for partially specified objects with separate common adjustment to the grammar of mathematical logic, namely the ability to work with ambient parameters. This is a special case of the more general situation of interpreting logic over an elementary topos, but we will not develop the general theory of topoi here. As this adjustment is orthogonal to the adjustments in the preceding section, we shall for simplicity revert back temporarily to the traditional notational conventions for completely specified objects, and not refer to partially specified objects at all in this section.

In the formal language of first-order logic, variables such as are understood to range in various domains of discourse (e.g., could range in the real numbers, could range in the natural numbers, and in the class of sets). One can then construct various formulas, such as , in which involve zero or more input variables (known as free variables), and have a truth value in for any given choice of free variables. For instance, might be true for some triples , and false for others. One can create formulas either by applying relations to various terms (e.g., applying the inequality relation to the terms gives the formula with free variables ), or by combining existing formulas together with logical connectives (such as ) or quantifiers ( and ). Formulas with no free variables (e.g. ) are known as sentences; once one fixes the domains of discourse, sentences are either true or false. We will refer to this first-order logic as orthodox first-order logic, to distinguish it from the parameterised first-order logic we shall shortly introduce.

We now generalise this setup by working relative to some ambient set of parameters – some finite collection of variables that range in some specified sets (or classes) and may be subject to one or more constraints. For instance, one may be working with some natural number parameters with the constraint ; we will keep this particular choice of parameters as a running example for the discussion below. Once one selects these parameters, all other variables under consideration are not just single elements of a given domain of discourse, but rather a family of such elements, parameterised by the given parameters; we will refer to these variables as parameterised variables to distinguish them from the orthodox variables of first-order logic. For instance, with the above parameters, when one refers to a real number , one now refers not just to a single element of , but rather to a function that assigns a real number to each choice of parameters ; we will refer to such a function as a parameterised real, and often write to indicate the dependence on parameters. Each of the ambient parameters can of course be viewed as a parameterised variable, thus for instance can (by abuse of notation) be viewed as the parameterised natural number that maps to for each choice of parameters.

The specific ambient set of parameters, and the constraints on them, tends to vary as one progresses through various stages of a mathematical argument, with these changes being announced by various standard phrases in mathematical English. For instance, if at some point a proof contains a sentence such as “Let be a natural number”, then one is implicitly adding to the set of parameters; if one later states “Let be a natural number such that “, then one is implicitly also adding to the set of parameters and imposing a new constraint . If one divides into cases, e.g., “Suppose now that is odd… now suppose instead that is even”, then the constraint that is odd is temporarily imposed, then replaced with the complementary constraint that is even, then presumably the two cases are combined and the constraint is removed completely. A bit more subtly, parameters can disappear at the conclusion of a portion of an argument (e.g., at the end of a proof of a lemma or proposition in which the parameter was introduced), replaced instead by a summary statement (e.g., “To summarise, we have shown that whenever are natural numbers with , then …”) or by the statement of the lemma or proposition in whose proof the parameter was temporarily introduced. One can also remove a variable from the set of parameters by specialising it to a specific value.

Any term that is well-defined for individual elements of a domain, is also well-defined for parameterised elements of the domain by pointwise evaluation. For instance, if and are parameterised real numbers, one can form the sum , which is another parameterised real number, by the formula

Given a relation between terms involving parameterised variables, we will interpret the relation as being true (for the given choice of parameterised variables) if it holds for all available choices of parameters (obeying all ambient constraints), and false otherwise (i.e., if it fails for at least one choice of parameters). For instance, the relation

would be interpreted as true if one has for all choice of parameters , and false otherwise.

Remark 3 In the framework of nonstandard analysis, the interpretation of truth is slightly different; the above relation would be considered true if the set of parameters for which the relation holds lies in a given (non-principal) ultrafilter. The main reason for doing this is that it allows for a significantly more general transfer principle than the one available in this setup; however we will not discuss the nonstandard analysis framework further here. (Our setup here is closer in spirit to the “cheap” version of nonstandard analysis discussed in this previous post.)

With this convention an annoying subtlety emerges with regard to boolean connectives (conjunction , disjunction , implication , and negation ), in that one now has to distinguish between internal interpretation of the connectives (applying the connectives pointwise for each choice of parameters before quantifying over parameters), and external interpretation (applying the connectives after quantifying over parameters); there is not a general transfer principle from the former to the latter. For instance, the sentence

is false in parameterised logic, since not every choice of parameter is odd. On the other hand, the internal negation

or equivalently

is also false in parameterised logic, since not every choice of parameter is even. To put it another way, the internal disjunction

is true in parameterised logic, but the individual statements and are not (so the external disjunction of these statements is false). To maintain this distinction, I will reserve the boolean symbols () for internal boolean connectives, and reserve the corresponding English connectives (“and”, “or”, “implies”, “not”) for external boolean connectives.

Because of this subtlety, orthodox dichotomies and trichotomies do not automatically transfer over to the parameterised setting. In the orthodox natural numbers, a natural number is either odd or even; but a parameterised natural number could be neither even all the time nor odd all the time. Similarly, given two parameterised real numbers , it could be that none of the statements , , are true all the time. However, one can recover these dichotomies or trichotomies by subdividing the parameter space into cases. For instance, in the latter example, one could divide the parameter space into three regions, one where is always true, one where is always true, and one where is always true. If one can prove a single statement in all three subregions of parameter space, then of course this implies the statement in the original parameter space. So in practice one can still use dichotomies and trichotomies in parameterised logic, so long as one is willing to subdivide the parameter space into cases at various stages of the argument and recombine them later.

There is a similar distinction between internal quantification (quantifying over orthodox variables before quantifying over parameters), and external quantification (quantifying over parameterised variables after quantifying over parameters); we will again maintain this distinction by reserving the symbols for internal quantification and the English phrases “for all” and “there exists” for external quantification. For instance, the assertion

when interpreted in parameterised logic, means that for all parameterised reals and , the assertion holds for all . In this case it is clear that this assertion is true and is in fact equivalent to the orthodox sentence . More generally, we do have a restricted transfer principle in that any true sentence in orthodox logic that involves only quantifiers and no boolean connectives, will transfer over to parameterised logic (at least if one is willing to use the axiom of choice freely, which we will do in this post). We illustrate this (somewhat arbitrarily) with the Lagrange four square theorem

This sentence, true in orthodox logic, implies the parameterised assertion that for every parameterised natural number , there exist parameterised natural numbers , , , , such that for all choice of parameters . To see this, we can Skolemise the four-square theorem (5) to obtain functions , , , such that

for all orthodox natural numbers . Then to obtain the parameterised claim, one simply sets , , , and . Similarly for other sentences that avoid boolean connectives. (There are some further classes of sentences that use boolean connectives in a restricted fashion that can also be transferred, but we will not attempt to give a complete classification of such classes here; in general it is better to work out some examples of transfer by hand to see what can be safely transferred and which ones cannot.)

So far this setup is not significantly increasing the expressiveness of one’s language, because any statement constructed so far in parameterised logic can be quickly translated to an equivalent (and only slightly longer) statement in orthodox logic. However, one gains more expressive power when one allows one or more of the parameterised variables to have a specified type of dependence on the parameters, and in particular depending only on a subset of the parameters. For instance, one could introduce a real number which is an absolute constant in the sense that it does not depend on either of the parameters ; these are a special type of parameterised real, in much the same way that constant functions are a special type of function. Or one could consider a parameterised real that depends on but not on , or a parameterised real that depends on but not on . (One could also place other types of constraints on parameterised quantities, such as continuous or measurable dependence on the parameters, but we will not consider these variants here.)

By quantifying over these restricted classes of parameterised functions, one can now efficiently write down a variety of statements in parameterised logic, of types that actually occur quite frequently in analysis. For instance, we can define a parameterised real to be bounded if there exists an absolute constant such that ; one can of course write this assertion equivalently in orthodox logic as

One can also define the stronger notion of being -bounded, by which we mean , or in orthodox logic

In the opposite direction, we can assert the weaker statement that is bounded in magnitude by a quantity that depends on but not on ; in orthodox logic this becomes

As before, each of the example statements in parameterised logic can be easily translated into a statement in traditional logic. On the other hand, consider the assertion that a parameterised real is expressible as the sum of a quantity depending only on and a quantity depending on . (For instance, the parameterised real would be of this form, but the parameterised real cannot.) Now it becomes significantly harder to translate this statement into first-order logic! One can still do so fairly readily using second-order logic (in which one also is permitted to quantify over operators as well as variables), or by using the language of set theory (so that one can quantify over a set of functions of various forms). Indeed if one is parameterising over proper classes rather than sets, it is even possible to create sentences in parameterised logic that are non-firstorderisable; see this previous blog post for more discussion.

Another subtle distinction that arises once one has parameters is the distinction between “internal” or `parameterised” sets (sets depending on the choice of parameters), and external sets (sets of parameterised objects). For instance, the interval is an internal set – it assigns an orthodox set of reals to each choice of parameters ; elements of this set consist of all the parameterised reals such that for all . On the other hand, the collection of bounded reals – i.e., parameterised reals such that there is a constant for which for all choices of parameters – is not an internal set; it does not arise from taking an orthodox set of reals for each choice of parameters. (Indeed, if it did do so, since every constant real is bounded, each would contain all of , which would make the set of all parameterised reals, rather than just the bounded reals.) To maintain this distinction, we will reserve set builder notation such as for internally defined sets, and use English words (such as “the collection of all bounded parameterised reals”) to denote external sets. In particular, we do not make sense of such expressions as (or , once asymptotic notation is introduced). In general, I would recommend that one avoids combining asymptotic notation with heavy use of set theoretic notation, unless one knows exactly what one is doing.

— 3. Asymptotic notation —

We now simultaneously introduce the two extensions to orthodox first order logic discussed in previous sections. In other words,

  1. We permit the use of partially specified mathematical objects in one’s mathematical statements (and in particular, on either side of an equation or inequality).
  2. We allow all quantities to depend on one or more of the ambient parameters.

In particular, we allow for the use of partially specified mathematical quantities that also depend on one or more of the ambient parameters. This allows us now formally introduce asymptotic notation. There are many variants of this notation, but here is one set of asymptotic conventions that I for one like to use:

Definition 4 (Asymptotic notation) Let be a non-negative quantity (possibly depending on one or more of the ambient parameters).
  • We use to denote a partially specified quantity in the class of quantities (that can depend on one or more of the ambient parameters) that obeys the bound for some absolute constant . More generally, given some ambient parameters , we use to denote a partially specified quantity in the class of quantities that obeys the bound for some constant that can depend on the parameters, but not on the other ambient parameters.
  • We also use or as a synonym for , and as a synonym for . (In some fields of analysis, , , and are used instead of , , and .)
  • If is a parameter and is a limiting value of that parameter (i.e., the parameter space for and both lie in some topological space, with an adherent point of that parameter space), we use to denote a partially specified quantity in the class of quantities (that can depend on as well as the other the ambient parameters) that obeys a bound of the form for all in some neighborhood of , and for some quantity depending only on such that as . More generally, given some further ambient parameters , we use to denote a partially specified quantity in the class of quantities that obey a bound of the form for all in a neighbourhood of (which can also depend on ) where depends on and goes to zero as . (In this more general form, the limit point is now also permitted to depend on the parameters ).
Sometimes (by explicitly declaring one will do so) one suppresses the dependence on one or more of the additional parameters , and/or the asymptotic limit , in order to reduce clutter.

(This is the “non-asymptotic” form of the notation, in which the bounds are assumed to hold for all values of parameters. There is also an “asymptotic” variant of this notation that is commonly used in some fields, in which the bounds in question are only assumed to hold in some neighbourhood of an asymptotic value , but we will not focus on that variant here.)

Thus, for instance, is a free variable taking values in the natural numbers, and are quantities depending on , then the statement denotes the assertion that for all natural numbers , where is another quantity depending on such that for all , and some absolute constant independent of . Similarly, denotes the assertion that for all natural numbers , where is as above.

For a slightly more sophisticated example, consider the statement

where again is a free variable taking values in the natural numbers. Using the conventions for multi-valued expressions, we can translate this expression into first-order logic as the assertion that whenever are quantities depending on such that there exists a constant such that for all natural numbers , and there exists a constant such that for all natural numbers , then we have for all , where is a quantity depending on natural numbers with the property that there exists a constant such that . Note that the first-order translation of (6) is substantially longer than (6) itself; and once one gains familiarity with the big-O notation, (6) can be deciphered much more quickly than its formal first-order translation.

It can be instructive to rewrite some basic notions in analysis in this sort of notation, just to get a slightly different perspective. For instance, if is a function, then:

  • is continuous iff one has for all .
  • is uniformly continuous iff one has for all .
  • A sequence of functions is equicontinuous if one has for all and (note that the implied constant depends on the family , but not on the specific function or on the index ).
  • A sequence of functions is uniformly equicontinuous if one has for all and .
  • is differentiable iff one has for all .
  • Similarly for uniformly differentiable, equidifferentiable, etc..

Remark 5 One can define additional variants of asymptotic notation such as , , or ; see this wikipedia page for some examples. See also the related notion of “sufficiently large” or “sufficiently small”. However, one can usually reformulate such notations in terms of the above-mentioned asymptotic notations with a little bit of rearranging. For instance, the assertion

can be rephrased as an alternative:

When used correctly, asymptotic notation can suppress a lot of distracting quantifiers (“there exists a such that for every one has…”) or temporary notation which is introduced once and then discarded (“where is a constant, not necessarily the same as the constant from the preceding line…”). It is particularly well adapted to situations in which the order of magnitude of a quantity of interest is of more importance than its exact value, and can help capture precisely such intuitive notions as “large”, “small”, “lower order”, “main term”, “error term”, etc.. Furthermore, I find that analytic assertions phrased using asymptotic notation tend to align better with the natural sentence structure of mathematical English than their logical equivalents in other notational conventions (e.g. first-order logic).

On the other hand, the notation can be somewhat confusing to use at first, as expressions involving asymptotic notation do not always obey the familiar laws of mathematical deduction if applied blindly; but the failures can be explained by the changes to orthodox first order logic indicated above. For instance, if is a positive integer (which we will assume to be at least say , in order to make sense of quantities such as ), then

  • (i) (Asymmetry of equality) We have , but it is not true that . In the same spirit, is a true statement, but is a false statement. Similarly for the notation. This of course stems from the asymmetry of the equality relation that arises once one introduces partially specified objects.
  • (ii) (Intransitivity of equality) We have , and , but . This is again stemming from the asymmetry of the equality relation.
  • (iii) (Incompatibility with functional notation) generally refers to a function of , but usually does not refer to a function of (for instance, it is true that ). This is a slightly unfortunate consequence of the overloaded nature of the parentheses symbols in mathematics, but as long as one keeps in mind that and are not function symbols, one can avoid ambiguity.
  • (iv) (Incompatibility with mathematical induction) We have , and more generally for any , but one cannot blindly apply induction and conclude that for all (with viewed as an additional parameter). This is because to induct on an internal parameter such as , one is only allowed to use internal predicates ; the assertion , which also quantifies externally over some implicit constants , is not an internal predicate. However, external induction is still valid, permitting one to conclude that for any fixed (external) , or equivalently that if is now viewed instead as a parameter.
  • (v) (Failure of the law of generalisation) Every specific (or “fixed”) positive integer, such as , is of the form , but the positive integer is not of the form . (Again, this can be fixed by allowing implied constants to depend on the parameter one is generalising over.) Like (iv), this arises from the need to distinguish between external (fixed) variables and internal parameters.
  • (vi) (Failure of the axiom schema of specification) Given a set and a predicate involving elements of , the axiom of specification allows one to use set builder notation to form the set . However, this is no longer possible if involves asymptotic notation. For instance, one cannot form the “set” of bounded real numbers, which somehow manages to contain all fixed numbers such as , but not any unbounded free parameters such as . (But, if one uses the nonstandard analysis formalism, it becomes possible again to form such sets, with the important caveat that such sets are now external sets rather than internal sets. For instance, the external set of bounded nonstandard reals becomes a proper subring of the ring of nonstandard reals.) This failure is again related to the distinction between internal and external predicates.
  • (vii) (Failure of trichotomy) For non-asymptotic real numbers , exactly one of the statements , , hold. As discussed in the previous section, this is not the case for asymptotic quantities: none of the three statements , , or are true, while all three of the statements , , and are true. (This trichotomy can however be restored by using the nonstandard analysis formalism, or (in some cases) by restricting to an appropriate subsequence whenever necessary.)
  • (viii) (Unintuitive interaction with ) Asymptotic notation interacts in strange ways with the symbol, to the extent that combining the two together is not recommended. For instance, the statement is a true statement, because for any expression of order , one can find another expression of order such that for all . Instead of using statements such as in which one of contain asymptotic notation, I would instead recommend using the different statement “it is not the case that “, e.g. “it is not the case that “. And even then, I would generally only use negation of asymptotic statements in order to demonstrate the incorrectness of some particular argument involving asymptotic notation, and not as part of any positive argument involving such notations. These issues are of course related to (vii).
  • (ix) (Failure of cancellation law) We have , but one cannot cancel one of the terms and conclude that . Indeed, is not equal to in general. (For instance, and , but .) More generally, is not in general equal to or even to (although there is an important exception when one of dominates the other). Similarly for the notation. This stems from the care one has to take in the law of substitution when working with partially specified quantities that appear multiple times on the left-hand side.
  • (x) (, do not commute with signed multiplication) If are non-negative, then and . However, these laws do not work if is signed; indeed, as currently defined and do not even make sense. Thus for instance cannot be written as . (However, one does have and when is signed.) This comes from the absolute values present in the -notation. For beginners, I would recommend not placing any signed quantities inside the and symbols if at all possible.
  • (xi) ( need not distribute over summation) For each fixed , , and , but it is not the case that . This example seems to indicate that the assertion is not true, but that is because we have conflated an external (fixed) quantity with an internal parameter (the latter being needed to define the summation ). The more precise statements (with now consistently an internal parameter) are that , and that the assertion is not true, but the assertion is still true (why?).
  • (xii) ( does not distribute over summation, I) Let , then for each fixed one has ; however, . Thus an expression of the form can in fact grow extremely fast in (and in particular is not of the form or even ). Of course, one could replace here by any other growing function of . This is a similar issue to (xi); it shows that the assertion

    can fail, but if one has uniformity in the parameter then things are fine:

  • (xiii) ( does not distribute over summation, II) In the previous example, the summands were not uniformly bounded. If one imposes uniform boundedness, then one now recovers the bound, but one can still lose the bound. For instance, let , then is now uniformly bounded in magnitude by , and for each fixed one has ; however, . Thus, viewing now as a parameter, the expression is bounded by , but not by . (However, one can write since by our conventions the implied decay rates in the summands are uniform in .)
  • (xiv) ( does not distribute over summation, III) If are non-negative quantities, and one has a summation of the form (noting here that the decay rate is not allowed to depend on ), then one can “factor out” the term to write this summation as . However this is far from being true if the sum exhibits significant cancellation. This is most vivid in the case when the sum actually vanishes. For another example, the sum is equal to , despite the fact that uniformly in , and that . For oscillating , the best one can say in general is that

    Similarly for the notation. I see this type of error often among beginner users of asymptotic notation. Again, the general remedy is to avoid putting any signed quantities inside the or notations.

Perhaps the quickest way to develop some basic safeguards is to be aware of certain “red flags” that indicate incorrect, or at least dubious, uses of asymptotic notation, as well as complementary “safety indicators” that give more reassurance that the usage of asymptotic notation is valid. From the above examples, we can construct a small table of such red flags and safety indicators for any expression or argument involving asymptotic notation:

Red flag Safety indicator Signed quantities in RHS Absolute values in RHS Casually using iteration/induction Explicitly allowing bounds to depend on length of iteration/induction Casually summing an unbounded number of terms Keeping number of terms bounded and/or ensuring uniform bounds on each term Casually changing a “fixed” quantity to a “variable” or “bound” one Keeping track of what parameters implied constants depend on Casually establishing lower bounds or asymptotics Establishing upper bounds and/or being careful with signs and absolute values Signed algebraic manipulations (e.g., cancellation law) Unsigned algebraic manipulations Negation of ; or, better still, avoiding negation altogether Swapping LHS and RHS Not swapping LHS and RHS Using trichotomy of order Not using trichotomy of order Set-builder notation Not using set-builder notation (or, in non-standard analysis, distinguishing internal sets from external sets)

When I say here that some mathematical step is performed “casually”, I mean that it is done without any of the additional care that is necessary when this step involves asymptotic notation (that is to say, the step is performed by blindly applying some mathematical law that may be valid for manipulation of non-asymptotic quantities, but can be dangerous when applied to asymptotic ones). It should also be noted that many of these red flags can be disregarded if the portion of the argument containing the red flag is free of asymptotic notation. For instance, one could have an argument that uses asymptotic notation in most places, except at one stage where mathematical induction is used, at which point the argument switches to more traditional notation (using explicit constants rather than implied ones, etc.). This is in fact the opposite of a red flag, as it shows that the author is aware of the potential dangers of combining induction and asymptotic notation. Similarly for the other red flags listed above; for instance, the use of set-builder notation that conspicuously avoids using the asymptotic notation that appears elsewhere in an argument is reassuring rather than suspicious.

If one finds oneself trying to use asymptotic notation in a way that raises one or more of these red flags, I would strongly recommend working out that step as carefully as possible, ideally by writing out both the hypotheses and conclusions of that step in non-asymptotic language (with all quantifiers present and in the correct order), and seeing if one can actually derive the conclusion from the hypothesis by traditional means (i.e., without explicit use of asymptotic notation ). Conversely, if one is reading a paper that uses asymptotic notation in a manner that casually raises several red flags without any apparent attempt to counteract them, one should be particularly skeptical of these portions of the paper.

As a simple example of asymptotic notation in action, we give a proof that convergent sequences also converge in the Césaro sense:

Proposition 6 If is a sequence of real numbers converging to a limit , then the averages also converge to as .

Proof: Since converges to , we have

so in particular for any we have

whenever . For , we thus have

whenever . Setting to grow sufficiently slowly to infinity as (for fixed ), we may simplify this to

for all , and the claim follows.

Kategorije: Matematički blogovi

Announcing an automatic theorem proving project

W.T. Gowers - Čet, 2022-04-28 11:41

I am very happy to say that I have recently received a generous grant from the Astera Institute to set up a small group to work on automatic theorem proving, in the first instance for about three years after which we will take stock and see whether it is worth continuing. This will enable me to take on up to about three PhD students and two postdocs over the next couple of years. I am imagining that two of the PhD students will start next October and that at least one of the postdocs will start as soon as is convenient for them. Before any of these positions are advertised, I welcome any informal expressions of interest: in the first instance you should email me, and maybe I will set up Zoom meetings. (I have no idea what the level of demand is likely to be, so exactly how I respond to emails of this kind will depend on how many of them there are.)

I have privately let a few people know about this, and as a result I know of a handful of people who are already in Cambridge and are keen to participate. So I am expecting the core team working on the project to consist of 6-10 people. But I also plan to work in as open a way as possible, in the hope that people who want to can participate in the project remotely even if they are not part of the group that is based physically in Cambridge. Thus, part of the plan is to report regularly and publicly on what we are thinking about, what problems, both technical and more fundamental, are holding us up, and what progress we make. Also, my plan at this stage is that any software associated with the project will be open source, and that if people want to use ideas generated by the project to incorporate into their own theorem-proving programs, I will very much welcome that.

I have written a 54-page document that explains in considerable detail what the aims and approach of the project will be. I would urge anyone who thinks they might want to apply for one of the positions to read it first — not necessarily every single page, but enough to get a proper understanding of what the project is about. Here I will explain much more briefly what it will be trying to do, and what will set it apart from various other enterprises that are going on at the moment.

In brief, the approach taken will be what is often referred to as a GOFAI approach, where GOFAI stands for “good old-fashioned artificial intelligence”. Roughly speaking, the GOFAI approach to artificial intelligence is to try to understand as well as possible how humans achieve a particular task, and eventually reach a level of understanding that enables one to program a computer to do the same.

As the phrase “old-fashioned” suggests, GOFAI has fallen out of favour in recent years, and some of the reasons for that are good ones. One reason is that after initial optimism, progress with that approach stalled in many domains of AI. Another is that with the rise of machine learning it has become clear that for many tasks, especially pattern-recognition tasks, it is possible to program a computer to do them very well without having a good understanding of how humans do them. For example, we may find it very difficult to write down a set of rules that distinguishes between an array of pixels that represents a dog and an array of pixels that represents a cat, but we can still train a neural network to do the job.

However, while machine learning has made huge strides in many domains, it still has several areas of weakness that are very important when one is doing mathematics. Here are a few of them.

  1. In general, tasks that involve reasoning in an essential way.
  2. Learning to do one task and then using that ability to do another.
  3. Learning based on just a small number of examples.
  4. Common sense reasoning.
  5. Anything that involves genuine understanding (even if it may be hard to give a precise definition of what understanding is) as opposed to sophisticated mimicry.

Obviously, researchers in machine learning are working in all these areas, and there may well be progress over the next few years [in fact, there has been progress on some of these difficulties already of which I was unaware — see some of the comments below], but for the time being there are still significant limitations to what machine learning can do. (Two people who have written very interestingly on these limitations are Melanie Mitchell and François Chollet.)

That said, using machine learning techniques in automatic theorem proving is a very active area of research at the moment. (Two names you might like to look up if you want to find out about this area are Christian Szegedy and Josef Urban.) The project I am starting will not be a machine-learning project, but I think there is plenty of potential for combining machine learning with GOFAI ideas — for example, one might use GOFAI to reduce the options for what the computer will do next to a small set and use machine learning to choose the option out of that small set — so I do not rule out some kind of wider collaboration once the project has got going.

Another area that is thriving at the moment is formalization. Over the last few years, several major theorems and definitions have been fully formalized that would have previously seemed out of reach — examples include Gödel’s theorem, the four-colour theorem, Hales’s proof of the Kepler conjecture, Thompson’s odd-order theorem, and a lemma of Dustin Clausen and Peter Scholze with a proof that was too complicated for them to be able to feel fully confident that it was correct. That last formalization was carried out in Lean by a team led by Johan Commelin, which is part of the more general and exciting Lean group that grew out of Kevin Buzzard‘s decision a few years ago to incorporate Lean formalization into his teaching at Imperial College London.

As with machine learning, I mention formalization in order to contrast it with the project I am announcing here. It may seem slightly negative to focus on what it will not be doing, but I feel it is important, because I do not want to attract applications from people who have an incorrect picture of what they would be doing. Also as with machine learning, I would welcome and even expect collaboration with the Lean group. For us it would be potentially very interesting to make use of the Lean database of results, and it would also be nice (even if not essential) to have output that is formalized using a standard system. And we might be able to contribute to the Lean enterprise by creating code that performs steps automatically that are currently done by hand. A very interesting looking new institute, the Hoskinson Center for Formal Mathematics, has recently been set up with Jeremy Avigad at the helm, which will almost certainly make such collaborations easier.

But now let me turn to the kinds of things I hope this project will do.

Why is mathematics easy?

Ever since Turing, we have known that there is no algorithm that will take as input a mathematical statement and output a proof if the statement has a proof or the words “this statement does not have a proof” otherwise. (If such an algorithm existed, one could feed it statements of the form “algorithm A halts” and the halting problem would be solved.) If , then there is not even a practical algorithm for determining whether a statement has a proof of at most some given length — a brute-force algorithm exists, but takes far too long. Despite this, mathematicians regularly find long and complicated proofs of theorems. How is this possible?

The broad answer is that while the theoretical results just alluded to show that we cannot expect to determine the proof status of arbitrary mathematical statements, that is not what we try to do. Rather, we look at only a tiny fraction of well-formed statements, and the kinds of proofs we find tend to have a lot more structure than is guaranteed by the formal definition of a proof as a sequence of statements, each of which is either an initial assumption or follows in a simple way from earlier statements. (It is interesting to speculate about whether there are, out there, utterly bizarre and idea-free proofs that just happen to establish concise mathematical statements but that will never be discovered because searching for them would take too long.) A good way of thinking about this project is that it will be focused on the following question.

Question. What is it about the proofs that mathematicians actually find that makes them findable in a reasonable amount of time?

Clearly, a good answer to this question would be extremely useful for the purposes of writing automatic theorem proving programs. Equally, any advances in a GOFAI approach to writing automatic theorem proving programs have the potential to feed into an answer to the question. I don’t have strong views about the right balance between the theoretical and practical sides of the project, but I do feel strongly that both sides should be major components of it.

The practical side of the project will, at least to start with, be focused on devising algorithms that find proofs in a way that imitates as closely as possible how humans find them. One important aspect of this is that I will not be satisfied with programs that find proofs after carrying out large searches, even if those searches are small enough to be feasible. More precisely, searches will be strongly discouraged unless human mathematicians would also need to do them. A question that is closely related to the question above is the following, which all researchers in automatic theorem proving have to grapple with.

Question. Humans seem to be able to find proofs with a remarkably small amount of backtracking. How do they prune the search tree to such an extent?

Allowing a program to carry out searches of “silly” options that humans would never do is running away from this absolutely central question.

With Mohan Ganesalingam, Ed Ayers and Bhavik Mehta (but not simultaneously), I have over the years worked on writing theorem-proving programs with as little search as possible. This will provide a starting point for the project. One of the reasons I am excited to have the chance to set up a group is that I have felt for a long time that with more people working on the project, there is a chance of much more rapid progress — I think the progress will scale up more than linearly in the number of people, at least up to a certain size. And if others were involved round the world, I don’t think it is unreasonable to hope that within a few years there could be theorem-proving programs that were genuinely useful — not necessarily at a research level but at least at the level of a first-year undergraduate. (To be useful a program does not have to be able to solve every problem put in front of it: even a program that could solve only fairly easy problems but in a sufficiently human way that it could explain how it came up with its proofs could be a very valuable educational tool.)

A more distant dream is of course to get automatic theorem provers to the point where they can solve genuinely difficult problems. Something else that I would like to see coming out of this project is a serious study of how humans do this. From time to time I have looked at specific proofs that appear to require at a certain point an idea that comes out of nowhere, and after thinking very hard about them I have eventually managed to present a plausible account of how somebody might have had the idea, which I think of as a “justified proof”. I would love it if there were a large collection of such accounts, and I have it in mind as a possible idea to set up (with help) a repository for them, though I would need to think rather hard about how best to do it. One of the difficulties is that whereas there is widespread agreement about what constitutes a proof, there is not such a clear consensus about what constitutes a convincing explanation of where an idea comes from. Another theoretical problem that interests me a lot is the following.

Problem. Come up with a precise definition of a “proof justification”.

Though I do not have a satisfactory definition, very recently I have had some ideas that will I think help to narrow down the search substantially. I am writing these ideas down and hope to make them public soon.

Who am I looking for?

There is much more I could say about the project, but if this whets your appetite, then I refer you to the document where I have said much more about it. For the rest of this post I will say a little bit more about the kind of person I am looking for and how a typical week might be spent.

The most important quality I am looking for in an applicant for a PhD or postdoc associated with this project is a genuine enthusiasm for the GOFAI approach briefly outlined here and explained in more detail in the much longer document. If you read that document and think that that is the kind of work you would love to do and would be capable of doing, then that is a very good sign. Throughout the document I give indications of things that I don’t yet know how to do. If you find yourself immediately drawn into thinking about those problems, which range from small technical problems to big theoretical questions such as the ones mentioned above, then that is even better. And if you are not fully satisfied with a proof unless you can see why it was natural for somebody to think of it, then that is better still.

I would expect a significant proportion of people reading the document to have an instinctive reaction that the way I propose to attack the problems is not the best way, and that surely one should use some other technique — machine learning, large search, the Knuth-Bendix algorithm, a computer algebra package, etc. etc. — instead. If that is your reaction, then the project probably isn’t a good fit for you, as the GOFAI approach is what it is all about.

As far as qualifications are concerned, I think the ideal candidate is somebody with plenty of experience of solving mathematical problems (either challenging undergraduate-level problems for a PhD candidate or research-level problems for a postdoc candidate), and good programming ability. But if I had to choose one of the two, I would pick the former over the latter, provided that I could be convinced that a candidate had a deep understanding of what a well-designed algorithm would look like. (I myself am not a fluent programmer — I have some experience of Haskell and Python and I think a pretty good feel for how to specify an algorithm in a way that makes it implementable by somebody who is a quick coder, and in my collaborations so far have relied on my collaborators to do the coding.) Part of the reason for that is that I hope that if one of the outputs of the group is detailed algorithm designs, then there will be remote participants who would enjoy turning those designs into code.

How will the work be organized?

The core group is meant to be a genuine team rather than simply a group of a few individuals with a common interest in automatic theorem proving. To this end, I plan that the members of the group will meet regularly — I imagine something like twice a week for at least two hours and possibly more — and will keep in close contact, and very likely meet less formally outside those meetings. The purpose of the meetings will be to keep the project appropriately focused. That is not to say that all team members will work on the same narrow aspect of the problem at the same time. However, I think that with a project like this it will be beneficial (i) to share ideas frequently, (ii) to keep thinking strategically about how to get the maximum expected benefit for the effort put in , and (iii) to keep updating our public list of open questions (which will not be open questions in the usual mathematical sense, but questions more like “How should a computer do this?” or “Why is it so obvious to a human mathematician that this would be a silly thing to try?”).

In order to make it easy for people to participate remotely, I think probably we will want to set up a dedicated website where people can post thoughts, links to code, questions, and so on. Some thought will clearly need to go into how best to design such a site, and help may be needed to build it, which if necessary I could pay for. Another possibility would of course be to have Zoom meetings, but whether or not I would want to do that depends somewhat on who ends up participating and how they do so.

Since the early days of Polymath I have become much more conscious that merely stating that a project is open to anybody who wishes to join in does not automatically make it so. For example, whereas I myself am comfortable with publicly suggesting a mathematical idea that turns out on further reflection to be fruitless or even wrong, many people are, for very good reasons, not willing to do so, and those people belong disproportionately to groups that have been historically marginalized from mathematics — which of course is not a coincidence. Because of this, I have not yet decided on the details of how remote participation might work. Maybe part of it could be fully open in the way that Polymath was, but part of it could be more private and carefully moderated. Or perhaps separate groups could be set up that communicated regularly with the Cambridge group. There are many possibilities, but which ones would work best depends on who is interested. If you are interested in the project but would feel excluded by certain modes of participation, then please get in touch with me and we can think about what would work for you.

Kategorije: Matematički blogovi

Higher uniformity of arithmetic functions in short intervals I. All intervals

Terrence Tao - Pon, 2022-04-11 06:49

Kaisa Matomäki, Xuancheng Shao, Joni Teräväinen, and myself have just uploaded to the arXiv our preprint “Higher uniformity of arithmetic functions in short intervals I. All intervals“. This paper investigates the higher order (Gowers) uniformity of standard arithmetic functions in analytic number theory (and specifically, the Möbius function , the von Mangoldt function , and the generalised divisor functions ) in short intervals , where is large and lies in the range for a fixed constant (that one would like to be as small as possible). If we let denote one of the functions , then there is extensive literature on the estimation of short sums

and some literature also on the estimation of exponential sums such as

for a real frequency , where . For applications in the additive combinatorics of such functions , it is also necessary to consider more general correlations, such as polynomial correlations

where is a polynomial of some fixed degree, or more generally

where is a nilmanifold of fixed degree and dimension (and with some control on structure constants), is a polynomial map, and is a Lipschitz function (with some bound on the Lipschitz constant). Indeed, thanks to the inverse theorem for the Gowers uniformity norm, such correlations let one control the Gowers uniformity norm of (possibly after subtracting off some renormalising factor) on such short intervals , which can in turn be used to control other multilinear correlations involving such functions.

Traditionally, asymptotics for such sums are expressed in terms of a “main term” of some arithmetic nature, plus an error term that is estimated in magnitude. For instance, a sum such as would be approximated in terms of a main term that vanished (or is negligible) if is “minor arc”, but would be expressible in terms of something like a Ramanujan sum if was “major arc”, together with an error term. We found it convenient to cancel off such main terms by subtracting an approximant from each of the arithmetic functions and then getting upper bounds on remainder correlations such as

(actually for technical reasons we also allow the variable to be restricted further to a subprogression of , but let us ignore this minor extension for this discussion). There is some flexibility in how to choose these approximants, but we eventually found it convenient to use the following choices.

  • For the Möbius function , we simply set , as per the Möbius pseudorandomness conjecture. (One could choose a more sophisticated approximant in the presence of a Siegel zero, as I did with Joni in this recent paper, but we do not do so here.)
  • For the von Mangoldt function , we eventually went with the Cramér-Granville approximant , where and .
  • For the divisor functions , we used a somewhat complicated-looking approximant for some explicit polynomials , chosen so that and have almost exactly the same sums along arithmetic progressions (see the paper for details).

The objective is then to obtain bounds on sums such as (1) that improve upon the “trivial bound” that one can get with the triangle inequality and standard number theory bounds such as the Brun-Titchmarsh inequality. For and , the Siegel-Walfisz theorem suggests that it is reasonable to expect error terms that have “strongly logarithmic savings” in the sense that they gain a factor of over the trivial bound for any ; for , the Dirichlet hyperbola method suggests instead that one has “power savings” in that one should gain a factor of over the trivial bound for some . In the case of the Möbius function , there is an additional trick (introduced by Matomäki and Teräväinen) that allows one to lower the exponent somewhat at the cost of only obtaining “weakly logarithmic savings” of shape for some small .

Our main estimates on sums of the form (1) work in the following ranges:

  • For , one can obtain strongly logarithmic savings on (1) for , and power savings for .
  • For , one can obtain weakly logarithmic savings for .
  • For , one can obtain power savings for .
  • For , one can obtain power savings for .

Conjecturally, one should be able to obtain power savings in all cases, and lower down to zero, but the ranges of exponents and savings given here seem to be the limit of current methods unless one assumes additional hypotheses, such as GRH. The result for correlation against Fourier phases was established previously by Zhan, and the result for such phases and was established previously by by Matomäki and Teräväinen.

By combining these results with tools from additive combinatorics, one can obtain a number of applications:

  • Direct insertion of our bounds in the recent work of Kanigowski, Lemanczyk, and Radziwill on the prime number theorem on dynamical systems that are analytic skew products gives some improvements in the exponents there.
  • We can obtain a “short interval” version of a multiple ergodic theorem along primes established by Frantzikinakis-Host-Kra and Wooley-Ziegler, in which we average over intervals of the form rather than .
  • We can obtain a “short interval” version of the “linear equations in primes” asymptotics obtained by Ben Green, Tamar Ziegler, and myself in this sequence of papers, where the variables in these equations lie in short intervals rather than long intervals such as .

We now briefly discuss some of the ingredients of proof of our main results. The first step is standard, using combinatorial decompositions (based on the Heath-Brown identity and (for the result) the Ramaré identity) to decompose into more tractable sums of the following types:

  • Type sums, which are basically of the form for some weights of controlled size and some cutoff that is not too large;
  • Type sums, which are basically of the form for some weights , of controlled size and some cutoffs that are not too close to or to ;
  • Type sums, which are basically of the form for some weights of controlled size and some cutoff that is not too large.

The precise ranges of the cutoffs depend on the choice of ; our methods fail once these cutoffs pass a certain threshold, and this is the reason for the exponents being what they are in our main results.

The Type sums involving nilsequences can be treated by methods similar to those in this previous paper of Ben Green and myself; the main innovations are in the treatment of the Type and Type sums.

For the Type sums, one can split into the “abelian” case in which (after some Fourier decomposition) the nilsequence is basically of the form , and the “non-abelian” case in which is non-abelian and exhibits non-trivial oscillation in a central direction. In the abelian case we can adapt arguments of Matomaki and Shao, which uses Cauchy-Schwarz and the equidistribution properties of polynomials to obtain good bounds unless is “major arc” in the sense that it resembles (or “pretends to be”) for some Dirichlet character and some frequency , but in this case one can use classical multiplicative methods to control the correlation. It turns out that the non-abelian case can be treated similarly. After applying Cauchy-Schwarz, one ends up analyzing the equidistribution of the four-variable polynomial sequence

as range in various dyadic intervals. Using the known multidimensional equidistribution theory of polynomial maps in nilmanifolds, one can eventually show in the non-abelian case that this sequence either has enough equidistribution to give cancellation, or else the nilsequence involved can be replaced with one from a lower dimensional nilmanifold, in which case one can apply an induction hypothesis.

For the type sum, a model sum to study is

which one can expand as

We experimented with a number of ways to treat this type of sum (including automorphic form methods, or methods based on the Voronoi formula or van der Corput’s inequality), but somewhat to our surprise, the most efficient approach was an elementary one, in which one uses the Dirichlet approximation theorem to decompose the hyperbolic region into a number of arithmetic progressions, and then uses equidistribution theory to establish cancellation of sequences such as on the majority of these progressions. As it turns out, this strategy works well in the regime unless the nilsequence involved is “major arc”, but the latter case is treatable by existing methods as discussed previously; this is why the exponent for our result can be as low as .

In a sequel to this paper (currently in preparation), we will obtain analogous results for almost all intervals with in the range , in which we will be able to lower all the way to .

Kategorije: Matematički blogovi

Nominations for 2023 Doob prize now open

Terrence Tao - Čet, 2022-04-07 20:47

Just a brief announcement that the AMS is now accepting (until June 30) nominations for the 2023 Joseph L. Doob Prize, which recognizes a single, relatively recent, outstanding research book that makes a seminal contribution to the research literature, reflects the highest standards of research exposition, and promises to have a deep and long-term impact in its area. The book must have been published within the six calendar years preceding the year in which it is nominated. Books may be nominated by members of the Society, by members of the selection committee, by members of AMS editorial committees, or by publishers.  (I am currently on the committee for this prize.)  A list of previous winners may be found here.  The nomination procedure may be found at the bottom of this page.

Kategorije: Matematički blogovi

Web page for the Short Communication Satellite (SCS) of the 2022 virtual ICM now live

Terrence Tao - Sri, 2022-03-30 23:39

Just a brief update to the previous post. Gerhard Paseman and I have now set up a web site for the Short Communication Satellite (SCS) for the virtual International Congress of Mathematicians (ICM), which will be an experimental independent online satellite event in which short communications on topics relevant to one or two of the sections of the ICM can be submitted, reviewed by peers, and (if appropriate for the SCS event) displayed in a virtual “poster room” during the Congress on July 6-14 (which, by the way, has recently released its schedule and list of speakers). Our plan is to open the registration for this event on April 5, and start taking submissions on April 20; we are also currently accepting any expressions of interest in helping out with the event, for instance by serving as a reviewer. For more information about the event, please see the overview page, the guidelines page, and the FAQ page of the web site. As viewers will see, the web site is still somewhat under construction, but will be updated as we move closer to the actual Congress.

The comments section of this post would be a suitable place to ask further questions about this event, or give any additional feedback.

UPDATE: for readers who have difficulty accessing the links above, here are backup copies of the overview page and guidelines page.

Kategorije: Matematički blogovi

Short communication satellite event for the 2022 virtual ICM?

Terrence Tao - Ned, 2022-03-20 19:53

[As with previous posts regarding ICM satellite events, I am authoring this post as an individual, and not in my capacity as chair of the ICM Structure Committee, which does not have any advisory or supervisory role over ICM satellite events – T.]

One of the traditional features of the International Congress of Mathematicians are the “short communications”, organized by the local organizing committee (as opposed to the International Mathematical Union), which allows participants at the congress to present either a poster or a short talk (typically 15 minutes or so) during the congress. For instance, here are the titles of the short communications and posters from the 2018 ICM, and here are the short communications and posters from the 2014 ICM. While not as high profile as other events of the ICM such as the plenary lectures, sectional lectures, or prize lectures, the short communications and posters can offer a chance for academics from a quite diverse range of institutions worldwide (and even a few independent mathematicians) be able to present their work to a mathematical audience.

There has been some volunteer effort to try to replicate some form of this event for the upcoming virtual ICM this July as a semi-official “satellite” event of the virtual ICM; it would technically not be part of the core ICM program, but I expect it would be recognized by the IMU as an independently organized satellite. Due to lack of time, funding, and technical expertise, we will not be able to offer any video, audio, or physical hosting for such an event, but we believe that a modest virtual event is possible involving submission of either a PDF “poster” or a PDF “slide deck”, together with other metadata such as author, title, abstract, and external links (e.g., to an externally hosted video presentation of the poster or slides), with some reviewing to ensure a certain minimum level of quality of approved submissions (we are thinking about setting guidelines similar to those required for a submission to the arXiv), and some ability to offer feedback on each submission. (For instance, we are thinking of hosting the event on a MediaWiki, with each communication being given a separate page which can attract discussion and responses to queries from the author(s).) We are also thinking of grouping the poster or slides according to the 20 sections of the 2022 ICM. We would then promote these communications during the virtual ICM, for instance on this blog or on the unofficial ICM Discord. Perhaps some of the other proposed online experiments for virtual events discussed in this previous post could also be implemented experimentally on this satellite event to demonstrate proof-of-concept. (If the event turns out to be successful, one could hope that it could serve as a pilot project for a longer-term and better funded platform for virtual short communications to accompany other conferences, but for now we would like to focus just on the virtual ICM satellite event.)

As one of our first actions, we would like to survey the level of interest in such an event, both among potential submitters of posters or slides, and also potential volunteers to help organize the event (in particular we may need some assistance in manually reviewing submissions, though we do plan to enlist peer reviewers by requiring submitters to rate and comment on other submissions in the same section). We have therefore created a form to (very unscientifically) gauge this level in order to decide on the scale of this project (or whether to attempt it at all). All readers of this blog are welcome to offer feedback through that form, or as a comment to this blog.

EDIT (Mar 29): a formal announcement will be made soon, but you can view a draft of the announcement here.

Kategorije: Matematički blogovi

2022 ICM satellite coordination group

Terrence Tao - Sub, 2022-03-12 00:23

[Note: while I happen to be the chair of the ICM Structure Committee, I am authoring this blog post as an individual, and not as a representative of that committee or of the IMU, as they do not have jurisdiction over satellite conferences. -T.]

The International Mathematical Union (IMU) has just released some updates on the status of the 2022 International Congress of Mathematicians (ICM), which was discussed in this previous post:

  • The General Assembly will take place in person in Helsinki, Finland, on July 3-4.
  • The IMU award ceremony will be held in the same location on July 5.
  • The ICM will take place virtually (with free participation) during the hours 9:00-18:00 CEST of July 6-14, with talks either live or pre-recorded according to speaker preference.

Due to the limited time and resources available, the core ICM program will be kept to the bare essentials; the lectures will be streamed but without opportunity for questions or other audience feedback. However, the IMU encourages grassroots efforts to supplement the core program with additional satellite activities, both “traditional” and “non-traditional”. Examples of such satellite activities include:

A more updated list of these events can be found here.

I will also mention the second Azat Miftakov Days, which are unaffiliated with the ICM but held concurrently with the beginning of the congress (and the prize ceremony).

Strictly speaking, satellite events are not officially part of the Congress, and not directly subject to IMU oversight; they also receive no funding or support from the IMU, other than sharing of basic logistical information, and recognition of the satellite conferences on the ICM web site. Thus this (very exceptional and sui generis) congress will differ in format from previous congresses, in that many of the features of the congress that traditionally were managed by the local organizing committee will be outsourced this one time to the broader mathematical community in a highly decentralized fashion.

In order to coordinate the various grassroots efforts to establish such satellite events, Alexei Borodin, Martin Hairer, and myself have set up a satellite coordination group to share information and advice on these events. (It should be noted that while Alexei, Martin and myself serve on either the structure committee or the program committee of the ICM, we are acting here as individuals rather than as official representatives of the IMU.) Anyone who is interested in organizing, hosting, or supporting such an event is welcome to ask to join the group (though I should say that most of the discussion concerns boring logistical issues). Readers are also welcome to discuss broader issues concerning satellites, or the congress as a whole, in the comments to this post. I will also use this space to announce details of satellite events as they become available (most are currently still only in the early stages of planning).

Kategorije: Matematički blogovi

Measurable tilings by abelian group actions

Terrence Tao - Pet, 2022-03-04 06:34


Jan Grebik, Rachel Greenfeld, Vaclav Rozhon and I have just uploaded to the arXiv our preprint “Measurable tilings by abelian group actions“. This paper is related to an earlier paper of Rachel Greenfeld and myself concerning tilings of lattices , but now we consider the more general situation of tiling a measure space by a tile shifted by a finite subset of shifts of an abelian group that acts in a measure-preserving (or at least quasi-measure-preserving) fashion on . For instance, could be a torus , could be a positive measure subset of that torus, and could be the group , acting on by translation.


If is a finite subset of with the property that the translates , of partition up to null sets, we write , and refer to this as a measurable tiling of by (with tiling set ). For instance, if is the torus , we can create a measurable tiling with and . Our main results are the following:

  • By modifying arguments from previous papers (including the one with Greenfeld mentioned above), we can establish the following “dilation lemma”: a measurable tiling automatically implies further measurable tilings , whenever is an integer coprime to all primes up to the cardinality of .
  • By averaging the above dilation lemma, we can also establish a “structure theorem” that decomposes the indicator function of into components, each of which are invariant with respect to a certain shift in . We can establish this theorem in the case of measure-preserving actions on probability spaces via the ergodic theorem, but one can also generalize to other settings by using the device of “measurable medial means” (which relates to the concept of a universally measurable set).
  • By applying this structure theorem, we can show that all measurable tilings of the one-dimensional torus are rational, in the sense that lies in a coset of the rationals . This answers a recent conjecture of Conley, Grebik, and Pikhurko; we also give an alternate proof of this conjecture using some previous results of Lagarias and Wang.
  • For tilings of higher-dimensional tori, the tiling need not be rational. However, we can show that we can “slide” the tiling to be rational by giving each translate of a “velocity” , and for every time , the translates still form a partition of modulo null sets, and at time the tiling becomes rational. In particular, if a set can tile a torus in an irrational fashion, then it must also be able to tile the torus in a rational fashion.
  • In the two-dimensional case one can arrange matters so that all the velocities are parallel. If we furthermore assume that the tile is connected, we can also show that the union of all the translates with a common velocity form a -invariant subset of the torus.
  • Finally, we show that tilings of a finitely generated discrete group , with a finite group, cannot be constructed in a “local” fashion (we formalize this probabilistically using the notion of a “factor of iid process”) unless the tile is contained in a single coset of . (Nonabelian local tilings, for instance of the sphere by rotations, are of interest due to connections with the Banach-Tarski paradox; see the aforementioned paper of Conley, Grebik, and Pikhurko. Unfortunately, our methods seem to break down completely in the nonabelian case.)
Kategorije: Matematički blogovi

Resources for displaced mathematicians

Terrence Tao - Sri, 2022-03-02 23:37

In this post I would like to collect a list of resources that are available to mathematicians displaced by conflict. Here are some general resources:

There are also resources specific to the current crisis:

Finally, there are a number of institutes and departments who are willing to extend visiting or adjunct positions to such displaced mathematicians:

If readers have other such resources to contribute (or to update the ones already listed), please do so in the comments and I will modify the above lists as appropriate.

As with the previous post, any purely political comment not focused on such resources will be considered off-topic and thus subject to deletion.

Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorems no. 247-248
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorems no. 249-251
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorem no. 228, 229, 230, by R.A. Fisher, Poncelet and Ore
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorem no. 231, 232, 233, 234
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorem no. 235, 236, 237
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorem no. 238 and 239
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorem no. 240 and 241
Kategorije: Matematički blogovi

New Theorem

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has a 'new acquisition': Theorem no. 242, the Polya-Redfield Enumeration Theorem
Kategorije: Matematički blogovi

New Theorems

Theorem of the Day - Sri, 2019-11-20 17:40
Theorem of the Day has 'new acquisitions': Theorems no. 243-246
Kategorije: Matematički blogovi