Sakupljač feed-ova
A recent experience with ChatGPT 5.5 Pro
We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models. I have just made a fairly large revision as a result of ChatGPT 5.5 Pro, to which I am fortunate to have been given access, producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me.
The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results. But little by little the laughter has become quieter. The message I am getting from what other mathematicians more involved in this enterprise have been saying is that LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed (that reason sometimes, but not always, being that the problem has not received all that much attention), then there is a good chance that the LLMs will spot it. Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.
I decided to try something a little bit different. At least in combinatorics, there are quite a lot of papers that investigate some relatively new combinatorial parameter that leads naturally to several questions. Because of the sheer number of questions one can ask, the authors of such papers will not necessarily have the time to spend a week or two thinking about each one, so there is a decent probability that at least some of them will not be all that hard. This makes such papers very valuable as sources of problems for mathematicians who are doing research for the first time and who will be hugely encouraged by solving a problem that was officially open. Or rather, it used to make them valuable in that way, but it looks as though the bar has just been raised. It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.
In any case, a little over a week ago I decided to see how ChatGPT 5.5 Pro would fare with a selection of problems asked by Mel Nathanson in a paper entitled Diversity, Equity and Inclusion for Problems in Additive Number Theory. Nathanson has a remarkable record of being interested in problems and theorems that have later become extremely fashionable, which has led him to write a series of extremely well timed and therefore highly influential textbooks. In this paper, he argues for the interest of several other problems, some of which I will now briefly describe.
If is a set of integers, then its sumset is defined to be . For a positive integer , the –fold sumset, denoted , is defined to be . Nathanson is interested in the possible sizes of given the size of . To that end one can define a set to be the set of all such that there exists a set with and .
An obvious first question to ask is simply “What is ?” When , the answer is the set of all integers between and . It is an easy exercise to show that if , then , so this result is saying that all sizes in between can be realized. However, it is not true in general that can take every size between its minimum and maximum possibilities, and we do not currently have a complete description of .
Another natural question one can ask, and this is where ChatGPT came in, is how large a diameter you need if you want a set with and having prescribed sizes. (Of course, the size of must belong to .) Nathanson showed that for every there is a subset of with and , and asked whether the bound could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is clearly best possible. It wrote up its argument in a slightly rambling LLM-ish style, so I asked if it could write the argument up as a LaTeX file in the style of a typical mathematical preprint. After two minutes and 23 seconds it gave me that, after which I spent some time convincing myself that the argument was correct.
The basic idea behind both Nathanson’s argument and ChatGPT’s was that in order to obtain a set of a given size with a sumset of a given size, it is useful to build it out of a Sidon set, which means a set with sumset of maximal size (that is not quite the usual definition but it is the simplest to use in this discussion), and an arithmetic progression. Also, for a bit of fine tuning one can take an additional point near the arithmetic progression. Then if one plays around with the various parameters, one finds that one can obtain sets of all the sizes one wants. Nathanson doesn’t express his argument this way (it is Theorem 5 of this paper), instead giving an inductive argument, but I think, without having checked too carefully, that if one unravels his argument, one finds that effectively that is what he ends up with, and the Sidon set in question consists of powers of 2. ChatGPT obtained its improvement by simply using a more efficient Sidon set — it is well known that one can find Sidon sets of quadratic diameter. (One might ask why Nathanson didn’t do that in the first place: I think it is because the obvious idea of using a more efficient Sidon set becomes obvious only after one has redescribed his inductive construction. Is that what ChatGPT did? It is very hard to say.)
Next, I asked ChatGPT to see whether it could do the same for a closely related question, where instead of looking at the size of the sumset, one looks at the size of the restricted sumset, which is defined to be . Unsurprisingly, it was able to do that with no trouble at all. I got it to write both results up in a single note, to avoid a certain amount of duplication. If you are curious, you can see the note here.
I then asked what it could do for general . I was much less optimistic that it would manage to do anything interesting, because the proof for makes fundamental use of the fact (due to Erdős and Szemerédi) that we know exactly which sizes we need to create. If we don’t know what the set is, then it seems that we are forced to start with a hypothetical set with and and build out of it a set of small diameter with the same property. As it happens, I still don’t know how to get round that difficulty (I’m mentioning that just to demonstrate that my mathematical input was zero, and I didn’t even do anything clever with the prompts), but Nathanson mentioned in his paper a remarkable paper of Isaac Rajagopal, a student at MIT, who must have got round the difficulty somehow, because he had managed to prove an exponential dependence of on for each fixed .
I’ll leave the previous paragraph there, but Isaac has subsequently explained to me that that isn’t really the difficulty. His argument gives a complete description of when is sufficiently large, and if one wants to prove a polynomial dependence for fixed , then assuming that is sufficiently large is clearly permitted. The real difficulty is that constructing the sets with given sumset sizes was significantly more complicated, and necessarily so because the degree of the polynomial grows with , and one therefore needs more and more parameters to define the sets.
In any case, the task faced by ChatGPT was not to solve the problem from scratch, but to see whether it was possible to tighten up Isaac Rajagopal’s argument. Here’s what happened.
- After 16 minutes and 41 seconds, it came back with an argument that claimed to have improved the upper bound from exponential in to exponential in for any .
- I asked it to write that in preprint form too, which took it a further 47 minutes and 39 seconds.
- That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
- Both ChatGPT and Rajagopal speculated a little on what might need to be done to push things further and get a polynomial bound, so I got greedy and asked ChatGPT to give that a go.
- After 13 minutes and 33 seconds it told me it felt optimistic about the existence of such an argument but there were a couple of technical statements that needed checking.
- I asked it to check them.
- After 9 minutes and 12 seconds it got back to me with the check having been done, so I asked for this too to be written in preprint form.
- After 31 minutes and 40 seconds the “preprint” was ready. Here it is.
- Isaac Rajagopal looked at it and declared it to be almost certainly correct. It was clear that he meant this not just at a line-by-line level but at the level of ideas.
Isaac made some very interesting remarks about the nature of what the additional ideas were that ChatGPT contributed. Since, as I have already said, my mathematical input was zero, I invited him to write a guest section to this post. Just before we get to that, I want to raise a question (that will undoubtedly have been raised by others as well), which is simple: what should we do with this kind of content? Had the result been produced by a human mathematician, it would definitely have been publishable, so I think it would be wrong to describe it as AI slop. On the other hand, it seems pointless even to think about putting it in a journal, since it can be made freely available, and nobody needs “credit” for it (except that Isaac deserves plenty of credit for creating the framework on which ChatGPT could build). I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live. But various decisions would need to be made about how it was organized. I myself think that one would probably want to have some kind of moderation process, so that results would be included only if a human mathematician was prepared to certify that they were correct — or, better still, that they had been formalized by a proof assistant — and perhaps also that they answered a question that had been asked in a human-written paper. On the other hand, I wouldn’t want a moderation process that created vast amounts of work (unless the work was itself done by AI, but there are obvious dangers in going down that route). Anyway, until these questions are answered, this result is available from the link above, and perhaps, now that LLMs are so good at literature search, that will be enough to make it findable by anyone who wants to know whether Nathanson’s problem has been solved.
Isaac’s evaluation of what ChatGPT achievedWith just a few prompts, ChatGPT was able to improve the upper bound on (which I will define very soon) from exponential in to polynomial in . While its first improvement of the bound, from exponential in to exponential in , was a routine modification of my work, the improvement to polynomial in is quite impressive. To do this, ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof. My goal is to explain that idea, in a manner that will be digestible to my friends who are computer science majors as well as my math major friends.
The problem of bounding is closely related to a problem I worked on at the Duluth REU (Research Experience for Undergrads) program, of determining . In particular, is the set of possible -fold sumset sizes , where can be chosen to be any set of integers. is the minimal such that we can achieve all of the values of using -element sets . I spent last summer explicitly characterizing the set for large , by constructing sets such that achieves all sizes which I could not rule out as impossible. So, can be upper-bounded by optimizing my constructions.
I constructed these sets by combining smaller component sets which are simpler to analyze. Some of these components are the geometric series
for various values of and . Unfortunately, the elements of and are exponentially large in terms of . So, I asked ChatGPT (through Tim) whether there exist sets of elements which have similar sumset sizes to these geometric series, but contain only numbers of polynomial size in : I had no idea if this was possible, or how to begin constructing such sets. ChatGPT came back with an answer, constructing sets and which behave like “half a geometric series squeezed into a polynomial interval,” which is counterintuitive. Before I discuss the construction of and , I will explain the important properties of the sumset sizes of and which they recreate.
For , a set is called a set if the only solutions to
with in are the “trivial” solutions, by which I mean that one side of the equation is a reordering of the other side. If is a set of size , then elements of correspond exactly to choices of elements of , with repetition allowed. Using “stars and bars,” one can see that and this is the maximum possible value of among sets of size . So, another definition is that is a set if . Sidon sets, which Tim discussed, are exactly sets.
To make things more concrete, let us assume that in (1). Then, is a set, but it is not a set because of the relations
for any choice of in . In particular, , as these relations are the only ones preventing from being a set. lacks the relations in (2) because is not in . So, is a set, but it is not a set because of the relations
for any choices of in . This gives relations, and one can check that . To summarize, we have seen that
(a) is a set.
(b) is a linear function of .
(c) is a set.
(d) is a quadratic function of .
ChatGPT was able to find sets and of elements which satisfy (a)-(d), but whose elements all have polynomial size in . The construction of and uses -dissociated sets, which are sets where the only solutions to
with and in are the “trivial” solutions, i.e. and one side of the equation is a reordering of the other side. For , it is possible to construct an -dissociated set , where is approximately , and in particular polynomial in . Constructions of such a using finite fields date back to Singer (1938) and Bose–Chowla (1963) and are described in Appendix 1. Define
and
In hindsight, I have good intuition for the construction of and . All of the relations in (2) and (3) are formed by combining one or two relations of the form . There are approximately relations of the form in and , and approximately such relations in and . There are few other low-order relations in and , and similarly in and because is -dissociated. So, and manage to contain half as many -relations as their geometric series counterparts, while also containing few low-order relations.
We now see why (a)-(d) hold with and replaced by and , respectively. For concreteness, we assume that and , so contains no nontrivial relations as in (4) with . Then, is a set, but it is not a set because of the relations
for any choice of in . If we let , we can check that is linear in . In particular, (a) and (b) hold with replaced by , and the linear function replaced by . We can also see that is a set, but it is not a set because of the relations
for any in . If we let , we can check that is quadratic in . In a similar manner, (c) and (d) hold with replaced by , and the quadratic function replaced by .
Even though I can motivate it in retrospect, ChatGPT’s idea to use -dissociated sets to control relations of order at most feels quite ingenious. As far as I can tell, this idea is completely original.
ChatGPT’s proof that its construction produces the desired values of is very similar to my proof that the sets which I construct achieve all possible values of , after replacing and by and , respectively. Properties (a)-(d) capture many of the important properties of and (or and ) which are used in this proof. The final constructions involve combining the sets and (or and in my paper) for each value of between and with another set which is the union of an arithmetic progression and a point. Intuitively, and (or and ) have large sumsets, while arithmetic progressions have small sumsets, so it is plausible that one could get sets which achieve all the medium-sized sumsets by combining them. However, the proof of this is quite involved, and it occupies Section 4 of my paper and the entirety of the ChatGPT preprint. In Appendix 2, I work out the details of the ChatGPT construction to show that for sufficiently large,
For comparison, it is easy to see that is at least on the order of , and it is unknown what the real value is. In Appendix 3, I give details of the correspondence between my paper and the ChatGPT preprint, which will be helpful for those who want to read either.
Finally, I want to express my deep gratitude to Tim for allowing me to contribute to this blog. I am still stunned by the coincidence that the problem he chose to put into ChatGPT 5.5 Pro led him to my paper on the arXiv.
Tim on what this means for mathematical researchI would judge the level of the result that ChatGPT found in under two hours to be that of a perfectly reasonable chapter in a combinatorics PhD. It wouldn’t be considered an amazing result, since it leant very heavily on Isaac’s ideas, but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper, looking for places where it might not be optimal, familiarizing oneself with various algebraic techniques that he used, and so on.
It seems to me that training beginning PhD students to do research, which has always been hard (unless one is lucky enough, as I have often been, to have a student who just seems to get it and therefore doesn’t need in any sense to be trained), has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.
I would qualify that statement in two ways though. First, there is the obvious point that a beginning PhD student has the option of using LLMs. So the task is potentially easier than proving something that LLMs can’t prove: it is proving something in collaboration with LLMs that LLMs cannot manage on their own. I have done quite a lot of such collaboration recently and found that LLMs have made useful contributions without (yet) having game-changing ideas.
A second point is that I don’t know how much of what I have said generalizes to other areas of mathematics. Combinatorics tends to be quite focused on problems: you start with a question and you reason back from the question or if you reason forwards you do so very much with the question in mind. In other areas there can be much more of an emphasis on forwards reasoning: you start with a circle of ideas and see where it leads. To do it successfully, you need to have some way of discriminating between interesting observations and uninteresting ones, and it isn’t obvious to me what LLMs would be like at that.
Of course, everything I am saying concerns LLMs as they are right now. But they are developing so fast that it seems almost certain that my comments will go out of date in a matter of months. It is also almost certain that these developments will have a profoundly disruptive effect on how we go about mathematical research, and especially on how we introduce newcomers to it. Somebody starting a PhD next academic year will be finishing it in 2029 at the earliest, and my guess is that by then what it means to undertake research in mathematics will have changed out of all recognition.
I sometimes get emails from people who are interested in doing mathematical research but are not sure whether that makes sense any more as an aspiration. I have a view on that question, but it may very well change in response to further developments. That view is that there is still a great deal of value in struggling with a mathematics problem, but that the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end. So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody. Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.
So what is the point of struggling with a difficult mathematics problem? One answer is that it can be very satisfying to solve a problem even if the answer is already known, but I don’t think that is a sufficient reason to spend several years of your life on this peculiar activity. A better answer is that by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders, or people who have a solid grasp of how to do basic arithmetic are likely to be more skilled at using calculators (and especially at noticing when an answer feels off). Mathematics is a highly transferable skill, and that applies to research-level mathematics as well. By doing research in mathematics, you may not get the same rewards as your equivalents a generation ago, but there is a good chance that you will be equipping yourself very well for the world we are about to experience.
Appendix 1 (Isaac)We will construct an -dissociated set , where is approximately . This construction is a very minor modification of Bose–Chowla (1963)’s construction of a set, which I learned about from this paper. For whatever reason, the GPT preprint (Lemma 3.1) uses a different, less efficient construction using moment curves.
Let be a prime, let , let be the finite field with elements and fix a generator of , so that is equal to . Define a set of elements
Then, each element corresponds to a unique value of , by taking . Now an additive relation of the form in (4) with can be reframed by taking powers of as
As is a degree- extension of and is a generator of as an -extension, this means that does not satisfy any nonzero polynomials in of degree . So, both sides of (6) are identical as polynomials in and thus the additive relation in (4) is trivial. So, is -dissociated, and of course one can prune a few elements to reduce to size .
Appendix 2 (Isaac)Fix constants such that (in my paper I arbitrarily chose ). Let the two sets in (5) be called and . Let denote the set of integers satisfying . Similarly to my paper, the constructions of such that achieves the desired sizes will combine sets of the following four types:
- with choices of and .
- for each value of , with choices of .
- for each value of , with choices of .
- A set of the correct size so that .
One reason that this construction needs to be complicated is that we need to create at least many sets. To do this, we vary parameters and in the domain and parameters and in the domain . We can choose to be slightly bigger than , and then the above construction gives us different sets where can be made arbitrarily small. So, if we were to remove any of the above parameters from the construction, and not change the others, this construction would no longer create many sets. In comparison, Nathanson’s construction when only needs to create sets. He does this by combining a Sidon set, an arithmetic progression, and one extra value, and varying the size of the arithmetic progression and the extra value in ranges of size .
We want to combine sets , which are given by , for the values of , for the values of , and a set. By Appendix 1, for all , there exists a -dissociated set of diameter . By the constructions of and , we can take each , where . Let have basis vectors . To combine , we can define as
Similarly to my Lemma 4.9, this construction ensures that the generating function product holds, which is the identity that both my paper and the GPT preprint use (see either paper for a definition of these generating functions). By (the standard) Lemma 2.3 of the GPT preprint, is Freiman-isomorphic of order to a subset of . Therefore, for sufficiently large (the whole construction relies on this for the same reasons as in my paper),
Appendix 3 (Isaac)In Section 4.2 of my paper, I use a different, simpler construction to construct sets achieving the values in which have , for some small . These sets are subsets of , meaning that all elements have polynomial size in . This is observed in Section 5 of the GPT preprint.
Section 4.3 of my paper carries out the construction which combines many components including and . This corresponds to Sections 2, 3, 4, and 6 of the GPT preprint. This section has a lot of moving parts; I give an outline in Section 4.3.1.
In Section 4.3.2, I describe how the different components will be combined, using a construction which I call the disjoint union, and introduce generating functions as a bookkeeping tool to keep track of the sumset sizes of a set . This corresponds to Section 2 and Section 4 of the GPT preprint.
In Section 4.3.3, I compute the generating function of each of the component sets, including (Lemma 4.15) and (Lemma 4.17). This corresponds to Section 3 and Section 6.1 of the GPT preprint. In particular, is computed in Lemma 3.3 and is computed in Lemma 3.4. Once these generating functions have been computed, the remainder of the proof is almost identical in my paper and in the GPT preprint.
In Section 4.3.4, I put all the pieces together to show that as we range over the sets which I have constructed, the values of will assume all of the elements of . The key idea is to show that the set of all values of forms an interval, and contains numbers both smaller than and equal to .
Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond
Boris Alexeev, Kevin Barreto, Yanyang Li, Jared Duker Lichtman, Liam Price, Jibran Iqbal Shah, Quanyu Tang, and I have just uploaded to the arXiv our paper Primitive sets and von Mangoldt chains: Erdős Problem #1196 and beyond. This paper (which is a work in progress) represents our efforts to digest and document the recent flurry of developments around the following problem of Erdős, Sárközy, and Szemerédi on primitive sets:
Conjecture 1 (Erdős problem #1196) Suppose that is a primitive set of integers, which means that no element of divides another. Then as .
One can show that the upper bound of is best possible up to the error by taking to be the set of products of primes for some suitable parameter . This was one of the most well-known open problems in the study of primitive sets, and had attracted some number of partial results (for instance, Lichtman was able to show the upper bound of ). It was thus notable that this problem was first solved by an autonomous AI query (by the fifth author) a few weeks ago. This solution introduced a proof technique – based on Markov chains in the divisibility poset – which in retrospect is very natural for controlling primitive sets, but which had not been explicitly used in previous literature, though in retrospect many of the arguments in that literature involved a specific Markov chain which we call the downwards Mertens chain. The proof instead revolved around a different Markov chain, which we call the downwards von Mangoldt chain, which manages to neatly avoid the “” type losses in the previous Mertens-based arguments, and resolve Conjecture 1. In this paper we develop the Markov chain approach more systematically, and show that it settles several further conjectures concerning primitive sets, and also provides simpler proofs of some previous results in the literature. More precisely, in addition to Conjecture 1, we establish the following:
Theorem 2 (Erdős primitive set conjecture, #164) For any primitive consisting of numbers greater than ,
Theorem 3 (Odd Banks–Martin) Let and suppose is a primitive set consisting of odd numbers with at most prime factors. Then
where denotes the primes appearing as factors of elements of , and is the collection of products of primes from .
Theorem 4 ( is Erdős-strong) If is a primitive set consisting of even numbers, then
Theorem 5 (Ahlswede–Khachatrian–Sárközy) If is a primitive set, then whenever .
Theorem 6 (Erdős–Sárközy–Szemerédi, #1217) Let be such that the upper doubly logarithmic density is positive. Then there exists a strictly increasing infinite divisibility chain in such that
Theorem 2 and Theorem 5 had been previously established by Lichtman and Ahlswede–Khachatrian–Sárközy respectively, but the Markov chain formalism gives shorter (and more unified) proofs of both. Theorems 3, 4, 6 were open conjectures that can now be settled by this method. These results were obtained with varying levels of AI involvement, ranging from completely autonomous AI queries to traditional pen-and-paper calculations, to various hybrid approaches (for instance, with humans suggesting key inequalities that could then be rapidly tested numerically or even proved by various AI tools).
— 1. Chain/antichain duality and Markov chains —
I’ll now discuss the basic method of proof and try to motivate the main ideas, which have become much clearer in retrospect. Primitive sets can be viewed as antichains in the divisibility poset , in which the partial ordering is given by the divisibility relation . So, one can pose the following more abstract question: given a general poset and a weight function , what is the maximal value of as ranges over all antichains in ?
One can attack this problem using the well known duality between antichains and chains (totally ordered subsets of ): every antichain and chain can meet in at most one point, thus one has
for any chain and any antichain . In particular, if one has a measure on the space of all chains (viewed as a compact subspace of the power set of , equipped with the product topology) with the property that for all , then by integrating the previous inequality against and using Tonelli’s theorem one would obtain the upper boundIn fact this duality is completely tight:
Proposition 7 (Chain/antichain duality) Let be a poset, let be a weight function, and let . Then the following are equivalent:
- (i) for all antichains .
- (ii) There exists a measure of total mass at most on the space of chains, such that (1) holds for all .
Proof: We have already indicated how (ii) implies (i). Now we need to show that (i) implies (ii). A standard compactness argument allows us to reduce to the case when is finite. If (i) holds, then we also have
for all in the Stanley chain polytope, defined as the convex hull of the indicator functions of antichains. By a classic result of Stanley, this polytope can also be defined as the space of all obeying the inequalities and Applying linear programming duality (or the Farkas lemma), we conclude that the inequality (2) must be a non-negative linear combination of the inequalities (3), (4) (as well as the trivial inequality ). Equivalently, we can find non-negative weights for each chain such that and for all . The claim follows by viewing as a measure on the space of chains.Thus, the “universal” problem of obtaining a uniform upper bound on for all antichains is replaced with the equivalent “existential” dual problem of exhibiting a single measure on chains of controlled mass, which hits each element of the poset with a mass of at least the original weight . Thus, such problems are now reduced to that of finding a sufficiently clever construction of such a measure . If the mass of was normalized to equal , this becomes a probability problem: find a random chain process in the poset that hits each element of the poset with a sufficiently high probability. (Though in our paper we found it more convenient for technical reasons to not normalize the measure, and allow the mass to take values other than .)
It turns out (in a manner that was not explicitly appreciated in past literature) that particularly good choices of random chain to use here can come from Markov chains. (Here, the term “chain” is now being used in two different ways, but fortunately the order-theoretic concept of a chain and the Markov process-theoretic concept of a chain will be quite compatible in this discussion.) There will be two types of Markov chains on the poset that will be relevant: downwards Markov chains and upwards Markov chains. Here is our notation for a downwards Markov chain:
Definition 8 (Downwards Markov chain) Let be a poset, and suppose we designate some subset of to be the “absorbing states” (in practice these will be the minimal elements of , although they do not have to be). A downwards Markov chain on with absorbing states is a collection of transition probabilities for obeying the following axioms:
- (i) , with equality unless and , or if and .
- (ii) For any , one has .
Given such a downwards Markov chain and an initial state , one can generate a random decreasing sequence by having each transition to with probability after conditioning on the past history of the chain. This sequence will (almost surely) be strictly decreasing until it hits an absorbing state, in which case it stays there forever, although if the descending chain condition is not satisfied it is also possible for the sequence to be strictly decreasing indefinitely. We let denote the law of this decreasing sequence. This construction is already enough to recover the Lubell-Yamomoto-Meshalkin (LYM) inequality:
Theorem 9 (LYM inequality) If is the power set of with the inclusion partial ordering, and is an antichain in this poset (i.e., a Sperner family), then
Proof: We introduce the downwards Markov chain with absorbing state in which each non-empty subset of of some cardinality transitions to a -element subset chosen uniformly at random (i.e., with probability of transitioning to each). If we start the descending sequence from the maximal element of , then one can easily check that each is hit with probability . Applying Proposition 7 with , , and , we obtain the claim.
In the above argument we fixed the initial location of the Markov chain, but more generally one can start with any source mass and work with the measure for the purposes of applying Proposition 7.
One can also define upwards Markov chains in exact analogy with downwards Markov chains (reversing the order in the poset), which now generate random increasing sequences rather than decreasing sequences. There is a useful adjoint construction that can convert a downwards Markov chain into an upwards Markov chain: if we have a positive weight which is invariant under the chain in the sense that
for any , then we can define an adjoint upwards Markov chain (with no absorbing state) by the formula for any in . More generally, if is merely sub-invariant in the sense that for all , one can still construct an adjoint upwards chain as before, but now one must also add an additional absorbing maximal state to ensure that the transition probabilities still sum to one.A downwards or upwards Markov chain, when equipped with an invariant or sub-invariant measure, also induces a flow network on the poset, in which an edge from to is assigned a flow capacity of
One can rewrite the Markov chain arguments in the paper in terms of such flow networks, in which case the arguments often boil down to an application of the discrete divergence theorem, giving very short proofs of many of the above results; see the paper for more discussion. However, we chose to focus more on the Markov chain approach in our presentation, as this formalism is also natural and could potentially be more flexible for further applications.
— 2. The Mertens and von Mangoldt chains —
For the purpose of analyzing primitive sets, there are two downward chains on the natural numbers (with absorbing state )that, in retrospect, are particularly natural to use:
- The downwards Mertens chain, in which each transitions deterministically to , where is the largest prime factor of ; and
- the downwards von Mangoldt chain, in which each transitions to with probability for each dividing , with the von Mangoldt function.
The von Mangoldt weight is a natural choice here thanks to the fundamental identity
which encodes the fundamental theorem of arithmetic. The two chains are similar in many way to each other: the von Mangoldt process favors the division by the largest prime factor, but does not require it.The Mertens chain generates deterministic downward divisibility chains
starting from a product of primes , and as such this process was implicit in much of the previous literature on primitive sets. However, it does not quite interact well with the weight, which is not invariant or subinvariant with respect to this process. Intuitively, the process of dividing by tends to increasingly select for numbers for which is smaller than expected. Instead, the natural invariant measure for this chain is the Mertens weight the verification that this is indeed an invariant weight is a nice exercise in telescoping series. Taking the adjoint of the downwards Mertens chain with respect to this weight and running that chain from gives the upwards Mertens divisibility chain in which each transitions to for some prime with probability . A routine induction shows that each is hit by this chain with a probability of ; this for instance gives a weak version of Theorem 1, and similarly for the other results discussed above.The key innovation (which was uncovered by the AI-assisted proofs, though not quite in the notation and framework presented here) is to switch to the von Mangoldt chain, which removes the bias towards numbers whose largest prime factor is small. Indeed, the weight now turns out to be sub-invariant (after removing ) under this chain, and there is a modification
of this weight which turns out to be perfectly invariant. (We have an interpretation of this formula in terms of a zeta process that couples together various zeta distributions into a continuous divisibility chain; see the paper for further details.) Taking the adjoint with respect to this weight (or with the original weight ) can eliminate the loss in the previous argument, and give one of the proofs of Theorem 1 recorded in our paper (there are several other variants of this method that we also present).One slight defect of the von Mangoldt chain, as compared to the Mertens chain, is that it can “jump over” primitive sets (such as the set of products of primes) due to the fact that it will sometimes multiply or divide by a power of a prime rather than a prime itself. This turns out to be a technical difficulty for many of our applications, resulting in a need to make various small ad hoc modifications to the von Mangoldt chain to eliminate this type of jump.
In order to establish some crucial sub-invariance properties, it turns out (after some standard manipulations) to be useful to obtain good bounds on the negative log-derivative
of the Riemann zeta function in the region . Here it turns out that there is a clean (and rather efficient) upper bound which is equivalent to the non-decreasing nature of the Dirichlet eta function in this region. There are many proofs of this fact in the literature, but I would like to record a particularly cute proof, that is in the spirit of other arguments in the paper: one can interpret probabilistically as where is a gamma random variable of shape and scale . Because the sum of independent copies of and has the distribution of , one can couple together all the so that they are increasing in , at which point the claim follows.
— 3. Further directions —
Will Sawin and Ofir Gorodetsky have obtained analogues of several of the above results for function field models or permutation models respectively; we briefly discuss these in the paper, although we do not plan to cover these models in depth. We also note another recent use of this technique to solve a separate Erdős problem (#858) relating to antichains in a variant of the divisibility poset.
The zeta process that we have discovered hints at an emerging theory of the “developmental anatomy of integers”, which differs from the existing topic of anatomy of integers in that it views a large integer (and its prime factor “organs”) not as a static entity, but rather as an evolving process in which primes (or powers of primes, in the case of the von Mangoldt process) are added or removed to the integer over time. With this perspective, primitive sets can be viewed as singular moments in such a developmental process, which are only encountered at most once in the life cycle of a given integer. It seems of interest to study this developmental perspective further.
The paper is currently a work in progress; we have released an early version due to the public interest in this problem. We plan to explore some further applications, and also to formalize more of the above results in Lean (currently two of the six main theorems are formalized), before submitting the paper for publication. The situation here highlights a distinction I have recently made between three components of the problem solving process in mathematics, namely proof generation, proof verification, and proof digestion. In this particular case, the first two steps were extremely rapid due to modern AI tools; however, properly digesting the AI-generated proofs into a coherent exposition that places the arguments in context with both past literature and future directions remains a slower process that requires expert human attention.
Products of consecutive integers with unusual anatomy
I’ve just uploaded to the arXiv my paper “Products of consecutive integers with unusual anatomy“. This paper answers some questions of Erdős and Graham which were initially motivated by the study of the Diophantine factorial equation
where and are positive integers. Writing , one can rewrite this equation as where denotes the squarefree part of (the smallest factor of formed by dividing out a perfect square). For instance, we have which corresponds to the solution to the original equation.The equation (1) ties into the general question of what the anatomy (prime factorization) of the product looks like. This is a venerable topic, with the first major result being the Sylvester-Schur theorem from 1892 that the largest prime factor of is greater than whenever . Another notable result is the Erdős-Selfridge theorem that the product is never a perfect power for .
Erdős and Graham were able to show that solutions to (1) were somewhat rare, in that the set of possible values of had density zero. For them, the hardest case to treat was when the interval was what they called bad, in the sense that was divisible by the square of its largest prime factor. They were able, with some effort, to show that the union of all bad intervals also had density zero, which was a key ingredient in to prove the previous result about solutions to (1). They isolated a subcase of the bad intervals, which they called the very bad intervals, in which the product was a powerful number (divisible by the square of every prime factor).
A later paper of Luca, Saradha, and Shorey made the bounds more quantitative, showing that both the set of values of , as well as the union of bad intervals, had density for some absolute constant . In the other direction, just by considering the case , one can show that the number of possible values of up to is , where is the constant
As for the bad intervals, by again considering the case , it is possible to show that the number of bad points up to is see for instance this paper of Ivic. Similarly, the union of the very bad intervals contains as the set of powerful numbers; Golomb worked out that the number of powerful numbers up to is .It was conjectured by Erdős and Graham that all of these lower bounds are in fact sharp (up to multiplicative factors); this is Erdos Problem 380 (and a portion of Erdos Problem 374). The main result of this paper is to confirm this conjecture in two cases and come close in the third:
Theorem 1
- The number of numbers up to that lie in a bad interval of length is of the number of bad points up to .
- The number of numbers up to that lie in a very bad interval of length is .
- The number of numbers up to of the form for a solution to (1) is .
Not surprisingly, the methods of proof involve many standard tools in analytic number theory, such as the prime number theorem (and its variants in short intervals), zero density estimates, Vinogradov’s bounds on exponential sums, asymptotics for smooth numbers, the large sieve, the fundamental lemma of sieve theory, and the Burgess bound for character sums. There was one point where I needed a small amount of algebraic number theory (the classification of solutions to a generalized Pell equation), which was the one place where I turned to AI for assistance (though I ended up rewriting the AI argument myself). One amusing point is that I specifically needed the recent zero density theorem of Guth and Maynard (as converted to a bound on exceptions to the prime number theorem in short intervals by Gafni and myself); previous zero density theorems were barely not strong enough to close the arguments.
A few more details on the methods of proof. It turns out that very bad intervals, or intervals solving (1), are both rather short, in that the bound holds. The reason for this is that the primes that are larger than (in the very bad case) or for a large constant (in the (1) case) cannot actually divide any of the unless they divide it at least twice. This creates a constraint on the fractional parts of and that turns out to be inconsistent with the equidistribution results on those fractional parts coming from Vinogradov’s bounds on exponential sums unless is small. In the very bad case, this forces a linear relation between two powerful numbers; expressing powerful numbers as the product of a square and a cube, matters then boil down to counting solutions to an equation such as
with say . The number of solutions here turns out to be by work of Aktas-Murty and of Chan; we generalize the arguments in the former to handle a slightly more general equation. A similar argument handles solutions to (1), except in one regime where the parameter is somewhat large (comparable to ), in which case one instead collects some congruence conditions on and applies the large sieve.The situation with bad intervals is more delicate, because there is no obvious way to make small in all cases. However, by the large sieve (as well as the Guth–Maynard theorem), one can show that the contribution of large is negligible, and from bounds on smooth numbers one can show that the interval contains a number with a particularly specific anatomy, of the form where are all primes of roughly the same size, and is a smoother factor involving smaller primes. The rest of the bad interval creates some congruence conditions on the product . Using some character sum estimates coming from the Burgess bounds, we find that the residue of becomes fairly equidistributed amongst the primitive congruence classes to a given modulus when one perturbs the primes randomly (there are some complications from exceptional characters of Siegel zero type, but we can use a large values estimate to keep their total contribution under control). This allows us to show that the congruence conditions coming from the bad interval are restrictive enough to make non-trivial bad intervals quite rare compared to bad points. One innovation in this regard is to set up an “anti-sieve”: the elements of a bad interval tend to have an elevated chance of being divisible by small primes, and one can use moment methods to show that an excessive number of small prime divisors is somewhat rare. This can be compared to standard sieve arguments, which often seek to limit the event that a number has an unexpectedly deficient number of small prime divisors.
Mathematical methods and human thought in the age of AI
Tanya Klowden and I have uploaded to the arXiv our preprint “Mathematical methods and human thought in the age of AI“. This is an unabridged version of a solicited article for a forthcoming Blackwell Companion to the Philosophy of Mathematics. I rarely write article-length essays of a philosophical nature (perhaps the last one was in 2007), but given the topical interest in AI and formalization for mathematics, which has begun to raise increasingly fundamental questions about what the nature, purpose and practice of mathematics actually is (or ought to be), it seemed like it was a timely opportunity to write about these matters. Other mathematicians seem to have recently come to this conclusion also; see for instance this paper of Avigad, or this paper of Commelin, Jamnik, Ochigame, Taelman, and Venkatesh, both of which have come out in the last few weeks.
Our piece took over a year to write – which means, at the current pace of development in the field, that some of it is already slightly out of date. Nevertheless, it was an instructive exercise for both of us to try to look beyond the immediate technical issues presented by current AI and formalization tools and try to point out the philosophical questions that we will have to grapple with as these tools become increasingly capable and integrated into our profession, using prior examples of technological advancement as a guide. We don’t pretend to have definitive answers to most of these questions, but as with mathematics itself, the first step is to pose the questions and then try to make partial progress on them (or at least identify some negative results and eliminate some failed approaches). One point we particularly felt worth stressing is that AI tools and applications (in mathematics or elsewhere) should not be viewed purely through the technical lens of what microscale problems they solve and how effective or efficient they are at solving them, but also through the macroscopic humanitarian lens of how our society, our shared body of knowledge and understanding, and our species benefits (or is harmed) as a whole from these technologies.
Our initial submission ended up significantly exceeding the page limits of the submission and ventured beyond the philosophy of mathematics into broader philosophical and ethical questions about AI in general. A streamlined version of the paper will appear in the forthcoming companion, but we have decided to make the original longer version available on the arXiv.
Local Bernstein theory, and lower bounds for Lebesgue constants
I’ve just uploaded to the arXiv my paper “Local Bernstein theory, and lower bounds for Lebesgue constants“. This paper was initially motivated by a problem of Erdős} on Lagrange interpolation, but in the course of solving that problem, I ended up modifying some very classical arguments of Bernstein and his contemporaries (Boas, Duffin, Schaeffer, Riesz, etc.) to obtain “local” versions of these classical “Bernstein-type inequalities” that may be of independent interest.
Bernstein proved many estimates concerning the derivatives of polynomials, trigonometric polynomials, and entire functions of exponential type, but perhaps his most famous inequality in this direction is:
Lemma 1 (Bernstein’s inequality for trigonometric polynomials) Let be a trigonometric polynomial of degree at most , with for all . Then for all .
Similar inequalities concerning norms of derivatives of Littlewood-Paley components of functions are now ubiquitious in the modern theory of nonlinear dispersive PDE (where they are also called Bernstein estimates), but this will not be the focus of this current post.
A trigonometric polynomial of degree is of exponential type in the sense that for complex . Bernstein in fact proved a more general result:
Lemma 2 (Bernstein’s inequality for functions of exponential type) Let be an entire function of exponential type at most , with for all . Then for all .
There are several proofs of this lemma – see for instance this survey of Queffélec and Zarouf. In the case that is real-valued on , there is a nice proof by Duffin and Schaeffer, which we sketch as follows. Suppose we normalize , and adjust by a suitable damping factor so that actually decays slower than as . Then, for any and , one can use Rouche’s theorem to show that the function has the same number of zeroes as in a suitable large rectangle; but on the other hand one can use the intermediate value theorem to show that has at least as many zeroes than in the same rectangle. Among other things, this prevents double zeroes from occuring, which turns out to give the desired claim after some routine calculations (in fact one obtains the stronger bound for all real ).
The first main result of the paper is to obtain localized versions of Lemma 2 (as well as some related estimates). Roughly speaking, these estimates assert that if is holomorphic on a wide thin rectangle passing through the real axis, is bounded by on the intersection of the real axis with this rectangle, and is “locally of exponential type” in the sense that it is bounded by on the upper and lower edges of this rectangle (and obeys some very mild growth conditions on the remaining sides of this rectangle), then can be bounded by plus small errors on the real line, with some additional estimates away from the real line also available. The proof proceeds by a modification of the Duffin–Schaeffer argument, together with the two-constant theorem of Nevanlinna (and some standard estimates of harmonic measures on rectangles) to deal with the effect of the localization. (As a side note, this latter argument was provided to me by ChatGPT, as I was not previously aware of the Nevanlinna two-constant theorem.)
Once one localizes this “Bernstein theory”, it becomes suitable for the analysis of (real-rooted, monic) polynomials of a high degree , which are not bounded globally on (and grow polynomially rather than exponentially at infinity), but which can exhibit “local exponential type” behavior on various intervals, particularly in regions where the logarithmic potential
behaves like a smooth function (here is the empirical measure of the roots of ). A key example is the (monic) Chebyshev polynomials , which locally behave like sinusoids on the interval (and are locally of exponential type above and below this interval):
This becomes relevant in the theory of Lagrange interpolation. Recall that if are real numbers and is a polynomial of degree less than then one has the interpolation formula
where the Lagrange basis functions are defined by the formula In terms of the monic polynomial , we can write The stability and convergence properties of Lagrange interpolation are closely related to the Lebesgue function and for a given interval , the quantity is known as the Lebesgue constant for that interval.If one chooses the interpolation points poorly, then the Lebesgue constant can be extremely large. However, if one selects these points to be the roots of the aforementioned monic Chebyshev polynomials, then it is known that for all fixed intervals in . In the case , it was shown by Erdős} that this is the best possible value of the Lebesgue constant up to errors for interpolation on , thus
whenever (a more precise bound was later shown by Vertesi). Erdős and Turán then asked if the same lower bound held for more general intervals . This is shown in our paper; a variant integral bound is also established, answering a separate question of Erdős. These lower bounds had previously obtained up to constants by Erdős and Szabados}; the main issue was to obtain the sharp constant in the main term.In terms of the monic polynomial , these two estimates can be written as
and Using the intuition that should behave locally like a trigonometric polynomial, and performing some renormalizations, one can extract the following toy problem to work with first:Problem 3 Let be a trigonometric polynomial of degree with roots in .
It is easy to check that the lower bounds of and are sharp by considering the case when is a sinusoid .
The bound (3) is immediate from Bernstein’s inequality (Lemma 1). By applying a local version of this inequality, I was able to get a weak version of the claim (1) in which was replaced with ; see this early version of the paper, which was developed through conversations with Nat Sothanaphan and Aron Bhalla. By combining this argument with ideas from the older work of Erdős}, I was able to establish (1).
The bound (2) took me longer to establish, and involved a non-trivial amount of playing around with AI tools, the story of which I would like to share here. I had discovered the toy problem (4), but initially was not able to establish this inequality; AlphaEvolve seemed to confirm it numerically (with sinusoids appearing to be the extremizer), but did not offer direct clues on how to prove this rigorously. At some point I realized that the left-hand side factorized into the expressions and , and tried to bound these expressions separately. Perturbing around a sinusoid , I was able to show that the norm was a local minimum as long as one only perturbed by lower order Fourier modes, keeping the frequency coefficients unchanged. Guessing that this local minimum was actually a global minimum, this led me to conjecture the general lower bound
whenever was a degree trigonometric polynomial with highest frequency components . AlphaEvolve numerically confirmed this inequality to be likely to be true also. I still did not see how to prove this inequality, but I decided to try my luck giving it to ChatGPT Pro, which recognized it as an approximation problem and gave me a duality-based proof (based ultimately on the Fourier expansion of the square wave). With some further discussion, I was able to adapt this proof to functions of global exponential type (replacing the Fourier manipulations with contour shifting arguments, in the spirit of the Paley-Wiener theorem), which roughly speaking gave me half of what I needed to establish (2). However, I still needed the matching lower bound on the other factor in the toy problem. Again, AlphaEvolve could numerically confirm that this inequality was likely to be true, but now the quantity I was trying to control did not look convex or linear in , and so the previous duality method did not seem to apply. At this point I switched to pen and paper; eventually I realized that the expression almost looked like a sum of residues, and eventually after playing around with contour integrals of using the residue theorem I was able to establish (4), and then with a bit more (human) effort I could move from the toy problem back to the original problem to obtain (2). Quite possibly AI tools would also have been able to assist with these steps, but they were not necessary here; their main value for me was in quickly confirming that the approach I had in mind was numerically plausible, and in recognizing the right technique to solve one part of the toy problem I had isolated. (I also used AI tools for several other secondary tasks, such as literature review, proofreading, and generating pictures, but these applications of AI have matured to the point where using them for this purpose is almost mundane.)
Group and semigroup puzzles and a possible Polymath project
An Artin-Tits group is a group with a finite set of generators in which every relation is of the form or for some positive integer , where and are two of the generators. In particular, the commutation relation is allowed (it is the case of the first type of relation) and so is the braid relation (it is the case of the second type of relation). This means that Artin-Tits groups include free groups, free Abelian groups, and braid groups: for example, the braid group on strands has a presentation with generators , where represents a twist of the th and st strands, and relations if and .
A few weeks ago, I asked ChatGPT for a simple example of a word problem for groups or semigroups that was not known to be decidable and also not known to be undecidable. It turns out that the word problem for many Artin-Tits groups comes into that category: the simplest example where the status is not known is the group with four generators and where and commute and all other pairs of generators satisfy the braid relation .
My interest in this was initially that I was looking for a toy model from which one could learn something about how mathematicians judge theorems to be interesting. I started with a remarkable semigroup discovered by G. S. Tseytin that has five generators, seven very simple relations, and an undecidable word problem. From that I created a puzzle game that you might be interested to play. (NB the puzzles were not showing up in the public version, but that should be sorted out now.) Each puzzle is equivalent to an instance of the word problem for Tseytin’s semigroup, but the interface makes it much more convenient to change words using the relations than it would be to do it with pen and paper.
I was hoping that because every mathematical problem can in principle be encoded as a puzzle in this game, one might be able to build a sort of “alien mathematics”, where a theorem was an equality between words, and a definition was a decision to introduce a new (and redundant) generator g, together with a relation of the form g = w, where w is a word in the current alphabet. Theorems would be particularly interesting if they were equalities between words that could be established only by a chain of equalities that went through much longer words, and definitions would be useful if the new generators satisfied particularly concise relations (which would allow one to build “theories” within the system). I still hope to find a word problem that will allow such a project to take off, but in the end, after a lot of playing around with the game linked to above, I have decided that Tseytin’s semigroup is not suitable. The reason is that it is designed so that arbitrary instances of the word problem for groups can be encoded as word problems in this semigroup, and once one gets used to the game, one starts to see how that encoding can work. Furthermore, one seems to be driven towards the encoding — I don’t get the impression that there’s a whole other region of this semigroup to explore that has nothing to do with the kinds of words that come up in the encoding. And if that impression is correct, then one might as well start with the word problem for some group in unencoded form, or alternatively look for another semigroup. Nevertheless, I find the Tseytin game quite enjoyable: I won’t say more about it here but have written a fairly comprehensive tutorial that you can open up and read if you follow the link above.
This is perhaps the moment to say that the words “I created a puzzle game” are slightly misleading. For one thing, I discussed the idea of gamifying Tseytin’s semigroup about three years ago with Mirek Olšák, a former member of my automatic theorem proving group, and he created a basic prototype in Python. But the main point is that I do not have the programming skills to create a game that can be played in a web browser — I vibecoded it using ChatGPT.
After that experience, I thought that maybe I would have better luck if instead of looking for a group or semigroup with undecidable word problem, which might well have been explicitly designed with some encoding in mind, I looked for a word problem for which the decidability status was unknown. That way, it wouldn’t have been designed to be undecidable, but might nevertheless just happen to be undecidable and provide a nice playground of the kind I was (and still am) after. And that is what led me to the Artin-Tits groups.
However, those don’t seem to be suitable either, because it is conjectured that they all have decidable word problems. I have created a game for the Artin-Tits group mentioned above, which you can also play if you want, but I have found it very hard to create interesting puzzles. (NB again there was a problem with the puzzles showing up, with only a tiny handful being there, but that should now be sorted out.) That is, I found it difficult to find words that are equivalent to the identity but that are not easily shown to be equivalent to the identity. One nice example comes from the fact that the subgroup generated by and is isomorphic to the braid group with four strands. The late Patrick Dehornoy found a very nice example of a braid with four strands that is equal to the identity but not in a completely trivial way. A picture of it can be found on page 4 of this paper.
This is where a potential Polymath project comes in. An initial goal would be to determine the decidability status of this one small Artin-Tits group: if we managed that, then we could consider the more general problem. And the way I envisage approaching this initial goal is an iterative process that runs as follows.
- Devise an algorithm for solving word problems in the group.
- If the current algorithm is , then search for a puzzle that fails to solve.
- If the search is successful, then devise an algorithm that solves all the puzzles that solves, and also solves . Make the current algorithm and return to step 2. Devising can be done by playing the game with puzzle several times until one has the feeling that one has solved it in a systematic way.
- If the search is unsuccessful and the current algorithm is , then attempt to prove inductively that always succeeds: that is, to prove that if solves and is obtained from by applying a single relation in the group, then solves .
- (Maybe) If the inductive step doesn’t seem to work, then try to use that failure to devise a puzzle that defeats and return to step 3.
I have done a couple of iterations already, with the result that I now have an algorithm — let’s call it — that solves (basically instantaneously) every puzzle I throw at it, including the one derived from Dehornoy’s braid mentioned above. So a subgoal of the initial goal is to find a puzzle that has a solution but that doesn’t solve. If we can’t, then maybe it is worth trying to prove that solves the word problem for this group. One comment about the algorithm is that it never increases the length of a word, though it often does moves that preserve the length. I was told by ChatGPT that it is an unsolved problem whether every braid that is equal to the identity can be reduced to the identity without ever increasing the number of crossings. Of course that isn’t a guarantee that it really is unsolved, but if it is, then that’s another interesting problem. It also means that I would be extremely interested if someone found an example of a word in the Artin-Tits group that can be reduced to the identity but only if one starts by lengthening it.
I’m hoping that this will be an enjoyable project for people who like both mathematics and programming. On the maths side, there is an unsolved problem to think about, and on the programming side there are lots of possibilities: for example, one could write programs that explore the space of words equal to the identity, trying to do so in such a way that there is a reasonable chance of reaching a word where it isn’t obvious how to reverse all the steps and get back to the identity. The game comes with a “sandbox” that has a few tools for generating words, but at the moment it is fairly primitive, and I would welcome suggestions for how to improve it.
It seems to me that as Polymath projects go, one can’t really lose: either we find an algorithm for the word problem, which establishes an unknown (at least according to ChatGPT) case of a decidability problem, or we find a suite of harder and harder puzzles, creating a more and more challenging and entertaining game and obtaining a deeper understanding of the Artin-Tits group in the process.
This Artin-Tits game has only a rather rudimentary and not very good tutorial created by ChatGPT. It’s easy to play once you’ve got the hang of it, and in the end I think the easiest way to get the hang of it is to watch someone else play it for a couple of minutes. I have therefore created a video tutorial, which you can find here. The video itself lasts about 25 minutes, but the tutorial part is under 10 minutes: what makes the video longer is an explanation of various features for making it easier to create puzzles, and also an explanation of how the algorithm works, which again is much easier to explain if I can demonstrate it on screen than if I have to write down some text. (I do have some text, since that is what I used as a prompt for ChatGPT to implement the algorithm, but it took a few iterations to get it working properly, so now I’m not sure what the quality of the code will be like, or even whether it is doing exactly what I want, though it appears to be.) Although I have embedded the video into this post, if you actually want to see what is going on you will probably need to watch it on YouTube using the full screen. I recommend not watching the explanation of the algorithm until you have played the game a few times first. Also, I should warn you that the games in the Advanced category are not all soluble, but games 1, 5 and 6 definitely are. Another thing I forgot to say on the video is that if you want you can “rotate” a word by clicking on one of its letters and dragging it to the right or left. If the word is equal to the identity, then its rotations will be as well, so this is a valid move. There is also a button for disabling this rotation facility if you want the puzzle to be slightly more challenging.
A quick note about the availability of the two games. They are hosted on the Netlify platform. I am on their free plan, which gives me a certain number of “credits” each month. I’m not sure how quickly these will run out, largely because I have no idea how many people will actually think it interesting to play the games. If they do run out, then the games will cease to be available until the credits are renewed, which for me happens on the 10th of each month. If this has happened and you are keen to play one of the games, another option is to download the html files and open them in your browser. Here is a link to the Artin-Tits game and here is one to the Tseytin game. If you are feeling particularly public-spirited, especially if you think you will play quite a lot, then you might consider doing that anyway, so that the Netlify credits run down more slowly. If the running out of the credits is quick enough for all this to be a real issue, then I may move to a paid plan.
I’ll finish with a quick tip for playing the Artin-Tits game, which I mentioned on the video but perhaps didn’t stress enough. Many of the moves consist in selecting three consecutive letters and using a relation of the form to change them. Easy examples of this are replacement rules such as . But what if inverses are involved? I’ll represent inverses of generators with upper-case letters, so for example represents , which in the game would be a white followed by a white followed by a black . The word turns out to equal . To remember this, a simple rule is that two letters of the same colour can be bracketed together and “pushed past” the third letter, which retains its colour but changes its value. Here, for example, we write as and then swap them over, changing into in the process, getting , or without the brackets. In group theoretic terms, this is of course saying that and are conjugates, and that is what conjugates one to the other. But when playing the game it is convenient to remember it by thinking that when you see a subword such as , you can push the (and in particular the ) to the left, getting .
Mathematics Distillation Challenge – Equational Theories
Mathematical research traditionally involves a small number of professional mathematicians working closely on difficult problems. However, I have long believed that there is a complementary way to do mathematics, in which one works with a broad community of mathematically minded people on problems which may not be as deep as the problems one traditionally works on, but still are of mathematical interest; and that modern technologies, including AI, are more suitable for contributing the latter type of workflow. The “Polymath projects” were one example of this broad type of collaboration, where internet platforms such as blogs and wikis were used to facilitate such collaboration. Some years later, collaborative formalization projects (such as the one to formalize the Polynomial Freiman–Ruzsa conjecture of Marton, discussed previously on this blog here) became popular in some circles. And in 2024, I launched the Equational Theories Project (ETP) (discussed on this blog here and here), combining the rigor of Lean formalization with “good old fashioned AI” (in the form of automated theorem provers) to settle (with formal verification) over 22 million true-false problems in universal algebra.
Continuing in this spirit, Damek Davis and I are launching a new project, in the form of an experimental competitive challenge hosted by the SAIR Foundation (where I serve as a board member, and which is supplying technical support and compute). The idea of this challenge, motivated in part by this recent paper of Honda, Murakami, and Zhang, is to measure the extent to which the 22 million universal algebra true-false results obtained by the ETP can be “distilled” into a short, human-readable “cheat sheet”, similar to how a student in an undergraduate math class might distill the knowledge learned from that class into a single sheet of paper that the student is permitted to bring into an exam.
Here is a typical problem in universal algebra that the ETP was able to answer:
Problem 1 Suppose that is a binary operation such that for all . Is it true that for all ?
Such a problem can be settled either by algebraically manipulating the initial equation to deduce the target equation, or by finding a counterexample to the target equation that still satisfies the initial equation. There are a variety of techniques to achieve either of these, but this sort of problem is difficult, and even undecidable in some cases; see this paper of the ETP collaborators for more discussion. Nevertheless, many of these problems can be settled with some effort by humans, by automated theorem provers, or by frontier AI systems; here for instance is an AI-generated solution to the above problem.
However, these AI models are expensive, and do not reveal much insight as to where their answers come from. If one instead tries a smaller and cheaper model, such as one of the many open-source models available, it turns out that these models basically perform no better than random chance, in that when asked to say whether the answer to a question such as the above is true or false, they only answer correctly about 50% of the time.
But, similarly to how a student struggling with the material for a math class can perform better on an exam when provided the right guidance, it turns out that such cheap models can perform at least modestly better on this task (with success rates increasing to about 55%-60%) if given the right prompt or “cheat sheet”.
“Stage 1” of the distillation challenge, which we launched today, asks for contestants to design a cheat sheet (of at most 10 kilobytes in size) that can increase the performance of these models on the above true-false problems to as high a level as possible. We have provided a “playground” with which to test one’s cheat sheet (or a small number of example cheat sheets) some cheap models against a public set of 1200 problems (1000 of which were randomly selected, and rather easy, together with 200 “hard” problems that were selected to resist the more obvious strategies for resolving these questions); a brief video explaining how to use the playground can be found here.
Submissions stage will end on April 20, after which we will evaluate the submissions against a private subset of test questions. The top 1000 submissions will advance to a second stage which we are currently in the process of designing, which will involve more advanced models, but also the more difficult task of not just providing a true-false answer, but also a proof or counterexample to the problem.
The competition will be coordinated on this Zulip channel, where I hope there will be a lively and informative discussion.
My hope is that the winning submissions will capture the most productive techniques for solving these problems, and/or provide general problem-solving techniques that would also be applicable to other types of mathematical problems. We started with the equational theory project data set for this pilot competition due to its availability and spectrum of difficulty levels, but if this type of distillation process leads to interesting results, one could certainly run in on many other types of mathematical problem classes to get some empirical data on how readily they can be solved, particularly after we learn from this pilot competition on how to encourage participation and share of best practices.
SAIR will also launch some other mathematical challenges in the coming months that will be of a more cooperative nature than this particular competitive challenge; stay tuned for further announcements.
SLMath Deputy Director Search
[This post is written in my capacity as Vice-Chair of the Board of Trustees of SLMath. -T.]
SLMath, formerly MSRI, has launched the search for the next Deputy Director. This key position is a close advisor to the Director and shares in the internal management of the scientific team and programs at SLMath. This position is ideal for an experienced professional with a PhD in mathematical sciences seeking a new opportunity to leverage their strengths in program and grant management, financial management, and people management.
Six Math Essentials
Just a brief announcement that I have been working with Quanta Books to publish a short book in popular mathematics entitled “Six Math Essentials“, which will cover six of the fundamental concepts in mathematics — numbers, algebra, geometry, probability, analysis, and dynamics — and how they connect with our real-world intuition, the history of math and science, and to modern practice of mathematics, both in theory and in applications. The scheduled publication date is Oct 27, but it is currently available for preorder.
