P	U	S	Č	P	S	N
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29

Sakupljač feed-ova

A recent experience with ChatGPT 5.5 Pro

W.T. Gowers - Pet, 2026-05-08 17:40

We are all having to keep revising upwards our assessments of the mathematical capabilities of large language models. I have just made a fairly large revision as a result of ChatGPT 5.5 Pro, to which I am fortunate to have been given access, producing a piece of PhD-level research in an hour or so, with no serious mathematical input from me.

The background is that, as has been widely reported, LLMs are now capable of solving research-level problems, and have managed to solve several of the Erdős problems listed on Thomas Bloom’s wonderful website. Initially it was possible to laugh this off: many of the “solutions” consisted in the LLM noticing that the problem had an answer sitting there in the literature already, or could be very easily deduced from known results. But little by little the laughter has become quieter. The message I am getting from what other mathematicians more involved in this enterprise have been saying is that LLMs have got to the point where if a problem has an easy argument that for one reason or another human mathematicians have missed (that reason sometimes, but not always, being that the problem has not received all that much attention), then there is a good chance that the LLMs will spot it. Conversely, for problems where one’s initial reaction is to be impressed that an LLM has come up with a clever argument, it often turns out on closer inspection that there are precedents for those arguments, so it is still just about possible to comfort oneself that LLMs are merely putting together existing knowledge rather than having truly original ideas. How much of a comfort that is I will not discuss here, other than to note that quite a lot of perfectly good human mathematics consists in putting together existing knowledge and proof techniques.

I decided to try something a little bit different. At least in combinatorics, there are quite a lot of papers that investigate some relatively new combinatorial parameter that leads naturally to several questions. Because of the sheer number of questions one can ask, the authors of such papers will not necessarily have the time to spend a week or two thinking about each one, so there is a decent probability that at least some of them will not be all that hard. This makes such papers very valuable as sources of problems for mathematicians who are doing research for the first time and who will be hugely encouraged by solving a problem that was officially open. Or rather, it used to make them valuable in that way, but it looks as though the bar has just been raised. It is no longer enough that somebody asks a problem: it needs to be hard enough for an LLM not to be able to solve it.

In any case, a little over a week ago I decided to see how ChatGPT 5.5 Pro would fare with a selection of problems asked by Mel Nathanson in a paper entitled Diversity, Equity and Inclusion for Problems in Additive Number Theory. Nathanson has a remarkable record of being interested in problems and theorems that have later become extremely fashionable, which has led him to write a series of extremely well timed and therefore highly influential textbooks. In this paper, he argues for the interest of several other problems, some of which I will now briefly describe.

If is a set of integers, then its sumset is defined to be . For a positive integer , the –fold sumset, denoted , is defined to be . Nathanson is interested in the possible sizes of given the size of . To that end one can define a set to be the set of all such that there exists a set with and .

An obvious first question to ask is simply “What is ?” When , the answer is the set of all integers between and . It is an easy exercise to show that if , then , so this result is saying that all sizes in between can be realized. However, it is not true in general that can take every size between its minimum and maximum possibilities, and we do not currently have a complete description of .

Another natural question one can ask, and this is where ChatGPT came in, is how large a diameter you need if you want a set with and having prescribed sizes. (Of course, the size of must belong to .) Nathanson showed that for every there is a subset of with and , and asked whether the bound could be improved. ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds before providing a construction that yielded a quadratic upper bound, which is clearly best possible. It wrote up its argument in a slightly rambling LLM-ish style, so I asked if it could write the argument up as a LaTeX file in the style of a typical mathematical preprint. After two minutes and 23 seconds it gave me that, after which I spent some time convincing myself that the argument was correct.

The basic idea behind both Nathanson’s argument and ChatGPT’s was that in order to obtain a set of a given size with a sumset of a given size, it is useful to build it out of a Sidon set, which means a set with sumset of maximal size (that is not quite the usual definition but it is the simplest to use in this discussion), and an arithmetic progression. Also, for a bit of fine tuning one can take an additional point near the arithmetic progression. Then if one plays around with the various parameters, one finds that one can obtain sets of all the sizes one wants. Nathanson doesn’t express his argument this way (it is Theorem 5 of this paper), instead giving an inductive argument, but I think, without having checked too carefully, that if one unravels his argument, one finds that effectively that is what he ends up with, and the Sidon set in question consists of powers of 2. ChatGPT obtained its improvement by simply using a more efficient Sidon set — it is well known that one can find Sidon sets of quadratic diameter. (One might ask why Nathanson didn’t do that in the first place: I think it is because the obvious idea of using a more efficient Sidon set becomes obvious only after one has redescribed his inductive construction. Is that what ChatGPT did? It is very hard to say.)

Next, I asked ChatGPT to see whether it could do the same for a closely related question, where instead of looking at the size of the sumset, one looks at the size of the restricted sumset, which is defined to be . Unsurprisingly, it was able to do that with no trouble at all. I got it to write both results up in a single note, to avoid a certain amount of duplication. If you are curious, you can see the note here.

I then asked what it could do for general . I was much less optimistic that it would manage to do anything interesting, because the proof for makes fundamental use of the fact (due to Erdős and Szemerédi) that we know exactly which sizes we need to create. If we don’t know what the set is, then it seems that we are forced to start with a hypothetical set with and and build out of it a set of small diameter with the same property. As it happens, I still don’t know how to get round that difficulty (I’m mentioning that just to demonstrate that my mathematical input was zero, and I didn’t even do anything clever with the prompts), but Nathanson mentioned in his paper a remarkable paper of Isaac Rajagopal, a student at MIT, who must have got round the difficulty somehow, because he had managed to prove an exponential dependence of on for each fixed .

I’ll leave the previous paragraph there, but Isaac has subsequently explained to me that that isn’t really the difficulty. His argument gives a complete description of when is sufficiently large, and if one wants to prove a polynomial dependence for fixed , then assuming that is sufficiently large is clearly permitted. The real difficulty is that constructing the sets with given sumset sizes was significantly more complicated, and necessarily so because the degree of the polynomial grows with , and one therefore needs more and more parameters to define the sets.

In any case, the task faced by ChatGPT was not to solve the problem from scratch, but to see whether it was possible to tighten up Isaac Rajagopal’s argument. Here’s what happened.

After 16 minutes and 41 seconds, it came back with an argument that claimed to have improved the upper bound from exponential in to exponential in for any .
I asked it to write that in preprint form too, which took it a further 47 minutes and 39 seconds.
That preprint would have been hard for me to read, as that would have meant carefully reading Rajagopal’s paper first, but I sent it to Nathanson, who forwarded it to Rajagopal, who said he thought it looked correct.
Both ChatGPT and Rajagopal speculated a little on what might need to be done to push things further and get a polynomial bound, so I got greedy and asked ChatGPT to give that a go.
After 13 minutes and 33 seconds it told me it felt optimistic about the existence of such an argument but there were a couple of technical statements that needed checking.
I asked it to check them.
After 9 minutes and 12 seconds it got back to me with the check having been done, so I asked for this too to be written in preprint form.
After 31 minutes and 40 seconds the “preprint” was ready. Here it is.
Isaac Rajagopal looked at it and declared it to be almost certainly correct. It was clear that he meant this not just at a line-by-line level but at the level of ideas.

Isaac made some very interesting remarks about the nature of what the additional ideas were that ChatGPT contributed. Since, as I have already said, my mathematical input was zero, I invited him to write a guest section to this post. Just before we get to that, I want to raise a question (that will undoubtedly have been raised by others as well), which is simple: what should we do with this kind of content? Had the result been produced by a human mathematician, it would definitely have been publishable, so I think it would be wrong to describe it as AI slop. On the other hand, it seems pointless even to think about putting it in a journal, since it can be made freely available, and nobody needs “credit” for it (except that Isaac deserves plenty of credit for creating the framework on which ChatGPT could build). I understand that arXiv has a policy against accepting AI-written content, which makes good sense to me. So maybe there should be a different repository where AI-produced results can live. But various decisions would need to be made about how it was organized. I myself think that one would probably want to have some kind of moderation process, so that results would be included only if a human mathematician was prepared to certify that they were correct — or, better still, that they had been formalized by a proof assistant — and perhaps also that they answered a question that had been asked in a human-written paper. On the other hand, I wouldn’t want a moderation process that created vast amounts of work (unless the work was itself done by AI, but there are obvious dangers in going down that route). Anyway, until these questions are answered, this result is available from the link above, and perhaps, now that LLMs are so good at literature search, that will be enough to make it findable by anyone who wants to know whether Nathanson’s problem has been solved.

Isaac’s evaluation of what ChatGPT achieved

With just a few prompts, ChatGPT was able to improve the upper bound on (which I will define very soon) from exponential in to polynomial in . While its first improvement of the bound, from exponential in to exponential in , was a routine modification of my work, the improvement to polynomial in is quite impressive. To do this, ChatGPT came up with an idea which is original and clever. It is the sort of idea I would be very proud to come up with after a week or two of pondering, and it took ChatGPT less than an hour to find and prove, using similar methods to those in my own proof. My goal is to explain that idea, in a manner that will be digestible to my friends who are computer science majors as well as my math major friends.

The problem of bounding is closely related to a problem I worked on at the Duluth REU (Research Experience for Undergrads) program, of determining . In particular, is the set of possible -fold sumset sizes , where can be chosen to be any set of integers. is the minimal such that we can achieve all of the values of using -element sets . I spent last summer explicitly characterizing the set for large , by constructing sets such that achieves all sizes which I could not rule out as impossible. So, can be upper-bounded by optimizing my constructions.

I constructed these sets by combining smaller component sets which are simpler to analyze. Some of these components are the geometric series

for various values of and . Unfortunately, the elements of and are exponentially large in terms of . So, I asked ChatGPT (through Tim) whether there exist sets of elements which have similar sumset sizes to these geometric series, but contain only numbers of polynomial size in : I had no idea if this was possible, or how to begin constructing such sets. ChatGPT came back with an answer, constructing sets and which behave like “half a geometric series squeezed into a polynomial interval,” which is counterintuitive. Before I discuss the construction of and , I will explain the important properties of the sumset sizes of and which they recreate.

For , a set is called a set if the only solutions to

with in are the “trivial” solutions, by which I mean that one side of the equation is a reordering of the other side. If is a set of size , then elements of correspond exactly to choices of elements of , with repetition allowed. Using “stars and bars,” one can see that and this is the maximum possible value of among sets of size . So, another definition is that is a set if . Sidon sets, which Tim discussed, are exactly sets.

To make things more concrete, let us assume that in (1). Then, is a set, but it is not a set because of the relations

for any choice of in . In particular, , as these relations are the only ones preventing from being a set. lacks the relations in (2) because is not in . So, is a set, but it is not a set because of the relations

for any choices of in . This gives relations, and one can check that . To summarize, we have seen that

(a) is a set.

(b) is a linear function of .

(d) is a quadratic function of .

ChatGPT was able to find sets and of elements which satisfy (a)-(d), but whose elements all have polynomial size in . The construction of and uses -dissociated sets, which are sets where the only solutions to

with and in are the “trivial” solutions, i.e. and one side of the equation is a reordering of the other side. For , it is possible to construct an -dissociated set , where is approximately , and in particular polynomial in . Constructions of such a using finite fields date back to Singer (1938) and Bose–Chowla (1963) and are described in Appendix 1. Define

and

In hindsight, I have good intuition for the construction of and . All of the relations in (2) and (3) are formed by combining one or two relations of the form . There are approximately relations of the form in and , and approximately such relations in and . There are few other low-order relations in and , and similarly in and because is -dissociated. So, and manage to contain half as many -relations as their geometric series counterparts, while also containing few low-order relations.

We now see why (a)-(d) hold with and replaced by and , respectively. For concreteness, we assume that and , so contains no nontrivial relations as in (4) with . Then, is a set, but it is not a set because of the relations

for any choice of in . If we let , we can check that is linear in . In particular, (a) and (b) hold with replaced by , and the linear function replaced by . We can also see that is a set, but it is not a set because of the relations

for any in . If we let , we can check that is quadratic in . In a similar manner, (c) and (d) hold with replaced by , and the quadratic function replaced by .

Even though I can motivate it in retrospect, ChatGPT’s idea to use -dissociated sets to control relations of order at most feels quite ingenious. As far as I can tell, this idea is completely original.

ChatGPT’s proof that its construction produces the desired values of is very similar to my proof that the sets which I construct achieve all possible values of , after replacing and by and , respectively. Properties (a)-(d) capture many of the important properties of and (or and ) which are used in this proof. The final constructions involve combining the sets and (or and in my paper) for each value of between and with another set which is the union of an arithmetic progression and a point. Intuitively, and (or and ) have large sumsets, while arithmetic progressions have small sumsets, so it is plausible that one could get sets which achieve all the medium-sized sumsets by combining them. However, the proof of this is quite involved, and it occupies Section 4 of my paper and the entirety of the ChatGPT preprint. In Appendix 2, I work out the details of the ChatGPT construction to show that for sufficiently large,

For comparison, it is easy to see that is at least on the order of , and it is unknown what the real value is. In Appendix 3, I give details of the correspondence between my paper and the ChatGPT preprint, which will be helpful for those who want to read either.

Finally, I want to express my deep gratitude to Tim for allowing me to contribute to this blog. I am still stunned by the coincidence that the problem he chose to put into ChatGPT 5.5 Pro led him to my paper on the arXiv.

Tim on what this means for mathematical research

I would judge the level of the result that ChatGPT found in under two hours to be that of a perfectly reasonable chapter in a combinatorics PhD. It wouldn’t be considered an amazing result, since it leant very heavily on Isaac’s ideas, but it was definitely a non-trivial extension of those ideas, and for a PhD student to find that extension it would be necessary to invest quite a bit of time digesting Isaac’s paper, looking for places where it might not be optimal, familiarizing oneself with various algebraic techniques that he used, and so on.

It seems to me that training beginning PhD students to do research, which has always been hard (unless one is lucky enough, as I have often been, to have a student who just seems to get it and therefore doesn’t need in any sense to be trained), has just got harder, since one obvious way to help somebody get started is to give them a problem that looks as though it might be a relatively gentle one. If LLMs are at the point where they can solve “gentle problems”, then that is no longer an option. The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.

I would qualify that statement in two ways though. First, there is the obvious point that a beginning PhD student has the option of using LLMs. So the task is potentially easier than proving something that LLMs can’t prove: it is proving something in collaboration with LLMs that LLMs cannot manage on their own. I have done quite a lot of such collaboration recently and found that LLMs have made useful contributions without (yet) having game-changing ideas.

A second point is that I don’t know how much of what I have said generalizes to other areas of mathematics. Combinatorics tends to be quite focused on problems: you start with a question and you reason back from the question or if you reason forwards you do so very much with the question in mind. In other areas there can be much more of an emphasis on forwards reasoning: you start with a circle of ideas and see where it leads. To do it successfully, you need to have some way of discriminating between interesting observations and uninteresting ones, and it isn’t obvious to me what LLMs would be like at that.

Of course, everything I am saying concerns LLMs as they are right now. But they are developing so fast that it seems almost certain that my comments will go out of date in a matter of months. It is also almost certain that these developments will have a profoundly disruptive effect on how we go about mathematical research, and especially on how we introduce newcomers to it. Somebody starting a PhD next academic year will be finishing it in 2029 at the earliest, and my guess is that by then what it means to undertake research in mathematics will have changed out of all recognition.

I sometimes get emails from people who are interested in doing mathematical research but are not sure whether that makes sense any more as an aspiration. I have a view on that question, but it may very well change in response to further developments. That view is that there is still a great deal of value in struggling with a mathematics problem, but that the era where you could enjoy the thrill of having your name forever associated with a particular theorem or definition may well be close to its end. So if your aim in doing mathematics is to achieve some kind of immortality, so to speak, then you should understand that that won’t necessarily be possible for much longer — not just for you, but for anybody. Here’s a thought experiment: suppose that a mathematician solved a major problem by having a long exchange with an LLM in which the mathematician played a useful guiding role but the LLM did all the technical work and had the main ideas. Would we regard that as a major achievement of the mathematician? I don’t think we would.

So what is the point of struggling with a difficult mathematics problem? One answer is that it can be very satisfying to solve a problem even if the answer is already known, but I don’t think that is a sufficient reason to spend several years of your life on this peculiar activity. A better answer is that by solving hard problems you get an insight into the problem-solving process itself, at least in your area of expertise, in a way that you simply don’t if all you do is read other people’s solutions. One consequence of this is that people who have themselves solved difficult problems are likely to be significantly better at using solving problems with the help of AI, just as very good coders are better at vibe coding than not such good coders, or people who have a solid grasp of how to do basic arithmetic are likely to be more skilled at using calculators (and especially at noticing when an answer feels off). Mathematics is a highly transferable skill, and that applies to research-level mathematics as well. By doing research in mathematics, you may not get the same rewards as your equivalents a generation ago, but there is a good chance that you will be equipping yourself very well for the world we are about to experience.

Appendix 1 (Isaac)

We will construct an -dissociated set , where is approximately . This construction is a very minor modification of Bose–Chowla (1963)’s construction of a set, which I learned about from this paper. For whatever reason, the GPT preprint (Lemma 3.1) uses a different, less efficient construction using moment curves.

Let be a prime, let , let be the finite field with elements and fix a generator of , so that is equal to . Define a set of elements

Then, each element corresponds to a unique value of , by taking . Now an additive relation of the form in (4) with can be reframed by taking powers of as

As is a degree- extension of and is a generator of as an -extension, this means that does not satisfy any nonzero polynomials in of degree . So, both sides of (6) are identical as polynomials in and thus the additive relation in (4) is trivial. So, is -dissociated, and of course one can prune a few elements to reduce to size .

Appendix 2 (Isaac)

Fix constants such that (in my paper I arbitrarily chose ). Let the two sets in (5) be called and . Let denote the set of integers satisfying . Similarly to my paper, the constructions of such that achieves the desired sizes will combine sets of the following four types:

with choices of and .
for each value of , with choices of .
for each value of , with choices of .
A set of the correct size so that .

One reason that this construction needs to be complicated is that we need to create at least many sets. To do this, we vary parameters and in the domain and parameters and in the domain . We can choose to be slightly bigger than , and then the above construction gives us different sets where can be made arbitrarily small. So, if we were to remove any of the above parameters from the construction, and not change the others, this construction would no longer create many sets. In comparison, Nathanson’s construction when only needs to create sets. He does this by combining a Sidon set, an arithmetic progression, and one extra value, and varying the size of the arithmetic progression and the extra value in ranges of size .

We want to combine sets , which are given by , for the values of , for the values of , and a set. By Appendix 1, for all , there exists a -dissociated set of diameter . By the constructions of and , we can take each , where . Let have basis vectors . To combine , we can define as

Similarly to my Lemma 4.9, this construction ensures that the generating function product holds, which is the identity that both my paper and the GPT preprint use (see either paper for a definition of these generating functions). By (the standard) Lemma 2.3 of the GPT preprint, is Freiman-isomorphic of order to a subset of . Therefore, for sufficiently large (the whole construction relies on this for the same reasons as in my paper),

Appendix 3 (Isaac)

In Section 4.2 of my paper, I use a different, simpler construction to construct sets achieving the values in which have , for some small . These sets are subsets of , meaning that all elements have polynomial size in . This is observed in Section 5 of the GPT preprint.

Section 4.3 of my paper carries out the construction which combines many components including and . This corresponds to Sections 2, 3, 4, and 6 of the GPT preprint. This section has a lot of moving parts; I give an outline in Section 4.3.1.

In Section 4.3.2, I describe how the different components will be combined, using a construction which I call the disjoint union, and introduce generating functions as a bookkeeping tool to keep track of the sumset sizes of a set . This corresponds to Section 2 and Section 4 of the GPT preprint.

In Section 4.3.3, I compute the generating function of each of the component sets, including (Lemma 4.15) and (Lemma 4.17). This corresponds to Section 3 and Section 6.1 of the GPT preprint. In particular, is computed in Lemma 3.3 and is computed in Lemma 3.4. Once these generating functions have been computed, the remainder of the proof is almost identical in my paper and in the GPT preprint.

In Section 4.3.4, I put all the pieces together to show that as we range over the sets which I have constructed, the values of will assume all of the elements of . The key idea is to show that the set of all values of forms an interval, and contains numbers both smaller than and equal to .

Secondary links

Novi brojevi časopisa math.e

Matematička događanja

Kalendar

Matematički blogovi

Sponzori

Sakupljač feed-ova