gpjt
4 months ago
To be fair to the OpenAI team, if read in context the situation is at worst ambiguous.
The deleted tweet that the article is about said "GPT-5 just found solutions to 10 (!) previously unsolved Erdös problems, and made progress on 11 others. These have all been open for decades." If it had been posted stand-alone then I would certainly agree that it was misleading, but it was not.
It was a quote-tweet of this: https://x.com/MarkSellke/status/1979226538059931886?t=OigN6t..., where the author is saying he's "pushing further on this".
The "this" in question is what this second tweet is in turn quote-tweeting: https://x.com/SebastienBubeck/status/1977181716457701775?t=T... -- where the author says "gpt5-pro is superhuman at literature search: [...] it just solved Erdos Problem #339 (listed as open in the official database erdosproblems.com/forum/thread/3…) by realizing that it had actually been solved 20 years ago"
So, reading the thread in order, you get
* SebastienBubeck: "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
* MarkSellke: "Now it's done ten more"
* kevinweil: "Look at this cool stuff we've done!"
I think the problem here is the way quote-tweets work -- you only see the quoted post and not anything that it in turn is quoting. Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.That seems like an easy mistake to entirely honestly make, and I think the pile-on is a little unfair.
moefh
4 months ago
> Kevin Weil had the two previous quotes in his context when he did his post and didn't consider the fact that readers would only see the first level, so wouldn't have Sebastien Bubek's post in mind when they read his.
No, Weil said he himself misunderstood Sellke's post[1].
Note Weil's wording (10 previously unsolved Erdos problems) vs. Sellke's wording (10 Erdos problems that were listed as open).
GodelNumbering
4 months ago
Also, previous comment omitted the part that now-deleted tweet from Bubeck begins with "Science revolution via AI has officially begun...".
OtherShrezzing
4 months ago
Am I correct in thinking this is the 2nd such fumble by a major lab? DeepMind released their “matrix multiplication better than SOTA” paper a few months back, which suggested Gemini had uncovered a new way to optimally multiply two matrices in fewer steps than previously known. Then immediately after their announcement, mathematicians pointed out that their newly discovered SOTA had been in the literature for 30-40 years, and was almost certainly in Gemini’s training set.
ogogmad
4 months ago
No, your claim about matrix multiplication is false. Google's new algorithm can be applied recursively to 4x4 block matrices (over the field of complex numbers). This results in an asymptotically faster algorithm for nxn matrix multiplication than Strassen's. Earlier results on 4x4 matrices by Winograd and others did not extend to block matrices..
Google's result has more recently been generalised: https://arxiv.org/abs/2506.13242
jsnell
4 months ago
That doesn't match my recollection of the AlphaEvolve release.
Some people just read the "48 multiplications for a 4x4 matrix multiplications" part, and thought they found prior art at that performance or better. But they missed that the supposed prior art had tighter requirements on the contents of the matrix, which meant those algorithms were not usable for implementing a recursive divide and conquer algorithm for much larger matrix multiplications.
Here is a HN poster claiming to be one of the authors rebutting the claim of prior art: https://news.ycombinator.com/item?id=43997136
ummonk
4 months ago
We also had the GPT-5 presentation which featured both incorrect bar charts (likely AI generated) and an incorrect explanation of lift.
card_zero
4 months ago
Well, it is important that we have some technology to prevent us from going round in circles by reinventing things, such as search.
glenstein
4 months ago
It's an interesting type of fumble too, because it's easy to (mistakenly!) read it as "LLM tries and fails to solve problem but thinks it solved it" when really it's being credited with originality for discovering or reiterating solutions already out there in the literature.
It sounds like the content of the solutions themselves are perfectly fine, so it's unfortunate that the headline will leave the impression that these are just more hallucinations. They're not hallucinations, they're not wrong, they're just wrongly assigned credit for existing work. Which, you know, where have we heard that one before? It's like the stylistic "borrowing" from artists, but in research form.
whimsicalism
4 months ago
no, you are incorrect
card_zero
4 months ago
So the first guy said "solved [...] by realizing that it had actually been solved 20 years ago", and the second guy said "found solutions to 10 (!) previously unsolved Erdös problems".
Previously unsolved. The context doesn't make that true, does it?
glenstein
4 months ago
Right, and I would even go a step further and say the context from SebastienBubeck is stretching "solved" past its breaking point by equating literature research with self-bootsrapped problem solving. When it's later characterized as "previously unsolved" it's doubling down on the same equivocation.
Don't get me wrong, effectively surfacing unappreciated research is great and extremely valuable. So there's a real thing here but with the wrong headline attached to it.
watwut
4 months ago
> Don't get me wrong, effectively surfacing unappreciated research is great and extremely valuable. So there's a real thing here but with the wrong headline attached to it.
If I said that I solved a problem, but actually I took a solution for an old book, people would call me a liar. If I was prominent person, it would be academic fraud incident. No one would be saying that "I did extremely valuable thing" or "there was a real thing here".
3form
4 months ago
If you said you "solved", yes - if you said "found a solution" however, there's ambiguity to it, which is part of the confusion here.
glenstein
4 months ago
Some of the most important advancements in the history of science came from reviewing underappreciated discoveries that already existed in the literature. Mendel's work on genetics went under appreciated for decades before being effectively rediscovered, and proved to be integral to the modern synthesis, which provided a genetic basis for evolution, and is the most important development in the history of our understanding of evolution since Darwin and Wallace's original formulation.
Henrietta Leavitt's work on the relation between a stars period of pulsation and brightness was tucked away in a Harvard Journal, which had revolutionary potential not appreciated until Hubbel recalled and applied her work years later to demonstrate galactic redshift in Andromeda, understanding that it was an entirely separate galaxy, that it was receding away from us and contributing to the bedrock of modern cosmology.
The pathogenic basis for ulcers was proposed in the 1940s, which later became instrumental to explaining data in the 1980s and led to a Nobel prize in 2005.
It is and has always been fundamental to the progress of human knowledge to not just propose new ideas but to pull pertinent ones from the literature and apply them in new contexts, and depending on the field, the research landscape can be inconceivably vast, so efficiencies in combing through it can create the scaffolding for major advancements in understanding.
So there's more going on here than "lying".
Frieren
4 months ago
> "GPT-5 is really good at literature search, it 'solved' an apparently-open problem by finding an existing solution"
Survivor bias.
I can assure you that GPT-5 fucks up even relatively easy searches. I need to have a very good idea how the results looks like and the ability to test it to be able to use any result from GPT-5.
If I throw the dice 1000 times and post about it each time that I got a double six. Am I the best dice thrower that there is?
wasabi991011
4 months ago
I'm not really sure what you mean. Literature search is about casting a wide net to make a reading list that is relevant to your research.
It is pretty hard to fuck that up, since you aren't expected to find everything anyway. The idea of "testing" and "using any result from GPT" is just, like, reading the papers and seeing if they are tangentially related.
If I may speak to my own experience, literature search has been the most productive application I've personally used, more than coding, and I've found many interesting papers and research directions with it.
saghm
4 months ago
One time when I was a kid my dad and I were playing Yahtzee, and he rolled five 5s on his first roll of the turn. He was absolutely stunned, and at the time I was young enough that I didn't understand just how unlikely it was. If I only I knew that I was playing against the best dice thrower!
zacmps
4 months ago
For literature search that might be ok. It doesn't need to replace any other tools, and if 1/10 it surfaces something you wouldn't have found otherwise it could be worth the time on the dud attempts.
camillomiller
4 months ago
I have some more mirrors for you to try and climb, if you need them.
user
4 months ago
jibal
4 months ago
That's being disingenuous, not fair.
user
4 months ago