qnleigh
4 hours ago
I am kind of amazed at how many commenters respond to this result by confidently asserting that LLMs will never generate 'truly novel' ideas or problem solutions.
> AI is a remixer; it remixes all known ideas together. It won't come up with new ideas
> it's not because the model is figuring out something new
> LLMs will NEVER be able to do that, because it doesn't exist
It's not enough to say 'it will never be able to do X because it's not in the training data,' because we have countless counterexamples to this statement (e.g. 167,383 * 426,397 = 71,371,609,051, or the above announcement). You need to say why it can do some novel tasks but could never do others. And it should be clear why this post or others like it don't contradict your argument.
If you have been making these kinds of arguments against LLMs and acknowledge that novelty lies on a continuum, I am really curious why you draw the line where you do. And most importantly, what evidence would change your mind?
Yizahi
3 hours ago
LLMs can generate anything by design. LLMs can't understand what they are generating so it may be true, it may be wrong, it may be novel or it may be known thing. It doesn't discern between them, just looks for the best statistical fit.
The core of the issue lies in our human language and our human assumptions. We humans have implicitly assigned phrases "truly novel" and "solving unsolved math problem" a certain meaning in our heads. Some of us at least, think that truly novel means something truly novel and important, something significant. Like, I don't know, finding a high temperature superconductor formula or creating a new drug etc. Something which involver real intelligent thinking and not randomizing possible solutions until one lands. But formally there can be a truly novel way to pack the most computer cables in a drawer, or truly novel way to tie shoelaces, or indeed a truly novel way to solve some arbitrary math equation with an enormous numbers. Which a formally novel things, but we really never needed any of that and so relegated these "issues" to a deepest backlog possible. Utilizing LLMs we can scour for the solutions to many such problems, but they are not that impressive in the first place.
logicprog
2 hours ago
If LLMs can come up with formerly truly novel solutions to things, and you have a verification loop to ensure that they are actual proper solutions, I don't understand why you think they could never come up with solutions to impressive problems, especially considering the thread we are literally on right now? That seems like a pure assertion at this point that they will always be limited to coming up with truly novel solutions to uninteresting problems.
eru
an hour ago
"Truly novel" is fast becoming a True Scotsman.
ggamezar
25 minutes ago
> It doesn't discern between them, just looks for the best statistical fit.
Why this is not true for humans?
Yizahi
9 minutes ago
We can't tell yet if that is true, partially true, or false for humans. We do know that LLM can't do anything else besides that (I mean as a fundamental operating principle).
LatencyKills
4 hours ago
I've been working on a utility that lets me "see through" app windows on macOS [1] (I was a dev on Apple's Xcode team and have a strong understanding of how to do this efficiently using private APIs).
I wondered how Claude Code would approach the problem. I fully expected it to do something most human engineers would do: brute-force with ScreenCaptureKit.
It almost instantly figured out that it didn't have to "see through" anything and (correctly) dismissed ScreenCaptureKit due to the performance overhead.
This obviously isn't a "frontier" type problem, but I was impressed that it came up with a novel solution.
skc
3 hours ago
That's actually pretty cool. What made you think of doing this in the first place?
LatencyKills
3 hours ago
Thanks! I've been doing a lot of work on a laptop screen (I normally work on an ultrawide) and got tired of constantly switching between windows to find the information I need.
I've also added the ability to create a picture-in-picture section of any application window, so you can move a window to the background while still seeing its important content.
I'll probably do a Show HN at some point.
saagarjha
2 hours ago
Why is ScreenCaptureKit a bad choice for performance?
LatencyKills
2 hours ago
Because you can't control what the content server is doing. SCK doesn't care if you only need a small section of a window: it performs multiple full window memory copies that aren't a problem for normal screen recorders... but for a utility like mine, the user needs to see the updated content in milliseconds.
Also, as I mentioned above, when using SCK, the user cannot minimize or maximize any "watched" window, which is, in most cases, a deal-breaker.
My solution runs at under 2% cpu utilization because I don't have to first receive the full window content. SCK was not designed for this use case at all.
stavros
3 hours ago
What was the solution?
LatencyKills
3 hours ago
Well, I'm not going to share either solution as this is actually a pretty useful utility that I plan on releasing, but the short answer is: 1) don't use ScreenCaptureKit, and 2) take advantage of what CGWindowListCreateImage() offers through the content server. This is a simple IPC mechanism that does not trigger all the SKC limitations (i.e., no multi-space or multi-desktop support). In fact, when using SKC, the user cannot even minimize the "watched" window.
Claude realized those issues right from the start.
One of the trickiest parts is tracking the window content while the window is moving - the content server doesn't, natively, provide that information.
stavros
3 hours ago
Huh, Claude one-shotted it out of a single message from me. Man, LLMs have gotten good.
LatencyKills
3 hours ago
No it didn't. Like I said... it may have gotten something that worked but there is no way Claude got it to work while supporting multi-spaces, multi-desktops, and using under 2% cpu utilization. My solution can display app window content even when those windows are minimized, which is not something the content server supports.
My point was that Claude realized all the SKC problems and came up with a solution that 99% of macOS devs wouldn't even know existed.
TeMPOraL
2 hours ago
> it may have gotten something that worked but there is no way Claude got it to work while supporting multi-spaces, multi-desktops, and using under 2% cpu utilization.
Maybe, but that's the magic of LLMs - they can now one-shot or few-shot (N<10) you something good enough for a specific user. Like, not supporting multi-desktops is fine if one doesn't use them (and if that changes, few more prompts about this particular issue - now the user actually knows specifically what they need - should close the gap).
energy123
an hour ago
> 67,383 * 426,397 = 71,371,609,051 ... You need to say why it can do some novel tasks but could never do others.
Model interpretability gives us the answers. The reason LLMs can (almost) do new multiplication tasks is because it saw many multiplication problems in its training data, and it was cheaper to learn the compressed/abstract multiplication strategies and encode them as circuits in the network, rather than memorize the times tables up to some large N. This gives it the ability to approximate multiplication problems it hasn't seen before.
SequoiaHope
3 hours ago
Most inventions are an interpolation of three existing ideas. These systems are very good at that.
mikkupikku
2 hours ago
My take as well. Furthermore, most innovations come relatively shortly after their technological prerequisites have been met, so that suggests the "novelty space" that humans generally explore is a relatively narrow band around the current frontier. Just as humans can search through this space, so too should machines be capable of it. It's not an infinitely unbounded search which humans are guided through by some manner of mystic soul or other supernatural forces.
fsflover
2 hours ago
I can't even find a good example of an invention that is not an interpolation.
franktankbank
5 minutes ago
The inclined plane, the wheel, shall I keep going?
jacquesm
4 hours ago
> e.g. 167,383 * 426,397 = 71,371,609,051
They may be wrong, but so are you.
KellyCriterion
3 hours ago
No, its correct:
jacquesm
an hour ago
You missed the point.
swingboy
3 hours ago
You could have just checked the math yourself, you know.
qsera
4 hours ago
It is like not trusting someone who attained highest score in some exam by by-hearting the whole text book, to do the corresponding job.
Not very hard to understand.
tornikeo
4 hours ago
Beliefs are not rooted in facts. Beliefs are a part of you, and people aren't all that happy to say "this LLM is better than me"
benterix
4 hours ago
I'm very happy to say calculators are far better than me in calculations (to a given precision). I'm happy to admit computers are so much better than me in so many aspects. And I have problem saying LLMs are very helpful tools able to generate output so much better than mine in almost every field of knowledge.
Yet, whenever I ask it to do something novel or creative, it falls very short. But humans are ingenious beasts and I'm sure or later they will design an architecture able to be creative - I just doubt it will be Transformer-based, given the results so far.
stavros
3 hours ago
But the question isn't whether you can get LLMs to do something novel, it's whether anyone can get them to do something novel. Apparently someone can, and the fact that you can't doesn't mean LLMs aren't good for that.
al_borland
an hour ago
When it comes to LLMs doing novel things, is it just the infinite monkey theorem[0] playing out at an accelerated rate, helped along by the key presses not being truly random?
Surely if we tell the LLM to do enough stuff, something will look novel, but how much confirmation bias is at play? Tens of millions of people are using AI and the biggest complaint is hallucinations. From the LLMs perspective, is there any difference between a novel solution and a hallucination, other than dumb luck of the hallucination being right?
stavros
an hour ago
This argument doesn't go the way you want it to go. Billions of people exist, but maybe a few tens of thousands produce novel knowledge. That's a much worse rate than LLMs.
al_borland
38 minutes ago
I’m not sure how we equate the number of humans to AI to determine a success rate.
We also can’t ignore than it was humans who thought up this problem to give to the AI. Thinking has two parts, asking and answering questions. The AI needed the human to formulate and ask the question to start. AI isn’t just dropping random discoveries on us that we haven’t even thought of, at least not that I’ve seen.
benterix
3 hours ago
To have a proper discussion we would have to define the word "novel" and that's a challenge in itself. In any case, millions of poeple tried to ask LLMs to do something creative and the results were bland. Hence my conclusion LLMs aren't good for that. But I'm also open they can be an element of a longer chain that could demonstrate some creativity - we'll see.
tovej
3 hours ago
Novel is a tricky word. In this case, the LLM produced a python program that was similar to other programs in its corpus, and this oython program generated examples of hypergraphs that hadn't been seen before.
That's a new result, but I don't know about novel. The technique was the same as earlier work in this vein. And it seems like not much computational power was needed at all. (The article mentions that an undergrad left a laptop running overnight to produce one of the previous results, that's absolute peanuts when compared to most computational research).
ChrisGreenHeur
4 hours ago
It's not possible to know something without believing it to be true. https://en.wikipedia.org/wiki/Belief#/media/File:Classical_d...
bilekas
3 hours ago
This is objectively wrong. If that was the case every scientist performing a test would have always had their expectations and beliefs proven true. If you're trying to disprove something also because you believe it to be wrong you would never be proven wrong.
veltas
3 hours ago
Do we know for a fact that LLMs aren't now configured to pass simple arithmetic like this in a simpler calculator, to add illusion of actual insight?
GaggiX
3 hours ago
You can train a LLM on just multiplication and test it on ones it has never seen before, it's nothing particularly magical.
veltas
3 hours ago
It's not 'magic' though but previously LLMs have performed very badly on longer multiplication, 'insight' is the wrong word but I'm saying maybe they're not wildly better at this calculation... maybe they are just optimising these well known jagged edges.
PUSH_AX
4 hours ago
The hardest part about any creativity is hiding your influences
bluecalm
4 hours ago
>>AI is a remixer; it remixes all known ideas together. It won't come up with new ideas
I always found this argument very weak. There isn't that much truly new anyway. Creativity is often about mixing old ideas. Computers can do that faster than humans if they have a good framework. Especially with something as simple as math - limited set of formal rules and easy to verify results - I find a belief computers won't beat humans at it to be very naive.
cyanydeez
2 hours ago
When I read through what they're doing? It sure doesn't sound like it's generating something new as people typically think of it. The link, they provide a very well defined problem and they just loop through it.
I think you're arguing with semantics.
ekjhgkejhgk
4 hours ago
Yes! I call these the "it's just a stochastic parrot" crowd.
Ironically, they are the stochastic parrots, because they're confidently repeating something that they read somehwere and haven't examined critically.
bdbdbdb
4 hours ago
I guess when it can't be tripped up by simple things like multiplying numbers, counting to 100 sequentially or counting letters in a string without writing a python program, then I might believe it.
Also no matter how many math problems it solves it still gets lost in a codebase
fenomas
3 hours ago
LLMs are bad at arithmetic and counting by design. It's an intentional tradeoff that makes them better at language and reasoning tasks.
If anybody really wanted a model that could multiply and count letters in words, they could just train one with a tokenizer and training data suited to those tasks. And the model would then be able to count letters, but it would be bad at things like translation and programming - the stuff people actually use LLMs for. So, people train with a tokenizer and training data suited to those tasks, hence LLMs are good at language and bad at arithmetic,
anal_reactor
4 hours ago
Arguments like "but AI cannot reliably multiply numbers" fundamentally misunderstand how AI works. AI cannot do basic math not because AI is stupid, but because basic math is an inherently difficult task for otherwise smart AI. Lots of human adults can do complex abstract thinking but when you ask them to count it's "one... two... three... five... wait I got lost".
datsci_est_2015
4 hours ago
> fundamentally misunderstand how AI works
Who does fundamentally understand how LLMs work? Many claims flying around these days, all backed by some of the largest investments ever collectively made by humans. Lots of money to be lost because of fundamental misunderstandings.
Personally, I find that AI influencers conveniently brush away any evidence (like inability to perform basic arithmetic) about how LLMs fundamentally work as something that should be ignored in favor of results like TFA.
Do LLMs have utility? Undoubtedly. But it’s a giant red flag for me that their fundamental limitations, of which there are many, are verboten to be spoken about.
stavros
3 hours ago
You're not doing yourself a favor when you point out "but they can't do arithmetic!" as if anyone says otherwise. Yes, we all know they can't do arithmetic, and that's just how they work.
I feel like I'm saying "this hammer is so cool, it's made driving nails a breeze" and people go "but it can't screw screws in! Why won't anyone talk about that! Hammers really aren't all they're cracked up to be".
datsci_est_2015
3 hours ago
Maybe because society has invested $trillions into this hammer and influencers are trying to convince CEOs to fire everyone and buy a bunch of hammers instead.
My comment even said “LLMs have utility”. I gave an inch, and now the mile must be taken.
stavros
3 hours ago
Saying that the fundamental limitations are things like counting the number of rs in strawberry is boring, though. That's how tokens work and it's trivial to work around.
Talking about how they find it hard to say they aren't sure of something is a much more interesting limitation to talk about, for example.
datsci_est_2015
2 hours ago
> Talking about how they find it hard to say they aren't sure of something is a much more interesting limitation to talk about, for example.
Sure, thank you for steelmanning my argument. I didn’t think I needed to actually spell out all of the fundamental limitations of LLMs in this specific thread. They are spoken at length across the web, but are often met with pushback, which was my entire point.
Here’s another one: LLMs do not have a memory property. Shut off the power and turn it back on and you lose all context. Any “memory” feature implemented by companies that sell LLM wrappers are a hack on top of how LLMs work, like seeding a context window before letting the user interact with the LLM.
stavros
2 hours ago
But that's also like saying "humans don't have a memory property, any 'memory' is in the hippocampus". It's not useful to say that "an LLM you don't bother to keep training has no memory". Of course it doesn't, you removed its ability to form new memories!
datsci_est_2015
2 hours ago
So why then do we stop training LLMs and keep them stored at a specific state? Is it perhaps because the results become terrible and LLMs have a delicate optimal state for general use? This sounds like an even worse case for a model of intelligence.
stavros
an hour ago
Nope, it's not that, but it's nice of you to offer a straw man. Makes the argument flow better.
datsci_est_2015
an hour ago
Not entirely a straw man. What is the purpose of storing and retrieving LLMs at a fixed state if not to guarantee a specific performance? Wouldn’t a strong model of intelligence be capable of, to extend your analogy, running without having its hippocampus lobotomized?
Given the precariousness of managing LLM context windows, I don’t think it’s particularly unfair to assume that LLMs that learn without limit become very unstable.
To steelman, if it’s possible, it may be prohibitively expensive. But somehow I doubt it’s possible.
stavros
an hour ago
It is, indeed, prohibitively expensive. But it's not impossible. The proof is in the fact that you can fine-tune LLMs.
TheSpiceIsLife
3 hours ago
Because know one owns a $300 billion dollar hammer that literally runs on fancy calculators.