poulpy123
6 days ago
The answer is reasoning. It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable. They are useful, and they can mimic thinking to a certain level, mainly because they have been trained on a inhumane amount of data that no person could learn in one life. But they do not think, and the current algorithms are clearly a dead end for thinking machines.
griffzhowl
6 days ago
> the current algorithms are clearly a dead end for thinking machines.
These discussions often get derailed into debates about what "thinking" means. If we define thinking as the capacity to produce and evaluate arguments, as the cognitive scientists Sperber and Mercier do, then we can see LLMs are certainly producing arguments, but they're weak at the evaluation.
In some cases, arguments can be formalised, and then evaluating them is a solved problem, as in the examples of using the Lean proofchecker in combination with LLMs to write mathematical proofs.
That suggests a way forward will come from formalising natural language arguments. So LLMs by themselves might be a dead end but in combination with formalisation they could be very powerful. That might not be "thinking" in the sense of the full suite of human abilities that we group with that word but it seems an important component of it.
kjellsbells
6 days ago
> suggests a way forward will come from formalising natural language arguments
If by this you mean "reliably convert expressions made in human natural language to unambiguous, formally parseable expressions that a machine can evaluate the same way every time"... isn't that essentially an unreachable holy grail? I mean, everyone from Plato to Russell and Wittgenstein struggled with the meaning of human statements. And the best solution we have today is to ask the human to restrict the set of statement primitives and combinations that they can use to a small subset of words like "const", "let foo = bar", and so on.
griffzhowl
6 days ago
Whether the Holy Grail is unreachable or not is the question. Of course, the problem in full generality is hard, but that doesn't mean it can't be approached in various partial ways, either by restricting the inputs as you suggest or by coming up with some kind of evaluation procedures that are less strict than formal verifiability. I don't have any detailed proposals tbh
AstroBen
6 days ago
Yesterday I got AI (a sota model) to write some tests for a backend I'm working on. One set of tests was for a function that does a somewhat complex SQL query that should return multiple rows
In the test setup, the AI added a single database row, ran the query and then asserted the single added row was returned. Clearly this doesn't show that the query works as intended. Is this what people are referring to when they say AI writes their tests?
I don't know what to call this kind of thinking. Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues. AI just doesn't have it, and it hasn't improved in this area for years
This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way
elcritch
6 days ago
As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes.
The tooling in the Code tools is key to useable LLM coding. Those tools prompt the models to “reason” whether they’ve caught edge cases or met the logic. Without that external support they’re just fancy autocompletes.
In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.
LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.
grayhatter
6 days ago
> In some ways it’s no different than working with some interns. You have to prompt them to “did you consider if your code matched all of the requirements?”.
I really hate this description, but I can't quite filly articulate why yet. It's distinctly different because interns can form new observations independently. AIs can not. They can make another guess at the next token, but if it could have predicted it the 2nd time, it must have been able to predict it the first, so it's not a new observation. The way I think through a novel problem results in drastically different paths and outputs from an LLM. They guess and check repeatedly, they don't converge on an answer. Which you've already identified
> LLMs are different in that they’re sorta lobotomized. They won’t learn from tutoring “did you consider” which needs to essentially be encoded manually still.
This isn't how you work with an intern (unless the intern is unable to learn).
expedition32
5 days ago
The whole point about an intern is that after a month they can act without coaching. Humans do actually learn- it is quite a revelation to see a child soak up data like an AI on steroids.
AstroBen
6 days ago
> As a counter I’ve had OpenAI Codex and Claude Code both catch logic cases I’d missed in both tests and codes
That has other explanations than that it reasoned its way to the correct answers. Maybe it had very similar code in its training data
This specific example was with Codex. I didn't mention it because I didn't want it to sound like I think codex is worse than claude code
I do realize my prompt wasn't optimal to get the best out of AI here, and I improved it on the second pass, mainly to give it more explicit instruction on what to do
My point though is that I feel these situations are heavily indicative of it not having true reasoning and understanding of the goals presented to it
Why can it sometimes catch the logic cases you miss, such as in your case, and then utterly fail at something that a simple understanding of the problem and thinking it through would solve? The only explanation I have is that it's not using actual reasoning to solve the problems
dorgo
6 days ago
Sounds like the AI was not dumb but lazy. I do it similarly when I don't feel like doing it.
grayhatter
6 days ago
> Is this what people are referring to when they say AI writes their tests?
yes
> Any intelligent, reasoning human would immediately see that it's not even close to enough. You barely even need a coding background to see the issues.
[nods]
> This kind of thing happens over and over again. I look at the stuff it outputs and it's clear to me that no reasoning thing would act this way
and yet there're so many people who are convinced it's fantastic. Oh, I made myself sad.
The larger observation about it being statistical inference, rather than reason... but looks to so many to be reason is quite an interesting test case for the "fuzzing" of humans. In line with why do so many engineers store passwords in clear text? Why do so many people believe AI can reason?
benrutter
6 days ago
> That suggests a way forward will come from formalising natural language arguments.
Hot take (and continue with the derailment), but I'd argue that analytic philosophy from the last 100 years suggests this is a dead end. The idea that belief systems could be formalized was huge in the early 20th century (movements like Logical Positivism, or Russell's principia mathematica being good examples of this).
Those approaches haven't really yielded many results, and by far the more fruitful form of analysis has been to conceptually "reframe" different problems (folks like Hillary Putnam, Wittgenstein, Quine being good examples).
We've stacked up a lot of evidence that human language is much too loose and mushy to be formalised in a meaningful way.
master_crab
6 days ago
We've stacked up a lot of evidence that human language is much too loose and mushy to be formalized in a meaningful way.
Lossy might also be a way of putting it, like a bad compression algorithm. Written language carries far less information than spoken and nonverbal cues.
sdwr
6 days ago
Wittgenstein obliterates any hope of formalizing language, yeah.
I think modeling language usefully looks a lot more like psychoanalysis than first order logic.
griffzhowl
6 days ago
True, maybe full formalisation is too strict and the evaluation should be fuzzier
staticman2
6 days ago
I think you may mean Sperber and Mercier define "reasoning" as the capacity to produce and evaluate arguments?
griffzhowl
6 days ago
True, they use the word "reasoning". Part of my point was just to focus on the more concrete concept: the capacity to produce and evaluate arguments.
hulitu
5 days ago
> These discussions often get derailed into debates about what "thinking" means.
"SAL-9000: Will I dream? Dr. Chandra: Of course you will. All intelligent beings dream. Nobody knows why. Perhaps you will dream of HAL... just as I often do." From 2010
coldtea
6 days ago
>If we define thinking as the capacity to produce and evaluate arguments
That bar is so low that even a political pundit on TV can clear it.
md2020
2 days ago
> they are just statistical machine outputing whatever they training set as most probable.
How is this sentiment still the top comment on an article about AI on HN in 2026? It's not true with today's models. They undergo vast amounts of reinforcement learning optimizing an objective that is NOT just predict the most likely next token given the training corpus. I would say even without the RL the "predict the next token" objective doesn't preclude thinking and reasoning, but that's a separate discussion. Generative sequence modeling learns to (approximately) model the process that produced the sequence. When you consider that text sequences are produced by human minds, which most would consider to be thinking and reasoning, well...
Balgair
6 days ago
I know a lot of people with access to Claude Code and the like will say that 'No, it sure seems to reason to me!'
Great. But most (?) of the business out there aren't paying for the big boy models.
I know of a F100 that got snookered into a deal with GPT 4 for 5 years, max of 40 responses per session, max of 10 sessions of memory, no backend integration.
Those folks rightly think that AI is a bad idea.
trash_cat
6 days ago
What constitutes real "thinking" or "reasoning" is beside the point. What matters is what results we getting.
And the challenge is rethinking how we do work, connecting all the data sources for agents to run and perform work over the various sources that we perform work. That will take ages. Not to mention having the controls in place to make that the "thinking" was correct in the end.
virgil_disgr4ce
6 days ago
Thinking is not besides the point, it is the entire point.
You seem to be defining "thinking" as an interchangeable black box, and as long as something fits that slot and "gets results", it's fine.
But it's the code-writing that's the interchangeable black box, not the thinking. The actual work of software development is not writing code, it's solving problems.
With a problem-space-navigation model, I'd agree that there are different strategies that can find a path from A to B, and what we call cognition is one way (more like a collection of techniques) to find a path. I mean, you can in principle brute-force this until you get the desired result.
But that's not the only thing that thinking does. Thinking responds to changing constraints, unexpected effects, new information, and shifting requirements. Thinking observes its own outputs and its own actions. Thinking uses underlying models to reason from first principles. These strategies are domain-independent, too.
And that's not even addressing all the other work involved in reality: deciding what the product should do when the design is underspecified. Asking the client/manager/etc what they want it to do in cases X, Y and Z. Offering suggestions and proposals and explaining tradeoffs.
Now I imagine there could be some other processes we haven't conceived of that can do these things but do them differently than human brains do. But if there were we'd probably just still call it 'thinking.'
neutronicus
6 days ago
> connecting all the data sources for agents to run
Copilot can't jump to definition in Visual Studio.
Anthropic got a lot of mileage out of teaching Claude to grep, but LLM agents are a complete dead-end for my code-base until they can use the semantic search tools that actually work on our code-base and hook into the docs for our expensive proprietary dependencies.
klooney
6 days ago
> It is obvious now that whatever quality LLM have, they don't think and reason, they are just statistical machine outputing whatever they training set as most probable
I have kids, and you could say the same about toddlers. Terrific mimics, they don't understand the whys.
gls2ro
6 days ago
IMHO when toddlers say mama they really understand that to a much much bigger degree than any LLM. They might not be able to articulate it but the deep understanding is there.
So I think younger kids have purpose and associate meaning to a lot of things and they do try to get to a specific path toward an outcome.
Of course (depending on the age) their "reasoning" is in a different system than hours where the survival instincts are much more powerful than any custom defined outcome so most of the time that is the driving force of the meaning.
Why I talk about meaning? Because, of course, the kids cannot talk about the why, as that is very abstract. But meaning is a big part of the Why and it continues to be so in adult life it is just that the relation is reversed: we start talking about the why to get to a meaning.
I also think that kids starts to have more complex thoughts than the language very early. If you got through the "Why?" phase you might have noticed that when they ask "Why?" they could mean very different questions. But they don't know the words to describe it. Sometimes "Why?" means "Where?" sometimes means "How?" sometimes means "How long?" .... That series of questioning is, for me, a kind of proof that a lot of things are happening in kids brain much more than they can verbalise.
cbdevidal
6 days ago
Reasoning keeps improving, but they still have a ways to go
vaylian
6 days ago
What we need is reasoning as in "drawing logical conclusions based on logic". LLMs do reasoning by recursively adding more words to the context window. That's not logical reasoning.
tim333
6 days ago
It's debatable that humans do "drawing logical conclusions based on logic". Look at politics and what people vote for. They seem to do something more like pattern matching.
isodev
6 days ago
Humans are far from logical. We make decisions within the context of our existence. This includes emotions, friends, family, goals, dreams, fears, feelings, mood, etc.
it’s one of the challenges when LLMs are being anthropomorphised, reasoning/logic for bots is not the same as that for humans.
R_D_Olivaw
6 days ago
And yet, when we make bad calls or do illogical things, because of hormones, emotions, energy levels, etc we still calling it reasoning.
But, to LLMs we don't afford the same leniency. If they flip some bits and the logic doesn't add up we're quick to point that "it's not reasoning at all".
Funny throne we've built for ourselves.
habinero
6 days ago
Yes, because different things are different.
coldtea
6 days ago
Maybe we say that when we don't like those conclusions?
After all I can guarantee the other side (whatever it is) will say the same thing for your "logical" conclusions.
It is logic, we just don't share the same predicates or world model...
vaylian
4 days ago
I agree that some people use intuition or pattern matching to make decisions. But humans are also able to use logical reasoning to come to conclusions.
virgil_disgr4ce
6 days ago
Just because all humans don't use reason all the time doesn't mean reasoning isn't a good and desirable strategy.
elzbardico
6 days ago
I don't know why you were downvoted. It is a bit more complicated, but that's the gist of it. LLMs don't actually reason.
dagss
6 days ago
Whether LLM is reasoning or not is an independent question to whether it works by generating text.
By the standard in the parent post, humans certainly do not "reason". But that is then just choosing a very high bar for "reasoning" that neither humans nor AI meets...what is the point then?
It is a bit like saying: "Humans don't reason, they just let neurons fire off one another, and think the next thought that enters their mind"
Yes, LLMs need to spew out text to move their state forward. As a human I actually sometimes need to do that too: Talk to myself in my head to make progress. And when things get just a tiny bit complicated I need to offload my brain using pen and paper.
Most arguments used to show that LLMs do not "reason" can be used to show that humans do not reason either.
To show that LLMs do not reason you have to point to something else than how it works.
irishcoffee
5 days ago
I’ll take a stab.
If LLMs were actually able to think/reason and you acknowledge that they’ve been trained on as much data as everyone could get their hands on such that they’ve been “taught” an infinite amount more than any ten humans could learn in a lifetime, I would ask:
Why can’t they solve novel, unsolved problems?
dagss
5 days ago
When coding they are solving "novel, unsolved problems" related to coding problems set up.
So I will assume you mean within maths, science etc? Basically things they can't solve today.
Well 99.9% of humans cannot solve novel, unsolved problems in those fields.
LLMs cannot learn, there is just the initial weight estimation process. And that process currently does not make them good enough on novel math/theoretical physics problems.
That does not mean they do not "reason" in the same way that those 99.9% of humans still "reason".
But they definitely do not learn, the way humans do.
(Anyway, if LLMs could somehow get 1000x as large context window and get to converse with themselves for a full year, it does not seem out of the question they could come out with novel research?)
notepad0x90
6 days ago
Do you think reasoning models don't count? there is a lot of work around those and things like RAGs.
vrighter
5 days ago
"Reasoning" in this context is just a marketing gimmick that means "run the llm in a loop". But people who don't know how they work (the people buying them for others) just take that word at face value and assume it means what it usually means in totally different contexts.
notepad0x90
5 days ago
That's fair, thanks.
lerp-io
6 days ago
they can think just not in the same abstract platonic way that a human mind can
dagss
6 days ago
Your mind must work differently than mine. I have programmed for 20 years, I have a PhD in astrophysics..
And my "reasoning" is pretty much like a long ChatGPT verbal and sometimes not-so-verbal (visual) conversation with myself.
If my mind really did abstract platonic thinking I think answers to hard problems would just instantly appear to me, without flaws. But only problems I hve solved before and can pattern match do that.
And if I have to think any new thoughts I feel that process is rather similar to how LLMs work.
It is the same for history of science really -- only thoughts that build small steps on previous thoughts and participate in a conversation actually are thought by humans.
Totally new leaps, which a "platonic thinking machines" should easily do, do not seem to happen..
Humans are, IMO, conversation machines too...
HPsquared
6 days ago
I rather approach it from a Cartesian perspective. A context window is just that, it's not "existence". And because they do not exist in the world the same way as a human does, they do not think in the same way a human does (reversal of "I think therefore I am")
bonesss
6 days ago
I have a context matrix, therefore I transform?
HPsquared
6 days ago
An LLM doesn't have a valid concept of "I" because the data and algorithm it comprises could be run deterministically on any hardware, even multiple times, held in storage, rewound, edited and so on. There is no unique subject, there is no "I". The "life" or "soul" or "session ID" when that sentence is generated, is not known by itself to be unique.
fc417fc802
6 days ago
> because the data and algorithm it comprises could be run deterministically on any hardware, even multiple times, held in storage, rewound, edited and so on.
Quite the assumption you're making that the same thing does not apply to humans. Not having the technology is not the same thing as not being possible.
user
6 days ago
user
6 days ago
NuclearPM
6 days ago
LLMs
joenot443
6 days ago
> But they do not think
I see this argument made a lot. I'm not sure if the distinction really holds weight once we start to unravel though.
What's a topic you're able to think about that an LLM is not able to think about?
mopsi
6 days ago
Any topic with little coverage in the training data. LLMs will keep circling around the small bits in the training data, unable synthesize new connections.
This is very obvious when trying to use LLMs to modify scripts in vendor-specific languages that have not been widely documented and don't have many examples available. A seasoned programmer will easily recognize common patterns like if-else blocks and loops, but LLMs will get stuck and output gibberish.
paodealho
6 days ago
Edit: HN filters emojis in comments. TLDR: the bots reply with various sea-related emojis and can't explain that there's no seahorse emoji.
on Meta (whatsapp) chatbot:
me: what is the seahorse emoji?
bot:
me: this is not a seahorse
bot: You got me! The actual seahorse emoji is . Want me to use it in a fun sentence?
on Bing/Copilot:
me: show me the seahorse emoji
bot:
------------
Here you go: — actually, that’s a fun combo, but the real seahorse emoji is … wait, not quite! The correct one is for jellyfish and for unicorn, but the seahorse emoji itself is ? Let me clarify:
The official seahorse emoji is: ? No — correction! It’s for unicorn. The actual seahorse emoji is: ?
Let’s be precise: the seahorse emoji is ?
Oops — I misspoke earlier. The true seahorse emoji is: ?
Wait — let’s settle this cleanly: The seahorse emoji is: ?
Actually, the correct one is: ?
---------------
In some instances, Copilot answers that "there's no seahorse emoji _and that's why I keep running in circles_". This is just pasting answers, as this is a popular problem discussed on the internet. If the bot were reasoning, it would just answer "there's no seahorse emoji, please stop asking".
mr_mitm
6 days ago
If that's the benchmark, then Opus 4.5 (with "extended thinking") can think:
> Me: what is the seahorse emoji?
> Claude: There isn't a seahorse emoji in the standard Unicode emoji set. The closest you'll get is the generic fish or tropical fish , but no dedicated seahorse exists as of now.montjoy
6 days ago
Copilot is the absolute worst. Yesterday I had tried to have it create a printable calendar for January 2026 but no matter how I instructed it, it kept showing that the first was on a Wednesday, not Thursday. I even fed it back its own incorrect PDF in a new conversation, which clearly showed the 1st on a Wednesday and asked it to tell me what day the calendar showed the first on. It said the calendar showed the 1st as a Thursday. It started to make me disbelieve my own eyes.
Edit: I gave up on Copilot ant fed the same instructions to ChatGPT, which had no issue.
The point here is that some models seem to know your intention while some just seem stuck on their training data.
cgriswald
6 days ago
I can solve a mystery novel based on the evidence alone. Assuming an LLM doesn’t already have the answer it will offer solutions based on meta-information like how similar mysteries conclude or are structured. While this can be effective, it’s not really solving the mystery and will fail with anything truly novel.
Tteriffic
5 days ago
People more often do the same thing and act like llm’s.
ted_bunny
6 days ago
I asked GPT for rules on 101-level French grammar. That should be well documented for someone learning from English, no? The answers were so consistently wrong that it seemed intentional. Absolutely nothing novel asked of it. It could have quoted verbatim if it wanted to be lazy. I can't think of an easier question to give an LLM. If it's possible to "prompt wrong" a simple task that my six-year old nephew could easily do, the burden of proof is not on the people denying LLM intelligence, it's on the boosters.
munksbeer
6 days ago
> the burden of proof is not on the people denying LLM intelligence, it's on the boosters
It's an impossible burden to prove. We can't even prove that any other human has sentience or is reasoning, we just evaluate the outcomes.
One day the argument you're putting forward will be irrelevant, or good for theoretical discussion only. In practice I'm certain that machines will achieve human level output at some point.
mjcl
6 days ago
> machines will achieve human level output at some point
Would you care to put some sort of time scale to "at some point?" Are we talking about months, years, decades, centuries?
munksbeer
6 days ago
No real idea. It is also a very difficult thing to measure. But I think once we see it, the argument will be over.
Wild guess, within 30 years.