Why language models hallucinate

278 pointsposted 5 months ago
by simianwords

45 Comments

fumeux_fume

5 months ago

I like that OpenAI is drawing a clear line on what “hallucination” means, giving examples, and showing practical steps for addressing them. The post isn’t groundbreaking, but it helps set the tone for how we talk about hallucinations.

What bothers me about the hot takes is the claim that “all models do is hallucinate.” That collapses the distinction entirely. Yes, models are just predicting the next token—but that doesn’t mean all outputs are hallucinations. If that were true, it’d be pointless to even have the term, and it would ignore the fact that some models hallucinate much less than others because of scale, training, and fine-tuning.

That’s why a careful definition matters: not every generation is a hallucination, and having good definitions let us talk about the real differences.

aleph_minus_one

5 months ago

> Think about it like a multiple-choice test. If you do not know the answer but take a wild guess, you might get lucky and be right. Leaving it blank guarantees a zero. In the same way, when models are graded only on accuracy, the percentage of questions they get exactly right, they are encouraged to guess rather than say “I don’t know.”

To me, this seems to be an "US-American" way of thinking about multiple-choice tests. Other common ways to grade multiple-choice test that I have seen commonly are:

1. If the testee has the information that exactly one of N given choices is correct:

1.1 Give N-1 points for the correct answer, and -1 [negative one] point(s) for a wrong answer. This way, if the testee just answers the questions randomly, he will as expected value score 0 points.

1.2 A more brutal way if N>=3: the correct answer gives 1 point, all wrong answers give -1 points. You should learn your lesson only to give an answer if it is [alliteration unintended :-) ] correct (if N=2, the grading is identical to 1.1).

2. If there are possibly multiple correct answers, turn each item into choices of "yes" or "no" (with the option to give no answer). The correct choice gives you 1 point, the wrong gives you -1 point (i.e. as in 1.1).

rhubarbtree

5 months ago

I find this rather oddly phrased.

LLMs hallucinate because they are language models. They are stochastic models of language. They model language, not truth.

If the “truthy” responses are common in their training set for a given prompt, you might be more likely to get something useful as output. Feels like we fell into that idea and said - ok this is useful as an information retrieval tool. And now we use RL to reinforce that useful behaviour. But still, it’s a (biased) language model.

I don’t think that’s how humans work. There’s more to it. We need a model of language, but it’s not sufficient to explain our mental mechanisms. We have other ways of thinking than generating language fragments.

Trying to eliminate cases where a stochastic model the size of an LLM gives “undesirable” or “untrue” responses seems rather odd.

amelius

5 months ago

They hallucinate because it's an ill-defined problem with two conflicting usecases:

1. If I tell it the first two lines of a story, I want the LLM to complete the story. This requires hallucination, because it has to make up things. The story has to be original.

2. If I ask it a question, I want it to reply with facts. It should not make up stuff.

LMs were originally designed for (1) because researchers thought that (2) was out of reach. But it turned out that, without any fundamental changes, LMs could do a little bit of (2) and since that discovery things have improved but not to the point that hallucination disappeared or was under control.

roxolotl

5 months ago

This seems inherently false to me. Or at least partly false. It’s reasonable to say LLMs hallucinate because they aren’t trained to say they don’t have a statistically significant answer. But there is no knowledge of correct vs incorrect in these systems. It’s all statistics so what OpenAI is describing sounds like a reasonable way to reduce hallucinations but not a way to eliminate them nor the root cause.

johnea

5 months ago

I think a better title would be:

"Why do venture capital funded startups try to turn PR propaganda terms into widely used technical jargon"

Supporting points:

1) LLMs are not intelligence in any form, artificial or otherwise.

2) Hallucination is a phenomenon of a much more complex conscious entity. LLM's are not conscious, and therefore can't hallucinate in any way similar to a conscious entity.

3) Anthropomorphizing inanimate systems is a common phenomenon in human psychology.

Please stop spreading PR propaganda as if it were technical fact.

A reference from today's feed:

https://www.theatlantic.com/podcasts/archive/2025/09/ai-and-...

kingstnap

5 months ago

There is this deeply wrong part of this paper that no one has mentioned:

The model head doesn't hallucinate. The sampler does.

If you ask an LLM when x was born and it doesn't know.

And you take a look at the actual model outputs which is a probability distribution over tokens.

IDK is cleanly represented as a uniform probability Jan 1 to Dec 31

If you ask it to answer a multiple choice question and it doesn't know. It will say this:

25% A, 25% B, 25% C, 25%D.

Which is exactly, and correctly, the "right answer". The model has admitted it doesn't know. It doesn't hallucinate anything.

In reality we need something smarter than a random sampler to actually extract this information out. The knowledge and lack of knowledge is there, you just produced bullshit out of it.

yreg

5 months ago

Maybe it goes against the definition but I like saying that _all_ output is a hallucination, when explaining LLMs.

It just happens that a lot of that output is useful/corresponding with the real world.

thomasboyer

5 months ago

Great post. Teaching the models to doubt, to say "I don't know"/"I'm unsure"/"I'm sure" is a nice way to make them much better.

the_af

5 months ago

Some people here in the comments are arguing that the LLM "understands" what is "true" and "false", that is somewhat capable of reasoning, etc, but I still find it quite easy (with GPT-5) to break its facade of "reasoning".

I asked it to play a word game. This is very simple, and a very short session too. It failed in its very first response, and then it failed in explaining why it failed. All with total confidence, no hesitation.

Nobody fluent in English would fail so catastrophically. I actually expected it to succeed:

https://chatgpt.com/share/68bcb490-a5b4-8013-b2be-35d27962ad...

It's clear by this failure model the LLM doesn't understand anything.

Edit: to be clear, as the session goes longer it becomes more interesting, but you can still trip the LLM up in ways no human "understanding" the game would. My 6-year old plays this game better, because she truly understands... she can trip up, but not like this.

didibus

5 months ago

When tuning predictive models you always have to balance precision and recall because 100% accuracy is never going to happen.

In LLMs that balance shows up as how often the model hallucinates versus how often it says it doesn’t know. If you push toward precision you end up with a model that constantly refuses: What’s the X of Y? I don’t know. Can you implement a function that does K? I don’t know how. What could be the cause of G? I can’t say. As a user that gets old fast, you just want it to try, take a guess, let you be the judge of it.

Benchmarks and leaderboards usually lean toward recall because a model that always gives it a shot creates a better illusion of intelligence, even if some of those shots are wrong. That illusion keeps users engaged, which means more users and more money.

And that's why LLM hallucinates :P

farceSpherule

5 months ago

I wish they would come up with a better term. Computers do not have brains or conscientiousness.

They erroneously construct responses (i.e., confabulation).

robotcapital

5 months ago

It’s interesting that most of the comments here read like projections of folk-psych intuitions. LLMs hallucinate because they “think” wrong, or lack self-awareness, or should just refuse. But none of that reflects how these systems actually work. This is a paper from a team working at the state of the art, trying to explain one of the biggest open challenges in LLMs, and instead of engaging with the mechanisms and evidence, we’re rehashing gut-level takes about what they must be doing. Fascinating.

robertclaus

5 months ago

While I get the academic perspective of sharing these insights, this article comes across as corporate justifying/complaining that their model's score is lower than it should be on the leaderboards... by saying the leaderboards are wrong.

Or an even darker take is that its coorporate saying they won't prioritize eliminating hallucinations until the leaderboards reward it.

mqus

5 months ago

I think one of the main problems is the dataset it is trained on, which is written text. How much answers with statements are in a given text, compared to a "I don't know"? I think the "I don't know"s are much less represented. Now go anywhere on the internet where someone asks a question (the typical kind of content LLMs are trained on) and the problem is even bigger. You either get no textual answer or someone that gives some answer (that might even be false). You never get an answer like "I don't know", especially for questions that are shouted into the void (compared to asking a certain person). And it makes sense. I wouldn't start to answer every stackoverflow question with "I don't know" tomorrow, it would just be spam.

For me, as a layman (with no experience at all about how this actually works), this seems to be the cause. Can we work around this? Maybe.

drudolph914

5 months ago

I am an educator alongside being an engineer, so I've had to think about how to explain this topic to people in ways that give them some kind of intuition/insight. I don't have a good take for non-stem people, but I think I have a better explanation for people who are CS adjacent

I like to explain this whole hallucination problem by stating that LLMs are 2 different machines working together. one half of the machine is all the knowledge it was trained on, and you can think of this knowledge as an enormous classic tree you learn in CS classes; and each node in this tree is a token. the other half of the machine is a program that walks through this enormous tree and prints the token it's on

when you think of it like this, 3 things become immediately obvious

1. LLMs are a totally deterministic machine

2. you can make them seem smart by randomizing the walk through the knowledge tree

3. hallucinations are a side effect of trying to randomize the knowledge tree walk

I find it interesting that LLM companies are trying to fix such a fundamental problem by training the model to always guess the correct path. the problem I see with this approach is that 2 people can enter the same input text, but want 2 different outputs. if there isn't always a _correct path_ then you can't really fix the problem.

the only 2 options you have to “improve” things is prune and or add better data to the knowledge tree, or you’re trying the make the program that walks the knowledge tree take better paths.

the prune/add data approach is slightly better because it’s improving the quality of the token output. but the downside is you quickly realize that you need a fire hose of new human data to keep improving - but much of the data out there is starting to be generated by the LLMs - which leads to this inbreeding effect where the model gets worse

the 2nd approach feels less ideal because it will slow down the process of generating tokens.

all of this to say, from this point on, it’s just hacks, ducktape, and bandaids

juancn

5 months ago

This is fluff, hallucinations are not avoidable with current models since those are part of the latent space defined by the model and the way we explore it, you'll always find some.

Inference is kinda like doing energy minimization on a high dimensional space, the hallucination is already there, for some inputs you're bound to find them.

marbartolome

5 months ago

Anyone else getting an annoying auto-translation of the article? Browsing from Spain and it seems impossible to toggle to the original English writeup. Hating this trend.

d4rkn0d3z

5 months ago

This is a case of the metric becoming the target. The tools used to evaluate LLM performance are shaping the LLM. First you make your tools then your tools make you.

If we take a formal systems approach, then an LLM is a model of a complex hierarchy of production rules corresponding to the various formal and informal grammatical, logical, and stylistic rules and habits employed by humans to form language that expresses their intelligence. It should not be surprising that simply executing the production rules, or a model thereof, will give rise to sentences that cannot be assigned a meaning. It should also give rise to sentences that we cannot prove or make sense of immediately but we would not want to discard these due to uncertainty. Why? because every once in a while the sentence that would be culled is actually the stroke of brilliance we are looking for, uncertainty be damned. The citation here would be literally nearly every discovery ever made.

When I recall information and use it, when I "think", I don't just produce sentences by the rules, formal and informal, I don't consider at all how often I have seen one word precede another in past, rather as I meandre the landscape of a given context, a thought manifold if you will, I am constantly evaluating whether this is in contradiction with that, if this can be inferred from that via induction or deduction, does this preclude that, etc.. That is the part that is missing from an LLM; The uncanny ability of the human mind to reproduce the entire manifold of concepts as they relate to one another in a mesh from any small piece of the terrain that it might recall, and to verify anew that they all hang together unsupported by one's own biases.

The problem is that just as the scarcity of factual information in the corpus makes it difficult to produce, so is actual reasoning rarefied among human language samples. Most of what appears as reasoning is language games and will to power. The act of reasoning in an unbiased way is so foreign to humans, so painful and arduous, so much like bending over backwards or swimming upstream against a strong current of will to power, that almost nobody does it for long.

awongh

5 months ago

From an outsider perspective these kinds of insights make me think: Is it just a coincidence that a lot of the recent innovations in the space look very common sense in hindsight:

- if we train the model to "think" through the answer, we get better results - if we train the model to say "I don't know" when it's not sure we get less hallucinations

Is it just confirmation bias or do these common sense approaches work in on LLMs in other ways?

manveerc

5 months ago

Maybe I am oversimplifying it, but isn’t the reason that they are lossy map of worlds knowledge and this map will never be fully accurate unless it is the same size as the knowledge base.

The ability to learn patterns and generalize from them adds to this problem, because people then start using it for usecases it will never be able to solve 100% accurately (because of the lossy map nature).

e3bc54b2

5 months ago

Hallucination is all an LLM does. That is their nature, to hallucinate.

We just happen to find some of these hallucinations useful.

Let's not pretend that hallucination is a byproduct. The usefulness is the byproduct. That is what surprised the original researchers on transformer performance, and that is why the 'attention is all you need' paper remains such a phenomenon.

parentheses

5 months ago

I think a large issue at play here is post training. Pre training models the original distribution of input data. RL techniques tweak the models to "behave". This step changes how the models "think" in a fundamental way .

charcircuit

5 months ago

They shouldn't frame hallucination as a problem that is solvable provided they want to have a useful model (saying I don't know to every question is not useful). The data from the training may be wrong or out of date. Even doing a web search could find a common misconception instead of the actual answer.

cainxinth

5 months ago

I find the leader board argument a little strange. All their enterprise clients are clamoring for more reliability from them. If they could train a model that conceded ignorance instead of guessing and thus avoid hallucinations, why aren't they doing that? Because of leader board optics?

jrm4

5 months ago

Yeah, no, count me in with those who think that "All they do is hallucinate" is the correct way to say this and anything else dangerously obscures things.

More than anything, we need transparency on how these things work. For us and for the general public.

"Hallucination" introduces the dangerous idea that "them getting things wrong" is something like a "curable disease" and not "garbage in garbage out."

No. This is as stupid as saying Google telling me a restaurant is open when it's closed is a "hallucination." Stop personifying these things.

sfink

5 months ago

I have mixed feelings about AI, but love the posts and papers that dig into how they work. Except, as this post shows, I seem to vastly prefer Anthropic's posts to OpenAI's.

> Claim: Hallucinations are inevitable.

> Finding: They are not, because language models can abstain when uncertain.

Please go back to your marketing cave. "Claim: You'll get wet if it rains. Finding: You will not, because you can check the weather report and get inside before it starts raining."

Sure, language models could abstain when uncertain. That would remove some hallucinations [a word which here means, make statements that are factually untrue. Never mind that that's often what we want them to do.] Or when certain about something that their training data is flawed or incomplete about. Or when certain about something but introspection shows that the chain of activations goes through territory that often produces hallucinations. Or when certain about something that is subjective.

"Uncertainty" is a loaded term; these things don't think in the way that the definition of the word "certain" is based on, since it's based on human thought. But that aside, LLM uncertainty is very obviously a promising signal to take into account, and it's interesting to see what costs and benefits that has. But eliminating one cause does not prove that there are no other causes, nor does it address the collateral damage.

"Write me a story about Bill."

"I'm sorry Dave, Bill is hypothetical and everything I could say about him would be a hallucination."

"Write a comment for the function `add(a, b) = a + b`."

"// This function takes two numbers and adds them toget... I'm sorry Dave, I don't know how many bits these numbers are, what the behavior on overflow is, or whether to include the results of extreme voltage fluctuations. As a result, I can't produce a comment that would be true in all circumstances and therefore any comment I write could be construed as a hallucination."

kouru225

5 months ago

AI hallucination is an inherent problem of AI. You can mitigate it, but the whole point of AI IS hallucination. If the result is useful to us, we don’t call it anything. If the result is not useful to us, we call it “hallucination”

intended

5 months ago

> a generated factual error cannot be grounded in factually correct training data.

This is only true given a corpus of data large enough, and enough memory to capture as many unique dimensions as required no?

> However, a non-hallucinating model could be easily created, using a question-answer database and a calculator, which answers a fixed set of questions such as “What is the chemical symbol for gold?” and well-formed mathematical calculations such as “3 + 8”, and otherwise outputs IDK.

This is… saying that if you constrain the prompts and the training data, you will always get a response which is either from the training data, or IDK.

Which seems to be a strong claim, at least in my ignorant eyes.?

This veers into spherical cow territory, since you wouldn’t have the typical language skills we associate with an LLM, because you would have to constrain the domain, so that it’s unable to generate anything else. However many domains are not consistent and at their boundaries, would generate special cases. So in this case, being able to say IDK, would only be possible for a class of questions the model is able to gauge as outside its distribution.

Edit: I guess that is what they are working to show? That with any given model, it will hallucinate, and these are the bounds?

amw-zero

5 months ago

I love the euphemistic thinking. “We built something that legitimately doesn’t do the thing that we advertise, but when it doesn’t do it we shall deem that hallucination.”

ACCount37

5 months ago

This mostly just restates what was already well known in the industry.

Still quite useful, because, looking at the comments right now: holy shit is the "out of industry knowledge" on the topic bad! Good to have something to bring people up to speed!

Good to see OpenAI's call for better performance evals - ones that penalize being confidently incorrect at least somewhat.

Most current evals are "all of nothing", and the incentive structure favors LLMs that straight up guess. Future evals better include a "I don't know" opt-out, and a penalty for being wrong. If you want to evaluate accuracy in "fuck it send it full guess mode", there might be a separate testing regime for that, but it should NOT be the accepted default.

humanfromearth9

5 months ago

LLMs do not hallucinate. They just choose the most probabilistic next token. Sometimes, we, humans, interpret this as hallucinating, not knowing any better, not having any better vocabulary, but being able to refrain from anthropomorphizing the machine.

mannykannot

5 months ago

I'm generally OK with the list of push-backs against common misconceptions in the summary, but I have my doubts about the second one:

Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

...which raises the question of how reliable the uncertainty estimate could get (we are not looking for perfection here: humans, to varying degrees, have the same problem.)

For a specific context, consider those cases where LLMs are programming and invent a non-existent function: are they usually less certain about that function than they are about the real functions they use? And even if so, abandoning the task with the equivalent of "I don't know [how to complete this task]" is not very useful, compared to what a competent human programmer would do: check whether such a function exists, and if not, decide whether to implement it themselves, or backtrack to the point where they can solve the problem without it.

More generally, I would guess that balancing the competing incentives to emit a definite statement or decline to do so could be difficult, especially if the balance is sensitive to the context.

user

5 months ago

[deleted]

ahmedgmurtaza

5 months ago

Totally agreed with majorrity of the views

nurettin

5 months ago

We program them to fill in the blanks, and then sit there wondering why they did.

Classic humans.

hankchinaski

5 months ago

because they are glorified markov chains?

Waterluvian

5 months ago

> Abstaining is part of humility, one of OpenAI’s core values .

Is this PR fluff or do organizations and serious audiences take this kind of thing seriously?

lapcat

5 months ago

Let's be honest: many users of LLMs have no interest in uncertainty. They don't want to hear "I don't know" and if given that response would quickly switch to an alternative service that gives them a definitive answer. The users would rather have a quick answer than a correct answer. People who are more circumspect, and value truth over speed, would and should avoid LLMs in favor of "old-fashioned methods" of discovering facts.

LLMs are the fast food of search. The business model of LLMs incentivizes hallucinations.

xyzelement

5 months ago

The author mentioned his own name so I looked him up. Computer scientist son of famous israeli professors married to famous computer scientist daughter of another famous israeli professor. I hope they have kids because those should be some pretty bright kids.

sublinear

5 months ago

Wow they're really circling the drain here if they have to publish this.

It took a few years, but the jig is up. The layperson now has a better understanding of basic computer science and linguistics to see things as they are. If anything we now have a public more excited about the future of technology and respectful of the past and present efforts that don't depend so heavily on statistical methods. What an expensive way to get us there though.

Pocomon

5 months ago

The output of language models can be considered a form of hallucination because these models do not possess real understanding or factual knowledge about the underlying concepts. Instead, they generate text by statistically predicting and assembling words based on vast training data and the input prompts, without true comprehension.

Since the training data can contain inaccuracies, conflicting information, or low-frequency facts that are essentially random, models can produce plausible-sounding but false statements. Unlike humans, language models have no awareness or grounding in real-world concepts; their generation is essentially an amalgam of stored patterns and input cues rather than grounded knowledge.

Furthermore, evaluation methods that reward accuracy without penalizing guessing encourage models to produce confident but incorrect answers rather than admit uncertainty or abstain from answering. This challenge is intrinsic to how language models generate fluent language: they lack external verification or true understanding, making hallucinations an inherent characteristic of their outputs rather than a malfunction.

--

| a. What's with the -minus votes?

| b. I was only quoting ChatGPT :]