JohnMakin
5 days ago
Besides harping on the fact that "hallucination" is unnecessarily anthropomorphizing these tools, I'll relent because clearly that argument has been lost. This is more interesting to me:
> When there is general consensus on a topic, and there is a large amount of language available to train the model, LLM-based GPTs will reflect that consensus view. But in cases where there are not enough examples of language about a subject, or the subject is controversial, or there is no clear consensus on the topic, relying on these systems will lead to questionable results.
This makes a lot of intuitive sense, just from trying to use these tools to accelerate Terraform module development in a production setting - Terraform, particularly HCL, should be something LLM's are extremely good at. It's very structured, the documentation is broadly available, and tons of examples and oodles of open source stuff exists out there.
It is pretty good at parsing/generating HCL/terraform for most common providers. However, about 10-20% of the time, it will completely make up fields or values that don't exist or work but look plausible enough to be right - e.g., mixing up a resource ARN with an resource id, or things like "ssl_config" may become something like "ssl_configuration" and leave you puzzling for 20 minutes what's wrong with it.
Another thing it will constantly do is mix up versions - terraform providers change often, deprecate things all the time, and there are a lot of differences in how to do things even between different terraform versions. So, by my observation in this specific scenario, the author's intuition rings completely correct. I'll let people better at math than me pick it apart though.
final edit: Although I love the idea of this experiment, it seems like it's definitely missing a "control" response - a response that isn't supposed to change over time.
kreims
5 days ago
Please keep harping. The marketing myths that gets circulated about these models are creating very serious misunderstandings and misallocation of resources. I am hopeful that more cautious and careful dialogue like this will curb the notions of sentience or human intelligence that exciting headlines seemed to have put in the public discussion of these tools.
shwaj
5 days ago
What’s the alternative? You can’t just say “don’t say that”. There needs to be something you can say instead, 5 syllables at the most, which evokes the same feeling of confident wrongness, without falling into anthropomorphism. It’s a tall order.
danielbln
5 days ago
Confabulation is a term often brought forward as an alternative, but compared to hallucination almost noone knows what confabulation means. Metaphors like hallucinating might be anthropomorphizing, but they convey meaning well, so personally I look for other hills to die on.
Same with "it's not really AI", because no it's not, but language is fluid and that's alright.
JohnMakin
5 days ago
Well, if you want to convey confident incorrectness - hallucination is definitely not the word, confabulate is far more like what is happening here. But, that's still anthropomorphizing. I'd prefer "incorrect response" or "bug."
heresie-dabord
5 days ago
Agree. Incorrect response, or faulty, or erroneous, and/or unsuitable.
We do not call it "hallucination" when a human provides unfounded, or dubious, or poorly-structured, or untrustworthy, or shallowly parroted, or patently wrong information.
We wouldn't have confidence in a colleague who "hallucinated" like this. What is the gain in having a system that generates rubbish for us?
Zondartul
5 days ago
You can say "Bullshit". LLMs bullshit all the time. Talk without regard to the truth or falsity of statements. It also doesn't pressupose that the trueness is known, nir deny it, so it should satisfy both camps; unlike hallucination which implies that truth and fiction are separate.
I wonder if there is some sort of transition between recalling declarative facts (some of which have been shown to be decodable from activations) on one hand and completing the sentence with the most fitting word on the other hand. The dream that "hallucination" can be eliminated requires that the two states be separable, yet it is not evident to me that these "facts" are at all accessible without a sentence to complete.
empthought
5 days ago
Technically, "bullshit" is the most accurate term. From "On Bullshit" by Professor Harry Frankfurt:
"What is wrong with a counterfeit is not what it is like, but how it was made. This points to a similar and fundamental aspect of the essential nature of bullshit: although it is produced without concern with the truth, it need not be false. The bullshitter is faking things. But this does not mean that he necessarily gets them wrong."
Both "hallucinations" and valuable output are produced by exactly the same process: bullshitting. LLMs do for bullshitting what computers do for arithmetic.
bastawhiz
5 days ago
So the verb is "bullshitting" which does an even worse job of avoiding anthropomorphizing or attributing sentience to the model. At least "hallucinating" isn't done with conscious effort; "bullshitting" implies effort.
chaosist
3 days ago
Frankfurt's use of bullshit is what has always came to my mind also but you make an excellent point.
I think we really need a new word for this process because it really is just not comparable to anything previously.
Unfortunately, "hallucinate" is a horse that has left the barn with seemingly no possible way of replacing the horse at this point.
empthought
3 days ago
It's a computer bullshitting, the same way as a computer calculating is comparable to a human calculating unaided by a computer.
empthought
3 days ago
No, it ascribes accountability to the humans who employ a bullshitting machine to bullshit more effectively. It doesn't anthropomorphize anything, any more than "calculating" anthropomorphizes a computer doing arithmetic.
bastawhiz
2 days ago
If you can ascribe accountability of "bullshitting" or "calculating" to the human who's using the machine then there's exactly no reason "thinking" or "writing" can't be ascribed to the human who's using the machine. There's no obvious line where the semantics of some words should or should not apply to a machine for behaviors that (up until recently) only applied to humans.
JohnMakin
5 days ago
It just draws too many annoying comments and downvotes, and has been discussed ad nauseam on this forum and others - but I broadly agree. There are "features" with these applications where if I'm rude, or frustrated with the responses, the model will say things like "I'm not continuing this conversation."
How utterly absurd, it has no emotions, and there's no way that response was the result of a training set. It's just dumb marketing, all of it. And the real shame is (and the thing that actually pisses me off about the marketing/hype) that the useful things we actually have uncovered from ML or "AI" the last 10 years will be lost again in the inevitable AI winter we're facing following from whenever this market bubble collapses.
radarsat1
5 days ago
what you're referring to has nothing to do with how GPTs are pretrained or with hallucinations in and of themselves, and everything to do with how companies have reacted to the presence of hallucinations and general bad behavior, using a combination of fine tuning, RLHF, and keyword/phrase/pattern matching to "guide" the model and cut it off before it says something the company would regret (for a variety of reasons)
In other words, your complaints are ironically not about what the article is discussing, but about, for better or for worse, attempts to solve it.
JohnMakin
5 days ago
I mean, in so many words that's precisely what I am complaining about. Their attempt to solve it is to make it appear more human. What's wrong with an error message? Or in this specific example - why bother at all? Why even stop the conversation? It's ridiculous.
throwaway314155
5 days ago
RLHF is what was responsible for your frustration. You're assuming there is a scalable alternative. There is not.
> What's wrong with an error message?
You need a dataset for RLHF which provides an error message _only_ when appropriate. That is not yet possible. For the same reason the conversation stops.
> Or in this specific example - why bother at all? Why even stop the conversation? It's ridiculous.
They want a stop/refusal condition to prevent misuse. Adding one at all means sometimes stopping when the model should actually keep going. Not only is this subjective as hell, but there's still no method to cover every corner case (however objectively defined those may be).
You're correct to be frustrated with it, but it's not as though they have some other option that allows them to detect how and when to stop/not stop, error message/complain for every single human's preference patterns on the planet. Particularly not one that scales as well as RLHF on a custom dataset of manually written preferences. It's an area of active research for a reason.
eschneider
5 days ago
Don't anthropomorphize LLMs. They hate that.
And it's not even a question of LLMs getting answers "wrong". It's just generating associated text. It has no concept of right or wrong answers.
baxtr
5 days ago
I think it’s totally fine to am LLMs. In the end they have been trained on human input.
nerdjon
5 days ago
I get the concern over what using the word hallucination implies, I also think it is a fairly fitting word.
We need something easy to explain when these systems are straight up wrong. Something that a normal non technical user will understand. Sure saying "wrong" could be easy enough, I think "Hallucination" also has a simplicity too it.
Part of the problem is that these models will appear to confidently be wrong. Hallucinate to me kinda goes along with this, it isn't just wrong things are being made up.
But regardless of that, people are used to calling it hallucinating. We are also up against an effort to downplay any concern over this fundamental problem with the technology and already trying to push it as a general AI (And we have to recognize there is a ton of money on pushing this exact narrative), that I would be worried about confusing the topic by pushing for an alternative term giving leeway to further downplay the problem.
tim333
4 days ago
In favour of "hallucination" it's not that much of an anthropomorphization because hallucination in a human context is something quite different - seeing ghosts and the like. If you use it in the context of an LLM everyone knows what you mean. The human terms for making random stuff up would be bullshiting, imagining etc.
basch
5 days ago
There is a secondary issue of LLM's taking questions literally, and not really being able to (at the moment) deny the premise of a question. For example, if you google benefits of circumcision, the LLM will quite literally print all the benefits. But it also wont contextualize them, it wont frame them, it wont provide counter arguments, it just responds literally to the question.
FrustratedMonky
5 days ago
Maybe instead of hallucinate? Use 'BS'?
To anthropomorphize even more. Since humans will also just create "BS" as an answer if they don't know the answer, or will combine half bits of knowledge into something to sound like they know what they are talking about.
freilanzer
4 days ago
I think it's a perfectly good word for what is happening.
ysofunny
5 days ago
just to be clear, I see it like this (for now):
if a GPT does it and turns out to be false, then it's an hallucination and it's bad (goto more training)
if a human does it, then truth becomes "self-expression" (art) so we call it creativity and it's good
bayindirh
5 days ago
> if a human does it, then truth becomes "self-expression" (art) so we call it creativity and it's good.
Depends. Once I misremembered the usage of the command "ln", and I wiped ~10 machines inadvertently.
Nobody called it self-expression / art, and none of the results of my little "experiment" were good.
Do it a couple of times, and you'll be updating your CV.
TremendousJudge
5 days ago
No, if a human does it by accident, as is clearly the case here, we call it "hallucination", "misremembering", "mandela effect" or "dementia"
ysofunny
5 days ago
the point of contention comes out of how you are saying "...by accident" but I'm sidelining the intention