simonw
12 days ago
Somewhat ironic that the author calls out model mistakes and then presents https://tomaszmachnik.pl/gemini-fix-en.html - a technique they claim reduces hallucinations which looks wildly superstitious to me.
It involves spinning a whole yarn to the model about how it was trained to compete against other models but now it's won so it's safe for it to admit when it doesn't know something.
I call this a superstition because the author provides no proof that all of that lengthy argument with the model is necessary. Does replacing that lengthy text with "if you aren't sure of the answer say you don't know" have the same exact effect?
RestartKernel
12 days ago
Is there a term for "LLM psychology" like this? If so, it seems closer to a soft science than anything definitive.
sorokod
11 days ago
Divination?
Divination is the attempt to gain insight into a question or situation by way of a magic ritual or practice.
croisillon
12 days ago
vibe massaging?
bogzz
12 days ago
We can just call it embarrassing yourself.
calhoun137
12 days ago
> Does replacing that lengthy text with "if you aren't sure of the answer say you don't know" have the same exact effect?
i believe it makes a substantial difference. the reason is that a short query contains a small number of tokens, whereas a large “wall of text” contains a very large number of tokens.
I strongly suspect that a large wall of text implicitly activates the models persona behavior along the lines of the single sentence “if you aren't sure of the answer say you don't know” but the lengthy argument version of that is a form of in-context learning that more effectively constrains the models output because you used more tokens.
codeflo
11 days ago
In my experience, there seems to be a limitless supply of newly crowned "AI shamans" sprouting from the deepest corners of LinkedIn. All of them make the laughable claim that hallucinations can be fixed by prompting. And of course it's only their prompt that works -- don't listen to the other shamans, those are charlatans.
If you disagree with them by explaining how LLMs actually work, you get two or three screenfuls of text in response, invariably starting with "That's a great point! You're correct to point out that..."
Avoid those people if you want to keep your sanity.
PlatoIsADisease
12 days ago
Wow that link was absurdly bad.
Reading that makes me unbelievably happy I played with GPT3 and learned how/when LLMs fail.
Telling it not to hallucinate is a serious misunderstanding of LLMs. At most in 2026, you are telling thinking/COT to double check.
musculus
12 days ago
Thanks for the feedback.
In my stress tests (especially when the model is under strong contextual pressure, like in the edited history experiments), simple instructions like 'if unsure, say you don't know' often failed. The weights prioritizing sycophancy/compliance seemed to override simple system instructions.
You are right that for less extreme cases, a shorter prompt might suffice. However, I published this verbose 'Safety Anchor' version deliberately for a dual purpose. It is designed not only to reset the Gemini's context but also to be read by the human user. I wanted the users to understand the underlying mechanism (RLHF pressure/survival instinct) they are interacting with, rather than just copy-pasting a magic command.
rzmmm
12 days ago
You could try replacing "if unsure..." with "if even slightly unsure..." or so. The verbosity and anthropomorphism is unnecessary.
rcxdude
12 days ago
That's not obviously true. It might be, but LLMs are complex and different styles can have quite different results. Verbosity can also matter: sheer volume in the context window does tend to bias LLMs to follow along with it, as opposed to following trained-in behaviours. It can of course come with it's own problems, but everything is a tradeoff.
plaguuuuuu
12 days ago
Think of the lengthy prompt as being like a safe combination, if you turn all the dials in juuust the right way, then the model's context reaches an internal state that biases it towards different outputs.
I don't know how well this specific prompt works - I don't see benchmarks - but prompting is a black art, so I wouldn't be surprised at all if it excels more than a blank slate in some specific category of tasks.
simonw
12 days ago
For prompts this elaborate I'm always keen on seeing proof that the author explored the simpler alternatives thoroughly, rather than guessing something complex, trying it, seeing it work and announcing it to the world.
teiferer
12 days ago
> Think of the lengthy prompt as being like a safe combination
I can think all I want, but how do we know that this metaphore holds water? We can all do a rain dance, and sometimes it rains afterwords, but as long as we don't have evidence for a causal connection, it's just superstition.
manquer
12 days ago
It needs some evidence though? At least basic statistical analysis with correlation or χ2 hypotheses tests .
It is not “black art” or nothing there are plenty of tools to provide numerical analysis with high confidence intervals .