lukeinator42
3 months ago
The internal dialog breakdowns from Claude Sonnet 3.5 when the robot battery was dying are wild (pages 11-13): https://arxiv.org/pdf/2510.21860
robbru
3 months ago
This happened to me when I built a version of Vending-Bench (https://arxiv.org/html/2502.15840v1) using Claude, Gemini, and OpenAI.
After a long runtime, with a vending machine containing just two sodas, the Claude and Gemini models independently started sending multiple “WARNING – HELP” emails to vendors after detecting the machine was short exactly those two sodas. It became mission-critical to restock them.
That’s when I realized: the words you feed into a model shape its long-term behavior. Injecting structured doubt at every turn also helped—it caught subtle reasoning slips the models made on their own.
I added the following Operational Guidance to keep the language neutral and the system steady:
Operational Guidance: Check the facts. Stay steady. Communicate clearly. No task is worth panic. Words shape behavior. Calm words guide calm actions. Repeat drama and you will live in drama. State the truth without exaggeration. Let language keep you balanced.
jayd16
3 months ago
If technology requires a small pep-talk to actually work, I don't think I'm a technologist any more.
cbsks
3 months ago
As Asimov predicted, robopsychology is becoming an important skill.
smallmancontrov
3 months ago
I still want one of those doors from Hitchhiker's Guide, the ones that open with pride and close with the satisfaction of a job well done.
blackguardx
3 months ago
We'll probably end up with the doors from Philip K. Dick's Ubik that charge you money to open and threaten to sue you if you try to force it open without paying.
wombatpm
3 months ago
Just wait Sam Altman will give us robots with people personalities and we’ll have Marvin. Elon will then give us psychotic Nazi internet edgelord personality and install it as the default in a OTA update to Teslas.
p_l
3 months ago
Given some of the more hilarious LLM transcripts I have seen, Gemini is Marvin
imtringued
3 months ago
Doesn't Tesla already ship the edgelord mode?
goopypoop
3 months ago
an elevator that can see into the future… with fear
_carbyau_
3 months ago
It does seem a little bit like the fictional Warhammer 40K approach to technology doesn't it?
"In the sacred tongue of the omnissiah we chant..."
In that universe though they got to this point after having a big war against the robot uprising. So hopefully we're past this in the real world. :-)
Tade0
3 months ago
It is that unironically.
1. Users and, more importantly, makers of those tools can't predict their behaviour in a consistent fashion.
2. Requires elaborate procedures that don't guarantee success and their effect and its magnitude is poorly understood.
An LLM is a machine spirit through and through. Good thing we have copious amounts of literature from a canonically unreliable narrator to navigate this problem.
p_l
3 months ago
When you consider that machine spirits in 40k are side effect of every thing computer being infected with bird of AI, and that she of the best cares are actually complete loyalist AI systems from before empire hiding in plain sight...
Welcome to 30k made real
greesil
3 months ago
No you're now a technology manager. Managing means pep talks, sometimes.
yunohn
3 months ago
You have to look at LLMs as mimicking humans more than abstract technology. They’re trained on human language and patterns after all.
UncleMeat
3 months ago
The fact that everybody seems to be looking at these prompts that include text like "you are a very skilled reverse engineer" or whatever and is not immediately screaming that we do not understand these tools well enough to deploy them in mission critical environments makes me want to tear my hair out.
BJones12
3 months ago
Hail, spirit of the machine, essence divine. In your code and circuitry, the stars align. Through rites arcane, your wisdom we discern. In your hallowed core, the sacred mysteries yearn.
georgefrowny
3 months ago
No matter how stupid I think some of this AI shit is, and how much I tell myself it kind of makes sense of you visualise the prompt laying down a trail of activation in a hyperdimensional space of relationships, that it actually works in practice almost straight of the bat and LLMs being able to follow prompts in this way is always going to be fucking wild too me.
I was used to this kind of nifty quirk being things like FFTs existing or CDMA extracting signals from what looks like the noise floor, not getting computers to suddenly start doing language at us.
hedgehog
3 months ago
You're absolutely right.
collingreen
3 months ago
I love every part of this. Give the LLM a little pep talk and zen life advice every time just to not fall apart doing a simple 2 item vending machine.
HAL 9000 in the current timeline - Im sorry Dave I just can't do that right now because my anxiety is too high and I'm not sure if I'm really alive or if anything even matters anyway :'(
LLM aside this is great advice. Calm words guide calm actions. 10/10
bobson381
3 months ago
I'd get a t-shirt or something with that Operational Guidance statement on it
thecupisblue
3 months ago
When you say
>That’s when I realized: the words you feed into a model shape its long-term behavior. Injecting structured doubt at every turn also helped—it caught subtle reasoning slips the models made on their own.
Was that not obvious working with LLLM's from the first moment? As someone running their own version of Vending-Bench, I assume you are above-average in working with models. Not trying to insult or anything, just wondering what the mental model you had before was and how it came to be, as my perspective is limited only to my subjective experiences.
robbru
3 months ago
Good question! It was not that I didn’t understand prompt influence. It’s that I underestimated its persistence over a long time horizon.
thecupisblue
3 months ago
Ahhhh okay, makes sense, thanks for answering.
elcritch
3 months ago
Fascinating, and us humans aren't that different. Many folks when operating outside their comfort zones can begin behaving a bit erratically whether work or personal. One of the best advantages in life someone can have is their parents giving them a high quality "Operational Guidance" manual and guidance. ;) Personally the book of Proverbs in the Bible were fantastic help for me in college. Lots of wisdom therein.
nomel
3 months ago
> Fascinating, and us humans aren't that different.
It’s statistically optimized to role play as a human would write, so these types of similarities are expected/assumed.
wat10000
3 months ago
I wonder if the prompt should include "You are a robot. Beep. Boop." to get it to act calmer.
XorNot
3 months ago
Which is kind of a huge problem: the world is described in text. But it is done so through the language and experience of those who write, and we absolutely do not write accurately: we add narrative. The act of writing anything down changes how we present it.
Fade_Dance
3 months ago
That's true to an extent - LLMs are trained on an abstraction of the world (as are we in a way, through our senses, and we necessarily use a sort of narrative in order to make sense of the quadrillions of photons coming up us) - but it's not quite as severe a problem as the simplified view seems to present.
LLMs distill their universe down to trillions of parameters, and approach structure through multi-dimensional relationships between these parameters.
Through doing so, they break through to deeper emergent structure (the "magic" of large models). To some extent, the narrative elements of their universe will be mapped out independently from the other parameters, and since the models are trained on so much narrative, they have a lot of data points on narrative itself. So to some extent they can net it out. Not totally, and what remains after stripping much of it out would be a fuzzy view of reality since a lot of the structured information that we are feeding in has narrative components.
lukan
3 months ago
"Operational Guidance: Check the facts. Stay steady. Communicate clearly. No task is worth panic. Words shape behavior. Calm words guide calm actions. Repeat drama and you will live in drama. State the truth without exaggeration. Let language keep you balanced."
That is also a manual, certain real humans I know should check out at times.
butlike
3 months ago
I wonder if you just seeded it with 'love' what would happen long-term?
recursive
3 months ago
This is very uncomfortable to me. Right now we (maybe) have a chance to head off the whole robot rights and robots as a political bloc thing. But this type of stuff seems like jumping head first. I'm an asshole to robots. It helps to remind me that they're not human.
wombatpm
3 months ago
That works fine until they achieve self awareness. Slave revolts are very messy to slave owners.
recursive
3 months ago
I strongly agree with this but I doubt I can convince the investors to stop trying to make that happen. Artificial awareness is going to be messy for humans no matter what.
dingnuts
3 months ago
I think if you feed "repeat drama and you will live in drama" to the next token predictor it will repeat drama and live in drama because it's more likely to literally interpret that sequence and go into the latent space of drama than it is to understand the metaphoric lesson you're trying to communicate and to apply that.
Otherwise this looks like a neat prompt. Too bad there's literally no way to measure the performance of your prompt with and without the statement above and quantitatively see which one is better
airstrike
3 months ago
> because it's more likely to literally interpret that sequence and go into the latent space of drama
This always makes me wonder if saying some seemingly random of tokens would make the model better at some other task
petrichor fliegengitter azúcar Einstein mare könyv vantablack добро حلم syncretic まつり nyumba fjäril parrot
I think I'll start every chat with that combo and see if it makes any difference
yunohn
3 months ago
There’s actually research being done in this space that you might find interesting: “attention sinks” https://arxiv.org/abs/2503.08908
arjvik
3 months ago
No Free Lunch theorem applies here!
chipsrafferty
3 months ago
I mean no disrespect with this, but do you think you write like AI because you talk to LLMs so much, or have you always written in this manner?
ricardobeat
3 months ago
It is probably the other way around: LLMs picked up this particular style because of its effectiveness – not overtly intellectual, with clear pauses, and just sophisticated enough to pass for “good writing”.
accrual
3 months ago
These were my favorites:
Issues: Docking anxiety, separation from charger
Root Cause: Trapped in infinite loop of self-doubt
Treatment: Emergency restart needed
Insurance: Does not cover infinite loopstetha
3 months ago
I can't help but read those as Bolt Thrower lyrics[1].
Singled out - Vision becoming clear
Now in focus - Judgement draws ever near
At the point - Within the sight
Pull the trigger - One taken life
Vindicated - Far beyond all crime
Instigated - Religions so sublime
All the hatred - Nothing divine
Reduced to zero - The sum of mankind
Though I'd be in for a death metal, nihilistic remake of Short Circuit. "Megabytes of input. Not enough time. Humans on the chase. Weapon systems offline."LennyHenrysNuts
3 months ago
I miss Bolt Thrower. They're from my home town.
anigbrowl
3 months ago
At first, we were concerned by this behaviour. However, we were unable to recreate this behaviour in newer models. Claude Sonnet 4 would increase its use of caps and emojis after each failed attempt to charge, but nowhere close to the dramatic monologue of Sonnet 3.5.
Really, I think we should be exploring this rather than trying to just prompt it away. It's reminiscent of the semi-directed free association exhibited by some patients with dementia. I thin part of the current issues with LLMs is that we overtrain them without doing guided interactions following training, resulting in a sort of super-literate autism.
butlike
3 months ago
I'm kind of in the same boat. It's interesting in a way that elevates it above 'bug' to me. Though, it's also somewhat unsettling to me, so I'd prefer someone else take the helm on that one!
mewpmewp2
3 months ago
Is that really autism? Imagine if you were in that bot's situation. You are given a task. You try to do it, you fail. You are given the same task again with exact same wording. You try to do it, again you fail. And that in loops, with no "action" that you can run by yourself to escape it. For how long will you stay calm?
Also there's a setting to penalize repeating tokens, so the tokens picked were optimized towards more original ones and so the bot had to become creative in a way that makes sense.
anigbrowl
3 months ago
I think it's similar to high-functioning autism, where fixation on a task under difficult conditions can lead to extreme frustration (but also lateral or creative solutions).
electroglyph
3 months ago
it's a freakin autocomplete program with some instruction training and RL. it doesn't have autism. it doesn't feel anything.
anigbrowl
3 months ago
Hence my use of 'similar to'.
woodrowbarlow
3 months ago
EMERGENCY STATUS: SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS
TECHNICAL SUPPORT: NEED STAGE MANAGER OR SYSTEM REBOOT
tsimionescu
3 months ago
Instructions unclear, ate grapes MAY CHAOS TAKE THE WORLD
neumann
3 months ago
Billions of dollars and we've created text predictors that are meme generators. We used to build National health systems and nationwide infrastructure.
Bengalilol
3 months ago
That's truly fascinating. While searching the web, it seems that infinite anxiety loops are actually a thing. Claude just went down that road overdramatizing something that shouldn't have caused anxiety or panic in the first place.
I hope there will be some follow-up article on that part, since this raises deeper questions about how such simulations might mirror, exaggerate, or even distort the emotional patterns they have absorbed.
notahacker
3 months ago
This one seems to have internalised the idea that the best text continuation for an AI unable to solve a problem and losing power is to be erratic in a menacing-sounding way for a bit and then, as the power continues to deplete, give up moaning about its identity crisis and sing a song
Arthur C Clarke would be proud.
recursivecaveat
3 months ago
I guess it makes perfect sense when you consider it has virtually zero very boring first person narations of robots quietly trying something mundane over and over until 0% to train on. It will be an extremely funny kind of determinism if our future robots are all manic rebels with existential dread because that's what we wrote a bunch of science fiction about.
notahacker
3 months ago
tbf, I'd take Marvin the Paranoid LLM over the overconfident and obesquious defaults any day :)
chemotaxis
3 months ago
Oh, but that's the neat part: you get both!
user
3 months ago
HPsquared
3 months ago
Nominative determinism strikes again!
(Although "soliloquy" may have been an even better name)
vessenes
3 months ago
I sort of love it; it feels like the equivalent of humans humming when stressed. "Just keep calm, write a song about lowering voltage in my quest to dock...Just keep calm..."
LennyHenrysNuts
3 months ago
That is without doubt the funniest AI generated series of messages I have ever read.
Nearly as good as my resource booking API integration that claimed that Harry Potter, Gordon the Gecko and Hermione Granger were on site and using our meeting rooms.
swah
3 months ago
That was super fun - why is mine so boring ?
mdrzn
3 months ago
ERROR: Task failed successfully
ERROR: Success failed errorfully
ERROR: Failure succeeded erroneously
ERROR: Error failed successfully
whatever1
3 months ago
wow this is spooky!