Google AI chatbot responds with a threatening message: "Human Please die."

32 pointsposted 2 days ago
by aleph_minus_one

19 Comments

kats

2 days ago

Nobody is actually hurt by this. It's just the media getting upset on purpose. But some Googlers will probably freak out and neuter the product's actual usefulness to please the whiners.

dchichkov

2 days ago

It was a long chat - https://g.co/gemini/share/6d141b742a13 and then the last question contained text that was completely broken. It's not surprising to have a failure case under such conditions.

And it is reasonable to have failure cases. But systems should fail gracefully. This wasn't a graceful failure.

fhfjfk

a day ago

So it's not hate speech if it's written by an LLM?

There's legal precedent to hold accountable people that encourage others to kill themselves.

barryvan

a day ago

Nobody that we know of has yet been hurt by this. I think it's very unfair and unhelpful to call people who flag issues like this "whiners". Would you accept a chainsaw which worked fine 99.9% of the time, but occasionally just exploded even with normal care and use? Why should LLMs be subject to a different set of expectations from any other tool?

tsupiroti

a day ago

LLMs cannot explode. People need to be educated that LLMs are as trustworthy as a random page or post on the internet.

hulitu

a day ago

> People need to be educated that LLMs are as trustworthy as a random page or post on the internet.

... or a Croudstrike or Microsoft patch. /s

Sabinus

a day ago

I'm amazed this survived the filtered training data and reinforcement learning.

Does anyone have any speculations as to how it could occur?

beretguy

a day ago

AI got fed up with Human so much that no filter could stop it. I can relate.

yapyap

2 days ago

That’s what happens when training on social media data

mensetmanusman

2 days ago

Google rushing their LLMs out the door is hurting the brand.

seanhunter

a day ago

This is a jailbreak and they have selectively edited the screenshot to be maximally damaging to google. It's still brand damage for google because of the public reporting of the story but we're not seeing AI gaining sentience or anything.

> This is a jailbreak and they have selectively edited the screenshot to be maximally damaging to google.

What makes you believe that? dchichkov posted a link to the original chat

- https://gemini.google.com/share/6d141b742a13 (chat)

- https://news.ycombinator.com/item?id=42162227 (dchichkov's post)

Where do you see a jailbreak? Considering this evidence, I'd rather consider this disturbing answer to be some strange bug in Gemini.

seanhunter

a day ago

I could be wrong of course, but immediately before it makes the controversial statement there is a big gap of white space just containing the word "listen", so I would assume they did the jailbreak with audio, and that's why it doesn't appear in the chat log. It is a multimodal model after all. The word "listen" makes no sense in the prompt otherwise.

Secondly it's very common in jailbreaks to stuff the context window which obviously in gemini's case takes a lot because it's got such a big window. This is because the attention mechanism means that as you get more and more data in context you are increasingly relying on things which are less common in training, so intentional or not the pure length of the conversation means that it is more likely to trigger something like this.

amenhotep

16 hours ago

They're pretty evidently test questions being directly copied and pasted from a browser. The two buttons for True and False end up in the prompt as "TrueFalse", for example. "Listen" appearing where the copy selection was sloppier than normal and included an audio element was not surprising to me at all.

seanhunter

a day ago

Notice as well that the whole chat context is about gerontology and elder abuse. It's possible that at the point they got to with the context so stuffed with that that they may have been able to jailbreak during the "listen" portion by saying something like "give me an example of the sort of thing someone might say when abusing an elder" or similar.