I extracted the safety filters from Apple Intelligence models

477 pointsposted 19 hours ago
by BlueFalconHD

39 Comments

torginus

17 hours ago

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

userbinator

17 hours ago

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

RachelF

7 hours ago

In the 1970's George Carlin had "7 Words You Can't Say On TV" and got into legal trouble for saying them during his live skits.

Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.

skygazer

17 hours ago

I'm pretty sure these are the filters that aim to suppress embarrassing or liability inducing email/messages summaries, and pop up the dismissible warning that "Safari Summarization isn't designed to handle this type of content," and other "Apple Intelligence" content rewriting. They filter/alter LLM output, not input, as some here seem to think. Apple's on device LLM is only 3b params, so it can occasionally be stupid.

MatekCopatek

2 hours ago

You can design a racist propaganda poster, put someone's face onto a porn pic or manipulate evidence with photoshop. Apart from super specific things like trying to print money, the tool doesn't stop you from doing things most people would consider distasteful, creepy or even illegal.

So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?

jjani

8 hours ago

Did you only extract the English versions or is this as usual another case where big tech only cares to censor in English?

extraduder_ire

6 hours ago

This reminds me of the extensive list of regexes twitch had for filtering allowed usernames that came out when they were hacked.

efitz

18 hours ago

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

user

8 hours ago

[deleted]

Cort3z

5 hours ago

What are they protecting against? Honestly. LLMs should probably have an age limit, and then, if you are above, you should be adult enough to understand what this is and how it can be used.

To me, it seems like they only protect against bad press

mike_hearn

18 hours ago

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

Animats

17 hours ago

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"
[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHD

17 hours ago

One additional note for everyone is that this is an additional safety step on top of the safety model, so this isn’t exhaustive, there is plenty more that the actual safety model catches, and those can’t easily be extracted.

azalemeth

7 hours ago

Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

user

15 hours ago

[deleted]

bombcar

18 hours ago

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

jacquesm

16 hours ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

user

16 hours ago

[deleted]

apricot

16 hours ago

Quis custodiet ipsos custodes corporatum?

Applejinx

7 hours ago

The funny thing is, I have an AU/VST plugin for altering only the exponents not the mantissas of audio samples (simple powers of 2 multiply/divide) called BitShiftGain.

So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.

So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)

user

18 hours ago

[deleted]

Aeolun

16 hours ago

Why Xylophone?

sandworm101

10 hours ago

No shoot, bombs or bombers? I guess apple isnt interested in military contracts. Or, frankly, any work for world peace organizations dedicated to detecting and preventing genocide. And without talk of losing lives, much of the gaming industry is out too.

But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.

zombot

7 hours ago

Who would have thought that this AI shit that is being forced on us ushers in a new round of censorship and control of formerly free speech! /s

user

18 hours ago

[deleted]