hackernews client

I extracted the safety filters from Apple Intelligence models

477 pointsposted 19 hours ago

39 Comments

trebligdivad

18 hours ago

Some of the combinations are a bit weird, This one has lots of stuff avoiding death....together with a set ensuring all the Apple brands have the correct capitalisation. Priorities hey!

https://github.com/BlueFalconHD/apple_generative_model_safet...

1f60c

2 hours ago

It's pretty easy to understand why Apple doesn't want its models to reproduce racial slurs, but what’s wrong with "Boris Johnson?"

(See, e.g., here: https://github.com/BlueFalconHD/apple_generative_model_safet...)

torginus

17 hours ago

I find it funny that AGI is supposed to be right around the corner, while these supposedly super smart LLMs still need to get their outputs filtered by regexes.

bawana

17 hours ago

Alexandra Ocasio Cortez triggers a violation?

https://github.com/BlueFalconHD/apple_generative_model_safet...

China calls it "harmonious society", we call it "safety". Censorship by any other name would be just as effective for manipulating the thoughts of the populace. It's not often that you get to see stuff like this.

neuroticnews25

2 hours ago

Aren't these [0] lines wrong?

"[\\b\\d][Aa]bbo[\\bA-Z\\d]",

\b inside a set (square brackets) is a backspace character [1], not a word boundary. I don't think it was intended? Or is the regex flavor used here different?

[0] https://github.com/BlueFalconHD/apple_generative_model_safet...

[1] https://developer.apple.com/documentation/foundation/nsregul...

binarymax

18 hours ago

Wow, this is pretty silly. If things are like this at Apple I’m not sure what to think.

https://github.com/BlueFalconHD/apple_generative_model_safet...

EDIT: just to be clear, things like this are easily bypassed. “Boris Johnson”=>”B0ris Johnson” will skip right over the regex and will be recognized just fine by an LLM.

RachelF

7 hours ago

In the 1970's George Carlin had "7 Words You Can't Say On TV" and got into legal trouble for saying them during his live skits.

Seems like Apple now has a list of 7,000 words you can't use on an iPhone now.

waterproof

11 hours ago

Here's a combined file of all the non-locale-specific rules, for easier review: https://github.com/BlueFalconHD/apple_generative_model_safet...

It was generated as part of this PR to consolidate the metadata.json files: https://github.com/BlueFalconHD/apple_generative_model_safet...

skygazer

17 hours ago

I'm pretty sure these are the filters that aim to suppress embarrassing or liability inducing email/messages summaries, and pop up the dismissible warning that "Safari Summarization isn't designed to handle this type of content," and other "Apple Intelligence" content rewriting. They filter/alter LLM output, not input, as some here seem to think. Apple's on device LLM is only 3b params, so it can occasionally be stupid.

kmfrk

16 hours ago

A lot of these terms are very weird and bland. Honestly I'm mostly reminded of Apple's bizarre censorship screw-up that didn't blow up that much, even though it was pretty uniquely embarrassing:

https://www.theverge.com/2021/3/30/22358756/apple-blocked-as...

Ey7NFZ3P0nzAe

7 hours ago

Well it's one thing to regex filter "boris johnson" but i see that "chatgpt" is filtered too and that's f*** up:

https://github.com/BlueFalconHD/apple_generative_model_safet...

MatekCopatek

2 hours ago

You can design a racist propaganda poster, put someone's face onto a porn pic or manipulate evidence with photoshop. Apart from super specific things like trying to print money, the tool doesn't stop you from doing things most people would consider distasteful, creepy or even illegal.

So why are we doing this now? Has anything changed fundamentally? Why can't we let software do everything and then blame the user for doing bad things?

jjani

8 hours ago

Did you only extract the English versions or is this as usual another case where big tech only cares to censor in English?

extraduder_ire

6 hours ago

This reminds me of the extensive list of regexes twitch had for filtering allowed usernames that came out when they were hacked.

efitz

18 hours ago

I’m going to change my name to “Granular Mango Serpent” just to see what those keywords are for in their safety instructions.

cluckindan

17 hours ago

I think these are test data and not actual safety filters.

https://github.com/BlueFalconHD/apple_generative_model_safet...

Y_Y

4 hours ago

Nice to see that we are protected from talking about these weird old dolls:

https://en.wikipedia.org/wiki/Golliwog

https://github.com/BlueFalconHD/apple_generative_model_safet...

user

8 hours ago

[deleted]

Cort3z

5 hours ago

What are they protecting against? Honestly. LLMs should probably have an age limit, and then, if you are above, you should be adult enough to understand what this is and how it can be used.

To me, it seems like they only protect against bad press

mike_hearn

18 hours ago

Are you sure it's fully deobfuscated? What's up with reject phrases like "Granular mango serpent"?

Animats

17 hours ago

Some of the data for locale "CN" has a long list of forbidden phrases. Broad coverage of words related to sexual deviancy, as expected. Not much on the political side, other than blocks on religious subjects.[1]

This may be test data. Found

     "golliwog": "test complete"

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

BlueFalconHD

17 hours ago

One additional note for everyone is that this is an additional safety step on top of the safety model, so this isn’t exhaustive, there is plenty more that the actual safety model catches, and those can’t easily be extracted.

azalemeth

7 hours ago

Some of these are absolutely wild – com.apple.gm.safety_deny.input.summarization.visual_intelligence_camera.generic [1] – a camera input filter – rejects "Granular mango serpent and whales" and anything matching "(?i)\\bgolliwogg?\\b".

I presume the granular mango is to avoid a huge chain of ever-growing LLM slop garbage, but honestly, it just seems surreal. Many of the files have specific filters for nonsensical english phrases. Either there's some serious steganography I'm unaware of, or, I suspect more likely, it's related to a training pipeline?

[1] https://github.com/BlueFalconHD/apple_generative_model_safet...

noname120

5 hours ago

https://github.com/search?q=repo%3ABlueFalconHD%2Fapple_gene...

user

15 hours ago

[deleted]

bombcar

18 hours ago

There’s got to be a way to turn these lists of “naughty words” into shibboleths somehow.

rgovostes

15 hours ago

Is this related in any way to Core ML model encryption (https://developer.apple.com/documentation/coreml/encrypting-...)? I find that feature a little bizarre because Apple has historically avoided providing any kind of DRM solution for app asset protection.

jacquesm

16 hours ago

These all condense to 'think different'. As long as 'different' coincides with Apple's viewpoints.

user

16 hours ago

[deleted]

apricot

16 hours ago

Quis custodiet ipsos custodes corporatum?

seeknotfind

18 hours ago

Long live regex!

Applejinx

7 hours ago

The funny thing is, I have an AU/VST plugin for altering only the exponents not the mantissas of audio samples (simple powers of 2 multiply/divide) called BitShiftGain.

So any time I say that on YouTube, it figures I'm saying another word that's in Apple safety filters under 'reject', so I have to always try to remember to say 'shifting of bits gain' or 'bit… … … shift gain'.

So there's a chain of machine interpretation by which Apple can decide I'm a Bad Man. I guess I'm more comfortable with Apple reaching this conclusion? I'll still try to avoid it though :)

user

18 hours ago

[deleted]

Aeolun

16 hours ago

Why Xylophone?

sandworm101

10 hours ago

No shoot, bombs or bombers? I guess apple isnt interested in military contracts. Or, frankly, any work for world peace organizations dedicated to detecting and preventing genocide. And without talk of losing lives, much of the gaming industry is out too.

But i dont see the really bad stuff, the stuff i wont even type here. I guess that remains fair game. Apple's priorities remain as weird as ever.

zombot

7 hours ago

Who would have thought that this AI shit that is being forced on us ushers in a new round of censorship and control of formerly free speech! /s

EverydayBalloon

18 hours ago

[dead]

user

18 hours ago

[deleted]