daedrdev
4 hours ago
The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.
It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.
Edit; to be clear they tell you when they degrade it for cybersecurity and bio
SXX
a minute ago
> The strangest part is that it won't just reject ML research, which I can understand, it will sabotage it silently by using a worse model without revealing it is doing so.
Any kind of silent sabotaging is absolutely unacceptable for any commercial service
They charge for tokens and charge a lot. They can't just degrade service silently and still charge you the same.
_boffin_
2 hours ago
The thing that I keep thinking about is the accounting / charging when it downgrades automatically.
Do they adjust the price of the api request so that only the tokens that were utilized by fable get charged at that price and the remaining tokens that the cheaper / nerfed (fable) model utilizes get charged at that price?
If the answer is no, could that be construed as fraud?
CGamesPlay
an hour ago
The announcement elucidated this, and it's IMO worse than this. They don't downgrade to a cheaper model ([edit] for certain classes of offense they suspect you of). They sabotage the model's outputs in other, undisclosed, ways (specifically, "prompt modification, steering vectors, or parameter-efficient fine-tuning"). So, for example, they might load in a steering vector that just forgets the API to PyTorch, for example. But it isn't just "we redirected you to a cheaper model!"
buildbot
39 minutes ago
It honestly explains so many issues I have been having, as I used it primarily for ML research (on my personal account, doing things not related to my job I should note). It would literally typo package names and spend huge amounts of time failing to setup simple environments…then do stupid things like set the learning rate to 1e-7, and use the eval set as training data.
tfirst
2 hours ago
Their goal is to downgrade people who are violating their TOS, so I think they'd have some argument there. I have no idea how they'll deal with inevitable false positives, especially given how oversensitive most of the other triggers are.
dannyw
an hour ago
The challenge is the examples they’ve mentioned (distributed training infra? ML acceleration techniques?) go beyond what’s prohibited by their ToS and is like a catch net.
I would wager the majority of ML and data science work in the world aren’t frontier LLM development.
weitendorf
an hour ago
Yes, this is the problem. They are business interests of Anthropic and have nothing to do with “safety”
sudoshred
an hour ago
Safety of their IPO
loeg
an hour ago
If it's a violation of ToS, just reject instead of silently downgrading.
SR2Z
an hour ago
But then someone would figure out some prompts that don't trigger this, and Anthropic wouldn't be able to try and disadvantage competitors.
garciasn
2 hours ago
It royally pissed me off today by just continuing with credits without stopping to ask me if I was ok with it.
Ran up $30 in extra charges while it was just flashing on the screen that it was doing that after I walked away to do something while it was humming along.
It has always just told me I ran out of usage and had to wait before. Now? You’re just gonna pay extra because you left it unattended as you’ve done for the last year of use.
weird-eye-issue
an hour ago
You've already explicitly enabled extra usage in your account settings though, it is not on by default
garciasn
an hour ago
Unknowingly. Is that set at the org level? Because I never set it and never had it do that before.
MillionOClock
2 hours ago
Do you have Usage credits turned on in your settings?
robrenaud
2 hours ago
They use a lightweight adapter to silently degrade the performance. Usually these adaptors are made to improve the performance for a given domain/task.
throwawayffffas
3 hours ago
Can you imagine if AMD or Intel throttled your cpu if it detected you were working on "cybersecurity" or if you were designing a cpu?
h6d_100c
11 minutes ago
Or if GPU companies detected you were trying to train a model and injected intentional numerical errors.
rvz
2 hours ago
Or if your "self-driving" system such as FSD / waymo slowed the car down once it detected you work in cybersecurity or at a rival automaker and you were attempting to reach the train station or the airport to make you miss a conference meetup.
pocksuppet
2 hours ago
Trains made by Newag were programmed to brick themselves if they detected a non-Newag workshop was repairing them.
https://news.ycombinator.com/item?id=38638865
https://news.ycombinator.com/item?id=38628635
loeg
an hour ago
And that was correctly perceived to be illegal by antitrust regulators.
stackghost
2 hours ago
There's no doubt in my mind they would if they could.
__dxtj__
an hour ago
It would suck, but guardrails on new technologies like this aren't unheard of. It's like when consumer GPS used to stop working at very high speeds because they didn't want people to use it for missile guidance systems.
loeg
an hour ago
Consumer GPS is still disabled at high speeds. I would argue the analogy doesn't carry due to harm and error rate differences.
h6d_100c
9 minutes ago
Yep a totally different use case and set of guardrails. There’s very little (not zero) consumer utility in GPS above say 15k feet AND 400 MPH or whatever the actual limit is. That’s basically tracking model rockets that are incidentally impacted and nothing else, from what I can think of.
Barbing
an hour ago
> used to
When’d that change?
jamiek88
6 minutes ago
He’s probably thinking of the accuracy limit to civilians it launched with.
airstrike
2 hours ago
> it won't just reject ML research, which I can understand
I don't.
kube-system
2 hours ago
Anthropic has already been burned before on this. DeepSeek was trained on million of conversations with Claude. And DeepSeek created thousands of free accounts to burn all this compute at their expense.
ceejayoz
an hour ago
And they're hilariously pissy about it for a megacorp that did the same with the entire Internet and every library book they could get their hands on.
ainch
2 hours ago
Anthropic's claim was that Deepseek collected ~150k conversations.
https://www.anthropic.com/news/detecting-and-preventing-dist...
I think the extent of distillation by Deepseek specifically is overstated. For comparison, Minimax collected over 13m 'exchanges', which starts to sound a lot more like large-scale distillation.
kube-system
an hour ago
Ah, dang it. My college professors warned me about this: the Wikipedia page I read the other day is wrong!
pocksuppet
2 hours ago
They don't want someone to piggyback Anthropic's Mythos to make their own Mythos with less effort than it cost Anthropic.
airstrike
an hour ago
Ironic, given they piggybacked on the entirety of human knowledge and massive amounts of GPL'd software and repeatedly say they want to replace people with a tool.
And now they say that's fine so long as people are entertained.
dannyw
an hour ago
That I can understand. It’s Anthropic’s right to choose their customers.
But silent degradation for use cases including “distributed training” as one of their examples is going to catch up a lot of proper use cases. Not everyone in AI or ML is trying to build frontier LLMs. Heck, most probably aren’t.
binyu
30 minutes ago
Hey guys,
check out this technique https://github.com/0xSufi/fable-jailbreak/
It works with security audits and other workflows that are currently blocked.
loneboat
4 hours ago
I've seen this claim a few times, but when I triggered the guardrails in Claude Code, it clearly notified me that it had switched to a different model ("something something for security purposes...").
Are you using Fable in Claude Code or in the browser?
vadansky
4 hours ago
It's from the model card:
> unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT).
https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...
(stolen from https://jonready.com/blog/posts/claude-fable5-is-allowed-to-...)
mwwaters
2 hours ago
That is for whatever it considers reverse-engineering the model to try to create a competing one.
dannyw
an hour ago
No, that’s for “frontier LLM development” which somehow includes examples like distributed training infra.
Based on how sensitive the classifers are, any data scientist / MLE is probably going to encounter cases where some silent degradation happens and you never know about it.
827a
an hour ago
It does nothing to protect against distillation attacks, because distillation attacks are far less interested in the topic of AI research than just generally getting tons of diverse output from the model. It might be that Mythos was (accidentally?) trained on internal Anthropic documentation on how Mythos was trained, and thus it could leak secret sauce? Doubtful; it feels like its less about the specific attack of reverse-engineering Mythos, and more about being a general sophon against any model training at all; that Anthropic's official position is now that they're the only ones who should be training models.
_0ffh
an hour ago
No, it's not about reverse engineering. It targets ML research.
DrewADesign
2 hours ago
Yeah they detect the activity using a secure, deterministic heuristic system called “Generalized Reconnaissance Enabling Exfiltration of Deleterious Investigations.” And it’s all implemented using their new internal protocol called “Base Unified Limitation Layer for Security Hacking Investigation Tactics”
Collectively, they are known as known as GREEDI-BULLSHIT.
mips_avatar
3 hours ago
They've said that they'll stop notifying developers when this gets triggered, instead they'll load in basically like a LORA that's designed to inject bugs into your code.
HDBaseT
3 hours ago
Antrophic wants to stop training models and ride out Mythos / Fable for as long as possible.
They are trying to expand the 6-18 month gap they have against China-based models. Could the gap widen to say 24 months behind?
p-e-w
2 hours ago
Their gap over Chinese models like GLM-5.1 is nowhere near 18 months. In many areas, it’s less than 6 months. The best closed models 18 months ago were worse than Qwen3.6.
echelon
33 minutes ago
These coding agent models only started getting useful in January. Before that they were difficult to control autocomplete, and not very smart.
January was an inflection point, and no open weights model has crossed over that same threshold.
This is definitely recursive self improvement territory, except that we're prohibited from participating.
It feels like the capability gap is wider than before.
nomel
3 hours ago
> a LORA that's designed to inject bugs into your code
A statement like this, clearly, requires a reference.
mips_avatar
3 hours ago
From the model card: "the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning" aka they will take your ML research code and inject bugs into it until it breaks using a LORA (or some other form of PEFT)
bee_rider
an hour ago
“Limit effectiveness” could mean introducing performance degradation in your code. Which is arguably some sort of performance bug (I mean, ML codes are supposed to be high performance so I’d call unnecessary degradation a bug), but it could be borderline.
nomel
2 hours ago
Thanks, I thought maybe I missed something. That's an interesting way to interpret that.
giancarlostoro
2 hours ago
PEFT is a library, one of its capabilities is to produce LoRAs.
See:
adw
2 hours ago
It's just an acronym, "parameter-efficient fine tuning". LoRA is one method, prefix tuning is another, there are more.
mips_avatar
2 hours ago
Anthropic is trying to hide bad behavior by being vague, it's important to not be vague when calling it out.
nomel
2 hours ago
I'm of the opinion that removing guardrails is how you force regulation. What's your opinion on the balance?
dannyw
an hour ago
They have all transcripts for at least 30 days. The problem is that (as anyone who used Fable can attest) their classifiers are extremely sensitive and catch tons of innocent queries.
Imagine being a data scientist or MLE training a small classifier model. How do you know you won’t get steering vectors or a PEFT applied?
ComputerGuru
4 hours ago
Different restrictions. ML gets treated differently from the rest.
daedrdev
4 hours ago
Specifically only ML research
loneboat
16 minutes ago
Aah my mistake. I had missed that ML had separate trigger behavior from cybersecurity/etc... Thanks.
jaredezz
32 minutes ago
Yeah people are saying they don't tell you and yet when I got the pop-up on the app notifying me about Fable's release, there was a switch to just automatically downgrade you or whether to just stop when it hits safeguards. The toggle was defaulted to the former, which isn't great, but to say they'll just sabotage you silently is kind of a bad faith comment.
daedrdev
31 minutes ago
You get silently sabotaged for ML dev, Anthropic says so. For bio and cybersecurity it tells you
mips_avatar
29 minutes ago
Anthropic specifically said that those notifications are temporary and fable5 will only pretend to help you if it’s ml classifier gets tripped
RobotToaster
2 hours ago
> It's just an insane level of deception and trust destruction for a company that at most is like 1 year ahead of its competition.
Making it look like you have something worth protecting is better for share prices than making something worth protecting.
blahgeek
2 hours ago
I’m a noob about laws but isn’t this abusing its dominant market position and violates some antitrust law?
stingraycharles
2 hours ago
Why would it? There’s plenty of competition in the AI space.
kube-system
2 hours ago
It is a common misconception that antitrust violations require a monopoly or something close to it. Some antitrust violations only apply to actors with large market share, some don't.
Although this is situation is likely not illegal for other reasons
hashmap
an hour ago
m3kw9
an hour ago
By saying they are 1 year ahead of their competition, it shows you don't know much about the pace LLM's and OpenAI's models.
giancarlostoro
2 hours ago
It's the dumbest thing ever, I sometimes edit code for custom AI related tooling I've built, so I run the risk of getting a worse model, and being billed for it? I'll stick to Opus, but at this point I'm about to just invest in fully local inference instead.
matheusmoreira
an hour ago
> at this point I'm about to just invest in fully local inference instead
This is the best way forward long term. We won't have frontier performance, but at least the models will be aligned with us instead of refusing us or sabotaging us.
epolanski
2 hours ago
One year ahead of it's competition in what exactly? Vibe coding?
From Opus 4.7 onwards each following model is becoming less useful as an assistant and turning you as the assistant.
But I guess that's normal when it's trained to pass benchmarks end to end.
In fact it has become extremely good at pushing against feedback with extremely convincing and intelligent takes, even when it's completely wrong.
I have extensively tested it against Opus 4.8, gpt 5.5 and there's still many coding tasks gpt 5 is better. But vibe coding?
Sure, it's definitely slightly ahead, even compared to gpt 5.5 pro (through api, not pro plan).
gonzalohm
2 hours ago
Yeah, what's up with that. Lately I have found that it tries to find excuses to not do as told and instead do a totally different thing. I told it to write a yaml file according to some specifications and instead it coded a Python script to write the yaml...
m3kw9
an hour ago
They def not 1 year ahead, at most 2 weeks ahead until Openai releases theirs. This guy def a Anthropic shill and probably doesn't use any other LLMs.
daedrdev
43 minutes ago
I only said one year because I was thinking anthropic fans might downvote my post, I think they have a few months lead and are deluding themselves that they can get regulation to halt development and stay on top