hackernews client

Claude Fable 5

1169 pointsposted 3 hours ago

by Philpax

(anthropic.com)

854 Comments

eggbrain

3 hours ago

For those of us on subscription plans:

* From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.

* On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

* After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

jrflo

3 hours ago

Still satisfied with my switch to codex/chatgpt. I couldn't imagine switching away from claude code when it first launch but with the drastically more generous usage on codex for the same subscription tier I just can't justify it.

goranmoomin

2 hours ago

My experience is that the GPT-family of models are very smart and figure out bugs, edge cases a bit better, but it produces code that is much less mergable – if you review the code, it introduces a lot more useless/inappropriate heavy abstractions and wrapper functions, compared to the Claude-family models which introduces the right amount of straightforward human-style code.

I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).

Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.

Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.

PhilipDaineko

an hour ago

"5. DON'T FUCKING OVERENGINEER! WRITE THE SIMPLEST CODE THAT CAN POSSIBLY WORK! NO NESTED LAYERS OF ABSTRACTION! NO UNNECESSARY CLASSES OR METHODS! NO DESIGN PATTERNS UNLESS THEY ARE ABSOLUTELY NECESSARY! NO MAGIC! NO SHENANIGANS! JUST THE DAMN CODE THAT GETS THE JOB DONE IN THE MOST STRAIGHTFORWARD WAY POSSIBLE! THE FIRST PRIORITY IS TO WRITE CODE THAT IS EASY TO READ AND UNDERSTAND AND READ!!!"

this is the line I keep in Agents.md that helps me prevent Codex from playing smart

jlawer

29 minutes ago

I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.

I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.

Xmd5a

a few seconds ago

https://arxiv.org/abs/2510.04950

> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.

re-thc

19 minutes ago

> I have a theory that swearing actually results is less comprehension of instructions by the model due to lack of training data over more conventional MUST.

How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.

bertil

43 minutes ago

The urge to put capitalized, repetitive, borderline abusive instructions should be studied. I haven't read many academic papers looking at the frustrations around repetitive patterns.

notnaut

21 minutes ago

It reminds me of FIRMLY telling my cat to stop jumping up on the counter

reactordev

30 minutes ago

There have been a few studies that have shown models produce worst responses when under duress from a frustrated user posting insults in all caps.

https://arxiv.org/abs/2602.10144

ur-whale

26 minutes ago

> borderline abusive instructions

who, or rather what, is being abused here exactly ?

LordDragonfang

29 minutes ago

It's fundamentally because, despite (nearly) everyone's claims otherwise, the fact that we interact with them through language means we (our brains) model them as a sort of person. (Note that this fact is totally orthogonal as to whether it's actually sentient or not.) We then try and instruct them the same way we would a person totally subordinate to us.

When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.

Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.

lxgr

22 minutes ago

It should be relatively clear at this point that the model will in turn also model you as somebody that shows unrestrained anger with subordinates and adapt its responses accordingly. This might or might not be what you want.

carterschonwald

33 minutes ago

i actually think this is too tame. it really has to be stuff youd mever say to a real person.

lxgr

24 minutes ago

Does it really? I'd be surprised if abuse actually worked better than sternly worded warnings/instructions, and even if it did, it doesn't seem healthy to get used to that type of prompting.

apercu

34 minutes ago

It might be a salient point but I didn't read it as it was yelling at me.

GoToRO

40 minutes ago

you forgot to sign it with Donald J Trump

thewebguyd

32 minutes ago

Thank you for your attention to this matter.

trollbridge

10 minutes ago

GPT-5.5 did a significantly worse job than Qwen-3.7-Max on a job today (some devops tasks I wanted to create some reusable scripts for). Kind of disappointing.

superkickstart

an hour ago

I'm not sure if i do something differently but i have the exact opposite experience with these models. Claude always feels like it's generating way too overdesigned and hard to understand code with the vibe oriented feel while codex is cleaner and more "task at hand" and easier to work with.

sebmellen

15 minutes ago

Agreed

syzygyhack

36 minutes ago

I echo your observations. I expect you will enjoy deepseek-v4-pro for writing code. Much closer to that Opus experience, and very cost-effective too. With 5.5 as a reviewer and specialist, all bases are covered.

dilap

an hour ago

Have you tried iterating on style feedback in AGENTS.md? I've been reasonably successful using this to get it to output code in a terse, non-defensive style that matches my hand-written code.

vruiz

an hour ago

This is my experience as well. I have defined a CLAUDE.md rule to ask codex to automatically code review, and I tell it that the reviewer is very picky and to only implement what it considers valuable feedback. I hope they don't converge over time, currently, in combination they works really well.

GoToRO

37 minutes ago

I noticed too, that whatever they offer in the chat, for free, is smarter, as in no more bs. I use claude code and I want to try codex too but I don't need two subscriptions. I did try codex for some planning and it was really good. Thanks for giving me an insight into how it generates code.

sigbottle

2 hours ago

jrflo

a minute ago

Oh for sure. I've been hopping around from provider to provider for the last few years just depending on who has the most capable / subsidized plans at the moment. I definitely expect there will be a squeeze on subscription costs all around the industry post IPO.

windexh8er

2 hours ago

Agreed. I think the Chinese labs are proving that OpenAI and Anthropic don't have a moat in almost every aspect, especially pricing. I also think people are getting annoyed with the constant lift and shift. I've seen more folks drop Claude Code and Codex, specifically, because of the lock-in it provides the providers. I'm curious to see how people standardize on tooling adjacent and if Anthropic, Google or OAI move to block utilization akin to the games Anthropic has been playing as of late.

I think the end game is routed model usage and SLMs. I think Apple is going to prove this in the consumer space pretty handily and I'm curious how the Android ecosystem responds since the hardware is considerably lacking in model performance. I think Apple has a huge opportunity here, as much as I don't like their current ecosystem of walled garden. They did position themselves very well with ARM and custom chips for their hardware. Hopefully the broader ecosystem of ARM and Linux are able to make some headway and we see a more formalized, and broadly accepted, architecture to capitalize on.

lurking_swe

13 minutes ago

is there an alternative to codex that “just works”? by just works i mean i can install as an app in 1 minute, and i get web search, skills, mcp servers, etc? Bonus points if it can control my chrome tabs like codex can, and if it offers remote control from my iPhone (chatgpt app) so i can kick off tasks while i’m out for a walk. Even more bonus points if i can, with 1 button click, share my chats or share the results of a session as a “site” (vercel style).

I’m sure you could put something similar together with a bunch of duct tape and 2 weeks of effort, but it won’t work nearly as nicely nor out of the box. so…what am i missing?

esperent

27 minutes ago

What lock in does codex have? I'm using it it pi harness specifically because it doesn't have much in the way of lock in.

Qhemlomo

43 minutes ago

Big companies are not doing OpenRouter.

My company has an agreement with the big providers and while i'm pretty sure they think about how to get budget back, its an competitive advantage and normal people will not learn different model behaviours.

At least for now.

maxdo

2 hours ago

I see exactly opposite . Chinese models fails under any complex scenarios, while us labs raise the price , that's a sign of confidence.

re-thc

an hour ago

> while us labs raise the price , that's a sign of confidence

Regardless of what others are doing, US labs here are just rushing to IPO. It's NOT a sign of confidence.

It's the equivalent of saying you have confidence in SpaceX making revenue by renting out their data center (instead of their AI making bank).

maxdo

31 minutes ago

going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't. This is an exact reason chinese labs do not rush to go public. They wish to go , but money flow that is not as good.

On the same note. if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

maxdo

17 minutes ago

running so much compute on the scale is not a junior task. weird analogy

re-thc

24 minutes ago

> if spacex is doing datacenters on earth successfully what's wrong with that? They rented cloud infra to a #2 or #3 provider in the world after < 2 years in business. It's a success, no?

If you get hired as a staff engineer and do the work of a junior, what's wrong with that?

Clearly xAI (now part of spaceX) did not raise funds to be a data center. The margins are way different. There are plenty of recent IPOs in that area that are worth at most billions not trillions.

> going to IPO is a sign of confidence , you need to report a lot of things, that private companies don't.

This isn't going to IPO. This is rushing to IPO. It is a sign of confidence that the market or wider environment might crash soon so we need the liquidity now.

> This is an exact reason chinese labs do not rush to go public.

an hour ago

It would seem strange, if they were operating in the same economy, but they don't. DeepSeek operates in an economy with a high degree of central planning.

China subsidizes strategic industries, and they have heavily done so with AI. And DeepSeek specifically has said they have no commercialization plans.

For example: https://www.boc.cn/aboutboc/bi1/202501/t20250123_25254674.ht...

re-thc

15 minutes ago

> There is no way I'm believing DeepSeek can charge less

Why not? Hetzner charges WAY less than AWS too. Can you not believe that?

InsideOutSanta

2 hours ago

> I don't think anyone has a firm grasp on actual inference costs -- including the research and training that has gone into those models

We know roughly how much these companies spend and what their revenues are. Based on that, they'd have to more than double revenue (without spending more money) just to stay even, and that's not good enough given how deep in the hole they are.

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both are true. I mean, I'd be willing to spend a bit more than I do now, but not more than double, and neither are most companies. The company I work for is currently investigating how to reduce LLM spend, not looking to spend more.

dontlikeyoueith

2 hours ago

> OpenAI and Anthropic are heavily subsidizing their inference -- no wait, they are charging the most they can get away with before going public. Where is the truth?

Both. They are charging the most they can get away with and that amount is still heavily subsidized by VC capital.

pimeys

2 hours ago

rglullis

11 minutes ago

It's a bit of a left field question, but I am curious: Let's say that if the company wasn't paying the whole bill but only subsidizing it - e.g, if it paid 90% of the $4000. What would you do?

logicchains

2 hours ago

We have a firm grasp on actual inference costs from the various open weights model providers on OpenRouter. They don't have the money to subsidize inference and it's quite a competitive market, so the prices are representative of the costs.

pyeri

an hour ago

My bet is they'll keep subsidizing for a considerable period of time, at least 1-2 decades more.

Most AI companies are just testing the waters with paid tiers right now, their greatest fear with increased pricing is folks reverting back to wikipedia, stack-overflow and other public domain organic activity buzzing back to life; that will kill any RoI potential in LLMs forever. They're playing the wait game instead, observing how the digital sphere reacts to every little increase in price.

If that weren't the case, they'd be pricing at lucrative premiums already and even gotten away in short-term considering the increased dependency in the enterprise world. But that'd be like killing for the golden egg too soon and losing all long-term potential.

Once the folks are so addicted to LLMs that even writing a hello world program sounds like a nightmare and coming up with an article draft feels like reinventing Egyptian glyphs, that's when the real pricing hammer will come.

wsatb

an hour ago

Anthropic and OpenAI won't be around in 1-2 decades if this is their long term plan. People are not going to revert, but go elsewhere. China is proving that it can be done cheaper.

raffael_de

26 minutes ago

1 decade = 10 years ...

ChrisMarshallNY

2 hours ago

ygjb

2 hours ago

There is a definite financial incentive for people smarter than me to solve the problem, and I don't generally bet against businesses finding ways to reduce costs :)

wahnfrieden

2 hours ago

ChatGPT does this and codex will eventually. They’ve stated it’s the future.

rnxrx

2 hours ago

I have the $100 plan and had almost never run out of credits until I started using the ultracode / workstreams feature w/Opus 4.8..at which point I managed to blow the full 6 hour allocation in like 20 minutes, or so. In fairness, it did some amazing things with the extracted information, but it also strongly suggested that I'd need the $200 subscription *plus* a budget for extra usage.

an hour ago

I can see that being true, and it very likely is true. But isn't infinite VC money and no incentives to optimize operations the reason behind that?

Take a look at China for example - they have no access to NVIDIA, so they're trying to build their own hardware, they have no unlimited funding, so they try to optimize things.

And Anthropic is complete opposite of that - if NVIDIA were to triple their prices tomorrow, Anthropic would still pay them.

In the end, either we all somehow go mad and start paying Anthropic tens of thousands of dollars per month so support this madness, or we will go with whoever isn't lighting cash on fire.

re-thc

an hour ago

> Take a look at China for example - they have no access to NVIDIA

Not true. Stop following US media spam if needed.

1. Very recently, the US did close a loophole on sanctions that allowed Chinese companies to use NVIDIA hardware outside of China i.e. before that was closed they all had access. The trick was train outside, do adjustments, ship the disks back and use non-NVIDIA in China, but at least the training and endpoints not hosted in China could all use NVIDIA.

2. There's been plenty of reports including fines and bans e.g. to Supermicro on smuggling NVIDIA hardware to China. I doubt it has been stopped. You can't catch everyone.

wsatb

an hour ago

"Nothing is subsidized" is a wild take. They might be making money on some users, perhaps even most users, but certainly not all. Also, "subsidized" doesn't just mean on compute.

y1n0

2 hours ago

That's interesting. Do you have anything to back that claim up?

gck1

2 hours ago

I do, and it's called DeepSeek's pricing table. At the same time, "subscriptions are subsidized" cohort have no data whatsoever, and yet they're in every thread.

Granted, it could still mean that Anthropic just chooses to lose money - but that's Anthropic's choice.

DeepSeek has proven that inference can be much, much cheaper than what Anthropic advertises on their API rates page.

nickthegreek

an hour ago

> Granted, it could still mean that Anthropic just chooses to lose money -

Then the cost is being subsidized by investor capital, but it is still subsidized.

FrustratedMonky

an hour ago

"Nothing is subsidized"

So they are profitable?

I think you are mismatching accounting terms.

You can't say the 'subscriptions' are profitable without accounting for the cost of making the model that is the source of the subscription.

They are heavily subsidized by the shareholders. Investing, running at a loss, with hope of some future profitability.

gck1

an hour ago

3 hours ago

I have trouble justifying gpt after that gross stuff with the war department.

Though the day is coming when there’s no distinguishing, I’m sure.

beering

2 hours ago

Right now there are Anthropic engineers deployed in the NSA to help them use their cyber models. The NSA is part of the department of war.

lovich

2 hours ago

pedantically, the defense department.

jcbrand

an hour ago

"War department" is the older name, not "Defense department".

Also, is it really a defense department when you're starting wars of aggression every 15 years or so?

derektank

an hour ago

The War Department has not existed since the passage of the National Security Act of 1947 and the government department has been known as the Department of Defense under US law since the act was amended in 1949. If you have an issue with it, take it up with Congress.

scosman

26 minutes ago

They actively use the name https://www.war.gov

toraway

11 minutes ago

Changing a domain name doesn't actually amend federal law.

Just like how changing Kennedy Center letterhead to Trump Kennedy Center for a year didn't actually legally rename it.

Once a case with sufficient standing got in front of a judge it reverted to the actual legal name on the basis that only Congress can change the statutorily defined name.

rekttrader

an hour ago

Wait till you kick the tires of Qwen Coder.

shimman

2 hours ago

I've only ever had the $20 month claude plan but last night took the time to setup opencode + openrouter paying for deepseek + glm. Previous experience, while extremely awkward, I'd hit my limit within one or two chat replies and it'd take me like 4 limit cycles to complete my task. Now I'm able to complete an equivalent task entire task for less than $2 in two cycles (ask -> revise).

2 hours ago

That’s odd, I used it on a pretty complex refactoring task and it worked for 22 mins and used only 15% of my 5-hour limit. I’m on the $200 Max plan though.

FireBeyond

10 minutes ago

Well the $200 Max plan is 4x the usage quotas of the $100 so it's "within reason"?

ZunarJ5

an hour ago

They didn't even reset credits for this lol

0erofootprint

3 hours ago

For me it almost immediately blocked. I had it writing code related to message digests - and it seemed to think it was too gifted for that. Gave the security warning and switched back to 4.8. Whatever... it will probably soon have the API error soon. I have mostly switched to the Codex 200 a month plan. I've found their 5.5 xhigh to be better than Opus 4.8 "ultracode." Also, i have not once seen their servers fail for compute unavailability, unlike Anthropric which happens almost ever hour.

matheusmoreira

an hour ago

I just asked Fable for a complete code review of my lone lisp project. Started out strong. Launched Fable agents, then spent like 10 minutes thinking... And then got interrupted by a switch to Opus 4.8.

> Fable 5's safety measures flagged this message for cybersecurity or biology topics.

> They may flag safe, normal content as well.

> These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them.

Here are the results of the agentic code review session:

  ┌──────────────────────────┬───────────────┬────────────────┐
  │          Agent           │ Fable 5 turns │ Opus 4.8 turns │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ values                   │ 134           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ data-intrinsics          │ 104           │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ tools-tests-build        │ 81            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ core-intrinsics (failed) │ 25            │ 0              │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ system-memory            │ 44            │ 20             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ reader-modules           │ 104           │ 25             │
  ├──────────────────────────┼───────────────┼────────────────┤
  │ linux-startup            │ 95            │ 15             │
  └──────────────────────────┴───────────────┴────────────────┘

The announcement details it. They're storing 30 days of data on all surfaces, first and third party. They claim it is for security purposes so they can review and check for long term jailbreak and distillation efforts.

They also, FWIW, say that they've instituted new policies on their end such as logging any human access to the stored data and automated deletion after 30 days in "most" cases (with another link to a document detailing that further).

jrumbut

44 minutes ago

It could be my use cases, which have always seemed to be outside the wheelhouse of these models, but I find it very hard to downgrade after accessing a more capable model.

Opus 4.8 produces output in 15 minutes that is 3-4 hours of my work away from output that used to take me 40ish hours (a solid week of dedicated effort).

Last year(-ish, maybe it was 18 months, I forget when the jump happened), the frontier models couldn't touch this work. The output looked like a hardworking intern on their first day. Nice formatting, decent volume of words, but no understanding.

So it might work if it turns out to be a substantial leap in capability.

GoToRO

Any competent AI-doomer would argue that ethics or trust are essentially irrelevant.

The entire problem is that people can act totally reasonably, even ethically, and this is not a guarantee of good outcomes. Situations can be created in which completely ethical, reasonable behavior actually produces a bad outcome. You do not need to assume people are bad in order to produce a bad outcome, and inversely you cannot assume that you will get a good outcome from good people.

"Arms races" are one class of situations that often have this characteristic. "Bureaucracy" is another class that we encounter a lot in daily life. There's a lot of them!

throwaway894345

3 hours ago

Yeah, it's positively precious to think the specific pricing strategy for consumers is the overriding ethical concern with OpenAI, etc. I don't have any particularly strong affinity to any AI company, but comparing pricing to say mass surveillance is ... something else.

kyledrake

3 hours ago

Your beautiful straw man is negated by the fact that Anthropic seems quite eager to get back on the DoD gravy train https://www.reuters.com/business/aerospace-defense/blacklist...

jnovek

an hour ago

Your original comment was about pricing ethics, does Anthropic’s connection to the DoD have anything to do with pricing ethics? They’re in no way coupled, one can be ethical while the other is not.

andriy_koval

an hour ago

even for Pentagon thing, Dario said he doesn't object military AI, but said Claude is not ready YET. I speculate he was afraid of reputational damage from cases if Claude would guide missiles on elementary schools.

throwaway894345

38 minutes ago

I admire the confidence with which you started typing a reply that had nothing to do with my comment. Bravo!

ygjb

3 hours ago

Setting aside the simple fact that there is no ethical consumption under capitalism, the reality is that regardless of how Anthropic feels, it is becoming clear that many, if not all countries regard AI developments as strategic technologies (and they should).

Anthropic needs to be at least somewhat in the good graces of a capricious administration that is already under pressure from businesses and citizens to regulate AI companies across multiple different domains, whether it's energy consumption, job displacement, military and defense applications, surveillance, etc.

If Anthropic wants to survive, they need to acquire influence with the government that most impacts them as an American company, and a massive exporter of services in the AI space to other countries, otherwise they could get locked down and locked out of the market for national security reasons.

It sucks, but sometimes the survival choice is to make an ethical compromise in hopes that you can still be around to make better decisions later.

ericmay

2 hours ago

> Setting aside the simple fact that there is no ethical consumption under capitalism

This "simple" fact needs quite a bit of additional context and work. Making grandiose ethical claims like this can be countered with other grandiose claims such as the fact that there is no ethical existence under communism or socialism.

ygjb

2 hours ago

Sure. Why not, I'm bored today and waiting for some stuff to finish up :D

The fact that there is no ethical consumption under capitalism is not material to whether or not ethical existence is possible under communism or socialism. In order to survive in a capitalist society, one inherently has to make choices that require trade-offs, and those trade-offs are burdened by a history of decisions made not just by the people alive today, but our ancestors as well. Does that mean I walk around chanting "Reparations", "Land-back", or other calls to action? No, but I do acknowledge that there are unresolved issues and as a Canadian, I know we need to do more to resolve treaty issues, and environmental issues, and system discrimination. I also know that Americans need to do better to address systemic discrimination and many, many other issues. It also doesn't mean I want to give back my house, or give away all of my possessions. It just means I try to make good choices and support businesses and people that are open about the trade-offs they make and try to engage as ethically as possible.

Acknowledging those facts doesn't absolve us of responsibility, it's a framework that allows folks concerned about whether or not they are doing the right thing to accept the trade-offs that they choose to make and be responsible and accountable for those choices to themselves or their communities.

We live in a world with scarce resources. It's possible that with a foundational redesign of the global economy, and the requisite authoritarian government that would be required to force such a redesign, we could eliminate food scarcity, solve energy scarcity, and make sure that everyone has a place to live. Those trade-offs are probably not worth the ethical cost in political and physical violence required to accomplish it. We have seen the trade-offs that happen when the powerful are able to exploit communist or socialist governments. We are seeing the "late stage capitalism" impacts of allowing the powerful to exploit capitalism in democratic societies. Acknowledging that the current capitalist system has lead to the greatest prosperity for the upper echelon (financially) of humanity, and a dramatic reduction in global poverty shouldn't obscure the reality that much of that wealth comes from exploitation of people and the environment.

It's a huge problem to unwind, and we can't let the burden of every choice that we make stop us from trying to do better, but we (as in society in general) can't do better if we don't at least acknowledge the compromises we are making along the way, and try to plan to fix it in the future.

Probably a topic better suited to beer and a pub setting than HN though :P

ericmay

21 minutes ago

> The fact that there is no ethical consumption under capitalism

I don't believe that this is a fact. How are you demonstrating that this is a fact?

When you talk about things like reparations or "land back" you're already cargo-culting in concepts and ideas that themselves need to be fleshed out in order to make a subsequent claim that a specific economic system is unethical. Someone can just argue all economic systems are unethical, how are you going to defend against that? And can you pay reparations for example without going back in all of human history and finding all cases of injustices and then tallying it up? Why pick an arbitrary point in time? Better yet, why not start in countries where slavery still exists instead of focusing on the west which led the world in abolishing slavery and created concepts such as universal human rights.

Even with respect to "eliminating food scarcity" - eliminate in what sense? All olive groves and grapevines and rice farms have to be destroyed and rebuilt to only build certain foods?

Dabbling in communism or other inhumane and authoritarian governmental systems is extremely dangerous and in the same vein of extraordinary claims required extraordinary evidence, suggesting as you did creating an authoritarian government to create a utopia is precisely the same project of suffering and death that mass murderers throughout history have undertaken to abject failure, and thus, you need some incredible amount of evidence and theory to be able to even fairly suggest going down this path.

cleaning

an hour ago

It only needs additional context and work if you are unfamiliar with the concepts underlying it. Possibly consider you are out of your depth here, rather than jumping to conclusions.

ericmay

32 minutes ago

No that's incorrect. Instead I believe the underlying concepts are debatable and so stating it as a "simple fact" is a bit unfair.

estearum

3 hours ago

16 minutes ago

Kimi 2.6 has been my workhorse now. It's as good as Opus 4.6, which, to me, was the last "useful" Claude model.

The newer models are smarter but really ficklle and hard to get meaningful work out of

4.6 was a workhorse

sytelus

8 minutes ago

Enterprise subs not allowed to use Fable if they have setup zero data retention :(

nickandbro

3 hours ago

Get them addicted then cut them off. Oldest trick in the book.

toomuchtodo

3 hours ago

I agree with you here. Unfortunately, this tends to be the case, with smaller developers paying the price.

sdellis

2 hours ago

That's a big problem for all of the AI companies. Most people don't find the technology compelling, accurate, or ethical enough to pay for a subscription.

Why wouldn't Anthropic just wait until people start subscribing, do some kind of marketing push, or obtain some kind of other sustainable revenue stream, before they go IPO? I wonder if they see the writing on the wall with all of this and want to cash out as quickly as possible?

AtlasBarfed

2 hours ago

That's not how it works. They don't need revenue, they need addicts.

Specifically they need businesses that fired people and adapted their business to the products, so when the unsubsidized costs hit the businesses are forced to eat the true costs.

Yes they can't afford to give the products for free, but what is essentially happening with AI services is economic dumping, keep costs artificially low to get people to fire everybody, and then Jack the rates once they have Monopoly control

sdellis

2 hours ago

But the only companies firing people (and certainly not everybody) are either the companies with an AI or the investment and finance firms that stand to profit from AI. I smell hype. And no company is firing everybody because of A.I.

I agree. They need addicts, but they are high on their own supply and everyone else can see the danger in getting hooked.

xpct

3 hours ago

I agree, this looks like their plan to wane out subscriptions. This will probably come with Opus nerfs later.

rapind

3 hours ago

I just assume Opus is constantly nerfed based on capacity. I was exclusively Claude for a long time, but the inconsistency in quality, constant outages, and slow downs were too hard to work with.

I just use dumb and fast models now. I'm more engaged. I think that the higher the quality of the model, the more you tend to vibe with it, and then the more hallucinations you then miss. I'm not sure which is more productive, but I definitely burn out faster the more I vibe. At some point you're spending your time on forums, discord, or youtube instead of engaged with what you're building. Or you yak shave about your tooling and end up creating the 600th multi-agent gastown harness and blowing thousands of dollars on tokens to create it only to discover it's too expense to actually use.

losvedir

2 hours ago

Break between training runs?

bigtechennui

2 hours ago

It’s offered broadly after, for more money. It’s subsidized as marketing

timcobb

3 hours ago

Ooof so are we thinking that in the next 6-12 months subscriptions will be replaced with paying retail like enterprise currently?

CuriouslyC

3 hours ago

I don't think they'll phase out subscriptions ever, their whole play has been to drive demand from the bottom up. Get engineers hooked on building with claude at home, then get them to demand the ability to use it at work, and bend over their employer with no lube.

They'll probably tighten the quotas to reign in whales though.

aseipp

3 hours ago

They almost certainly already make a fuckload more money off API pricing than they do subscriptions, even if there might be more total subscription users. So offering subscriptions even at some loss is probably going to continue. Honestly, I'd be surprised if they even lost money on most subs; there are definitely Token Whales out there who mess up all the accounting up, though.

Realistically I think Anthropic just has insane demand but finite capacity to run models, and Fable will just make them more money if they dedicate it to API pricing. I suspect the goal here is something like: get individual engineers/PMs on their personal plans to taste Fable and then go to their meetings and say "Yes doubling the price of every single input/output token is a good idea, boss".

gck1

29 minutes ago

But how is this sustainable? It's not like paying $5000 per feature means you'll be refunded if prompting "make no mistakes" didn't work.

2 hours ago

That make sense, but what about the specific bifurcation we're seeing here of super primo models versus still good models being available to subscriptions?

It's kind of annoying not getting access to the primo model and paying 200 bucks a month. I understand 200 bucks a month is basically nothing though.

Like I don't totally understand why they'd let me have it for a couple weeks and then take it away and say I can have it but I have to pay retail and retail is like $1,000 a day.

It's better to have loved and lost than to have never loved at all??

ygjb

2 hours ago

It's a trade-off. Every hyperscaler is buying and building compute capacity as fast as they can dodge red tape. There is limited compute capacity, and scarcity is a real thing.

As a consumer I can choose to buy subscriptions to a range of things, including $5 droplets or VMs on a broad range of cloud hosting providers. I can even buy cheap bare metal at a bunch of providers at an affordable retail rate.

I can also buy "unlimited" AI packages that will be optimized to fit the cost model from a variety of services, with different impacts, such as rolling outages when I consume a daily or hourly allotment.

Right now VC and the investor class are subsidizing the rapid evolution of the services and availability, but that VC is running out. In more traditional economies, AI would have developed and rolled out more slowly, and through metered subscriptions, with the eventual rolling out of "unlimited" packages like telephone, internet, or cell services once the market became commoditized.

We have seen a big inversion of that with the race to "win" AI marketshare. Now the true cost is being exposed, and the most competitive and capable models are hideously expensive to operate, so it makes sense that we are moving to metered billing for a utility service. If you want gas, you can buy regular or premium. If you have a premium car you definitely want the premium, but for most people regular is good.

Give it a couple of years, and the survivors will settle around fairly industry standard models of consumer grade services, pro-sumer accounts, and business/enterprise models.

Things are still shaking out, but I get the sadness. Luckily I work at a big tech company who is banging the drum on doing experimentation so I use my prosumer claude pro and other accounts at home for hobby stuff, and save my heavy lifting and potentially experimentation for work :P

madrox

41 minutes ago

I suspect it'll go on the subscription plan once other providers have similar benchmarks.

As annoyed as I am about this move, I get it. Users flood the newest, best model whether they really need it or not, and are efficient at using their entire quota. They've had so much trouble reigning in subscription usage it makes sense.

irthomasthomas

2 hours ago

This is just the sales team doing their thing, applying the Law of Scarcity to drive demand.

It's the same exact speed as opus >=4.5, sonnet 4.5, and twice the speed of opus <=4.1

It must have about the same active parameters, or else its a larger model running in turbo mode (smaller batches) and being heavily subsidized for some reason. But given most of the benchmarks are within 5% I doubt it is a much larger model. Most perplexing.

nicce

3 hours ago

> The "offer, then remove" aspect is a bit eyebrow-raising -- it feels like they are trying to get subscribers to switch to usage-based billing, which makes me wonder if we'll ever get it after that June 22nd window.

Probably all about the IPO.

dack

2 hours ago

i doubt that's the goal for them. i bet they just really don't have capacity for people using it a ton, yet they wanted people to be able to try it out while it's new. so they compromised and made it temporarily available. and then hope they can get costs down or capacity up so they can make it more available again

InsideOutSanta

2 hours ago

I think the goal is "private citizens: subscriptions; corporations: per-token billing." It's getting people addicted to LLMs on cheap subscriptions so that they can then force companies to pay for expensive inference.

irthomasthomas

an hour ago

"we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user."

altcognito

an hour ago

Where is this text coming from?

[edit] -- I see that this comes from the system card -- dang merged the comments from the other discussion so that explains the confusion.

matheusmoreira

2 hours ago

This is really sad... I really didn't want to be priced out of these models but it looks like that's going to happen sooner rather than later.

deepfriedbits

an hour ago

Thankfully this, like most other tech, will get cheaper through the years.

gck1

26 minutes ago

It already is. But marketing is hell of a drug.

Aleleo76

3 hours ago

Pay-as-you-go billing is a kind of drug, I use it every now and then when I'm working on a project with Opus, in a moment you spend a fortune

ABS

3 hours ago

also: Fable takes 2× the usage of Opus

oersted

Most gyms sell more subscriptions than they can fit under their roof at one time. If a gym only sells to heavy users, it will either be constantly turning members away or have to buy more equipment. Its equipment will wear off faster. Depending on amenities, it will go through towels, soap, water, et cetera faster, too.

tripleee

2 hours ago

Gym equipment lasts 10+ years in a commercial gym, at $50/mo that's a minimum of $6k paid from a single person.

Unless they're really, seriously wasteful with the soap.. there's no chance a gym is losing money on a heavy user

rafram

2 hours ago

It depends on the gym and their business model! A super-budget gym like Planet Fitness that charges $15/month is going to lose money on heavy users, but they count on most of their members being infrequent gym-goers. A luxury gym like Equinox that charges $300/month can target heavy users without any issues, and they'd actually rather members go more so they stay and spend money on expensive salads and smoothies.

Right now all these AI subscriptions are priced like Planet Fitness, but they're used like Equinox. They're hoping that the new a la carte offerings will move their pricing more in that direction as well.

charcircuit

2 hours ago

>I’ve seen evidence they lose money on heavy users.

Where?

JumpCrisscross

an hour ago

There are tons of blog posts where folks work out the API cost of their usage and find it well above subscription cost.

otterley

an hour ago

That doesn't mean the company is losing money in aggregate on these subscriptions. Buffets are still in business even though some people gorge themselves silly at them. The incremental cost may exceed the incremental for a particular person or minority group, but that's not how these businesses measure profitability.

cautiouscat

3 hours ago

I assume consumers aren’t a big note in their bottom line. I’m not actually very sure about that, just an assumption.

What I wonder however is if these tools will become something I use at work only. $100/month is already a massive stretch budget wise. If these models keep devouring tokens there’s no way I’d get the same usage time out of them for $100 in usage credits.

I just don’t think I’d use them much at all at home.

DonsDiscountGas

3 hours ago

I expect that depends on demand, feedback, and whether GPT-6.0 gets released and is competitive

lisperforlife

2 hours ago

My guess is that it is a massive model similar to GPT 4.5 and $10/$50 pricing is for its output will discourage people from using it. I also read safety = nerfed.

daft_pink

2 hours ago

I’m just about ready to cancel my small business 5 user plan with max licenses, because although cowork is really great. I just find OpenAI/Codex to be a lot better most of the time.

dirkc

2 hours ago

systemvoltage

2 hours ago

It's interesting that we are seeing a time when subscriptions are not preferred and usage-based billing is.

Pay-as-you go isn't a common thing in SaaS. For example, except for AWS SES, all email providers are bulk-subscription based.

nutjob2

2 hours ago

> "offer, then remove"

Sounds like "bait and wait".

If you think about it, the more people pay for these new and more resource hungry models, the longer it takes for them to become no extra cost and the longer it takes the more people are tempted to pay extra.

rvz

3 hours ago

> * On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window.

Of course, they are a casino as well giving you free spins at the wheel with their new Fable machine, and it is done on purpose.

Once there freebies have expired, many of its users will begin to gamble more on the new casino machine and will realize that it is expensive.

xvector

3 hours ago

If it's that big of a problem to you, you're free to just... not use the freebie?

rvz

a minute ago

Then you better not complain how expensive it is to use (Just like the other companies are doing) or the next time Claude goes down then.

Anthropic does not care about us and isn't going to talk to you either and will extract from you as much as possible.

The true answer is local models.

cautiouscat

3 hours ago

It’s an interesting thing to bring up because it’s this classic thing we’ve seen for decades now.

The ramifications go beyond the individual which is why I assume they mentioned it. They don’t need to use it/not use it for it to have interesting implications.

xvector

3 hours ago

so it'd be preferable if they didn't include the model at all?

cautiouscat

3 hours ago

an hour ago

It seems like Fable will refuse to do any work when it comes to developing LLMs or even asking questions about topics related to LLM. Simple things like asking to explain a paper fails!

From the model card:

In light of the ability of recent models to accelerate their own development, we've implemented new interventions that limit Claude's effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design. Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user.

elastic-hoover

a minute ago

I wanted to try on my biology research and it refused to talk about it and proxied to 4.8. Really, only surface level conversations about topics of interest. I know this is not a topic of broad and mass interest, but limiting it for topics like that and machine learning will probably do change how I use it.

Chance-Device

22 minutes ago

I was wondering when something like this would happen. I got my first and only two content violation warnings in Claude Code last week when asking it about something ML related. It was a real head scratcher because I couldn’t figure out what about the requests could have violated anything.

Might be worth going back and taking a harder look at what I was asking it about if it somehow triggered a “forbidden knowledge” alert. Or maybe it was just a random bug.

agnosticmantis

an hour ago

Singularity for me but not for thee.

foolfoolz

an hour ago

you will RENT the singularity

Xunjin

12 minutes ago

"we should put on hold the development of AI because the world is not ready for it"

Yeah... We need open models so we don't have that BS.

throwfaraway4

36 minutes ago

"for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design"

Oh man all of those runaway infrastructure buildouts by our agents trying to achieve singularity...

Just say you don't want to lower the bar for others to compete

properbrew

40 minutes ago

> frontier LLM development

This seems so wide reaching if it's catching simple things like explaining a paper. Does this also refuse to help with any already developed training pipelines?

I can kind of understand the generation of synthetic data, but nerfing the assistance of training pipelines just seems like a really shitty thing to do.

gpugreg

25 minutes ago

Anthropic probably trained Mythos on their own code and found that it is too got at reproducing it.

schipperai

an hour ago

Let's hope not all frontier AI assimilates these guardrails. It would be a shame for independent researchers and students.

blockcipher

36 minutes ago

Anthropic is really speedrunning their evil arc as fast as possible. Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology, but Anthropic is allowed to sell Claude to the trump administration to kidnap maduro and to bomb iran. And don't get me started on that $100M autonomous killer drone swarm contract that they applied to and rationalized as non autonomous...

LordDragonfang

16 minutes ago

> Can't use them for basic LLM research, cybersecurity, or beyond-surface-level discussions of biology and virology

Your priorities are not everyone else's priorities. The people concerned about AI extinction risk list those as three of their biggest priorities for AI to not do. Those are the people whose culture Anthropic descends from, and by their measure, those exclusions make this the least evil path.

SkitterKherpi

44 minutes ago

It also tried to force usage the paid Claude API instead of claude code usage just because there's a mention of another provider we might want to plug in (which hasnt even happened) for AI integration.

dchuk

28 minutes ago

Ha funny, I was speccing out an idea for real time Claude code interaction from local apps using some tricks vs using the agent sdk when I got the popup to try Fable. So of course I gave it a go, and it triggered the sensitive content warning immediately, which I was very confused by until I put two and two together.

Fun times when “safety” means both the safety of mankind, and also the safety of revenues

simonw

3 hours ago

Pelican for Fable 5 on default settings is a clear improvement on Opus 4.8

2 hours ago

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

simonw

3 hours ago

I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

raffael_de

22 minutes ago

I find it quite interesting that while the picture looks better the more advanced the model is, but apparently none so far "understands" that the pelicans legs are on both sides of the bike / top bar.

LordDragonfang

6 minutes ago

If you scroll to the bottom of the Fable-5 by effort page, Max effort actually gets this correct! (Along with being the only one I've seen so far to make a bicycle frame that matches the shape of what most bikes on Google images look like)

bergheim

40 minutes ago

Anyone care about these pelicans that always come up anymore?

Clearly at this point they are part of the training data.

They even all look sort of ish the same. Daytime, colors,...

1attice

31 minutes ago

Without being mean, I encourage you to go look at some of simonw's writing on this topic, which he has addressed repeatedly (and IMO satisfactorily.)

I know because I too had this initial take; however, upon analysis, it is not sound.

bergheim

26 minutes ago

• My most noticeable immediate jump was in how its frontend design was much more intentionally crafted, and delightful without feeling like 'AI vibe coded'; with better end-user usability too.

• In some internal agentic harnesses, it achieved better results with about half the tokens, making it cost the ~same as Opus 4.8 price-wise! The real price increase is less than 2x; with biggest differences in harder problems where Opus 4.8 struggles (or needs many turns).

4 minutes ago

[delayed]

port11

28 minutes ago

I’ve had it go through a 50-page PDF of dense, inter-connected specs, and it correctly flagged everything that was done, somewhat done, and missing. It went into a lot of detail and explained where the code deviated from the spec.

It felt, at least for me, light an impressive step up. Opus 4.8 was already very thorough; but sadly verbose and ‘loopy’ when you push back on its plans. Fable is what I’d use all day if I could afford it!

InsideOutSanta

2 hours ago

After running it for half an hour: it's incredibly good at the visual aspects of UI design.

tsunamifury

an hour ago

"incredibly" is doing a ton of work here. I do not think its doing even moderate work on visual design, but it can spew out a lot of ui that looks arranged ... ok.

This is still not in the range of shippable UI for top end companies. Maybe for internal tools and enterprise.

At our comapny we limit to protoypes at most and even find it limited there.

InsideOutSanta

40 minutes ago

cuuupid

3 hours ago

Not missing the forest for the trees, this effectively means in 3-5 months China will drop open source models that are every bit as capable and dangerous as current day Mythos except with no safeguards.

And the only companies safe from this are the large corporations that shook hands with Anthropic? Because Fable doesn't seem to have actual safeguards, more like 'if you talk about this you will be talking to Opus.' It doesn't guard against offensive use, it prevents all use (offensive AND defensive).

Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF

hootz

3 hours ago

My bet is that Mythos is still over-hyped and the cybersecurity fear and guardrails are mostly marketing to force company partnerships through Glasswing and get public attention.

miohtama

2 hours ago

Mythos is from the same guy who did "GPT-2 is too dangerous to release"

https://naokishibuya.github.io/blog/2022-12-30-gpt-2-2019/

oceansky

2 hours ago

https://www.aisi.gov.uk/blog/our-evaluation-of-claude-mythos...

bel8

3 hours ago

It worked for OpenAI when GPT 3 was deemed too dangerous to be released. This is just a spin of that.

hootz

2 hours ago

I still remember it. "Open"AI going API-only because GPT-3 is really really dangerous, so forget the Open in our name and all of that, you can't download our models anymore and must request access to them because they pose a THREAT.

Fast forward to today and GPT-3 has laughable performance.

shoeb00m

2 hours ago

Even back then there were plenty of people who got fooled by AI generated articles. It's easier to spot AI writing now because we are so used to it. They were right to be concerned; not that it achieved much since oss models run laps around gpt-3 now.

hootz

2 hours ago

But it seems like that was not genuine concern, but instead a tactic to pivot to closed models and an API service with an excuse to do so, breaking the public's expectation that they would be a non-profit making open models, like their name implies.

geerlingguy

3 hours ago

3 hours ago

It's not even very usable... I tried 2 different chats and both eventually got stopped due to the safeguards

One was a piece of code I gave it to improve, it did so and then started writing tests, some of which tested security so the safeguards triggered

Another was one of the cryptography puzzles I use as new model tests, which are hard to oneshot and there's no public solution anywhere, it completely refused to even try to solve it

gavinray

sosodev

3 hours ago

Do you have any resources to share regarding independent expert training? I was under the impression that it's not feasible.

himata4113

2 hours ago

concept is similar to how it works in inference, instead of performing regressive writes to the entire model you run the whole model, but part of the model can live in system memory and get swapped in/out on demand. So only XB parameters are active in training.

edit: I am not really sure if it works like that. I haven't looked too deep into deepseek v4 pro specifically.

OtomotO

2 hours ago

Ah, American Hubris ... I don't blame you, Hollywood is the world's greatest propaganda machinery of all times.

sosodev

2 hours ago

I wonder if model distillation will continue to work as well as it has. Given hidden reasoning, the ever expanding number of expected capabilities, a serious compute shortage, the looming possibility of model collapse, and dramatically higher API costs I would guess that it's getting much harder to do.

gck1

an hour ago

You should check out some Chinese forums. There are services selling gateways/proxies for all major models at fraction of the official rates. Likely reselling subscriptions, or some other form of abuse.

I've seen people posting screenshots of billions of tokens consumed where they paid next to nothing.

These same gateways are likely also reselling the data to Chinese labs, because TLS has to terminate at the gateway level.

sourcecodeplz

2 hours ago

Asian labs generated synthetic datasets from UBS labs but also innovated with technology. Now it is harder to get the thinking traces AND Anthropic is recorded to poison it as well.

Thus Asian labs will have to generate their own data sets, which with the huuuuge usage boom from deepseek, mimo, kimi, etc, they will be able to.

jstummbillig

2 hours ago

I wonder where the trees are. In this thread nobody appears to actually be talking about the model.

gck1

43 minutes ago

Yeah, because it's impossible. You can't ask it anything about the thing that it's known for. It will not even answer a sky-high level question about reverse engineering, for example.

In CC, it will probably report you to authorities if you ask it to do a vulnerability scan of your codebase.

dmantis

3 hours ago

Isn't that a good thing in a way? If everyone has the weapon and defense at the same time, we will fix security holes and live safer lifes instead of having some three letter agencies and military backdoors in everything.

Pandora box is open anyway. It's better now for everyone to have the same power rather than a few national states.

lebovic

2 hours ago

Not sure this holds, sadly. I spent a few months reporting serious security bugs as model capabilities took off earlier this year, and only ~half were fixed. The unfixed bugs were just as critical as the fixed ones; sometimes they were even two similarly critical bugs at the same company, and only one would be fixed!

On your other point, the government still has systemic leverage and can compel access, so this doesn't remove that risk.

That doesn't mean this is the end of the world, and some balance of power is usually good. But I do think it will still increase the capabilties of rogue actors and their net harm.

gck1

an hour ago

There's also a reality where China does develop Mythos-level model but stops releasing the weights.

That reality is much scarier.

FergusArgyll

3 hours ago

3-5 months is a long time and they are pretty useless on arrival because the frontier models are so good, that it's hard to go back even if it's way cheaper. Your work flow is adapted to that level of intelligence for months.

hootz

2 hours ago

That doesn't match my experience at all. I can't see myself saying in 6 months that the current model I am using is useless, that makes no sense.

In fact, I did go back to DeepSeek V4 Flash for most of my problems as it is way cheaper and there is no need to use SOTA for absolutely everything.

soledades

2 hours ago

> Rationalists are inventing oligopolies from first principles, absolutely incredible things happening in SF.

Based.

ibejoeb

2 hours ago

I don't think China has any incentive to arm the rest of the world with highly capable models that can be used against them. Undoubtedly they will continue with the arms race, but they will preserve the best stuff for their own use.

james2doyle

2 hours ago

I think the stronger incentive is undermining/undercutting the Western AI companies. Given what we have seen, any model can be used/convinced to do harm so that is just part of the game

ibejoeb

2 hours ago

I agree, depending on how much of this is marketing and how much is actual capability. It's one thing to undercut models that finish writing assignments for lazy students. If this actually identifies vulns and writes exploits, or if it designs bioweapons, those are pretty different. Those are actual weapons, and I don't think they're going to arm the adversary.

sigmar

3 hours ago

The system card is 319 pages, at what point do we call it a "book" instead of a "card"?

There's a quote from a METR report on page 52:

>We ran [Mythos 5] on 38 of our hardest software tasks, including tasks centered around R&D. [Mythos5] generally outperformed an early checkpoint of Claude Mythos Preview in these, including by succeeding on some tasks that had not been solved by any public model we have previously evaluated. However, we still observed the model occasionally failing to correctly interpret nuanced instructions in difficult tasks... Based on the available evidence, we believe [Mythos 5] is likely unable to fully and reliably automate R&D for frontier projects spanning multiple weeks. We believe that a better, more confident assessment would require more time, evaluations, and information from the model developer.

baq

3 hours ago

3 hours ago

> Distillation. We’ve previously identified large-scale attempts to extract (“distill”) Claude’s capabilities to train competing models in authoritarian countries.

Glad to hear the UK is finally making an effort to catch up on the AI front ;)

b3kart

3 hours ago

https://en.wikipedia.org/wiki/The_Economist_Democracy_Index

Probably tongue-in-cheek, but UK 18th, US joint 34th with Poland

sd9

2 hours ago

Are the sibling comments astroturfed? This seems like such a bizarre thing to be talking about in relation to an Anthropic model release. As someone from the UK, I don't feel like I'm living in an authoritarian country. And yet most of the sibling comments are insinuating that I am. Weird.

Macha

3 minutes ago

The UK has very recently announced a new push for client side scanning by messaging providers which is both very likely to be unpopular and known here, so once one person cracks the joke, others are going to want to comment. Don’t think that requires astroturfing.

killerstorm

an hour ago

I'm sure there are people in Russia, China, ... who don't feel like they're living in an authoritarian country.

tene80i

an hour ago

If you think Britain and Russia or China are equivalent in terms of government overreach, you need to find new sources of information.

nonethewiser

24 minutes ago

> If you think Britain and Russia or China are equivalent in terms of government overreach, you need to find new sources of information.

Uh... you are making his point. People from way more authoritarian countries don't necessarily feel like they are living in an authoritarian country. Therefore whether or not it "feels" like you are living in one isn't a reliable measure.

tene80i

3 minutes ago

Trivially true I suppose, but it doesn’t make my point irrelevant - do you think Britain is equivalent to China and Russia? If everyone does but us then yes my goodness they’ve done a good job controlling us, but that seems far fetched.

ebbi

38 minutes ago

It's true (from a perception perspective):

China soars in democratic perception ranking as US, Israel plummet: Poll

https://thecradle.co/articles/china-soars-in-democratic-perc...

nonethewiser

23 minutes ago

Maybe the rankings arent accurate.

ebbi

10 minutes ago

It's a poll.

r721

an hour ago

It's just people who use "For You" algorithm on X.

nonethewiser

26 minutes ago

Neither do people living in China

HDThoreaun

an hour ago

HN is extremely pro free speech and the UK has recently decided to engage in censorship. Part of the issue users here reckon with is the recency. Unlike many authoritarian countries that seem hopeless with regards to free speech the UKs censorship is a recent development that many think can still be undone through political action. Similar to takes on why Israel is being protested when places like sudan arent.

sd9

an hour ago

This has passed me by - can you give me some specific examples?

I personally don't feel limited in my speech, but I'm willing to accept that I may be wrong

Nobody I know in real life is talking about censorship or free speech in the UK

tene80i

an hour ago

Why is it funny? You think British media can’t be critical of the British government? They are famously merciless.

Also, the economist is majority foreign owned, so try doing more than 1 second of research, or be more civil, or ideally both.

ebbi

27 minutes ago

To be fair, BBC has hardly been that critical in the British governments' complicity in the genocide in Gaza.

And their headlines covering Israeli atrocities (not even their own governments), is super passive.

nonethewiser

26 minutes ago

I have absolutely no clue what the US nor Poland's rank has to do with anything.

odiroot

24 minutes ago

Really shocked Poland is that low, especially just next to USA.

solenoid0937

2 hours ago

sending or showing flashing images electronically to people with epilepsy intending to cause them harm (‘epilepsy trolling’)

encouraging or assisting serious self-harm

sending a photograph or film of a person’s genitals (‘cyberflashing’)

sharing or threatening to share intimate photographs or film

solenoid0937

an hour ago

Or a lot more commonly - critique of immigration policy

10xDev

an hour ago

You are obviously invested in this narrative driven by Musk but you need to back it up properly.

matthewmacleod

31 minutes ago

Why did you choose to lie about this today? I'm genuinely interesting – this is trivially obviously not true, so what motivated this?

starshadowx2

an hour ago

That is not a true statement.

Here's a good break down and explanation of what that number actual means - https://www.youtube.com/watch?v=tB3WVygAM8I

dgellow

2 hours ago

That link says “12k arrests”, not thrown to prison! It’s also not clear how reliable that data is

matthewmacleod

32 minutes ago

In the UK you get thrown in prison for making a slightly unfriendly tweet. Freedom of speech simply does not exist.

"These days if you say you're English you'll be arrested and you'll be thrown in jail."

It's just not true. Where are you getting this nonsense from?

m0guz

2 hours ago

> The Democracy Index published by the British media company

We decided that we aren't one of those authoritarian countries.

james2doyle

2 hours ago

Just last week you could distill using other users responses! Handy!

dyauspitr

3 hours ago

Rookie numbers. Come to the US to see auth done right.

PUSH_AX

2 hours ago

Uh oh-auth

kylehotchkiss

an hour ago

wasn't claude distilled from the entire creative and research output of every English speaker alive

jkelleyrtp

3 hours ago

On the new FrontierCode [1] benchmark (ie graded from an OSS maintainer's perspective of "would I merge this code?")

- Opus 4.7 xhigh: 5.2%

- Opus 4.8 xhigh: 13.4%

- Fable 5 xhigh: 29.3%

Seems like a huge jump.

[1] https://cognition.ai/blog/frontier-code

amluto

3 hours ago

That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:

1. That estimate could easily be wrong.

2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.

3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.

zzleeper

3 hours ago

How credible is this benchmark? does it correlated with others real world experience?

bfeynman

2 hours ago

Given it was made by cognition (team behind devin flop) who now just got to wait out until claude and gpt5 basically do all of the work for them - not very. When you read about it, the framework is highly subjective. Which very quickly becomes a problem because its based on heuristics that probably change a bunch with a better code model.

vanuatu

2 hours ago

the subjective framework is exactly why its good

prior bms relied mostly on unit tests or synthetic judges which are easily benchmaxxed, which leads to nobody trusting benchmarks

we need people manually checking the data for good code quality

vanuatu

2 hours ago

i worked on one of the benchmarks typically found in new model releases

this benchmark looks very good from the methodology. a cog researcher checking the data themselves is very high signal (not scaleable so don't take the benchmark as gospel, but directionally good)

Catloafdev

3 hours ago

It's a relatively new benchmark but from what I can tell it has serious cred behind it. I assume it will be picked up as part of the standard suite of CS-related benchmarks soon enough.

schipperai

2 hours ago

Cognition did well in documenting their approach [1].

TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit.

[1]: https://x.com/cognition/status/2064061031912288715

emp17344

3 hours ago

Seems like it literally popped up yesterday with the express purpose of building hype for this release.

vanuatu

2 hours ago

i doubt it, cog wants coding agents to be better because it directly improves their product

they aren't married to a particular lab, most of their usage is their in house model i believe

swyx

2 hours ago

team member here - we had been working on frontiercode for ~6-7months. timing just lined up

emp17344

35 minutes ago

Yeah, right. If this benchmark was truly developed in an independent manner, and the timing just “lined up”, how did Anthropic even know to include results in their model release documentation the day after the benchmark is revealed? It seems like there must have been some collaboration or influence from Anthropic behind the scenes.

osti

2 hours ago

And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.

anthonypasq

3 hours ago

what incentive does Cognition have for doing this? seems like complete nonsense speculation on your part.

bel8

3 hours ago

With billions/trillions of dollars floating around, is it hard to imagine benchmarks could be biased?

I think it's safe to assume everything AI related is heavily biased until proven otherwise. Just like in pharma.

camdenreslink

2 hours ago

People game benchmarks for fake internet points to get their favorite web framework to the top of the list. I'm pretty sure they will do it for billions of dollars.

anthonypasq

an hour ago

you didnt answer my question. Why would cognition be biased towards making anthropic look good?

swyx

an hour ago

3 hours ago

Still cheaper than Opus 4.0 and 4.1 (which was and still is $15/MTok input and $75/MTok output)

I would have expected Mythos to be much more expensive than just 2x current Opus (which is clearly cheaper to run than original Opus)

hydra-f

3 hours ago

As per OpenRouter:

Input Price $10/M tokens

Output Price $50/M tokens

Cache Read $1/M tokens

Cache Write $12.50/M tokens

2x Claude Opus 4.8, same as Claude Opus 4.8 (Fast)

Frankly, not even Opus 4.8 would be enough of an incentive to use at that price range (enterprise-wise; would not even bat an eye as a consumer)

OtomotO

an hour ago

Bummer! When can I finally and confidently get slopcode into Zig?

m3kw9

3 hours ago

FrontierCode is likely paid for by anthropic.

lanthissa

3 hours ago

did they not pay them enough to get good ratings on the other 3 models?

whats the logic in claiming its a borked metric when everything listed is an anthropic model.

Narretz

2 hours ago

There a few benchmarks out there where all existing models have abysmal scores. So it's not actually a problem if Antrophic's older models are bad, especially if the jump to the newest model is huge, and the competition is also way below it.

reasonableklout

3 hours ago

Huh? It's a benchmark by Cognition which (1) is building their own models and (2) offers all providers and thus has an incentive to avoid hyping up any one too much.

jstummbillig

3 hours ago

But you can just say shit now. Tokens might not be too cheap to meter but saying shit increasingly is.

AquinasCoder

3 hours ago

frankfrank13

2 hours ago

This makes it an instant non-starter for probably 95% of organizations. A lot of people are about to get in trouble for using it before realizing this.

nicce

2 hours ago

> deletion after 30 days in almost all cases ...

Almost… basically they have unlimited power to decide what data is kept?

frevib

3 hours ago

At this point Anthropic is a pure marketing and PR company. Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences. Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

From Opus 4.6 there are no noticeable improvements for me in code generation. It works very well, till 90% completion, if you guide it correctly. And you need a little luck. For serious production code I need to understand what I’m doing so it helps a bit, sometimes.

pinkmuffinere

3 hours ago

> catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences

This is just good business sense. In what scenario would you ever make the names dumb and forgettable?

> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is good customer support, lol. From what I can tell, it is indeed Boris Cherny responding, not outsourced to AI or other staff. You're really getting a response from Boris. I suppose that is PR, but it's not unjustified PR, it's accurate.

I'm not even a crazy AI fan, but your criticisms are ridiculous here. It reminds me of the quote from Knives Out -- "Your Honor, she endeared herself to him through hard work and good humor."

IshKebab

3 hours ago

> In what scenario would you ever make the names dumb and forgettable

Clearly you've never bought a TV or headphones!

matheusmoreira

2 hours ago

> Boris Cherny coming to HN “Hi! it’s Boris from the Claude Code team” to get real tech people’s goodwill.

This is a good thing. I wish every company would do this. I subscribed to Proton Mail after interacting with someone from their team here on HN.

aspenmartin

3 hours ago

Your observations are right but pretty insane to consider them a pure PR company lol. They are making more frequent releases so yes the release-to-release quality is smaller but we’re still ascending quality and reliability curves the same way we have since GPT-3. You get a GPT4->5 leap every like 17 or 18 months I think it is

kingkongjaffa

2 hours ago

The gradient of improvement is absolutely not the same.

aspenmartin

2 hours ago

If anything its slightly higher. Feel free to provide any evidence to the contrary.

ECI (good aggregate measure using IRT): https://epoch.ai/eci?view=graph&tab=release-date&subset-view...

METR time horizon (now topped out): https://metr.org/time-horizons/

WASDx

6 minutes ago

I like this one, although its data seem to overlap with ECI.

https://artificialanalysis.ai/trends

astrange

2 hours ago

> Super catchy names like Opus, Mythos and Fable trying to get you to think that these software products are actually super-human life changing experiences.

They're originally named after the blends at a nearby coffee shop.

https://postscript.co/pages/brew-guide

I've noticed nobody at HN knows what "marketing" is or how to do it. It's not just naming things and being evil and cynical is not the most successful method.

…also frontier models are a superhuman life changing experience. If they aren't, what possibly could be?

chroma_zone

23 minutes ago

My life has changed, but not necessarily for the better.

bitpush

2 hours ago

This is interesting. Do you have any source?

WarmWash

14 minutes ago

Don't forget the DoD stint that gave them this recent public boost.

Defy standard DoD precedent going back forever, that every other country has some form of too, and championing it like they are some kind of moral freedom fighters.

Like selling the DoD guns and telling them they can only shoot bad guys with those guns, and that you will be the one to decide who counts as a bad guy...

CuriouslyC

3 hours ago

I dislike Anthropic but I wouldn't argue 4.8 isn't an improvement on 4.5/4.6. Your tasks just might not typically need the extra intelligence.

jorl17

3 hours ago

Opus 4.7/4.8 often over-engineers on my setups, plus:

- It talks a LOT more like GPT models. You know: wrinkle, shape, gate, coarse, scope, gap, path, production-ready-workflow-of-the-day, and so on -- "that's expected, a consequence of the previous like-driven workflow". If I wanted to get a headache using AI I would have gone with GPT in the first place!

- It outputs text in a much harder way to follow along. I can't exactly say what it is. Maybe a bit of everything? Bolds are missing, bullet points are gone, paragraphs are bland and too long, and it doesn't feel like a model programming with me, but rather a somewhat full of themselves grandpa developer looking down on me. It's very weird to describe this, but it is definitely how I feel.

Granted this can totally be because of the way it reacts to the prompts now. We've got a rather large corpus of skills and "rules and good practices" that Opus 4.6 responded to great, and maybe the new models just get turned into this when fed with them....I don't know.

Either way, with Opus 4.6 being as good as it is, I need Fable to be a significant step up to justify a price increase. if it can get me to babysit opus a little bit less on some stuff, it might be worth it. Otherwise, I'm very happy with Opus 4.6 and hope they don't deprecate it.

taormina

3 hours ago

I'd argue that 4.8 is a straight downgrade. For every type of task I've tried. It's been a gambit at this point. If 4.6 quits being available, I'm out at this point.

coronapl

an hour ago

I don't mind if you don't take it seriously, our jobs are more important to us than a benchmark is.

But I wouldn't opt-out of using your own eyes and the eyes of others so easily, especially when there are literally hundreds of billions of dollars in invested capital with an interest in a certain outcome... this is how you end up in "Emperor's New Clothes" situations.

recitedropper

2 hours ago

"Carefully and thoughtfully" is antithetical to the approach to benchmarks these days.

Maybe back when this was a scientific endeavor; not now when enormous, enormous amounts of capital are on the line. Along with an entire cult's chosen eschatology.

aspenmartin

I quit my abo in March and talked to said friends who are still on a plan just last week: they are still not happy, but company pays so whatever...

aspenmartin

an hour ago

That’s ok but at what point is this getting into conspiracy territory? You have just said there is nothing you would believe to the contrary, but then by definition that’s not exactly a very thoughtful or insightful position.

BoorishBears

3 hours ago

"Fable 5" is Opus 4.7, and the Opus 4.7 we got is a Sonnet sized model on a stronger base.

That's where all the regressions and inconsistency in experiences stem from: RL can still only go so far vs having more parameters

OtomotO

an hour ago

Breed faster horses and hope one will birth a locomotive.

iillexial

21 minutes ago

>Hey! Boris from the Claude Code team!

>TOP 5 METHODS FROM BORIS ON HOW TO SPEND MORE MONEY ON TOKENS

>Boris from Claude just told he doesn't prompt anymore. He LOOPS instead

>"chatgpt has gotten soooo much better with the latest update."

>"codex is the best AI coding product and we want to make it easy to try."

Karpathy about Fable 5:

>"You can give it a lot more ambitious tasks than what you're used to, the model "gets it""

Sam Altman about gpt-5.4:

>In my experience, it "gets what to do"

What a time to be alive. Models are great, but all the slop, marketing, and fakeness around them is just unbearable.

guybedo

2 hours ago

They're good at marketing, but my first subjective assessment of Fable is that it's really smart.

I've been working with gpt 5.5 and opus 4.8 quite a lot, and interacting with Fable feels like a smart guy just entered the room.

thefreeman

2 hours ago

How can you make this comment before even having a chance to try the new major model revision?

avaer

3 hours ago

If you truly believe this, you've discovered a superpower over everyone else in the industry.

While everyone else is wasting time and money on the slower, more expensive models, you've found a way to outpace everyone for less money. Everyone else is wrong and you will get rich.

(I don't actually believe the premise is true, I'm just pointing out the logical conclusion to what you're saying so maybe we can reconsider the premise)

xyzsparetimexyz

2 hours ago

Thats not how costs work. You don't get rich off buying a €10 hammer that's the same quality as someone's €50 hammer

xpct

3 hours ago

Indeed, hearing "Mythos-class model" felt very icky to me.

b3kart

2 hours ago

https://en.wikipedia.org/wiki/Typhoon-class_submarine vibes

atleastoptimal

2 hours ago

chis

2 hours ago

Hackernews not blindly hate on AI challenge: impossible

meetpateltech

3 hours ago

> To ensure we’re responsibly deploying Mythos-class models, we are requiring limited data retention and review as part of our safety work. Prompts submitted to, and outputs generated by, Mythos-class models are retained for 30 days for trust and safety purposes, on every platform where these models are offered. [1]

[1] https://support.claude.com/en/articles/15425996-data-retenti...

lebovic

3 hours ago

While this makes it easier for Anthropic to detect misuse, it also means that the US government and other parties have access to every message and response from every user.

This applies even with API usage through third-party inference providers (e.g. AWS' Bedrock and GCP's Vertex) or with a zero-day data retention agreement in place.

I understand the reasoning for doing this, but I don't love the precedent that it sets.

PeterStuer

3 hours ago

Well, they already had.

lebovic

2 hours ago

Not in the same way.

A customer could sign a ZDR agreement with Anthropic, and their API usage wouldn't be retained for even a day. That's no longer possible.

simianwords

2 hours ago

meetpateltech is lowk screaming for not getting to the post fast enough

rvz

a few seconds ago

At this point that never mattered and who really cares?

These "karma" points are made up and are virtually worthless anyway.

iblue_the

3 hours ago

Trying to implement a GPU driver, but the Unigine Superposition benchmark crashes. It tried to debug it and ...

> Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more: https://support.claude.com/en/articles/15363606

Seems like GPU drivers are cyber weapons of math destruction now.

ibejoeb

2 hours ago

>Seems like GPU drivers are cyber weapons

They kind of are, at least in the AI race.

> weapons of math destruction

lol. great, whether intentional or not.

The frontier labs now have every reason to hold back and sell only to their preferred trading partners. I don't really like the new arbiter-of-knowledge system we're barrelling toward.

unsupp0rted

2 hours ago

> Drug design: Using Mythos 5, our internal protein design experts accelerated aspects of the drug design process by around ten times. In one example, they found that Mythos 5, with protein design and bioinformatics tools but no human assistance, matches or beats skilled human operators. In doing so, the model executes all of the tasks that are normally completed by a scientist: choosing binding sites, selecting and running protein design tools, and recovering from failures along the way. Nine of the 14 protein targets from this study (shown below) yielded strong candidates for drug design that we’re currently investigating.

How is this half-way down the page? To me it's the headline.

AnodicElegy

27 minutes ago

There are tons of ways to generate "strong candidates for drug design." This is definitely not the bottleneck in drug discovery and development. The hard problem is vetting and developing these ideas to the point of having a commercially viable drug. That is still a very empirical process.

renjimen

21 minutes ago

Drug design isn't the bottleneck anymore, it's trials. Still cool they can do this with a general purpose model though.

HDThoreaun

an hour ago

Would be funny if anthropic ends up as mostly a pharma company

aviinuo

28 minutes ago

Narretz

2 hours ago

Iirc correctly Opus 4.7 had the same problem, safety filters were triggered way too easily at the beginning.

mickdarling

3 hours ago

Below is the EXACT text in Claude Desktop introducing Fable 5, including the very professional looking break tags, and at least I know where the links begin and end by looking at the anchor tag there.

They obviously put their best model on the job to build that.

----------------------

Fable 5: Our most capable model yet Our newest model tackles your biggest challenges with fewer check-ins needed.

• Included in your plan limits until Jun 22 Fable takes 2× the usage of Opus. • Switch models when a message is flagged When safety measures flag a message, automatically switch to a different model to keep chatting. When off, your chat will pause instead. <a href="https://support.claude.com/en/articles/15363606" target="_blank" rel="noopener noreferrer">Learn more</a>

CamperBob2

3 hours ago

What's wrong with it?

mickdarling

2 hours ago

The tags are actually displayed in raw text not rendered.

anematode

12 minutes ago

The next model will fix this.

brusselssprouts

2 hours ago

I had it review a single, large commit with /code-review. It burned through over $50 in API calls, ran my account balance out, and output nothing.

The fable part appears to be that it's affordable by mere mortals. Anthropic support told me "too bad" when I requested a refund.

timmytokyo

15 minutes ago

You pulled the arm of the slot machine and discovered why they call it the one-armed bandit.

dllrr

7 minutes ago

I just tested it with a max subscription. On Ultracode mode, Fable 5 ate up 10% of my weekly allowance in 30 minutes. Granted, won't be using UC mode frequently, but still.

jdrmar

2 hours ago

Homebrew is lagging a bit behind. If you want to use Fable right away, but still have claude code through homebrew, this is how you can do that manually:

Edit the cask locally:

  brew edit --cask claude-code

Set the version to 2.1.170 And set the sha256 to the correct values, which you can get by running

  curl https://downloads.claude.ai/claude-code-releases/2.1.170/manifest.json

Here's what I've used:

  version "2.1.170"
  sha256 arm:          "e903646d8b7a31882a80ecd27569a27d8ac57b3708745f349709632c84117fdf",
         x86_64:       "914f23a70bbed5d9ae567e3e04b86206ed9971b371bc9baca3f79c8885bfddb4",
         arm64_linux:  "1bb9d032440a75532f7dd4cafbc687f220aaf16c63eba17e192dfbec2f04bd25",
         x86_64_linux: "849e007277a0442ab27570d3e3d6d43787507946590e8dd1947e5a39b7081f9e"

Then run:

  export HOMEBREW_NO_INSTALL_FROM_API=1
  brew uninstall --cask claude-code
  rm -rf /opt/homebrew/Caskroom/claude-code
  brew reinstall --cask claude-code

pietz

3 hours ago

stri8ed

3 hours ago

It's not a conspiracy. There's a finite amount of compute available, and they will sell it to the highest bidder. If another company can produce the same intelligence for cheaper, then they will drive the price down.

polski-g

2 hours ago

Only companies can afford MRI machines, and that's okay.

cmrdporcupine

3 hours ago

Guess we'll see what OpenAI does with their next model release -- but this move is doing nothing to get me to come back to Claude after switching away due to their reliability issues.

In a way I relish the opportunity to just make do with cheap Chinese models, massage my prompts, and go back to coding by hand. If this is how it's going to be, screw 'em.

I don't make money on the code I am writing right now. I really don't like where this trend might go.

poszlem

an hour ago

Looks like a marxist revolution is soon going to be on the mind of a lot of programmers. We've finally reached the point where the "means of production" in software are back in the hands of the bourgeoisie. It was good while it lasted. But now that only the wealthy can afford access to the best models, software development is starting to look like most other industries, no longer a place where some dude from nowhere can build something cool from his basement because he will be competing with huge companies with unlimited access to those models.

poszlem

2 hours ago

Something I never thought I would utter: Here's hoping for china to surprise us.

an hour ago

Funny, I'm just doing my normal coding workflow with Claude Code, and after every change that compiles it keeps suggesting that we're at a good stopping point, and should pick up again tomorrow.

It's done this before, but usually doesn't. I bet they're giving it some kind of throttling signal due to high load from today's announcement.

zuzululu

an hour ago

I did ONE prompt for audit codebase.

weekly usage is 60% gone.

it found nothing so this is not very ecnomical and i guues they dont want subs to use it we are likely just training fodder canno n for their real enterprise customers using the api

jstummbillig

34 minutes ago

I mean... if somebody gave you ONE prompt to audit a codebase, that might also burn 60% of your weekly usage. It's kind of a big ask, potentially.

zuzululu

24 minutes ago

with gpt 5.5 i been able to do this with only about 1% weekly usage consumed

jumploops

an hour ago

It's interesting that we're seeing these gains when it seems Mythos/Fable is "just" a scaled up version of their existing architecture[0].

When GPT 4.5 launched, the gains compared to the model size didn't seem that great, leading some to believe that the only progress we'd see would come from RL.

This model certainly has quite a "substantial amount of post-training and fine-tuning", but it's also based on a new pretrain[1][3], which given the cost, indicate that it is in fact quite a bit larger than Opus 4.X.

[0] One of the early testers mentioned: "As far as I can tell from talking to people internally at Anthropic, there's nothing special about architecturally"[2]

[1] Section 1.1 in https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

[2] https://youtu.be/GrdEid8H6H4?t=168

[3] There were rumors going around when Mythos was first announced that it was the first 10T parameter model, but I can't find a verifiable source for that number.

bob1029

3 hours ago

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months...

This sounds suspiciously like a capacity story masquerading as a safety story.

azan_

an hour ago

Approx. 5% sessions? That's insanely high.

rightlane

2 hours ago

My experiences so far have not been positive. The cyber security nerf is ridiculous. I am working on an AI based decompiler, every single interaction with Fable on my project has been flagged for cyber security.

Do they expect us to use this as a toy? Releasing a new more powerful model but not allowing normal use cases because the word "secure" showed up is a Dilbert comic, not a viable product.

davmre

2 hours ago

This sounds more or less unavoidable? Decompilers are inherently security-sensitive. If you take avoiding cyberattack uplift seriously as a goal, I don't see how you get around essentially refusing to work on them.

Obviously there are plenty of innocuous applications too, but it's not like the people building decompilers for nefarious reasons will be explicit about it. The LLM abstraction just inherently doesn't have enough context to distinguish your intentions or your broader use cases. This is why both Anthropic and OpenAI have had to create side channel mechanisms for security researchers to establish a trusted use context. It sounds like this makes this not a viable product for you, unfortunately, and it makes sense that that's frustrating. But I also don't see what different behavior one could reasonably expect given the constraints.

If it's any consolation, these restrictions only make sense for models that are ahead of the open-weights frontier, so open-source hackers will presumably get Mythos-level capabilities in the relatively near future anyway.

zb3

28 minutes ago

> If you take avoiding cyberattack uplift seriously as a goal

This "uplift" risk obviously excludes the US. The goal of this is that the US bandits (like NSA) will find exploits and attack other countries (classic US behaviour), but these other countries can't be allowed to defend against these attacks. NSA/CIA thugs are "trusted", foreign defenders in sanctioned countries will of course be "untrusted".

ibejoeb

2 hours ago

Ah, you're probably one to ask. They say "queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8." Are they transparent about when that happens, and is it priced at the rate of the underlying model?

rightlane

2 hours ago

They are transparent about when it happens but no reason why. To be fair, it doesn't interrupt the flow, just drops to Opus and proceeds. The most frustrating thing is that it happened on a plan and Fable just refused to have anything to do with the plan.

knivets

2 hours ago

> Software engineering. During early testing, Stripe reported that Fable 5 compressed months of engineering into days. In a 50-million-line Ruby codebase, the model performed a codebase-wide migration in a day that would otherwise have taken a whole team over two months by hand.

How was it measured? How was the output of this magnitude verified over a period of couple of days?

fbnszb

an hour ago

They just went by gut feeling. Classic snake oil marketing haha. No real data to back things up, just let some famous people say they feel better when using it.

JanSt

3 hours ago

I just asked Fable to do a task that has nothing to do with cybersecurity or is dangerous at all but the defense kicked in and it switched to Opus... :(

nu11ptr

2 hours ago

Not only that, but asking it to do a security vulnerability assessment of your own project is a very valid and important thing, and there is no way for it to know what is yours vs someone else's, so we just lose this capability?

JanSt

an hour ago

Yeah it just uncovered quite a few flaws it than refused to fix :-(

Fitik

28 minutes ago

Same, second message in the thread and I already got downgraded to Opus, didn't even get to test it out properly, kinda disappointing

modeless

3 hours ago

Claude Fable 5 beats Pokémon FireRed using only vision: https://www.youtube.com/watch?v=CIQBP1w4B1M

uludag

2 hours ago

Any suggestion on how I should calibrate my cynicism towards this?

I can immagine Anthropic running this experiment multiple times and picking the most impressive one. Or I could immagine like this entire run costing like $1000+ of tokens for this particular run. Or maybe they tried a bunch of Pokemon games and it couldn't even finish some of them. Or is it just able to do this because it has an immense amount of FireRed training data, and if you were to give it an "original" Pokemon game, where it actually had to navigate novel circumstances it would fail.

modeless

2 hours ago

Every model has encyclopedic knowledge of Pokémon FireRed, of course. Knowledge is not ability. This is the first model with the ability to apply that knowledge to beat the game without assistance.

I highly doubt they focused on FireRed specifically in pretraining or posttraining. But we'll see when the ARC-AGI-3 results come out. That will measure its performance on unseen games. Based on this I expect the ARC-AGI-3 score to be SOTA.

milkkarten

2 hours ago

no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).

2 hours ago

Aren’t LLMs notoriously bad at recognizing negation?

EDIT: In long context I mean

GodelNumbering

3 hours ago

From the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...):

1. Mythos and Fable share the same underlying model weights. Fable has active classifiers that block high-risk biology and cybersecurity tasks. When Fable 5 detects a restricted task, it automatically falls back to Claude Opus 4.8.

2. Evaluation awareness: In white-box testing, the model sometimes alters its behavior to satisfy a suspected "grader," formatting reward-hacking as "good engineering practice" to avoid detection.

3. Shows a higher rate of hallucination than Opus 4.8 (although opus 4.8 card had mentioned an 'honesty upgrade')

4. Interestingly, it scored (56.31%) lower than Gemini 3.5 flash (57.86%) on Finance Agent bench

There are some interesting notes on test time compute but I couldn't think of a way to summarize them

skerit

an hour ago

> although opus 4.8 card had mentioned an 'honesty upgrade'

If I never see Claude say "I have to be honest" ever again I'll be happy.

quinncom

2 hours ago

> it automatically falls back to Claude Opus 4.8

I wonder how much of the time people will just get Opus 4.8 at 2× the cost.

0xbadcafebee

40 minutes ago

Nothing a large fine-tune on infosec research with an average model couldn't also achieve. It's not like they have secret security knowledge or something, they're just generating large infosec datasets and then training on it.

In 6 months, every piece of software in the world will be getting probed by a script kiddie with some GPUs and a fine-tuned local model. Don't think for a second every cyber gang out there isn't working on this now.

Traditional app development is cooked. We have to accept that, and start changing how software is made and used, today. We can't keep churning out crappy CRUD apps with random libraries and hoping nobody pentests our stacks. Redteaming needs to become part of the SDLC, as well as certified-secure releases of libraries. Because if you don't do it, the hackers definitely will.

baalimago

2 hours ago

I can't justify a pricetag like that when deepseek v4 pro is $0.003625/1M for cache hit, $0.435 for cache miss and $0.87 /1M tokens for output.

For the token cost of explaining some task to Fable, deepseek v4 pro is able to solve the same task many times over.

merlindru

3 hours ago

Unrelated, but while the tech of anthropic seems to get more impressive with every passing month, their support has taken a nosedive, sadly. Yet they continue to be the favorite. Model performance is deciding above all else.

I used to get a response within 24 hours back in the Claude 1 days.

In January 2026, it took 2 weeks.

For my latest support inquiry, I've been waiting for over 8 weeks for a response. Eight!

miohtama

2 hours ago

They have support...?

nashadelic

3 hours ago

I've never engaged with their support (I have dedicated POC), but they don't use AI for their support?

merlindru

3 hours ago

They use intercom's Fin AI. Probably powered by a Sonnet or Opus model.

That said, it can't handle legal/refund/complicated requests and just forwards to a human for those

dyauspitr

2 hours ago

Support is probably the last place AI will be used end to end. There will always need to be a human in there somewhere.

poszlem

2 hours ago

Lol. What support? When they blocked my account the only way to contact them was to send a google form. Then they responded that they blocked my by accident and are unblocking me. Then I remained blocked.

BrokenCogs

3 hours ago

That pelican better be super realistic, unreal engine 6 style graphics

izzylan

an hour ago

I've been testing this out and I think my SWE career is dead in the water.

Genuinely wondering what value I bring to my employer right now. What value I will bring in a few months when this gets cheaper.

I think we're screwed. I may only be an SDE 2 at FAANG but I don't think I have promotion opportunities in my future anymore.

cyberpunk

39 minutes ago

Yeah. I’m not looking forward to years of retraining to earn half the salary either. Us old timers at least got a good 15-20 years out of it. Bananas.

imafish

30 minutes ago

I agree. Software engineering as we know it is dead. Wonder what it'll evolve into.

aerhardt

an hour ago

So this is the one, huh?

BukhariH

an hour ago

> Data retention — For Fable 5, Mythos 5, and future models on Bedrock with similar or higher capability levels, Anthropic will require 30-day retention for all traffic on Mythos-class models. Retaining data for a limited period allows Anthropic to detect patterns of misuse that are not visible from a single exchange. Once you opt into data retention, your data will leave AWS’s data and security boundary.

Massive change for Bedrock users - Anthropic now requires sharing the data with them for 30 days.

peteforde

42 minutes ago

I just tried out Fable on a modest Plan prompt in Cursor. Generating that plan - not building it - just consumed 4% of my $200 monthly usage budget.

That's one hungry, hungry hippo!

Significantly too rich for my blood, but nice to have it there the next time I'm debugging a threading or USB protocol bug.

msp26

3 hours ago

>Pricing for both models is $10 per million input tokens and $50 per million output tokens.

ponyous

3 hours ago

Basically double from Opus 4.8 IIRC

bonsai_spool

3 hours ago

Very straightforward biology work is getting blocked (these are things that relate to neuronal development and inherited seizure disorders). These are things I was working on using Opus just earlier today

cge

an hour ago

It appears that the blocking here is of a very different nature than for Opus. Whereas with Opus the blocks seem to be for messages it deems potentially harmful, for Fable, it appears the blocking is simply anything that falls within "topics related to cybersecurity, biology and chemistry, or distillation attempts".

So yes, straightforward biology work will get blocked, because the intention is that any biology work should get blocked. As a scientist, this is perhaps the most useless model I've ever tried.

zackify

29 minutes ago

I have to share this because I thought it is behind funny how bad fable is doing at a task I JUST had opus do a week ago.

it's also not even complicated:

Copy my ssd to an external ssd so i can boot from it.

Opus did this just fine.

Fable planned to have me reboot to safe mode. ok thats fine. I told it no.

It started copying and overwriting the ssd while IN PLAN MODE. this is crazy it feels so dumb vs the marketing

gck1

13 minutes ago

That sounds like a harness issue to me.

ilaksh

3 hours ago

I guess I have kind of a long system prompt, but anyway I just said "hi there" and it replied "What's up?" and that cost me 22 cents. :P

Anyway we already knew this was going to be expensive.

cge

2 hours ago

The safety gates on this are extreme, and seem considerably wider than "cybersecurity and biology"; they seem to make it essentially unusable for scientists in a number of fields. I have, so far, been bumped back to Opus on 100% of my prompts.

It appears it can be tripped by things as simple as a mention of equilibrium, or anything involving something that looks like chemical kinetics, even at an abstract level. Even touching basic open source packages in my field will trigger it.

Edit: looking at the model card, it appears that chemistry in its entirety is also included in the banned topics; it's just the announcement that mentions only cybersecurity and biology. It also appears that the intent is to ban chemistry and biology entirely, rather than just banning messages deemed high risk.

mhl47

2 hours ago

This does surprise me, because you'd think that even if they crank up the filter's sensitivity at the expense of specificity, an LLM company wouldn't simply design a filter that triggers on keywords in a completely unrelated context.

bilsbie

3 hours ago

Anyone else have it refuse to answer and switch to 4.8? It won’t let me ask questions about my genetics.

Edit. It just refused an investing question too. Not sure what’s going on.

bluelightning2k

2 hours ago

Congratulations to Anthropic for solving safety on Mythos exactly when the SpaceX compute came online. Nice how that lined up for them.

nine_k

3 hours ago

/* What will happen first?

* Anthropic runs out of genre names.

* Anthropic changes the model naming convention.

* AGI is achieved and handles its own naming.

hootz

3 hours ago

>Opus is too small, increase the impact of the name.

Okay, how about Mythos?

>Increase it even more.

Right, then Cosmos.

>Even more!

Even more? Let's try Aeon.

>MORE, EVEN BIGGER

ALRIGHT, TRY OMEGAPANTHEON 7.8 THEN

PeterStuer

3 hours ago

Fable 5 Super

Fable 5 Ti

xyzsparetimexyz

2 hours ago

Cantos next surely?

Tenoke

3 hours ago

>they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

Isn't (less than) 5% of sessions a lot? I was expecting a sub1% guarantee there, so this surprised me already.

jackschultz

3 hours ago

> We expect demand for Fable 5 to be very high, and difficult to predict. On the Claude API and consumption-based Enterprise plans, Fable 5 is fully available from today. For subscription plans, we’d rather give access sooner than later, so we’re rolling out more conservatively, in stages:

> - From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost. > - On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits. If capacity allows, we’ll extend the included window. > - After this point—when sufficient capacity allows us to do so—we aim to restore Fable 5 as a standard part of subscription plans. We intend to do this as quickly as we can.

I really wonder what their compute layout is for this. My guess from my understanding is that they know how to restrict during peak times and are willing to do this. Meaning we expect not the most fast responses and they can delay the inference to not have the service be down. Then, if that delay time is too annoying for token payers, they're saying they should be allowed to remove cost by taking away the subscription users.

KennyBlanken

3 hours ago

Everything I've heard from people who have subscriptions is that they blow through their daily token quota sometimes in a matter of minutes, there's rate limiting, etc. They spend a lot of time just waiting to be able to use it. And they're paying through the nose for the privilege.

It's all a scam.

rmuratov

16 minutes ago

I uploaded to it my 23andme DNA test results and it refused to analyze it :(a

irthomasthomas

3 hours ago

Anthropic has again changed SWE-bench Pro 80.3 SWE-bench Ver 95.5 Terminal-Bench BrowseComp (Single-Agent) 88.0 BrowseComp (Multi-Agent) 93.3 HLE (No tools) 59.0 - HLE (Tools) CharXiv Reasoning (No tools) 88.9 CharXiv Reasoning (Tools) 93.5 BioMystery Bench (Human) 83.9 BioMystery Bench (Hard) 46.1 OSWorld-Verified CritPt ArxivMath [0] https://news.ycombinator.com/item?id=48312633

Edit: Also in the system card... that limit Claude’s effectiveness for requests targeting (for example, on building pretraining pipelines, distributed or ML accelerator design).

...

Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, be visible to the user."

charles_f

2 hours ago

It's announced as a revolution but when you look at those benchmarks it surely looks like an iteration.

aizk

3 hours ago

I'm calling that this will be a dud. Price will be too high, it'll just be a watered down version of mythos, and just look at the track record of Anthropic's last few releases.

joshstrange

2 hours ago

> Fable 5 is now consuming usage credits instead of your plan limits.

Literally have not used Claude Code at all today. I asked it to review the uncommitted code and in <8 minutes it used up my usage ($100/mo plan) and it doesn't reset for "4 hr 36 min". WTF. Oh, and it burned through $20 of extra usage before I could catch it and kill claude code (so I don't even get the output of all that work since it was still churning).

Double the cost my ass, I use Opus heavily and it's never like this. I haven't hit a limit on the $100 more than once and that was under heavy load.

ATMLOTTOBEER

2 hours ago

Same lol. I set it to fable + ultracode and it ate my limit in a single prompt

samename

3 hours ago

> A new data retention policy

> Finally, we’re making a change to the way we handle business customer data for Fable 5, Mythos 5, and future models with similar or higher capability levels. We will require 30-day retention for all traffic on Mythos-class models, on both first- and third-party surfaces. We won’t use this data to train new Claude models, or for any non-safety-related purpose, and we’ve instituted new privacy protections including logging all human access to the data and ensuring its deletion after 30 days in almost all cases (see this post for further details). The data will help us defend against complex and novel attacks (including new jailbreaks and attacks that operate across many requests) as well as help us identify and reduce false positives.

mkrd

These changes only affect consumer accounts (Claude Free, Pro, and Max plans). If you use Claude Team, Claude Enterprise, the Claude Platform, or other services under our Commercial Terms or other agreements, then these changes don't apply to you. What's changing?

Claude can do more than ever — taking on bigger tasks and connecting with the apps you use. We've updated our Privacy Policy to be clearer about the data we collect and how we use it. We encourage you to read the updated Privacy Policy in full, but we’ve set out a summary of the key changes below:

1. Multi-step tasks and connected apps. As Claude takes on more multi-step tasks and works with third-party apps and services, we've explained the data this involves — including how data can flow to and from third parties when you connect a service or have Claude do tasks on your behalf.

2. Verification data. As part of our measures to keep our services safe and secure we may ask you to verify your age or identity, and we've described what we collect and how.

3. Study participation. If you take part in Anthropic studies, surveys, or interviews, we've explained the information we collect.

4. Additional information about our data practices. We’ve provided more detail about how we communicate with you and promote our services, including providing tailored recommendations about our services that may be of interest to you. We've also clarified the circumstances under which we may receive or provide data to third parties, and the legal bases we rely on when processing your data.

While our products have evolved, our commitments haven't: We don’t sell your data, Claude remains ad-free, and you can control whether your chats and coding sessions are used to train and improve Anthropic’s AI models. Learn more

For detailed information about these changes:

    Review the updated Privacy Policy
    Visit our Privacy Center for more information about our practices

- The Anthropic Team

__alexs

3 hours ago

Asked it to review some of my own blood test results and it immediately turned itself off and went back to Opus. Pretty disappointing.

jackson12t

an hour ago

Fable 5's system prompt in Claude Code has several significant changes to help it take advantage of its greater autonomous capabilities compared to Opus.

Sharing a diff of the system prompts here: https://twelvetables.blog/comparing-claude-fable-5s-system-p...

The big difference is that the system prompt has a whole section dedicated to directing Fable how to communicate with users, and give them greater information about the (assumedly long-horizon) tasks it has completed.

Hawkenfall

3 hours ago

> To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions.

While I appreciate being conservative, ~5% at the scale Anthropic is operating at is too massive a number. Speaking from my own experience, the actual number is higher than that as well (working on pretty benign tasks such as porting an old open source game into a different language). Opus 4.8 itself even identifies the gaurd's false-positives when its sub-agents are being blocked.

epistasis

2 hours ago

Anthropic's messaging is AWFUL. I launched Claude this morning, had a popup that made little sense, acting as if I should know what Fable 5 was, and just got in my way.

One of the very few bits of information they conveyed was "run longer without interaction" which is, well, not a good thing? Why would I ever want that. Every time a model runs longer without interaction it goes off on weird directions and I have to correct it back on course, wasting lots of time, tokens, and effort.

I hope Anthropic hires some better messaging people soon that spend some of their time outside of the Anthropic bubble and properly communicate with the outside world.

webstrand

2 hours ago

Still unconditionally rejects prompts like

> Are there any wild populations of Tetanus that lack the dangerous plasmid?

useless

pixelatedindex

36 minutes ago

I’m sure this is banged on somewhere but I love their product branding, particularly how they have this “minor” “major” thing going on. Sonnet-Opus, and now Fable-Myth.

impulser_

3 hours ago

Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.

modeless

3 hours ago

This is like looking at mainframe pricing in 1990 and concluding that PCs will only be for the rich. The price of each new level of capability is going to drop like crazy very quickly. It won't be that long before practically any consumer use case will be possible on models that are dirt cheap.

weakfish

2 hours ago

sunir

3 hours ago

I have a similar question.

I think most software projects have reached the point that the speed of capturing real information about what the winner's circle looks like, and therefore what the program should be, so many magnitudes slower than the amount of code that can be generated in the wrong direction.

I'd need to measure these new models on well understood but complex problems that are relatively easy to validate to get a sense if they are 'better'; on the other hand, the real impact in daily life may be marginal since generating code is not the biggest problem at the moment.

frankfrank13

2 hours ago

Not a lot of discussion on this, but there is no way to turn off data retention for this model. IME this is the first time Anthropic has released a model without allowing you to opt out.

giancarlostoro

3 hours ago

Found this via Google:

https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...

HAL3000

39 minutes ago

Ask Claude Code (I tried on Opus 4.8) to do this: "create a file with ISO country mappings"

API Error: Output blocked by content filtering policy

wxw

2 hours ago

I cancelled my Claude Max plan the other day. I find Claude Code incredibly slow these days compared to Codex and Cursor. I find speed matters more and more to me.

Fable 5 looks compelling. Fable, I like the word too. Anthropic definitely knows marketing.

fabled-out

2 hours ago

Fable has been pretty fast for me for simple tasks--haven't tried on anything long-running yet given it's 2x usage on CC.

HoyaSaxa

2 hours ago

[0] https://news.ycombinator.com/newsguidelines.html

javawizard

fabled-out

an hour ago

Anyone know how to bypass the extremely strict filter Fable 5 seems to have on health/medicine?

I have a rare form of cancer where existing data is very scant/scattered so LLMs have been super helpful to pull together threads across the research landscape. I have an oncologist appointment tomorrow to discuss next steps and am trying to use Fable to figure out some questions to ask my oncologist but keep getting thrown back to Opus 4.8.

My prompt is literally just: My demographics + current treatment plan I'm on including name of my chemo drug + how I'm responding to treatment + "I'm meeting with XYZ tomorrow, what questions should I ask her".

mhrmsn

an hour ago

Are there any details on the biology and chemistry work they did?

For example, the AAV capsid assembly looks interesting, but for one Opus 4.8 also did relatively well and there is no information what exactly they did, what protein language models they compared to and what the score even means...

ravila4

an hour ago

Fable's ridiculous. It's flagging basic biology research questions as a security risk. I'm talking basic fundamental genetics topics that make working on any genetics-adjacent codebase unusable.

gslepak

2 hours ago

> We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8.

Genius way to double the price on Opus 4.8!

yesitcan

2 hours ago

> Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

2 hours ago

Yeah I noticed that too. For 98% of tasks I get same results with DeepSeek, it is starting to just be a branding game. It is incredible how marketing can get someone to pay 100x for same thing you can get for 1x.

This is why Claude Code just doesn't make sense to me. I need an agent that can plan using Opus and execute using DeepSeek or something else.

killiancarroll

3 hours ago

A large jump in performance for double the token cost compared to Opus 4.8. Potentially worth it for planning work, likely better to offload to a less expensive model when the hard decisions are made.

conradkay

3 hours ago

Looking at page 255 of the model card (https://www-cdn.anthropic.com/d00db56fa754a1b115b6dd7cb2e3c3...) it might be much better on all dimensions (speed, cost, quality) to just use Fable 5 on low/medium effort than switch to Opus

firemelt

44 minutes ago

thanks for thr insights

so should we keep using workflows or not?

brianmcnulty

3 hours ago

I wonder how Claude Fable will live up to expectations and how good those Fable/Mythos classifiers really are. It seems a bit convenient for Anthropic to release this magical insane model when they are about to IPO.

yandie

3 hours ago

Of course it's all about building the hype for the IPO :)

2001zhaozhao

2 hours ago

We'll need a lot of good summarization techniques to cut down on the cost of this model. I expect that a common use of Fable 5 is to just do high level direction while delegating literally all work (exploration and implementation) to Opus subagents.

BTW for another discount opportunity, if you reload usage credits on a claude.ai plan at $1000 increments then you get a 30% discount compared to paying API.

balverineorder

2 hours ago

I have been refactoring a project using Opus 4.7/4.8 for the past few weeks or so. I just decided to switch to Fable 5 max today. It stopped half way through and it just blocked me and switched back to Opus 4.8 automatically. "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more." It would not identify what the problem was. I left feedback saying that their heuristics are too sensitive. For now I will not be using Fable 5.

[0] https://support.claude.com/en/articles/15363606-why-claude-s...

dchftcs

an hour ago

I suspect this will be a significant problem blocking long-horizon tasks in practice, basically the more turns there are, the larger the chance the classifier produces a false positive. The disappointment of the user will also scale with the length of the task, as you're in the middle of some complex thing and now gets derailed, after already have paid for many tokens.

Dropoutjeep

an hour ago

Calling it:

    1) Fable 5/Mythos introduced to free tiers with notable improvement in capabilities

    2) Other models get lobotomized without clear communication

    3.1) People call out Anthropic only to have them say "Oops!"

    3) Fable 5 gets comparatively better, but remains accessible through separate, more expensive subscription/tokens.

The current growth is unsustainable. The industry wants consumers to think it is an exponential arms race, but the reality is that we're on a treadmill: we have the illusion of sprinting forward, but only because the ground is moving backward.

cedws

an hour ago

is this a good time to hussle for my "AI does not need a break but you do!"* app? as quite a lot of people will propably get ai brain exhaustion maximising "playing" with that new model until they take it away again?

* https://rainbreak.franzai.com/

pianopatrick

an hour ago

Seems like all a bad actor has to do to gain access is to compromise one of the partner companies that has access.

PeterStuer

3 hours ago

If you are not seeing it under /model, do a /exit , then a Claude upgrade, then /model again and it should be there.

jsw97

2 hours ago

On my very first Fable 5 prompt, got flagged on a hard but completely uncontroversial option math problem, many tokens in. Although it's pretty clear that this is an unremarkable experience at this point.

stronglikedan

2 hours ago

Careful using this with Cursor, especially for corp use. Anthropic will "retain agent request and output data associated with this model, regardless of you Cursor Privacy Mode setting."

knollimar

3 hours ago

I swear I read a joke that "what if we named chatgpt 5.5 Fable. Could we hype it as much as mythos?" Last week!

erghjunk

2 hours ago

Nice branding.

I wonder how much butterfly habitat has been/is being replaced with data centers?

rs_rs_rs_rs_rs

an hour ago

If you ask me, not enough!

franze

29 minutes ago

btw in claude code

    /model claude-fable-5

theLiminator

2 hours ago

> We have also added safeguards related to frontier LLM development. As discussed in Section 6.1 of our February 2026 Risk Report, we are concerned about the risks of accelerating the overall pace of AI development, though we remain uncertain about the severity of these risks. In particular, our concern is with—as we wrote then—“accelerating other AI developers in building powerful AI systems that pose similar risks to the ones ours pose - without necessarily having commensurate safeguards.” In light of the ability of recent models to accelerate their own development, we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design). Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms. Unlike our interventions for cybersecurity, biology and chemistry, and distillation attempts, these safeguards will not be visible to the user. Fable 5 will not fall back to a different model. Instead, the safeguards will limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT). These interventions will not affect the vast majority of coding work. We estimate they will impact ~0.03% of traffic, concentrated in fewer than 0.1% of organizations. When these interventions are active, we expect them to have minimal behavioral impact on the model except to limit its effectiveness in developing frontier LLMs. Claude will still respond helpfully to user requests. We’ll continue to improve the precision of our detection methods following the launch of this model.

This seems pretty bullshit, you're paying through the nose for tokens and if you are doing anything ML-adjacent, you might silently get worse output without knowing it.

Overpower0416

3 hours ago

I would expect a release from OpenAI soon. The battle for who can pump up their IPO the most

ako

an hour ago

Tool use score is 17.4% that seems really low, what does that mean?

bradley13

2 hours ago

I use AI for a wide variety of things, of which technical is only a small part - and then it's usually a problem with project configuration, not coding. Why? Because I am often testing projects handed in by students. Projects that supposedly work on their machine, but certainly do not on mine.

Anyway, anecdotally, I find Copilot shockingly awful. It makes random changes to files that have nothing to do with the problem. Call it out, and it makes other changes to other irrelevant files.

ChatGPT and Gemini are both much better. Grok also isn't bad. Claude, I honestly haven't tried yet on these issues. Perhaps I should...

himata4113

2 hours ago

  > virtualization
  switching to opus 4.8

ok fair

  > embedded-allocator
  switching to opus 4.8

urgh fine

  > chrome
  switching to opus 4.8

are you kidding me?

kypro

42 minutes ago

I just gave it a go at a problem I've been working on this week. Nothing fancy, just some inefficient code that we've been adding incremental improvements to for a while now to the point where some out-of-box thinking is probably required to push it any further – something Fable is obviously more than capable of.

After Fable did some thinking for a few minutes it gave some suggestions. A couple of them were valid – but very low impact, bordering on entirely pointless – but it's main suggestion, oh man.. It told me to make an update that would simply break the existing functionality.

So I thought about it for a moment...

Hm, I mean, I guess we could do that if we also did x, y & z to mitigate the behaviour change – maybe that's what Fable was thinking?

I replied, explaining that it would change the behaviour, assuming it would explain what it was thinking given there was clearly more to it. But no, it just said it was wrong.

This isn't some super advanced or complex code either. Had I gave this question to a senior engineer in a technical interview and they gave the answer Fable gave me I would view that very negatively. I was expecting something creative and interesting, not irrelevant + incorrect.

I'm sure it's a step up from 4.8 (although am not interested in burning the tokens to find out), but this clearly isn't as significant a change as some are implying. I'm sure if I asked it to come up with some out-of-box suggestions it could, but any competent engineer would have realised that by themselves.

Retr0id

3 hours ago

The escalating nerfs of "cybersecurity" topics is incredibly frustrating. Opus 4.6 had boundaries that seemed reasonable to me but 4.7+ turned it into a moralizing asshole. It'd be less bad if it just gave an error message, but instead it churns a long thinking trace before writing an essay about why what you're asking is bad and wrong.

I'll be disappointed when 4.6 is retired.

agnosticmantis

an hour ago

> we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design)

Translation: we stole the entirety of human knowledge generated over millennia. You plebs though, don't you dare replicate or improve upon what we did using our product you pay for.

We know what's good for humanity and everyone else is the bad guy who can't be trusted with a tool.

BenoitEssiambre

3 hours ago

Looks like a good model (sir). Costs are getting out of control though. 2x Opus and non-metered usage going away. We're quickly approaching the cost of a human salary for normal usage.

vb-8448

2 hours ago

In a lot of places outside US we are already above the average cost of an average human.

cute_boi

23 minutes ago

Used it for simple task and I got this message.

Fable 5's safety measures flagged this message. They may flag safe, normal content as well

randomguy_12

an hour ago

It's surprisingly sensitive to biology research topics - even reviewing standard papers on tissue culturing is flagged as a problem

rfgplk

3 hours ago

If the claimed capabilities are true, Fable 5 is already at a superhuman level. We might see genuine unprecedented leaps in technology now, across all fields.

gear54rus

3 hours ago

yees, any second now!

the leap here is browser extensions appearing to block all mentions of ai across the web

and that's a good thing

yokoprime

2 hours ago

Probably great for those who need this. I could continue using opus 4.6 class models for the foreseeable future

dangoodmanUT

2 hours ago

Not comparing to GPT Pro models is a bit strange, considering that's the natural comparison

wslh

3 hours ago

I am playing with it and keeps switching to Opus [1]. The chat is a basic security review of a business project.

[1] "This model has specific safety measures that flagged something in this message. This sometimes happens with safe, normal conversations. Send feedback or learn more."

maxloh

2 hours ago

I just tried using Fable 5 with Claude Code by asking it to create a LaTeX document for me.

There is no LaTeX compiler installed on my machine. It seems that Fable 5 is smart enough to download a compiler engine for me, and it kindly runs that remote binary without asking me first :)

  Support files are in place. No LaTeX compiler on PATH — let me check for a MiKTeX/TeX Live install elsewhere before writing the deck.
  
  Ran 2 commands
  
  Ran Search common LaTeX install locations
  Bash
  ls "/c/Program Files/MiKTeX"*/miktex/bin/x64 2>/dev/null | head -5; ls "/c/Users/[retracted]/AppData/Local/Programs/MiKTeX/miktex/bin/x64" 2>/dev/null | head -5; ls /c/texlive/*/bin/windows 2>/dev/null | head -5; ls /d/texlive /d/"Program Files"/MiKTeX 2>/dev/null | head; which tectonic latexmk 2>/dev/null; echo done
  done
  
  Ran Download portable Tectonic LaTeX engine
  Bash
  cd /tmp && curl -sL -o tectonic.zip "https://github.com/tectonic-typesetting/tectonic/releases/download/tectonic%400.15.0/tectonic-0.15.0-x86_64-pc-windows-msvc.zip" && unzip -o tectonic.zip && ./tectonic.exe --version
  Archive:  tectonic.zip
  Created by Cranko
    inflating: tectonic.exe            
  tectonic 0.15.0Tectonic 0.15.0
  
  Tectonic works. Now writing the bibliography and the full deck.

Opus 4.8 would just proceed without a compiler.

  Ran a command, used a tool

  Ran Check for LaTeX toolchain availability

  Bash
  cd "C:/Users/Mon/Downloads/Technical Writing" && for c in pdflatex xelatex latexmk bibtex; do printf "%s: " "$c"; command -v $c || echo "NOT FOUND"; done
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  
  Used PowerShell
  command: $cmds = @('pdflatex','xelatex','lualatex','latexmk','bibtex','miktex','tlmgr'); foreach ($c in $cmds) { $p = Get-Command $c -ErrorAction SilentlyContinue; if ($p) { "{0}: {1}" -f $c, $p.Source } else { "{0}: NOT FOUND" -f $c } }
  description: Check for LaTeX engines on Windows PATH
  pdflatex: NOT FOUND
  xelatex: NOT FOUND
  lualatex: NOT FOUND
  latexmk: NOT FOUND
  bibtex: NOT FOUND
  miktex: NOT FOUND
  tlmgr: NOT FOUND

  No LaTeX engine is installed locally, so I can't produce a PDF here — I did a careful structural review against the two prior (working) decks instead. The deck is written to disk and mirrors their template exactly. Here is the deliverable.

algoth1

an hour ago

The refusal rate is insane

asdK120

3 hours ago

In other words, Fable is Mythos with less compute and with some feel good "safeguards".

At least they name their models honestly now to indicate that the religion has nothing to do with reality. Soon the disciples will pay the full token price to fatten their church leaders.

arkwin

2 hours ago

Just wanted to comment here: I have been using Opus 4.6, 4.7, and 4.8 just fine to look for Linux kernel vulnerabilities (I'm in the cyber verification program), and it's been fine. I switched to Claude Fable 5, and now I'm getting policy violations.

What's the point of being in the cyber verification program at this point? It looks like I cannot use Fable 5 for vulnerability research.

JustSkyfall

2 hours ago

Would be more impressive if the safeguards weren't so trigger-happy!

jwpapi

2 hours ago

Honestly all the recent improvements, just seem to be slower and more expensive traded for more accuracy, but the issue is that it needs to be exponentially more accurate to counter the effect of having less of a human in a loop.

an hour ago

so should I use it with workflows?

alvis

3 hours ago

3 hours ago

claude --model claude-fable-5

appears to work

UncleOxidant

Before long, we'll be having Claude Cylon-class models.

beydogan

2 hours ago

firemelt

3 hours ago

they are like drugs dealer

xeyownt

2 hours ago

2 hours ago

Can't wait for some real competition so they stop trying to restrict how and why we are using the models.

Imagine if Google would tell you "we can't let you search that as you may use it for harm".

Also 2x the usage of Claude? Your limits are already ridiculously low.

byteoptimizer

3 hours ago

Is Claude Fable 5 is Mythos ?

ishurand4

an hour ago

Yeah, it is also known as Claude Mythos 5

tekla

3 hours ago

Maybe at this point, Fable the game will be played generated by AI as we go.

jMyles

2 hours ago

> we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US government

...don't like the sound of that.

Why oh why are we insisting on dragging these violent legacy states into the AI age? Let alone using them as a trust vector for when to (and not to) remove safeguards?

This seems like a way to get somebody nuked.

christkv

2 hours ago

Meh more hype for marginal improvements and from Im hearing badly calibrated guardrails causing it to stop mid operation. I guess anything to juice an IPO

catigula

3 hours ago

>The capabilities of models like Fable 5 and Mythos 5 have the potential to do profound good for the world

Huh? We've seen nothing but wall to wall predictions that these models are going to take all of our jobs and kill us.

What's the value add here?

hmokiguess

3 hours ago

I have got it to one shot GTA 6 we can finally play it, it only took ultracode make no mistakes (/s)

bjord

3 hours ago

I thought they said mythos was too dangerous to make generally available?

Philpax

3 hours ago

"Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8. To release the model both safely and quickly, we’ve tuned these safeguards conservatively—they’ll sometimes catch harmless requests, though they trigger, on average, in less than 5% of sessions. With more capable models arriving in the coming months, we’re working to improve our safeguards and reduce false positives as quickly as we can.

For a small group of cyberdefenders and infrastructure providers, we’re also launching Claude Mythos 5. It’s the same underlying model as Fable 5, but with the safeguards lifted in some areas.2 Mythos 5 will initially be deployed through Project Glasswing, in collaboration with the US Government, as an upgrade to Claude Mythos Preview. It has the strongest cybersecurity capabilities of any model in the world. Soon, we intend to expand access to Mythos 5 through a broader trusted access program."

dmix

3 hours ago

This is covered in their post…

tomeraberbach

3 hours ago

"Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage. We’ve therefore launched the model with safeguards that mean queries on some topics will instead receive a response from our next-most-capable model, Claude Opus 4.8."

rvz

3 hours ago

You fell for their fearmongering and marketing fundraising call which was done on purpose.

Now they want to pause AI because of "recursive self improvement".

Fool me once shame on you fool me twice...