botw44
4 hours ago
The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one.
The bottleneck is compute and data, not the model. That's why they could only gate it for a bit. The ITAR thing proves it: no nationality controls in place, so the only option was killing the whole thing. Not exactly what an all-powerful gatekeeper does.
embedding-shape
3 hours ago
> The whole thesis falls apart though. You can't be on your way to "power over everything" and get distilled into free Chinese models within months. Pick one.
But is that last part actually true though? Sure, there might be 600B+ models available for download and local inference if you have the hardware, but does the users who use Anthropic switch over to those even if they're available even as hosted models? Seems like some do, most don't, Anthropic and Claude remains very popular among the people who use LLMs, there is no denying that.
vbezhenar
2 hours ago
> does the users who use Anthropic switch over to those even if they're available even as hosted models?
I'm currently spending $200 for Claude. That's around my maximum that I can afford. I could stretch that to $500 I guess. But I saw reports of people spending tens of thousands of dollars with Claude API. That's certainly outside of my budget.
So if/when Anthropic decides to stop subsidizing subscription (if they ever do that thing, I still not sure about that), I'll certainly look at the other options. And available "open weights" LLMs hosted by someone will be my first pick. Right now Claude 4.8 feels very advanced, but things move very fast...
HDThoreaun
an hour ago
The ai labs would be very dumb to get rid of subscriptions. First, I don’t even think the subscriptions are losing money, I suspect they’re around break even, maybe small loses. More importantly, the subscriptions are how they lock in users and convince companies to pay api rates. Without user loyalty that they cultivate with subscriptions businesses will just use the cheapest model on open router or maybe local models.
FuriouslyAdrift
an hour ago
The hotness we are seeing is smaller 'expert' models with an 'orchestrator' model in front that evaulates the prompts and routes to the appropiate small models and then synthesizes the collected answer. Easier to split across many smaller, cheaper servers and more efficient than a huge monolithic model.
losvedir
an hour ago
Do you have more info about this? I can't tell if you're being misled by the unfortunate "Mixture of Experts" terminology (which don't work the way you're describing), or alluding to something different.
Or, maybe I'm wrong, but my understanding is: MoE is just an architecture to keep the activated weights smaller per token. The experts get routed basically token-by-token, and the "experts" themselves don't have a semantic domain so the "expert" word was maybe a poor choice.
halJordan
an hour ago
I don't think you're appropriately understanding the full gamut. The individuals who only spent $200/months will be stuck. But the pie is increasing in size, it's not stagnant. There are a lot of orgs who can afford to run a 1T model and even more that can run a 600B model. These newcomers are what's being fought over
ForHackernews
2 hours ago
> Anthropic and Claude remains very popular among the people who use LLMs
Only because someone else is paying the bills. I use Claude Opus at work because my employer pays for the tokens and encourages me to do it.
At home, I use DeepSeek Flash. It's not as good, but it's maybe 0.7 quality for 0.001 cost.
LaurensBER
2 hours ago
Same, I had Deepseek search for, download and transfer (to my Linux emulation machine) the best Dreamcast games yesterday.
GPT refused to do so (citing that it's illegal even though I own the games). Deepseek did a wonderful job for 7 cents.
At work I use Opus because, why not? But I could easily switch to a less capable model if needed.
mark_l_watson
2 hours ago
I have a question that perhaps you or someone else here has an answer for: I enjoy using Opus via Google Antigravity (usually agy) for perhaps 90 minutes a week. For Google’s subsidized $20/month plan they seem to give out a reasonably generous amount of Claude tokens. How does this compare with Anthropic’s $20/month plan using Claude Code?
BTW, I also use DeepSeek v4 Flash very frequently: fast and so cheap it is almost free.
okdood64
32 minutes ago
What's the speed on DeepSeek Flash? And what provider?
ForHackernews
13 minutes ago
Fast enough? I signed up directly with https://platform.deepseek.com/ because it was the cheapest I could find. I use both Anthropic and Deepseek models via the VS Code copilot plugin https://github.com/Vizards/deepseek-v4-for-copilot
_the_inflator
3 hours ago
I disagree. It is not the model alone. It needs a system which capitalizes on it. And this is very complex. Hardware, software, architecture - it takes a lot to get it right.
Try running the latest OS models on a normal Mac or PC. Claude Fable and Mythos are systems not just pure models.
And of course marketing. Don't believe the hype.
I think Claude is often times underwhelming. Security concerns are also a concern companies have a blond spot for. The really toughest pro security (Yes, pro! Totally different framing!) company I know is Google after all.
What I can companies advise to do is, really having more than just bug bounties but a professional hacker team that does nothing else but attacking them the whole day and night 24/7. This needs to be coordinated with the government otherwise you might sound an alarm and will be SWATed for doing good. And I would pay them huge sums since the risk and fallout warrant such a treatment, not the standard wage.
Hackers are the real deal, not AI. Proof: Hackers using AI.
zozbot234
2 hours ago
> Try running the latest OS models on a normal Mac or PC.
It can be done through the magic of SSD offload. The worst case involves seconds-per-token speeds, but that's OK if you only care about low volumes of slow unattended inference, which maximizes utilization for the hardware.
(The real worst case, where you're streaming the whole model from the cheapest storage you could feasibly think of, involves multiple minutes per token for a single inference, or even hours per token batch if you're doing many inferences in bulk. That's a lot less helpful, so there's a space for smaller models at the edge, even for unattended workloads.)
nerdsniper
3 hours ago
> I disagree. It is not the model alone. It needs a system which capitalizes on it. And this is very complex.
AFAICT … despite saying you “disagree”, you appear to be agreeing with the parent comment that the model is less important and compute (all that complex infra) and data (also complex infra) are more important.
ramblurr
2 hours ago
> > The bottleneck is compute and data, not the model.
> I disagree. It is not the model alone. It needs a system which capitalizes on it. And this is very complex. Hardware, software, architecture - it takes a lot to get it right.
What do you disagree with exactly?
christkv
2 hours ago
For now I suspect however that the gigantic models are not needed and you will be able to do pretty much what you need in a specific domain with 120b or lower. There is so much trash in the frontier models. I don't need all the world's slam poetry for my coding tasks for example.
ACCount37
2 hours ago
Wrong, mostly.
Model capability is a function of model size. Raising the bar raises model performance in every domain.
An "idiot savant" model that's overtrained for a specific domain would beat a generalist model of the same size. But scale the generalist up enough, and it'll trounce the specialist. Removing poetry data from a model training mix doesn't give you much - it might even cost you some performance - and "idiot savant" approach of overtraining for a domain has a hard ceiling.
So far, it seems like there's some equivalent of "g factor" in LLMs - a broad "intelligence" value that performance across many diverse domains correlates with. And, as a rule, larger models have more of it.
overfeed
40 minutes ago
> Wrong, mostly.
> Model capability is a function of model size
AModel effectiveness has improved across model sizes. You really should try the latest flash variants more. They have become my default for most tasks except for gnarly high-level planning.
ACCount37
22 minutes ago
"Capability per parameter" is rising, but parameter count remains an advantage. And small models remain bad, because "good" is a rapidly moving target.
A 2026 4B beats 2024 4B, but both are far behind the contemporary frontier. Which makes them bad. There is no such thing as "too much capability" - a "good" model is whatever the current frontier is.
In 2024, a "good" model is one that can be trusted to write a 800 line script. In 2026, it's a model that can be trusted to do gnarly high-level planning and execution both. In 2028, it's going to be something like a model you can point at an extremely involved task, abandon, and have it report back with a "done" in 3 weeks.
olmo23
4 hours ago
> no nationality controls in place
Not for now, but how long before we have KYC regulations concerning LLMs?
thefounder
4 hours ago
That’s really what Dario wants. Let’s hope he doesn’t get it
vbezhenar
2 hours ago
But he already got it, no? Claude Fable can only be made available to US citizens, which implies that every user who wants to use Claude Fable must provide proof of citizenship in some way, basically KYC.
baq
3 hours ago
what Dario wants is to retain any influence whatsover on how the research progresses before the inevitable nationalization of the frontier. he gets to keep the N-2 tech and maybe influence the N-1 tech, but the only influence on the frontier he has is today; whatever he imprints in the pipeline the government takes over.
IOW I don't think he thinks in the same categories as most folks here.
overfeed
30 minutes ago
> ...the research progresses before the inevitable nationalization of the frontier.
Hacker News has been telling me America beats China at "innovation" because of the "freedoms" - especially frew enterprise. I wonder how a nationalized frontier lab would perform.... Andhow the non-citizen researchers would feel about working for the US government that doesn't trust them to use frontier models.
stogot
2 hours ago
N-1? N-2?
Avicebron
2 hours ago
Best-possible-model (N) - Two Generations (2), same with N-1, N is the SOTA in this example. I'm not sure that actually clarifies what the comment is trying to say other than they think the models will be nationalized (can't even imagine what that would look like).
baq
2 hours ago
basically imagine the Manhattan project, but instead of blowing up the desert they're building the biggest datacenter you've ever seen.
Avicebron
2 hours ago
Isn't this the beginning of the plot of "I Have No Mouth, And I Must Scream"? The exceptionally disturbing dystopian horror?
baq
2 hours ago
the possible futures after the thing is built are uncountable, but hoping the thing won't get built at this point is naive.
in general I agree people should be reading a lot more sci-fi nowadays than they used to.
stogot
2 hours ago
I read the popular ones, but itch for more. Which sci fi most applies today?
dofm
3 hours ago
Regulatory capture is the OpenAI and Anthropic end goal, for certain.
But I also think they exist in a sort of un-designed corporate narcissism, which is a common trait in bubble economies — I am not judging them particularly severely.
Netscape under Clark and Andreessen and Sun under McNealy both fell into corporate narcissism: the belief that only they really mattered, that they were chosen, and that the world needed to rearrange itself to just let them shine. They arguably let themselves get played by Oracle (a corporate psychopath) and others as a result.
OpenAI's position is profoundly corporate-narcissistic: all we need is all the money in the economy and not to have to do anything upsetting like think about turning a profit for the next four years. Like rich kids. It would be nice if you believed we were so important that we should get an enormous stipend for just being us.
Anthropic's position is: we think we're so unique and ominous that government needs to make us both essential and terrifying. We have to exist otherwise worse people will.
Both narcissistic positions.
baq
3 hours ago
> Regulatory capture is the OpenAI and Anthropic end goal, for certain.
it has to be, because the other way around - the government taking over parts or the whole thing - is inevitable if the trend holds.
blitzar
3 hours ago
the inevitable trend is that numbers will be free and nobody will control the whole thing
ai-celebrities are just clinging to relevance like all the other celebrities out there
intended
3 hours ago
HN is the builder side of the conversation, and in my experience, few safety people congregate here.
The safety side of tech is a PTSD inducing shit show. Governments are more than happy to champion age verification laws, because parents, around the world, are clamoring for anything to pump the breaks on the social media experiment.
Society outside of HN is quite tired of Tech, and I despair of figuring out a way to make this clear to the commentariat.
ang_cire
2 hours ago
Social media is old hat now.
As someone on the "safety side of tech", social media is being exploited to increase surveillance and government control precisely because its actual social influence is heavily on the wane, and capital is happy to sacrifice what's left to increase the profits of the expanding public/private tech surveillance industry (with "protect the children" controls on social media like age verification being the usual backdoor route it always is).
Society may be growing tired of Tech, but governments aren't, and in fact they're heavily expanding their back channel reliance on not-traditionally-military Tech as an extension of their Defense spending.
intended
2 hours ago
Cyber security has the maturity that trust and safety hopes to achieve at some point.
Social media was being exploited from inception. Palantir had sales documents for sock puppet management software back in the PHP era.
I don’t disagree that Government is interested in tech, but I will push back on the dismissal of child safety that is inherent in your comment, intended or not.
For all that some people in the firm may have tried to do the right thing, Social media firms have created bad outcomes for children, and executives were briefed on the harms they were going to cause.
This is the dismissal that concerns me, because it ends up miscalculating the level of anger and unhappiness amongst the voting populace, and therefore the political will to pass regulation to reign tech in.
dofm
3 hours ago
> Society outside of HN is quite tired of Tech, and I despair of figuring out a way to make this clear to the commentariat.
I don't think anyone in tech is really truly engaging with how quickly the shine has come off the tech industry. Except maybe Apple, who even so still have some work to do.
malfist
2 hours ago
Technology and science is the intersection that is supposed to make our lives better, easier, more prosperous. The last decade or two what marvelous technology has came from silicon valley that hasn't served primarily the billionaire class and made life worse for the common people.
The yoke of silicon valley is feeling heavy. People might just throw it off.
dofm
3 hours ago
Porque no los dos?
baq
3 hours ago
this is exactly the play is my point
Aperocky
2 hours ago
Spot on. There's a certain level of drinking the kool-aid or getting high on their own supply. Anthropic is a lot worse than OpenAI but OpenAI had to go through rounds of shedding.
dofm
an hour ago
To be maximally fair to them, I think it is difficult to be one of the key businesses in a market bubble and not fall victim to this kind of thinking, especially when the continued inflation of the bubble depends on you — lots of people lose their shirts if you don't push hard to be "special".
But as you say, there is a measure of getting high on one's own supply now.
And there's the curious solipsistic energy of Sam Altman whimsically musing in public that it turns out his product is too expensive for people and they complain when you make the price realistic (when it possibly needs to be more expensive for OpenAI to survive).
They seem to believe that the ordinary rules either will not or somehow must not apply to them; it's increasingly bizarre to watch.
Maybe the people around pets.com were this bizarre; we didn't have so much livestreamed interview content to show us.
throw1234567891
3 hours ago
Yeah yeah, but after the IPO!
zozbot234
3 hours ago
"Distillation" from APIs is not a thing, it cannot replicate a model's deep reasoning and behavior.
bob1029
2 hours ago
I struggle with the practicality of the whole thing.
The amount of tokens required to properly distill a frontier model is so large that by the time you could consume the # of tokens you would either be banned for extremely obvious abuse or a new model would be released, rendering your efforts less and less valuable over time. Intelligence is not a linear thing. Being behind just a little bit can have exponential consequences.
Aperocky
2 hours ago
> Being behind just a little bit can have exponential consequences.
That seems to be the argument of Dario, Sam et. al., but I'm not ready to believe it. Time will tell, but this can be a marathon and Anthropic and OpenAI is in getting ready to sprint the last lap of the first mile.
archon
3 hours ago
I'm uneducated on how distillation works at more than a basic level so forgive me if this is a stupid question.
Isn't "distillation" of another provider's model exactly how these models got training date in the first place: Massive amounts of the written word + Prompt -> Answer. Why wouldn't distillation produce similar "reasoning" in the new model? It's just inputs and outputs.
maxbond
2 hours ago
What you're describing is (pre-)training. Distillation requires richer labels, the probability distribution over tokens (it would be logits rather than probabilities but that's not important). From a chat transcript you can only understand the argmax/most likely token of that distribution (and only if the API allows you to set the temperature to 0). It's not impossible for an API to give you that but they won't if they don't want you distilling their models.
The intuition is that distillation exploits not only the "right" answer but the relationship between answers (what's the second most right answer? the third? etc).
zozbot234
3 hours ago
Among other things, because you simply can't get those "massive amounts" of text from a SOTA model at reasonable cost. And complex reasoning cannot possibly be trained in a pure one-shot fashion, real post-training takes massive resources. The whole story doesn't add up.
saberience
3 hours ago
This is totally inaccurate, the APIs provide the reasoning logs. You ABSOLUTELY can distill from APIs, in fact, that's the primary way distillation is done currently.
zozbot234
3 hours ago
Not for proprietary models, all you get is a terse summary.
barrkel
3 hours ago
Do you think token completion endpoints are the final form for AI APIs?
slowmovintarget
an hour ago
That thesis is not about what Anthropic will achieve, but about what power they think they ought to have.
That's a different problem that what you're arguing against.
swalsh
3 hours ago
The distilled versions miss the spark of the model. Its like they land in the uncanny valley of models.
realusername
2 hours ago
They get to 80% of the top models for 10x cheaper, unless you don't care about the money at all, it's hard to ignore.