hackernews client

Are OpenAI and Anthropic losing money on inference?

pama

2 days ago

If you make careful calculations and estimate the theoretical margins for inference only of most of the big open models on openrouter, the margins are typically crazy high if the openrouter providers served at scale (north of 800% for most of the large models). The high cost probably reflects salaries, investments, and amortization of other expenses like free serving or occasional partial serving occupancy. Sometimes it is hard to keep uniform high load due to other preferences of users that dont get covered at any price, eg maximal context length (which is costing output performance), latency, and time for first token, but also things like privacy guarantees, or simply switching to the next best model quickly. I have always thought that centralized inference is the real goldmine of AI because you get so much value at scale for hardly any cost.

Aeolun

2 days ago

As much as I appreciate you saying the math is wrong, it doesn’t really help me adjust my expectations unless you provide correct numbers as well.

resonious

2 days ago

Right. Now I want to know if they're really losing money or not.

Den_VR

2 days ago

So, bottom line, do you think it’s probable that either OpenAI or Anthropic are “losing money on inference?”

Take that with a grain of salt, but thats a conversation from one of the big AI companies that is only a few weeks old. I suspect that it is pretty accurate that pricing is currently reasonable if you ignore training. But training is very expensive and the reason most AI companies are losing money right now.

diamond559

2 days ago

You lost me at "Sam Altman says".

2 days ago

That is for a very specific class of usecases. If they would turn up the sycophancy on the new model, those people would not call for the old onee.

The reasoning here is off. It is like saying new game development is nearly over as some people keep playing old games.

My feeling: we've yet barely scrarched the surface on the milage we can get out of even today's frontier models, but we are just at the beginning of a huge runway for improved models and architectures. Watch this space.

jazzyjackson

2 days ago

for their user base, sure

for their investors, however, they are promising a revolution

hnfsfdsd

2 days ago

If people want old models, they can go to any of the competitor's , deepseek, claud, opensources, etc... That's not good news for OpenAI.

vonneumannstan

2 days ago

I suspect we've already reached the point with models at the GPT5 tier where the average person will no longer recognize improvements and this model can be slightly improved at slow intervals and indeed run for years. Meanwhile research grade models will still need to be trained at massive cost to improve performance on relatively short time scales.

AJ007

2 days ago

Whenever someone has complained to me about issues they are having with ChatGPT on a particular question or type of question, the first thing I do is ask them what model they are using. So far, no one has ever known offhand what model they were using, nor were not aware there are more models!

If you understand there are multiple models from multiple providers, some of those models are better at certain things than others, and how you can get those models to complete your tasks, you are in the top 1% (probably less) of LLM users.

th0ma5

2 days ago

This would be helpful if there was some kind of first principle at which to gauge that better or worse comparison but there isn't outside of people's value judgements like what you're offering.

ewoodrich

2 days ago

I may not qualify as an "average user" but I shudder imagining being stuck using a 1+ yr stale model for development given my experiences using a newer framework than what was available during training.

Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I recently had to fight Gemini to accept that a library (a Google developed AI library for JS, somewhat ironically) had just released a major version update with a lot of API changes that invalidated 99% of the docs and example code online. And boy was there a lot of old code floating around thanks to the vast amounts of SEO blog spam for anything AI adjacent.

vonneumannstan

a day ago

>Passing in docs usually helps, but I've had some incredibly aggravating experiences where a model just absolutely cannot accept their "mental mode" is incorrect and that they need to forget the tens of thousands of lines of out of date example code they've ingested during training. IMO it's an under-discussed aspect of the current effectiveness of LLM development thanks to the training arms race.

I think you overestimate the amount of code turnover in 6-12 months...

black_knight

2 days ago

Strangely, I feel GPT-5 as the opposite of an improvement over the previous models, and consider just using Claude for actual work. Also the voice mode went from really useful to useless “Absolutely, I will keep it brief and give it to you directly. …some wrong annswer… And there you have it! As simple as that!”

vonneumannstan

2 days ago

>Strangely, I feel GPT-5 as the opposite of an improvement over the previous models

This is almost surely wrong but my point was about GPT5 level models in general not GPT5 specifically...

felipeerias

2 days ago

The "Pro" variant of GTP-5 is probably the best model around and most people are not even aware that it exists. One reason is that as models get more capable, they also get a lot more expensive to run so this "Pro" is only available at the $200/month pro plan.

At the same time, more capable models are also a lot more expensive to train.

The key point is that the relationship between all these magnitudes is not linear, so the economics of the whole thing start to look wobbly.

Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

vonneumannstan

a day ago

>Soon we will probably arrive at a point where these huge training runs must stop, because the performance improvement does not match the huge cost increase, and because the resulting model would be so expensive to run that the market for it would be too small.

I think we're a lot more likely to get to the limit of power and compute available for training a bigger model before we get to the point where improvement stops.

MontyCarloHall

2 days ago

As long as models continue on their current rapid improvement trajectory, retraining from scratch will be necessary to keep up with the competition. As you said, that's such a huge amount of continual CapEx that it's somewhat meaningless to consider AI companies' financial viability strictly in terms of inference costs, especially because more capable models will likely be much more expensive to train.

But at some point, model improvement will saturate (perhaps it already has). At that point, model architecture could be frozen, and the only purpose of additional training would be to bake new knowledge into existing models. It's unclear if this would require retraining the model from scratch, or simply fine-tuning existing pre-trained weights on a new training corpus. If the former, AI companies are dead in the water, barring a breakthrough in dramatically reducing training costs. If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.

mgh95

2 days ago

> If the latter, assuming the cost of fine-tuning is a fraction of the cost of training from scratch, the low cost of inference does indeed make a bullish case for these companies.

On the other hand, this may also turn into cost effective methods such as model distillation and spot training of large companies (similarly to Deepseek). This would erode the comparative advantage of Anthropic and OpenAI, and result in a pure value-add play for integration with data sources and features such as SSO.

It isn't clear to me that a slowing of retraining will result in advantages to incumbents if model quality cannot be readily distinguished by end-users.

echelon

2 days ago

> model distillation

I like to think this is the end of software moats. You can simply call a foundation model company's API enough times and distill their model.

It's like downloading a car.

Distribution still matters, of course.

christina97

2 days ago

In the same way that every other startup tries to sweep R&D costs under the rug and say “yeah but the marginal unit economics have 50% gross margins, we’ll be a great business soon”.

utyop22

2 days ago

lol.

The challenge I do see is that fully AI generated code bases devolve into slop pretty fast. The productivity cutoffs are much lower compared to human engineers.

next_xibalba

2 days ago

> whether or not you consider model training costs as part of the calculation

Whether they flow through COGS/COR or elsewhere on the income statement, they've gotta be recognized. In which case, either you have low gross margins or low operating profit (low net income??). Right?

That said, I just can't conceive of a way that training costs are not hitting gross margins. Be it IFRS/GAAP etc., training is 1) directly attributable to the production of the service sold, 2) is not SG&A, financing, or abnormal cost, and thus 3) only makes sense to match to revenue.

2 days ago

It's already questionable if anyone can make it profitable once you account for all the costs. Why do you think they try to squash the legal concerns so hard? If they move fast and stick their fingers in their ears, they can just steal whatever the want.

trilogic

2 days ago

I have to disagree. The biggest cost is still energy consumption, water and maintenance. Not to mention, to keep up with the rivals in incredibly high tempo (so offering billions like Meta recently). Then the cost of hardware that is equal to Nvidia skyrocketing shares :) No one should dare to talk about profit yet. Now is time to grab the market, invest a lot and work hard, hopping for a future profit. The equation is still work on progress.

jsnell

2 days ago

The capital costs for the GPU are an order of magnitude larger than the energy consumption. It doesn't matter whether the GPUs are used for training or inference.

Back of the envelope: $25k GPU amortized over 5 years is $5k/year. A 500W GPU run at full power uses 4.5MWh; at $0.15/kWh the electricity costs $650/year.

The other operating costs you suggest have to be even smaller.

DoesntMatter22

2 days ago

Is that not baked into the h100 rental costs?

tptacek

2 days ago

It is.

wtallis

2 days ago

> The biggest cost is still energy consumption, water and maintenance.

Are you saying that the operating costs for inference exceed the costs of training?

trilogic

2 days ago

The global cost of inference in both Openai and Anthropic it exceed training cost for sure. The reason is simple: the inference cost grows with requests not with datasets. My math simplified by AI says: Suppose training GPT-like model costs

= $ 10,000,000 C T

=$10,000,000.

Each query costs

= $ 0.002 C I

=$0.002.

Break-even:

> 10,000,000 0.002 = 5,000,000,000

inferences N> 0.002 10,000,000

=5,000,000,000inferences

So after 5 billion queries, inference costs surpass the training cost.

Openai claims it has 100 million users x queries = I let you judge.

umpalumpaaa

2 days ago

No. But training an LLM is certainly very very expensive and a gamble every time you do it. I think of it a bit like a pharmaceutical company doing vaccine research…

simonw

2 days ago

https://www.axios.com/2025/08/15/sam-altman-gpt5-launch-chat... quotes Sam Altman saying:

> Most of what we're building out at this point is the inference [...] We're profitable on inference. If we didn't pay for training, we'd be a very profitable company.

dcre

2 days ago

ICYMI, Amodei said the same in much greater detail:

"If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.

What's going on is that at the same time as you're reaping the benefits from one company, you're founding another company that's much more expensive and requires much more upfront R&D investment. And so the way that it's going to shake out is this will keep going up until the numbers go very large and the models can't get larger, and then it'll be a large, very profitable business, or, at some point, the models will stop getting better, right? The march to AGI will be halted for some reason, and then perhaps it'll be some overhang. So, there'll be a one-time, 'Oh man, we spent a lot of money and we didn't get anything for it.' And then the business returns to whatever scale it was at."

https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...

meshugaas

2 days ago

The "model as company" metaphor makes no sense. It should actually be models are products, like a shoe. Nike spends money developing a shoe, then building it, then they sell it, and ideally those R&D costs are made up in shoe sales. But you still have to run the whole company outside of that.

Also, in Nike's case, as they grow they get better at making more shoes for cheaper. LLM model providers tell us that every new model (shoe) costs multiples more than the last one to develop. If they make 2x revenue on training, like he's said, to be profitable they have to either double prices or double users every year, or stop making new models.

renjimen

2 days ago

But new models to date have cost more than the previous ones to create, often by an order of magnitude, so the shoe metaphor falls apart.

A better metaphor would be oil and gas production, where existing oil and gas fields are either already finished (i.e. model is no longer SOTA -- no longer making a return on investment) or currently producing (SOTA inference -- making a return on investment). The key similarity with AI is new oil and gas fields are increasingly expensive to bring online because they are harder to make economical than the first ones we stumbled across bubbling up in the desert, and that's even with technological innovation. That is to say, the low hanging fruit is long gone.

2 days ago

2 days ago

They picked the worst possible time to make the change if money wasn’t involved (which is why I assumed GPT-5 must be massively cheaper to run). The backlash from being forced to use it cost a fair bit of the model’s reputation.

steveklabnik

2 days ago

Yeah it didnt work out for them, for sure.

2 days ago

Good point, very possible that Altman is excluding free tier as a marketing cost even if it loses more than they make on paid customers. On the other hand they may be able to cut free tier costs a lot by having the model router send queries to gpt-5-mini where before they were going to 4o.

jacurtis

2 days ago

This is very true. ChatGPT has a very generous free tier. I used to pay for it, but realized I was never really hitting the limits of what is needed to pay for it.

However, at the same time, I was using Claude much less, really preferring the answers from it most of the time, and constantly being hit with limits. So guess what I did. I cancelled my OpenAI subscription and moved to Anthropic. Not only do i get Claude Code, which OpenAI really has no serious competitor for.

I still use both models but never run into problems with OpenAI, so i see no reason to pay for it.

DenisM

2 days ago

2 days ago

>> If we didn't pay for training, we'd be a very profitable company.

> ICYMI, Amodei said the same

No. He says that even paying for training a model is profitable. It makes more revenue that it costs - all things considered. A much stronger claim.

dcre

2 days ago

I take them to be saying the same thing — the difference is that Altman is referring to the training of the next model happening now, while Amodei is referring to the training months ago of the model you're currently earning money back on through inference.

kgwgk

2 days ago

Maybe he means that but the quote says “We're profitable on inference.” - not “We're profitable on inference including training of that model.”

selimthegrim

2 days ago

This sounds like fabs.

pera

2 days ago

Copy laundering as a service is only profitable when you discount future settlements:

https://www.reuters.com/legal/government/anthropics-surprise...

whatshisface

2 days ago

I don't see why the declining marginal returns can't be continuous.

DenisM

2 days ago

Fantastic perspective.

Basically each new company puts competitive pressure on the previous company, and together they compress margins.

They are racing themselves to the bottom. I imagine they know this and bet on AGI primacy.

oblio

2 days ago

dcre

2 days ago

I think at that point there is strong financial pressure to figure out how to continuously evolve models instead of changing new ones, for example by building models out of smaller modules that can be trained individually and swapped out. Jeff Dean and Noam Shazeer talked about that a bit in their interview with Dwarkesh: https://www.dwarkesh.com/p/jeff-dean-and-noam-shazeer

DenisM

2 days ago

There’s still untapped value in deeper integrations. They might hit a jackpot of exponentially increasing value from network effects caused by tight integration with e.g. disjoint business processes.

We know that businesses with tight network effects can grow to about 2 trillion in valuation.

oblio

2 days ago

How would that look with at least 3 US companies, probably 2 Chinese ones and at least 1 European company developing state of the art LLMs?

DenisM

a day ago

Network effects usually destroy or marginalized competition until they themselves start stagnating decaying. Sometimes they produce partially-overlapping duopolies, but maintain their monopoly-like power.

Facebook marginalized linkedin and sent twitter into a niche.

Internet Explorer and Windows destroyed competition, for a long while.

Google Search marginalized everyone for over 20 years.

These are multi-trillion-dollar businesses. If OpenAI creates a network effect of some sort they can join the league.

drob518

2 days ago

Like a very over-served market, I think. I see perhaps three survivors long term, or at most one gorilla, two chimps, and perhaps a few very small niche-focused monkeys.

prasadjoglekar

2 days ago

2 days ago

Zitron is not a serious analyst.

https://bsky.app/profile/davidcrespo.bsky.social/post/3lxale...

https://bsky.app/profile/davidcrespo.bsky.social/post/3lo22k...

https://bsky.app/profile/davidcrespo.bsky.social/post/3lwhhz...

https://bsky.app/profile/davidcrespo.bsky.social/post/3lv2dx...

jcranmer

2 days ago

Since only the first one responds to any of Zitron's content that I've actually read, I'll respond only to that one:

It's not responsive at all to Zitron's point. Zitron's broader contention is that AI tools are not profitable because the cost of AI use is too high for users to justify spending money on the output, given the quality of output. And furthermore, he argues that this basic fact is being obscured by lots of shell games around numbers to hide the basic cash flow issue. For example, focusing on cost in terms of cost per token rather than cost per task. And finally, there's an implicit assumption that the AI just isn't getting tremendously better, as might be exemplified by... burning twice as money tokens on the task in the hopes the quality goes up.

And in that context, the response is "Aha, he admits that there is a knob to trade off cost and quality! Entire argument debunked!" The existence of a cost-quality tradeoff doesn't speak to whether or not that line will intersect the quality-value tradeoff. I grant that a lot turns on how good you think AI is and/or will shortly be, and Zitron is definitely a pessimist there.

dcre

2 days ago

Already in your first point you are mixing up two claims Ed also likes to mix up. The funny thing is these claims are in direct conflict with each other. There is the question of whether people find AI worth paying for given what they get. You seem to think this is in some doubt, meanwhile here are tons of people paying for it, some even begging to be allowed to pay more in order to get more. The labs have revenue growing 20% per month. So I think that version of the point is absurd on its face. (And that's exactly why my thing about the cost-quality tradeoff being real is relevant. At least we agree on the relationship between these points.)

Ed doesn’t really make that argument anymore. The more recent form of the point is: yes, clearly people are willing to pay for it, but only because the providers are burning VC money to sell it below cost. If sold at a profit, customers would no longer find it worth it. But that’s completely different from what you’re saying. And I also think that’s not true, for a few reasons: mostly that selling near cost is the simplest explanation for the similarity of prices between providers. And now recently we have both Altman and Amodei saying their companies are selling inference at a profit.

jrflowers

2 days ago

Ed Zitron: I don’t think OpenAI will become profitable

The link you posted: I think it is very plausible that it will be hard for OpenAI to become profitable

dcre

2 days ago

Are you referring to the post where I listed 4 claims and marked one ridiculous, one wrong, one unlikely, and one plausible?

He is not wrong about everything. For example, after Sam Altman said in January that OpenAI would introduce a model picker, Zitron was able to predict in March that OpenAI would introduce a model picker. And he was right about that.

2 days ago

Exactly Zittrons point.

milesskorpen

2 days ago

While this could be true, I don't think OpenAI is investing the $hundreds of millions-to-billions that would be required otherwise make it actually true.

OpenAI's fund is ~$250-300mm Nvidia reportedly invested $1b last year - still way less than Open AI revenue

noodlescb

2 days ago

Except these tech billionaires lie the most of the time. This is still the "grow at any cost" phase, so I don't even genuinely believe he has a confident understanding of how or at what point anything will be profitable. This just strikes me as the best answer he has at the moment.

827a

2 days ago

From the latest NYT Hard Fork podcast [1]. The hosts were invited to a dinner hosted by Sam, where Sam said "we're profitable if we remove training from the equation", they report he turned to Lightcap (COO) and asked "right?" and Lightcap gave an "eeekk we're close".

They aren't yet profitable even just on inference, and its possible Sam didn't know that until very recently.

[1] https://www.nytimes.com/2025/08/22/podcasts/is-this-an-ai-bu...

twoodfin

2 days ago

“We’re not profitable even if we discount training costs.”

and

“Inference revenue significantly exceeds inference costs.”

are not incompatible statements.

So maybe only the first part of Sam’s comment was correct.

827a

2 days ago

I should have provided a direct quote:

> At first, he answered, no, we would be profitable, if not for training new models. Essentially, if you take away all the stuff, all the money we’re spending on building new models and just look at the cost of serving the existing models, we are profitable on that basis. And then he looked at Brad Lightcap, who is the COO. And he sort of said, right? And Brad kind of squirmed in his seat a little bit and was like, well — He’s like, we’re pretty close.

I don't think you can square that with what he stated in the Axios article:

> "We're profitable on inference. If we didn't pay for training, we'd be a very profitable company."

Except, if the NYT dinner happened after the Axios article interview, which is possible given when each was published, and he was actually literally unaware of the company's financials.

Personally: it feels like it should reflect very poorly on OpenAI that their CEO has been, charitably, entirely unaware how close they are to profitability (and uncharitably, that he actively lies about it). But I'm not sure if the broader news cycle caught it; the only place I've heard this mentioned is literally this NYT Hard Fork podcast which is hosted by the people who were at the dinner.

NoahZuniga

2 days ago

I imagine that one of the largest costs for openai is the wages they pay.

ugh123

2 days ago

That might be the case, but inference times have only gone up since GPT-3 (GPT-5 is regularly 20+ seconds for me).

asabla

2 days ago

And by GPT-5 you mean through their API? Directly through Azure OpenAI services? or are you talking about ChatGPT set to using GPT-5.

All of these alternatives means different things when you say it takes +20 seconds for a full response.

ugh123

2 days ago

Sure, apologies. I mean ChatGPT UI

noodletheworld

2 days ago

Huh.

I feel oddly skeptical about this article; I can't specifically argue the numbers, since I have no idea, but... there are some decent open source models; they're not state of the art, but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices?

The only cheap-ass providers I've seen only run tiny models. Where's my cheap deepseek-R1?

Surely if its this cheap, and we're talking massive margins according to this, I should be able to get a cheap / run my own 600B param model.

Am I missing something?

It seems that reality (ie. the absence of people actually doing things this cheap) is the biggest critic of this set of calculations.

dragonwriter

2 days ago

> but if inference is this cheap then why aren't there multiple API providers offering models at dirt cheap prices

There are multiple API providers offering models at dirt cheap prices, enough so that there is at least one well-known API provider that is an aggreggator of other API providers that offers lots of models at $0.

2 days ago

> why aren't there multiple API providers offering models at dirt cheap prices?

There are. Basically every provider's R1 prices are cheaper than estimated by this article.

https://artificialanalysis.ai/models/deepseek-r1/providers

ac29

2 days ago

The cheapest provider in your link charges 460x more for input tokens than the article estimates.

dragonwriter

2 days ago

> The cheapest provider in your link charges 460x more for input tokens than the article estimates.

Since I often consume 20M token per day, one can assume many would use far more than the 1M tokens assumed in the article's calculations.

empath75

2 days ago

2 days ago

I would not be surprised if the operating costs are modest

But these companies also have very expensive R&D development and large upfront costs.

jedberg

2 days ago

https://lambda.chat

Deepseek R1 for free.

hdgvhicv

2 days ago

> I'm here to provide helpful, respectful, and appropriate content for all users. If you have any other requests or need assistance with a different type of story or topic, feel free to ask!

mcpeepants

2 days ago

* distilled R1 for free

qrios

2 days ago

For sure an interesting calculation. Only one remark from someone with GPU metal experience:

> But compute becomes the bottleneck in certain scenarios. With long context sequences, attention computation scales quadratically with sequence length.

Even if the statement about quadratically scales is right, the bottleneck we are talking about is somewhere north by factor 1000. If 10k cores do only simple matrix operations each needs to have new data (up to 64k) available every 500 cycles (let's say). Getting these amount of data (without _any_ collision) means something like 100+GByte/s per core. Even 2+TByte/s on HBM means the bottleneck is the memory transfer rate, by something like 500 times. With collision, we talk about an additional factor like 5000 (last time I've done some tests with a 4090).

Onavo

2 days ago

What do you mean by collision?

qrios

2 days ago

2 days ago

This whole article is built off using DeepSeek R1, which is a huge premise that I don't think is correct. DeepSeek is much more efficient and I don't think it's a valid way to estimate what OpenAI and Anthropic's costs are.

https://www.wheresyoured.at/deep-impact/

Basically, DeepSeek is _very_ efficient at inference, and that was the whole reason why it shook the industry when it was released.

boroboro4

2 days ago

DeepSeek inference efficiency comes from two things: MoE and MLA attention. OpenAI was rumored to use MoE around GPT4 moment, I.e loooong time ago.

Given Gemini efficiency with long context I would bet their attention is very efficient too.

2 days ago

Uhhh, I'm pretty sure DeepSeek shook the industry because of a 14x reduction in training cost, not inference cost.

We also don't know the per-token cost for OpenAI and Anthropic models, but I would be highly surprised if it was significantly more expensive than open models anyone can use and run themselves. It's not like they're also not investing in inference research.

2 days ago

Not just energy cost, but also licensing cost of all this content…

andai

2 days ago

Isn't training cost a function of inference cost? From what I gathered, they reduced both.

I remember seeing lots of videos at the time explaining the details, but basically it came down to the kind of hardware-aware programming that used to be very common. (Although they took it to the next level by using undocumented behavior to their advantage.)

booi

2 days ago

They're typically somewhat related but the difference between training and inference can vary greatly so, i guess the answer is no.

they did reduce both though and mostly due to reduced precision

baxtr

2 days ago

Because of the alleged reduction in training costs.

basilgohar

2 days ago

All reports by companies are alleged until verified by other, more trustworthy sources. I don't think it's especially notable that it's alleged because it's DeepSeek vs. the alleged numbers from other companies.

dcre

2 days ago

What are we meant to take away from the 8000 word Zitron post?

In any case, here is what Anthropic CEO Dario Amodei said about DeepSeek:

"DeepSeek produced a model close to the performance of US models 7-10 months older, for a good deal less cost (but not anywhere near the ratios people have suggested)"

"DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese."

https://www.darioamodei.com/post/on-deepseek-and-export-cont...

We certainly don't have to take his word for it, but the claim is that DeepSeek's models are not much more efficient to train or inference than closed models of comparable quality. Furthermore, both Amodei and Sam Altman have recently claimed that inference is profitable:

Amodei: "If you consider each model to be a company, the model that was trained in 2023 was profitable. You paid $100 million, and then it made $200 million of revenue. There's some cost to inference with the model, but let's just assume, in this cartoonish cartoon example, that even if you add those two up, you're kind of in a good state. So, if every model was a company, the model, in this example, is actually profitable.

https://cheekypint.substack.com/p/a-cheeky-pint-with-anthrop...

Altman: "If we didn’t pay for training, we’d be a very profitable company."

https://www.theverge.com/command-line-newsletter/759897/sam-...

overgard

2 days ago

In terms of sources, I would trust Zitron a lot more than Altman or Amodei. To be charitable, those CEOs are known for their hyperbole and for saying whatever is convenient in the moment, but they certainly aren't that careful about being precise or leaving out inconvenient details. Which is what a CEO should do, more or less, but, I wouldn't trust their word on most things.

dcre

2 days ago

I agree we should not take CEOs at their word, we have to think about whether what they're saying is more likely to be true than false given other things we know. But to trust Zitron on anything is ridiculous. He is not a source at all: he knows very little, does zero new reporting, and frequently contradicts himself in his frenzy to believe the bubble is about to pop any time now. A simple example: claiming both that "AI is very little of big tech revenue" and "Big tech has no other way to show growth other than AI hype". Both are very nearly direct quotes.

jcranmer

2 days ago

Those two statements are not contradictory, and thinking that they are belies a pretty fundamental misunderstanding of his basic thesis.

The first statement is one about the present value of AI. The second statement is about their belief of the future value of AI.

dcre

2 days ago

It is not about the present and future value of AI at all. It is about the present and future value of things other than AI. Here is the full quote:

"There is nothing else after generative AI. There are no other hypergrowth markets left in tech. SaaS companies are out of things to upsell. Google, Microsoft, Amazon and Meta do not have any other ways to continue showing growth, and when the market works that out, there will be hell to pay, hell that will reverberate through the valuations of, at the very least, every public software company, and many of the hardware ones too."

I am not doing some kind of sophisticated act of interpretation here. If AI is very little of big tech revenue, and big tech are posting massive record revenue and profits every quarter, then it cannot be the case that "there is nothing left after generative AI" and they “do not have any other ways to continue showing growth” — what is left is whatever is driving all that revenue and profit growth right now!

gmerc

2 days ago

2 days ago

Specifically, a connected CEO in post-law America.

This sort of thing used to be called fraud, but there's zero chance of criminal prosecution.

CjHuber

2 days ago

Criminal persecution? This scheme has been perfected, like what do you want to persecute. Can you say with certainty that he means it's profitable overall? What if he means it's profitable right now today it is profitable, but not yesterday or in the last week. or what if he meant if you take the mean user its profitable? so much room for interpretation, that's why there is no risk for them

re-thc

2 days ago

> That doesn't seem compatible with what he stated more recently:

Profitable on inference doesn't mean they aren't losing money on pro plans. What's not compatible?

The API requests are likely making more money.

gitremote

2 days ago

Yes, API pricing is usage based, but ChatGPT Pro pricing is a flat rate for a time period.

The question is then whether SaaS companies paying for GPT API pricing are profitable if they charge their users a flat rate for a time period. If their users trigger inference too much, they would also lose money.

WesolyKubeczek

2 days ago

This can be true if you assume that there exists a high number of $20 subscribers who don't use the product that much, but $200 subscribers squeeze every last bit and then some more. The balance could be still positive, but if you look at the power users alone, they might cost more than they pay.

bee_rider

2 days ago

They might even have decided “hey, these power users are willing to try and tells us what LLMs are useful for, and are even willing to pay us for the opportunity!”

metalliqaz

2 days ago

> If we didn't pay for training

it is comical that something like this was even uttered in the conversation. It really shows how disconnected the tech sector is from the real world.

Imagine Intel CEO saying "If we didn't have to pay for fabs, we'd be a very profitable company." Even in passing. He'd be ridiculed.

Closi

2 days ago

I'm not entirely sure the analogy is fair - Amazon for example was 'ridiculed' for being hugely unprofitable for the first decade, but had underlying profitability if you removed capex.

As a counterpoint, if OpenAI were actually profitable at this early stage that could be a bad financial decision - it might mean that they aren't investing enough in what is an incredibly fierce and capital-intensive market.

hirako2000

2 days ago

Also admitting it would make this business impossible if they had to respect copyright law, so the laws shall be adjusted so that it can be a business.

mvieira38

2 days ago

Doesn't he have an incentive to make it look like that, though? The way he phrased it, that they are losing money because people use it so much, makes it seem like Pro subscribers are some super power-users. As long as inference has a nonnegative, nonzero cost, then this case will lose money, so Sam isn't admitting that the business model is flawed or anything

davedx

2 days ago

Still, very glad you pointed that out, thanks for sharing that information ^^

simianwords

a day ago

Again incorrect. He doesn’t have a super high salary.

KingMob

2 days ago

> He has to make money at some point.

Yes, but two paths to doing that are to a) build a profitable company, and b) accumulate personal wealth and walk away from a non-profitable company.

I'm not saying OpenAI is unprofitable, but nor do I see Altman as the sort who'd rule out option b.

hirako2000

2 days ago

Trusting the man about costs would be even more misplaced than trusting an oil company's CEO about the environment.

skybrian

2 days ago

That's interesting but it doesn't mean they're losing money on the $20/month users. The Pro plan selects for heavy-usage enthusiasts.

johnsmith1840

2 days ago

Losing money on o1-pro. That makes sense and also why they axed that entire class of models.

Every o1-pro and o1-preview inference was a normal inference times how many replica paths they made.

martinald

2 days ago

Apologies, should be Plus. I'll update the article later.

ankit219

2 days ago

This seems very very far off. From the latest reports, anthropic has a gross margin of 60%. It came out in their latest fundraising story. From that one The Information report, it estimated OpenAI's GM to be 50% including free users. These are gross margins so any amortization or model training cost would likely come after this.

Then, today almost every lab uses methods like speculative decoding and caching which reduce the cost and speed up things significantly.

The input numbers are far off. The assumption is 37B of active parameters. Sonnet 4 is supposedly a 100B-200B param model. Opus is about 2T params. Both of them (even if we assume MoE) wont have exactly these number of output params. Then there is a cost to hosting and activating params at inference time. (the article kind of assumes it would be the same constant 37B params).

2 days ago

Are you saying that you think Sonnet 4 has 100B-200B _active_ params? And that Opus has 2T active? What data are you basing these outlandish assumptions on?

ankit219

2 days ago

2 days ago

From https://www.theverge.com/command-line-newsletter/759897/sam-..., Sam Altman said:

> “If we didn’t pay for training, we’d be a very profitable company.”

Aurornis

2 days ago

Exactly. All of the claims that OpenAI is losing money on every request are wrong. OpenAI hasn’t even unlocked all of their possible revenue opportunities from the free tier such as ads (like Google search), affiliate links, and other services.

There’s also a lot of comments in this thread who want LLM companies to fail for different reasons, so they’re projecting that wish on to imagined unit economics.

2 days ago

If they're profitable, why on earth are they seeking crazy amounts of investment month after month? It seems like they'll raise 10 billion one month, and then immediately turn around and raise another 10 billion a month or two after that. If it's for training, it seems like a waste of money since GPT-5 doesn't seem like it's that much of an improvement.

eikenberry

2 days ago

So inference is cheap but training is expensive and getting more expensive. It seems like if they can't get training expenses down, cheap inference won't matter.

NoahZuniga

2 days ago

No. Training itself isn't that expensive compared to inference. The real expense is salary for talent.

tsunamifury

2 days ago

As someone who has been taking the largest part of Google and facebooks ad wallet share away, Let me tell you something.

Advertising is now a very very locked in market and will take over a decade to shift even a significant minority it into OpenAIs hands. This is not likely the first or even second monetization strategy imo.

But I’m happy to be wrong.

otterley

2 days ago

2 days ago

Because Sam Altman said so?

2 days ago

I personally felt that the maths calculations was a bit redundant, since after the maths part the same numbers are taken from open router pricing. But I think it is a matter of presentation.

I would have shown OR pricing first and then did the math. In that way it would have been as insightful, since it still showed that model providers do also make money and the reader would not have felt that he did maths he could have avoided :)

So, thanks to the author! Good job.

The feedback fro hn is overly harsh. Idk; It makes me sad how mean people have become. I guess the world is not in a great spot, but taking out anger on strangers will only make it worse.

jonathan-adly

2 days ago

Basically- the same math as modern automated manufacturing. Super expensive and complex build-out - then a money printer once running and optimized.

I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.

But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.

Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.

2 days ago

the difference is you can train on outputs deepseek style, there are not gates in this field profit margins will go to 0

smcleod

2 days ago

A few things:

1. Your token count per day seems quite low ("2M input tokens, ~30k output tokens/day") - that's FAR less than I'd expect,, for comparison I average 330M - 850M combined tokens per day, I'm on the higher side of my peers that average 150M-600M combined tokens per day.

2. It doesn't seem you're taking prompt caching into account. This generally reduces the inference required for agentic coding by 85-95%.

3. It would be good if you added what quantisation you're running, for example 8.5-9bpw / (Q8 equivalent) (indistinguishable from fp32/bf16) for the model, and for the KV cache (Q8/(b)f16 etc..).

moduspol

2 days ago

This kind of presumes you're just cranking out inference non-stop 24/7 to get the estimated price, right? Or am I misreading this?

In reality, presumably they have to support fast inference even during peak usage times, but then the hardware is still sitting around off of peak times. I guess they can power them off, but that's a significant difference from paying $2/hr for an all-in IaaS provider.

I'm also not sure we should expect their costs to just be "in-line with, or cheaper than" what various hourly H100 providers charge. Those providers presumably don't have to run entire datacenters filled to the gills with these specialized GPUs. It may be a lot more expensive to do that than to run a handful of them spread among the same datacenter with your other workloads.

martinald

2 days ago

Yes. But these are on demand prices, so you could just turn them off when loads are less.

But there is no way that OpenAI should be more expensive than this. The main cost is the capex of the H100s, and if you are buying 100k at a time you should be getting a significant discount off list price.

lolc

2 days ago

If things like cost of maintaining data centres or electricity or bandwidth push them into the red, then yes, they are losing money on inference.

If the things that make them lose money is new R&D then that's different. You could split them up into a profitable inference company and a loss making startup. Except the startup isn't purely financed by VC etc, but also by a profitable inference company.

toddmorey

2 days ago

Yes that's right. The inference costs in isolation are interesting because that speaks to the unit economics of this business: R&D / model training aside, can the service itself be scaled to operate at a profit? Because that's the only hope of all the R&D eventually paying dividends.

One thing that makes me suspect inference costs are coming down is how chatty the models have become lately, often appending encouragement to a checklist like "You can check off each item as you complete them!" Maybe I'm wrong, but I feel if inference was killing them, the responses would become more terse rather than more verbose.

jsnell

2 days ago

For the top few providers, the training is getting amortized over absurd amount of inference. E.g. Google recently mentioned that they processed 980T tokens over all surfaces in June 2025.

The leaked OpenAI financial projections for 2024 showed about equal amount of money spent on training and inference.

Amortizing the training per-query really doesn't meaningfully change the unit economics.

> Fact remains when all costs are considered these companies are losing money and so long as the lifespan of a model is limited it’s going to stay ugly. Using that apartment building analogy it’s like having to knock down and rebuild the building every 6 months to stay relevant. That’s simply not a viable business model.

To the extent they're losing money, it's because they're giving free service with no monetizaton to a billion users. But since the unit costs are so low, monetizing those free users with ads will be very lucrative the moment they decide to do so.

overgard

2 days ago

Assuming users accept those ads. Like, would they make it clear with a "sponsored section", or would they just try to worm it into the output? I could see a lot of potential ways that users reject the ad service, especially if it's seen to compromise the utility or correctness of the output.

jsnell

2 days ago

Billions of people use Google, YouTube, Facebook, Tiktok, Instagram, etc and accept the ads. Getting similar ad rates would make OpenAI fabulously profitable. They have no need to start with ad formats that might be rejected by users. Even if that were the intended endgame, you'd want to boil the frog for years.

martinald

2 days ago

(Author here). Yes I am aware of that and did mention it. However - what I wanted to push back in this article was that claude code was completely unsustainable and therefore a flash in the pan and devs aren't at risk (I know you are not saying this).

The models as is are still hugely useful, even if no further training was done.

Aurornis

2 days ago

> The models as is are still hugely useful, even if no further training was done.

Exactly. The parent comment has an incorrect understanding of what unit economics means.

The cost of training is not a factor in the marginal cost of each inference or each new customer.

It’s unfortunate this comment thread is the highest upvoted right now when it’s based on a basic misunderstanding of unit economics.

esafak

2 days ago

The marginal cost is not the salient factor when the model has to be frequently retrained at great cost. Even if the marginal cost was driven to zero, would they profit?

wongarsu

2 days ago

But they don't have to be retained frequently at great cost. Right now they are retrained frequently because everyone is frequently coming out with new models and nobody wants to fall behind. But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either. The models are powerful as they are, most of the knowledge in them isn't going to rapidly obsolete, and where that is a concern you can paper over it with RAG or MCP servers. If everyone runs out of money for R&D at the same time we could easily cut back to a situation where we get an updated version of the same model every 3 years instead of a bigger/better model twice a year.

And whether companies can survive in that scenario depends almost entirely on their unit economics of inference, ignoring current R&D costs

churchill

2 days ago

Like we've seen with Karparthy & Murati starting their own labs, it's to be expected that over the next 5 years, hundreds of engineers & researchers at the bleeding edge will quit and start competing products. They'll reliably raise $1b to $5b in weeks, too. And it's logical: for an investor, a startup founded by a Tier 1 researcher will more reliably 10-100x your capital, vs. Anthropic & OpenAI that are already at >$250b+.

This talent diffusion guarantees that OpenAI and Anthropic will have to keep sinking in ever more money to stay at the bleeding edge, or upstarts like DeepSeek and incumbents like Meta will simply outspend you/hire away all the Tier 1 talent to upstage you.

The only companies that'll reliably print money off AI are TSMC and NVIDIA because they'll get paid either way. They're selling shovels and even if the gold rush ends up being a bust, they'll still do very well.

JSR_FDED

2 days ago

True. But at some point the fact that there are many many players in the market will start to diminish the valuation of each of those players, don’t you think? I wonder what that point would be.

re-thc

2 days ago

> But if investment for AI were to dry up everyone would stop throwing so much money at R&D, and if everyone else isn't investing in new models you don't have to either

IF.

If you do stagnate for years someone will eventually decide to invest and beat you. Intel has proven so.

simianwords

2 days ago

The cost of “manufacturing” an AI response is the inference cost, which this article covers.

> That would be like saying the unit economics of selling software is good because the only cost is some bandwidth and credit card processing fees. You need to include the cost of making the software

Unit economics is about the incremental value and costs of each additional customer.

You do not amortize the cost of software into the unit economics calculations. You only include the incremental costs of additional customers.

> just like you need to include the cost of making the models.

The cost of making the models is important overall, but it’s not included in the unit economics or when calculating the cost of inference.

voxic11

2 days ago

That isn't what unit economics is. The purpose of unit economics is to answer: "How much money do I make (or lose) if I add one more customer or transaction?". Since adding an additional user/transaction doesn't increase the cost of training the models you would not include the cost of training the models in a unit economics analysis. The entire point of unit economics is that it excludes such "fixed costs".

barrkel

2 days ago

There is no marginal cost for training, just like there's no marginal cost for software. This is why you don't generally use unit economics for analyzing software company breakeven.

cj

2 days ago

The only reason unit economics aren't generally used for software companies is the profit margin is typically 80%+. The cost of posting a Tweet on Twitter/X is close to $0.

Compare the cost of tweeting to the cost of submitting a question to ChatGPT. The fact that ChatGPT rate limits (and now sells additional credits to keep using it after you hit the limit) indicates there are serious unit economic considerations.

We can't think of OpenAI/Anthropic as software businesses. At least from a financial perspective, it's more similar to a company selling compute (e.g. AWS) than a company selling software (e.g. Twitter/X).

ascorbic

2 days ago

You can amortise the training cost across billions of inference requests though. It's the marginal cost for inference that's most interesting here.

So at least for OpenAI, the answer is “no”

They did say it was close

And that’s if you exclude training costs which is kind of absurd because it’s not like you can stop training

topaz0

2 days ago

Worth noting that the post only claims they should be profitable for the inference of their paying customers on a guesstimated typical workload. Free users and users with atypical usage patterns will obviously skew the whole picture. So the argument in the post is at least compatible with them still losing money on inference overall.

nixgeek

2 days ago

Excluding training two of their biggest costs will be payroll and inferencing for all the free users.

It’s therefore interesting that they claimed it was close: this supports the theory inferencing from paid users is a (big) money maker if it’s close to covering all the free usage and their payroll costs?

JimDabell

2 days ago

There’s no mention of that in this article about it:

https://archive.is/wZslL

They quote him as saying inference is profitable and end it at that.

Are you saying that the COO corrected him at the dinner, or on the podcast? Which podcast was it?

Barbing

2 days ago

From a journalist at the dinner:

“I think that tends to end poorly because as demand for your service grows, you lose more and more money. Sam Altman actually addressed this at dinner. He was asked basically, are you guys losing money every time someone uses ChatGPT?

And it was funny. At first, he answered, no, we would be profitable if not for training new models. Essentially, if you take away all the stuff, all the money we're spending on building new models and just look at the cost of serving the existing models, we are sort of profitable on that basis.

And then he looked at Brad Lightcap, who is the COO, and he sort of said, right? And Brad kind of like squirmed in his seat a little bit and was like, well, we're pretty close.

We're pretty close. We're pretty close.

So to me, that suggests that there is still some, maybe small negative unit economics on the usage of ChatGPT. Now, I don't know whether that's true for other AI companies, but I think at some point, you do have to fix that because as we've seen for companies like Uber, like MoviePass, like all these other sort of classic examples of companies that were artificially subsidizing the cost of the thing that they were providing to consumers, that is not a recipe for long-term success.”

From Hard Fork: Is This an A.I. Bubble? + Meta’s Missing Morals + TikTok Shock Slop, Aug 22, 2025

est31

2 days ago

GPT-5 was I suppose their attempt to make a product that provides as good metrics as their earlier products.

Uber doesn't really compare, as they had existing competition from taxi companies that they first had to/have to destroy. And cars or fuel didn't get 10x cheaper over the time of Uber's existence, but I'm sure that they still can optimize a lot for efficiency.

I'm more worried about OpenAIs capability to build a good moat. Right now it seems that each success is replicated by the competing companies quickly. Each month there is a new leader in the benchmarks. Maybe the moat will be the data in the end, i.e. there is barriers nowadays to crawl many websites that have lots of text. Meanwhile they might make agreements with the established AI players, maybe some of those agreements will be exclusive. Not just for training but also for updating wrt world news.

JimDabell

2 days ago

Thanks!

FarMcKon

2 days ago

"This article is like saying an apartment complex isn’t “losing money” because the monthly rents cover operating costs but ignoring the cost of the building. Most real estate developments go bust because the developers can’t pay the mortgage payment, not because they’re negative on operating costs."

Exactly the analogy I was going to make. :)

rprend

2 days ago

It’s funny you mention apartments, because that is exactly the comparison i thought of, but with the opposite conclusion. If you buy an apartment with debt, but get positive cash flow from rent, you wouldn’t call that unprofitable or a bad investment. It takes X years to recoup the initial debt, and as long as X is achievable that’s a good deal.

Hoping for something net profitable including fixed costs from day 1 is a nice fantasy, but that’s not how any business works or even how consumers think about debt. Restaurants get SBA financing. Homeowners are “net losing money” for 30 years if you include their debt, but they rightly understand that you need to pay a large fixed cost to get positive cash flow.

R&D is conceptually very similar. Customer acquisition also behaves that way

JCM9

2 days ago

Running with your analogy having positive cash flow and buying a property to hold for the long term makes sense. Thats the classic mortgage scenario. But it takes time for that math to work out. Buying a new property every 6 months breaks that model. That’s like folks that keep buying a new car and rolling “negative equity” into a new deal. It’s insanity financially but folks still do it.

conradev

2 days ago

I found Dario’s explanation pretty compelling:

https://x.com/FinHubIQ/status/1960540489876410404

the short of it: if you do the accounting on a per-model basis, it looks much better

CharlesW

2 days ago

That was worth a watch, thank you!

furyofantares

2 days ago

I don't think it's an accounting error when the article title says "Are OpenAI and Anthropic Really Losing Money on Inference?"

And it's a relevant question because people constantly say these companies are losing money on inference.

JCM9

2 days ago

I think the nuance here is what people consider the “cost” of “inference.” Purely on compute costs and not accounting for the cost of the model (which is where the article focuses) it’s not bad.

crote

2 days ago

Their assumption is that training is a fixed cost: you'll spend the same amount on training for 5 users as you will with 500 million users.

Spending hundreds of millions of dollars on training when you are two guys in a garage is quite significant, but the same amount is absolutely trivial if you are planet-scale.

The big question is: how will training cost develop? Best-case scenario is a one-and-done run. But we're now seeing an arms race between the various AI providers: worst-case scenario, can the market survive an exponential increase in training costs for sublinear improvements?

simianwords

2 days ago

They just won’t train it. They have the choice.

Why do you think they will mindlessly train extremely complicated models if the numbers don’t make sense?

crote

2 days ago

Because they are trying to capture the market, obviously.

Nobody is going to pay the same price for a significantly worse model. If your competitor brings out a better model at the same price point, you either a) drop your price to attract a new low-budget market, b) train a better model to retain the same high-budget market, or c) lose all your customers.

You have taken on a huge amount of VC money, and those investors aren't going to accept options A or C. What is left is option B: burn more money, build an even better model, and hope your finances last longer than the competition.

It's the classic VC-backed startup model: operate at a loss until you have killed the competition, then slowly increase prices as your customers are unable to switch to an alternative. It worked great for Uber & friends.

empath75

2 days ago

> If the cash flow was truly healthy these companies wouldn’t need to raise money.

If this were true, the stock market would have no reason to exist.

benreesman

2 days ago

My observation is that Opus is chronically capacity constrained while being dramatically more expensive than any of the others.

To me that more or less settles both "which one is best" and "is it subsidized".

Can't be sure, but anything else defies economic gravity.

hirako2000

2 days ago

Or Opus is a great model so demand is high and the provider isn't scaling the platform. I agree something defies gravity.

Also that's not accounting for free riders.

I have probably consumed trillions of free tokens from openai infra since gpt 3 and never spent a penny.

And now I'm doing the equivalent on Gemini since flash is free of charge and a better model than most free of charge models.

losvedir

2 days ago

I think this is missing the point that the very interesting article makes.

You're arguing that maybe the big companies won't recoup their investment in the models, or profitably train new ones.

But that's a separate question. Whether a model - which now exists! - can profitably be run is very good to know. The fact that people happily pay more than the inference costs means what we have now is sustainable. Maybe Anthropic of OpenAI will go out of business or something, but the weights have been calculated already, so someone will be able to offer that service going forward.

hirako2000

2 days ago

It hasn't even proven that, it's assuming a ridiculous daily usage, and also ignoring free riders. Running a model is likely not profitable for any provider right now. Even a public company (e.g alphabet) isn't obliged to honest figures since numbers on the sheets can be moved left and right. We won't know for a other year or two when companies we have today start falling and their founders start talking.

ForHackernews

2 days ago

> if you have healthy positive cash flow you have much better mechanisms available to fund capital investment other than selling shares. Eg issue a bond against that healthy cash flow.

Is that actually true in 2025? Presumably you have to make coupon payments on a bond(?), but shares are free. Companies like Meta have shown you can issue shares that don't come with voting rights and people will buy them, and meme stocks like GME have demonstrated the effectiveness of churning out as many shares as the market will bear.

JCM9

2 days ago

Agree it’s not the fashionable thing. There’s a line from The Big Short of “This is Wall Street Dr Bury, if you offer us free money we’re going to take it.”

These companies are behaving the same way. Folks are willing to throw endless money into the present pit so on the one hand I can’t blame them for taking it.

Reality is though that when the hype wears off it’s only throwing more gasoline on the fire and building a bigger pool of investors that’s will become increasingly desperate to salvage returns. History says time and time again that story doesn’t end well and that’s why the voices mumbling “bubble” under their breath are getting louder every day.

politelemon

2 days ago

What will be the knock on effect on us consumers?

chasd00

2 days ago

Self hosting LLMs isn’t completely out of the realm of feasibility. Hardware cost may be 2-3x a hardcore gaming rig but it would be neat to see open source, self hosted, coding helpers. When Linux hit the scenes it put UNIX(ish) power in the hands of anyone with no license fee required. Surely somewhere someone is doing the same with LLM assisted coding.

Workaccount2

2 days ago

The only reason to have a local model right now is for privacy and hobby.

The economics are awful and local model performance is pretty lackluster by comparison. Never mind much slower and narrower context length.

$6,000 is 2.5 years of a $200/mo subscription. And in 2.5 years that $6k setup will likely be equivalent to a $1k setup of the time.

grim_io

2 days ago

We don't even need to compare it to the most expensive subscriptions.

The $20 subscription is far more capable than anything i could build locally for under $10k.

JCM9

2 days ago

Costs will go up to levels where people will no longer find this stuff as useful/interesting. It’s all fun and games until the subsides end.

See the recent reactions to AWS pricing on Kiro where folks had a big WTF reaction on pricing after, it appears, AWS tried to charge realistic pricing based on what this stuff actually costs.

nixgeek

2 days ago

Isn’t AWS always quite expensive? Look at their margins and the amount of cash it throws off, versus the consumer/retail business which runs a ton more revenue but no profit.

If you’re applying the same pricing structure to Kiro as to all AWS products then, yeah, it’s not particularly hobbyist accessible?

philipallstar

2 days ago

The article is answering a specific question, and has excluded this on purpose. If you have a sunk training cost you still want to know if you can at least operate profitably.

2 days ago

Since DeepSeek R1 is open weight, wouldn't it be better to validate the napkin math to validate how many realistic LLM full inferences can be done on a single H100 in a time period, and calculate the token cost of that?

Without having in depth knowledge of the industry, the margin difference between input and output tokens is very odd to me between your napkin math and the R1 prices. That's very important as any reasoning model explodes reasoning tokens, which means you'll encounter a lot more output tokens for fewer input tokens, and that's going to heavily cut into the high margin ("essentially free") input token cost profit.

Unless I'm reading the article wrong.

RhythmFox

2 days ago

I am so glad someone else called this out, I was reading the napkin math portions and struggling to see how the numbers really worked out and I think you hit the nail on the head. The author is assuming 'essentially free' input token cost and extrapolating in a business model that doesn't seem to connect directly to any claimed 'usefulness'. I think the bias on this is stated in the beginning of the article clearly as the author assumes 'given how useful the current models are...'. That is not a very scientific starting point and I think it leads to reasoning errors within the business model he posits here.

2 days ago

Will these companies ever stop training new models? What does it mean if we get there. Feels like they will have to constantly train and improve the models, not sure what that means either. What ncremental improvements can these models show?

Another question is - will it ever become less costly to train?

Let to see opinions from someone in the know

Romario77

2 days ago

current way the models works is that they don't have memory, it's included in training (or has to be provided as context).

So to keep up with times the models have to be constantly trained.

One thing though is that right now it's not just incremental training, the whole thing gets updated - multiple parameters and how the model is trained is different.

So the article is only off by a factor of at least 1,000. I didn't check any of the rest of the math, but that probably has some impact on their conclusions...

thatguysaguy

2 days ago

37 billion bytes per token?

Edit: Oh assuming this is an estimate based on the model weights moving fromm HBM to SRAM, that's not how transformers are applied to input tokens. You only have to do move the weights for every token during generation, not during "prefill". (And actually during generation you can use speculative decoding to do better than this roofline anyways).

GaggiX

2 days ago

> (And actually during generation you can use speculative decoding to do better than this roofline anyways).

And more importantly batches, so taking the example from the blog post, it would be 32 tokens per each forward pass in the decoding phase.

mutkach

2 days ago

There's also an estimation of how much a KV cache grows with each subsequent token. That would be roughly ~MBs/token. I think that would be the bottleneck

GaggiX

2 days ago

Your calculations make no sense. Why are you loading the model for each token independently? You can process all the input tokens at the same time as long as they can fit in memory.

You are doing the calculation as they were output tokens on a single batch, it would not make sense even in the decode phase.

ozgung

2 days ago

This. ChatGPT also agrees with you: "74 GB weight read is per pass, not per token." I was checking the math in this blog post with GPT to understand it better and it seems legit for the given assumptions.

ekelsen

2 days ago

Then the right calculation is to use FLOPs not bandwidth like they did.

endtime

2 days ago

> 37e9 bytes/token

This doesn't quite sound right...isn't a token just a few characters?

2 days ago

Yeah but they can probably monetize them with ads.

bgwalter

2 days ago

I'm not so sure. Inserting ads into chatbot output is like inserting ads into email. People are more reluctant to tolerate that than web or YouTube ads (which are hated already).

If they insert stealth ads, then after the third sponsored bad restaurant suggestion people will stop using that feature, too.

martinald

2 days ago

Mmm let's see. I think in LLM ads are probably have the most intent (and therefore most value) of any ads. They are like search PPC ads on steroids as you have even more context of what the user is actually looking for.

Hell they could even just add affiliate tracking to links (and not change any of the ranking based on it) and probably make enough money to cover a lot of the inference for free users.

UltraSane

2 days ago

LLM generated ads.

mutkach

2 days ago

A full KV-cache is quite big compared to the weights of the model (depending on the context size), that should be a factor too (and basically you need to maintain a separate KV cache for each request, I think...). Also the the token/s is not uniform across the request and it's getting slower with each subsequent generated token.

On the other side, there's an insane booster of speculative decoding, that would give a semi-prefill rate for decoding, but the memory pressure is still a factor.

I would be happy to be corrected regarding both factors.

fancyfredbot

2 days ago

When you are operating at scale you are likely to use a small model during the auto regressive phase to generate sequential tokens and only involve the large model once you've generated several tokens. Whenever the two predict the same output you effectively generate more than one token at a time. The idea is the models will agree often enough to significantly reduce output token costs. Does anyone know how effective that is in practice?

OtherShrezzing

2 days ago

This is a great article, but it doesn't appear to model H100 downtime in the $2/hr costs. It assumes that OpenAI and Anthropic can match demand for inference to their supply of H100s perfectly, 24/7, in all regions. Maybe you could argue that the idle H100s are being used for model training - but that's different to the article's argument that inference is economically sustainable in isolation.

manquer

2 days ago

Not really, that is why they sell Batch API at considerably lower costs than the normal API.

There are also probably all kinds of enterprise deals that they are okay with high latency (> hours) that they do beyond the PAYG batch APIs

jsnell

2 days ago

I don't believe the asymmetry between prefill and decode is that large. If it were, it would make no sense for most of the providers to have separate pricing for prefill with cache hits vs. without.

Given the analysis is based on R1, Deepseek's actual in-production numbers seem highly relevant: https://github.com/deepseek-ai/open-infra-index/blob/main/20...

(But yes, they claim 80% margins on the compute in that article.)

> When established players emphasize massive costs and technical complexity, it discourages competition and investment in alternatives

But it's not the established players emphasizing the costs! They're typically saying that inference is profitable. Instead the false claims about high costs and unprofitability are part of the anti-AI crowd's standard talking points.

martinald

2 days ago

Yes. I was really surprised at this myself (author here). If you have some better numbers I'm all ears. Even on my lowly 9070XT I get 20x the tok/s input vs output, and I'm not doing batching or anything locally.

I think the cache hit vs miss stuff makes sense at >100k tokens where you start getting compute bound.

jsnell

2 days ago

I linked to the writeup by Deepseek with their actual numbers from production, and you want "better numbers" than that?!

> Each H800 node delivers an average throughput of ~73.7k tokens/s input (including cache hits) during prefilling or ~14.8k tokens/s output during decoding.

That's a 5x difference, not 1000x. It also lines up with their pricing, as one would expect.

(The decode throughputs they give are roughly equal to yours, but you're claiming a prefill performance 200x times higher than they can achieve.)

smarterclayton

2 days ago

A good rule of thumb is that a prefill token is about 1/6th the compute cost of decode token, and that you can get about 15k prefill tokens a second on Llama3 8B on a single H100. Bigger models will require more compute per token, and quantization like FP8 or FP4 will require less.

Filligree

2 days ago

Maybe because you aren’t doing batching? It sounds like you’re assuming that would benefit prefill more than decode, but I believe it’s the other way around.

leecmjohnny

2 days ago

Yes, the API business is cross-subsidizing the consumer business.

Back in March, I did the same analysis with greater sensitivities, and arrived at similar gross margins: >70%.

https://johnnyclee.com/i/are-frontier-labs-making-80percent-...

ProofHouse

2 days ago

Only introducing this *NVIDIA AI Released Jet-Nemotron: 53x Faster Hybrid-Architecture Language Model Series that Translates to a 98% Cost Reduction for Inference at Scale, into the conversation as it just dropped (so it is timely) and while it seems unlikely either OpenAI or Anthropic use this or a technique like it (yet or if they even can), these types of breakthroughs may introduce dramatic savings for both closed and open source inference at scale moving forward https://www.marktechpost.com/2025/08/26/nvidia-ai-released-j...

osti

2 days ago

This kinda tracks with the latest estimate of power usage of llm inference published by google https://news.ycombinator.com/item?id=44972808. If inference isnt that power hungry like people thought, they must be able to make good money from those subscriptions.

jeffbee

2 days ago

> power hungry like people thought

The only people who thought this were non-practitioners.

smjburton

2 days ago

Good breakdown of the costs involved. Even if they're running at a loss, OpenAI and Anthropic receive considerable value from the free training data users are providing through their conversations. Looking at it another way, these companies are paying for the training data to make their models better for future profitability.

rossdavidh

2 days ago

gpjanik

2 days ago

I somehow missed the "decode phase" paragraph and hence was confused - it's essentially that separation I meant, you're obviously correct.

johnnienaked

2 days ago

But there's no way those "costs" are real because companies are currently losing money on purpose to accumulate market share and stay ahead of their competitors, similar to Amazon selling diapers at a loss to put diapers.com out of business and then jacking the prices.

So the true cost could be 10x as much as stated, we have no idea.

sanxiyn

2 days ago

While the broad point that no, OpenAI and Anthropic are not losing money on inference, is almost certainly correct, the model itself is quite incomplete.

If you actually want to know, I recommend Inference economics of language models from Epoch AI, which is probably the best public model as of 2025-06.

https://arxiv.org/abs/2506.04645

2 days ago

AI right now seems more like a religious movement than a business one. It doesn't matter how much it costs (to the true believers), its about getting to AGI first.

EcommerceFlow

2 days ago

I wouldn't be surprised if their profit/query is at a negative for all major Ai companies, but guess what?

They have a service which understands a users question/needs 100x better than a traditional Google search does.

Once they tap into that for PPC/paid ads, their profit/query should jump into the green. In fact, there's a decent chance a lot of these models will go 100% free once that PPC pipeline is implemented and shown to be profitable.

efficax

2 days ago

> Once they tap into that for PPC/paid ads,

If they start showing ads based on your prompts, and your history of "chats", it will erode the already shaky trust that users have in the bots. "Hallucinations" are one thing, but now you'll be asking yourself all the time: is that the best answer the llm can give me, or has it been trained to respond in ways favourable to its advertisers?

xdennis

2 days ago

This is the exact same issues Facebook/YouTube/etc had with ads. In the end, ads won.

Google used to segregate ads very clearly in the beginning. Now they look almost the same as results. I've switched to DDG since then, but have the majority of users? Nope. Even if they're not using ad blockers, most people seem to not mind the ads.

With LLMs, the ads will be even more harder to tell apart from non-ads.

techpineapple

2 days ago

> They have a service which understands a users question/needs 100x better than a traditional Google search does.

Source?

2 days ago

Model context limits are not “artificial” as claimed.

2 days ago

I wonder if there needs to be two different business models:

1. Companies that train models and license them

2. Companies that do inference on models

freediver

2 days ago

Message to Martin if you are reading this - a blog without an RSS feed is not a blog. Please add one :)

ath3nd

2 days ago

Yes they are, they are deeply deeply unprofitable and that's why they need endless investments to prop them up.

That's why Microsoft is not doing the deal with OpenAI, that's why Claude was fiddling with token limits just a couple of weeks ago.

It's a huge bubble, and the only winner at this moment is Nvidia.

wahnfrieden

2 days ago

Citation? Investments aren't evidence of unprofitability in inference

ath3nd

a day ago

> Citation? Investments aren't evidence of unprofitability in inference

Does a quote from their CEO help you?

"Anthropic CEO Dario Amodei has indicated that the individual models that the company creates are profitable, even if one includes the cost of training them. But given how Anthropic keeps training new models, it reports overall losses at a company level."

https://officechai.com/ai/each-individual-ai-model-can-alrea...

That good enough for you?

It's also obvious by the fact they are starting to train on your data, like the grifters they are: https://www.theverge.com/anthropic/767507/anthropic-user-dat...

They are struggling to do anything to survive, with a crushing debt. Soon Ads :)

The cheap usecase from this article is not a trillion dollar industry and absolutely not the usecase hyped as the future by AI companies, that is coming for your job.