hackernews client

Uber's $1,500/month AI limit is a useful signal for AI tool pricing

471 pointsposted 20 hours ago

584 Comments

ValentineC

14 hours ago

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.

Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.

vidarh

11 hours ago

We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.

I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.

They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.

But, yeah, the prices will come down one way or the other.

At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.

dgellow

13 hours ago

One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].

So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.

0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE

missedthecue

12 hours ago

"So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."

Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.

dgellow

29 minutes ago

Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them? GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.

It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.

Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet

inemesitaffia

20 minutes ago

You sell the GPU's to remote gaming companies.

Replace servers with regular compute.

solatic

10 hours ago

Yup, that is the real economic benefit of bankruptcy - a reset.

geysersam

11 hours ago

Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.

> „[AI vendors are] paying for a fixed cost with a depreciating commodity“

That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?

timacles

7 hours ago

I'm surprised people think LLMs, a thing which mainly excels at advertising, spam and writing code is going to generate that much economic activity.

ashdksnndck

6 hours ago

Companies whose main core competency is writing code were already making up a big chunk of the economy before AI. Also, less wealthy companies were constrained in their use of software by the inability to afford the salaries of talented programmers (and ripoff practices from software consulting companies who in theory could help). Lowering the cost of building software systems ought to unblock a good amount of economic activity as the technology diffuses.

bunderbunder

4 hours ago

Those companies are certainly writing more code. But It isn’t clear that they are increasing their economic productivity. It could even conceivably have the opposite effect by fueling a race to the bottom.

e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.

andwur

3 hours ago

The AI pundits often seem to apply the logic that code output is directly proportional to revenue and/or profit, and as such it follows that an AI usage increase leads to more code which leads to more revenue.

I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.

So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.

therealdrag0

4 hours ago

That’s great for consumers.

bandrami

2 hours ago

Not necessarily. European grocery shoppers report higher satisfaction with the shopping experience than American grocery shoppers do.

lelanthran

2 hours ago

A lower signal/noise ratio is never better for consumers.

bryanrasmussen

3 hours ago

If the quality of all apps remains high, but if there is an increase of low quality apps it may not necessarily be great for consumers as it becomes difficult to distinguish which are the good and bad quality apps, making it risky to purchase apps.

IsTom

an hour ago

If we talking about Meta, Google, etc. code is only incidental to them earning money.

lesostep

34 minutes ago

But what if it kills current ad-tech as we know it (paying to show ads on random sites without any way to verify that the site is legit), and the flow of ad money for legitimate goods turns back to journalism, magazines and other publications?

That would be half a trillion[1] redirected to regular people just from Google Ads.

[1] snatched my number from here: https://pixis.ai/blog/2025-google-advertising-benchmarks-for...

dgellow

44 minutes ago

A few things, I think you’re missing the point here

- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks

- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget

- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price

- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.

jiggawatts

11 hours ago

The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media.

flextheruler

10 hours ago

These are similar numbers to the dotcom bubble. With GDP growth and the percentage of productivity AI contributes staying the same in this scenario this requires regular gains in revenue or growth. If things just stumble, like with most datacenters going unbuilt the bubble will pop.

bandrami

2 hours ago

The other part of that is that while price per token may be going down, tokens per task is going up

no-name-here

2 hours ago

For ~equivalent tasks, or because we’re expecting more from tasks?

The real measure should be cost per ~equivalent task, not cost per token nor tokens per task.

bandrami

43 minutes ago

For better performance of ~equivalent tasks. That's what all the harness tooling people are using does: (often) increasing output quality by significantly increasing token counts.

try-working

6 hours ago

If you have a good model router, you can route to older, cheaper models that run on older hardware, for simpler tasks. That helps labs extend the economic life of their hardware investments. They will likely fight it at first though as they see it as reducing ASP.

This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/

bethekidyouwant

10 hours ago

Using a shittier model is just more work for the user, I’m not sure why anyone does it, unless they’re playing with it like a toy.

SoMomentary

8 hours ago

Local privacy respecting inference can be worth it. I use a local model to log everything I do all week to automate my timesheet. I also have it do a bunch of other data tasks. I won't say that larger SOTA models wouldn't do these tasks better than a local model but PII is a concern and my employer wouldn't approve of me just setting tokens on fire everyday to do what I could do myself.

no-name-here

2 hours ago

> more work for the user

Model routers allow this to happen automatically without any more work by the user.

> a shittier model

A ton of tasks don't require the most expensive frontier models, etc.

> I’m not sure why anyone does it

1. Faster solutions from the LLM - also reduces employee costs of having the employee waiting on the LLM

2. Avoiding things like the half-billion dollar per month bill for a single company’s LLM use recently reported in Axios

dgellow

24 minutes ago

What you call a shittier model is what was considered frontier and fantastic one generation ago…

Kaliboy

8 hours ago

I sometimes let Claude Opus create plans, DeepSeek v4 pro implements and writes tests. Claude reviews and corrects.

Saves like $2-3 per session. Same quality code.

bijowo1676

13 hours ago

do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.

I think its only accounting depreciation.

I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?

bgnn

12 hours ago

Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.

The solder joints are notorious to fail at a high rate too.

consp

12 hours ago

If those don't go the caps and coils will eventually.

chadgpt3

10 hours ago

those are easy and cheap to replace

jetbalsa

4 hours ago

Depends, the SMD caps spread across the board the tiny ones do start to fail and go out of spec over time. they are a right pain to replace and hard to spot one that has gone out of spec to cause the chip to start crashing.

grogenaut

4 hours ago

Can you not just move the epxensive part (the gpu itself) to a new carrier board in that situation? Also isn't most of the cost of the GPU itself the design of the board, not actually making one, esp if you can move the heat sinks around?

beAbU

2 hours ago

"just"

lelanthran

2 hours ago

Not if you account for labour.

lazide

10 hours ago

Caps also have a rapid aging with temp.

Aurornis

13 hours ago

There are data centers that use and rent out 10 year old server GPUs.

They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.

They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.

Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.

The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.

grogenaut

4 hours ago

except for you know the enterprise customers who won't change their code and will pay to run old inefficent hardware just to keep from dealing with upgrades?

icepush

3 hours ago

They can just ask Claude to upgrade it for them, completing the circle!

grogenaut

3 hours ago

I'd agree. but also that's too scary. and the bottleneck is the massive manual change control process since there's no automation around any of this. :)

Why take risk when you can spend money and take no risk

jmalicki

13 hours ago

As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.

When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.

If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?

HumanOstrich

12 hours ago

We aren't talking about insect identification models from 2019.

jmalicki

11 hours ago

What do you think are running on the T4 GPUs in AWS? A lot of the use cases I know of for them are mid-level computer vision models that don't need to be frontier level.

jmalicki

7 hours ago

I can no longer edit this, but want to expand on my comment.

I've seen those vision researchers want to train on H100s at the time and being told know, wait for the T4s.

I've seen T4s running BERT models for document classification.

When there are enough Blackwells in data centers that H100s are useless for inference by your standards (I don't know if we've arrived there or not yet), there will be people who, say, want to run the Taco Bell ordering chatbot on them. There will be people who have applications that are just fine with Qwen 2.5 who will be happy renting them.

There seems to be this crazy consensus that hyperscalers are going to go into their datacenters and throw away their old GPUs. The reality is they have a ton of paying customers for them.

And there may be insect identification apps from 2019 that say "you know what? H100s have gotten cheap enough I can use a VLLM so the user can describe where they saw the insect too", or the McDonald's website support chatbot developers say "Hey, the bigger cheapers have gotten cheap enough we can upgrade our models to Qwen 2.5".

The frontier level GPUs in e.g. AWS have a huge premium. When the newer generations come out, they will be able to cut prices to a bit of a premium over the operational costs and still make a profit, and there are a ton of down-market customers who will be interested, who aren't willing to try to outbid Anthropic for Blackwells.

munk-a

13 hours ago

In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.

Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.

tardedmeme

13 hours ago

Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.

When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...

bijowo1676

12 hours ago

sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues

HDBaseT

10 hours ago

It was not planned depreciation, as many chips were failing even before 2 years and this impacted not only PC Builders and Gamers, but also some server infra providers too.

This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.

It cost them far more than it made.

chadgpt3

10 hours ago

They didn't replace all the chips like with the FDIV bug though. What did it cost them? Only reputation?

tacticus

9 hours ago

Not even that in the end.

vb-8448

13 hours ago

I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.

Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.

malfist

13 hours ago

They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.

13 hours ago

> There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally

I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!

hgoel

11 hours ago

Not great, not terrible.

sandworm101

13 hours ago

Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.

As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.

satvikpendem

12 hours ago

Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.

> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.

https://www.anthropic.com/research/2028-ai-leadership

CuriouslyC

12 hours ago

If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.

arealaccount

11 hours ago

The “you wouldn’t download a car” meme applies here

WarmWash

11 hours ago

They'll ban them because (unless run locally or self-hosted) they are just data capture tools for the China.

corpoposter

8 hours ago

If it’s open weight then anyone can run it for you. Presumably someone you trust just as much as US proprietary models.

dzhiurgis

7 hours ago

I don't think they'll offer open models for long. Since they've actually invested in power, cheap chips, cheap memory and can subsidize tokens - they'll keep undercutting big models to capture data forever. Bonus if they remove ridiculous safeguards and China will be unstoppable.

atwrk

2 hours ago

Pretty sure they'll offer them at least so long as it takes to bring OpenAI and Anthropic into insolvency. Why wouldn't they? The Chinese models are way more nimble to train and run, bring in a ton of goodwill globally, and put immense pressure on the VC furnace that is the US AI sector.

And apparently OpenAI and Anthropic think so, too - why else would they try so hard to ban them instead of outcompeting them?

dakolli

10 hours ago

Please explain to me how that works. If I download gguf file and run inference with it, how is it collecting and sending data back to China?

This makes no sense, 99% of the people using Chinese models are using them via Western inference providers who are running them and serving them to people over openrouter or whatever. If anyone is stealing your data it would be an American or European inference provider. A model has no ability to send data anywhere.

China bad by default, right?

postsantum

21 minutes ago

You will see soon that china uses illegal uyghur children labor to train these models so we should all boycott them

satvikpendem

9 hours ago

> unless run locally or self-hosted

throwyu8

8 hours ago

China is the worst trading partner in the world. They banned most companies from functioning in their country for decades

evolighting

an hour ago

So, have you ever been to China and could hadely found anything familay?

- Oh, they must have been blocked from entering the Chinese market!

But none of that is true. You could see global brands everywhere here — Tesla, Unilever, KFC, Apple, and so on.

---

Or have you ever actually done cross-border trade? Or any international business collaboration? If you had, you’d definitely realize that what’s really stopping you is U.S. legislation. At least, that was the case with our former U.S. partner

Animats

13 hours ago

Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.

davedx

2 hours ago

Meanwhile, Google...

stingraycharles

an hour ago

Google also needs fabs to build their TPUs.

kristianp

9 hours ago

> The world is out of fab capacity.

Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.

stingraycharles

an hour ago

From what I understand it’s mostly TSMC and the memory providers being out of capacity over the next few years.

So it’s not even about datacenters.

Here’s a Reuters article about TSMC: https://www.reuters.com/world/asia-pacific/broadcom-flags-su...

So this is actual committed contracts with all kinds of companies such as Apple, NVidia, AMD.

Also, the whole reason they can’t build data centers faster is precisely because of this.

no-name-here

an hour ago

> they can't build datacenters at anywhere near the rate they want to

That was because the supplies the datacentre needed were constrained - supply-constrained, not end-user demand constrained, so would be in agreement with the GP comment (and the article I read didn't imply anything about lying).

EA-3167

12 hours ago

Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN.

criddell

12 hours ago

Today's frontier models will be tomorrows low-end option. I think whatever model you are using today will be less expensive to use a year or two from now.

missedthecue

12 hours ago

Last year's o3 was more expensive than 5.5 is. Whatever model we are using now is probably be more expensive than next year's leading models will be.

Insanity

11 hours ago

Price per M/tokens is also a fuzzy metric when newer models reason longer, and then burn more tokens while doing so.

oblio

10 hours ago

Isn't 5.5 a router, though? As in, some prompts get automatically sent to a cheaper model?

dakolli

10 hours ago

They aren't going down, but in the meantime they'll cover their ass by bribing their way into the S&P 500 and then use your 60 year old mother's 401k and teacher's pension to fund their risky capital expenditure.

LastTrain

9 hours ago

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.

freediddy

14 hours ago

Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.

eikenberry

13 hours ago

Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.

daemin

4 hours ago

A key point here is open in terms of being able to download and use it, not open as knowing what data and instructions were fed into it when training.

A paranoid part of me thinks that these models are all inherently biased and instructed to be pro CCP, with specific gaps in their training data related to undesirable historic events and political ideas.

viking123

12 minutes ago

Applies both ways, ask it about Israel.

therealdrag0

3 hours ago

Sure but that goes both ways. Any dataset has a bias. My coding doesn’t need to know about Tienamen square.

ceejayoz

14 hours ago

Saner companies ask the same question about models from their own country too.

rd

13 hours ago

I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.

CobrastanJorji

13 hours ago

Here's a free startup idea: operate an open-weight model service, and offer "Verified AI Integrity," which signs the input tokens, the seed for the randomness in selecting outputs, and the model ID, proving that the result of the call to AI was completely "organic" and was not interfered with.

Your main audience would be snake oil salesmen trying to prove their AI products are unbiased and not under the thumb of any outside influence. This doesn't address the biases of the model itself, but that's not your business. Your business is selling tokens and security certificates. If you can get the right angel investor, you could maybe have your new standard required for some government applications.

tokioyoyo

13 hours ago

Yes, you can. There are multiple inference providers out there. The problem is, it’s hard to beat the Chinese providers in cost. And you also have to compete with frontier model providers’ subsidized offerings.

dakolli

10 hours ago

They charge the exact same prices. So many people in these comments have no idea what they're talking about. Even if they did charge less, nobody is going to deal with the latency of sending requests to China.

edit: Actually American inference providers are cheaper for Chinese models. There's way more competition here because the Chinese aren't idiots and investing every last dollar they have into data centers for llms that don't make money..

tokioyoyo

9 hours ago

Can you please link me DeepSeekV4 provider that's cheaper than their official offering? And not all tasks require low latency.

Also, there are a lot of competition in China. Like a lot. You might know better than me as well, but although the biggest AI-labs are based in USA, the adoption is weirdly global. Like as a general sense of what's going on - you can see AI-related ads literally everywhere in Tokyo, almost all the time, in every single screen in public.

_matthew_

4 hours ago

Cro.ai seems to be: https://crof.ai/

Of course though they are not necessarily a viable solution for companies with security requirements etc. given it is just a single person project, but they still serve as a proof it can be done.

dakolli

3 hours ago

This costs more.

dakolli

7 hours ago

Deepseek's api platform for V4 Pro is the only example of this, and Deepseek V4 Flash is cheaper (usually) than from Deepseek itself on openrouter via DeepInfra.

Deepseek shot themselves in the foot because they never intended to serve V4 Pro for .80c mm ouput, that was a promotional price that was meant to expire (and still might). They intended for v4 to cost $4.00 per million but Western inference providers drove down the price because they can operate at negative margins to try and push competition out. I can assure you they are losing a ton of money @ ~80cents.

My point is, its Western inference providers that are establishing the floor price of inference. They are willing to operate at a loss in order to put their competition out of business. Chinese providers are typically at or above the prices set by American/western providers if you go looking on the Chinese internet. You aren't going to get deals from China for inference except through this one instance with Deepseek v4 Pro which wasn't even supposed to be permanent pricing.

RussianCow

9 hours ago

By "cost" I think the parent means the provider's own costs, not the cost of inference to the customer. The cost of land, labor, and electricity are significantly lower in China than in the US.

mediaman

13 hours ago

There are plenty of US-based inference providers available, including AWS, that serve Chinese models at competitive prices (vs frontier US models). They also have lots of usage. Not necessarily for coding, but for other enterprise tasks.

dakolli

10 hours ago

Have you heard of openrouter? There's 1000 of these companies already. Do something else.

fg137

12 hours ago

It's called AWS. Bedrock is right there. Price or data policy is never the issue. The models themselves are the problem -- most large US companies are not going to touch them.

Source: directly involved in these discussions. You can downvote as much as you'd like but you can't ignore the facts.

RussianCow

9 hours ago

> The models themselves are the problem -- most large US companies are not going to touch them.

Can you expand on this?

amunozo

13 hours ago

You can run DeepSeek as it's open weights, unlike Claude or GPT.

tmp10423288442

13 hours ago

There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.

cheeze

14 hours ago

Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company

KronisLV

10 hours ago

> Deepseek has some models in Bedrock.

Just looked into it, seems like at most they have just 3.2, not 4: https://aws.amazon.com/bedrock/pricing/

Looking around their catalogue more, most of their models seem quite outdated, aside from the OpenAI and Anthropic ones (but those get more expensive). I wouldn't willingly pick Bedrock and would instead throw money at OpenRouter, that has both a bunch of providers, as well as almost any model for you to try.

ed_elliott_asc

3 hours ago

If Anthropic are then they are making a big mistake, their token hungry Claude code is far too greedy

testdelacc1

14 hours ago

Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.

sevenzero

14 hours ago

I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...

apsurd

14 hours ago

Makes me think of how my Claude.md files specifies to use the built in framework code-generators (rails). Those generators are deterministically right every time.

I wonder how often the Agent actually follows the guidance. I do see them follow it when I look. But it doesn't seem so every time.

thefunnyman

14 hours ago

This is tricky since it can and will ignore your md directions. When possible I try to lean on tool call hooks or skills that invoke deterministic scripts. As much as you can remove the "choice" the better though still there's a lot of randomness in how reliably it invokes skills ime.

internet101010

4 hours ago

Hooks are incredibly underused by most people and are the easiest way to establish a first line of defense against bad behavior. Things like blocking tool calls that will read .env file or execute "create or replace table".

sfn42

14 hours ago

A lot of the time if you're copying code from one place to another what you actually want to do is abstract it so you can reuse it in both places.

The LLM can easily do this type of stuff, just tell it and it'll happily do it. This is exactly what I mean when I tell people they need to work closer with the AI, tell it how to do things. Don't just tell it what to do and get frustrated when it does it differently than you would.

A good way to achieve this without writing huge prompts is tell it to plan the change first. Just give it some vague low-effort directions. It'll usually get most things right, you tell it what you want different and once you're happy you tell it to go ahead.

sevenzero

14 hours ago

Nah the codebase is legacy fucked and I cant be bothered to try and optimize business flows without the fear of other stuff breaking.

Claude 100% of the time even thinks we use laravel despite the project being some old lumen codebase, so most of laravels features are not available. It also gets the PHP version we are using wrong 100% of the time.

sfn42

8 hours ago

Have you tried adding this information to claude.md so it knows?

I also think your excuse is bad. "The code is legacy fucked so I'll just legacy fuck it some more because I can't be bothered to make an effort"

adithyassekhar

7 hours ago

This is a spicy take, unless the business is willing to face some down time, and I am hired to do exactly what you said, I’d never touch any line of code unless I absolutely have to. Different environments don’t help as much.

We tend to obsess over software quality when it’s the least important thing for a business. It’s just a means to an end.

sevenzero

4 hours ago

This is what its about, we have multiple ecom shops running 24/7 and cant simply afford downtime or a change of business flow that maybe doesnt affect shop A and B but definitely affects shop C and D...

sevenzero

4 hours ago

Are you some kind of entitled corporate dev that barely has any influence on the codebase? If I fuck up a whole business goes down as I am the only dev there currently. We cant afford that happening. Also why would I mess with anything claude.md related? I just use the CLI tool. LLM enthusiasts always claim how smart these things are so they should figure it out on their own, you know?

sfn42

an hour ago

I have full control of my codebase. I'm not afraid to make changes to it because I know what I'm doing.

You would edit Claude.md to say things like what tech the project is using, because that's the entire point of claude.md. It's literally the solution to the exact problem you're complaining about. Any information you want it to know, you put in there and then it knows it. And you can tell Claude to make or update the file for you.

I'm not one of the people telling you how smart LLMs are. I'm telling you how to use it efficiently, by not expecting it to know everything but rather provide the information that it needs in order to be a more useful tool.

bigbuppo

8 hours ago

They're going to need to bring in a few trillion dollars fast to meet wall street expectations. Expect prices to rise.

SecretDreams

14 hours ago

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.

aDyslecticCrow

HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.

I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.

For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.

dghlsakjg

13 hours ago

The majority of Deepseek providers on OpenRouter for v4 pro are in the US. Especially interesting is that they are in the same ballpark for pricing.

eikenberry

13 hours ago

They are in the same ballpark for deepseek-v4-flash, but deepseek-v4-pro from deepseek is still around 1/2 of the alternatives.

dghlsakjg

13 hours ago

I'm pretty sure that Deepseek said that pricing was promotional. Be curious to see if it lasts.

V3 pricing from them was right in line with what the commodity providers are charging.

eikenberry

12 hours ago

They announced a few weeks back that the promotional pricing was permanent.

alpinisme

14 hours ago

“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.

dkersten

13 hours ago

Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)

lowbloodsugar

14 hours ago

Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!

Not everyone using AI is using it to code core value IP.

vinzenzu

an hour ago

API prices of Anthropic, OpenAI, and Google are massively inflated.

https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...

There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.

Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.

thundergolfer

10 hours ago

> That means each employee's AI spending cap is ~11% of that median compensation package.

Probably better to use the fully-loaded cost of the engineer, which is much higher than their compensation package. The fully-loaded cost is the total cost paid for the labor power of the engineer, and it includes big ticket items such as office space, food, equipment, insurance, payroll tax, fringe benefits, recruiting costs.

5 hours ago

GoodRX is always worth checking out, a ton of manufacturers will have coupons if you have insurance but they won't cover it.

Ask your doctor about them, look them up in your insurance's formulary to see what's required (e.g. if you have tried both Ambien and Trazadone and can document it), and see what they can do, before writing it off!

The expectation is Belsomra will lose its patent in 2029 and then generic makers can try to get one approved - so it's not that far off!

ptero

7 hours ago

While the fully burdened cost of an engineer being double his salary sounds suspicious, this is indeed broadly the case. It has been (sometimes significantly) more than double in the case in every US employer where I worked and where I saw both numbers. In one case it was a hair under 3x.

My experience was not with pure software houses; we had some labs, measurement and RF equipment, but even without the hardware component the offices, insurance, admin expenses, HR, janitors, conference travel and so on would easily bump the total employee cost to double the salary. My 2c.

stingraycharles

8 hours ago

I’ve even heard the rule “twice the salary” being used here in EU, but the tax and insurance burden may be higher. All kinds of those are based primarily on total payroll amount.

consp

2 hours ago

That number usually includes cost of habitat and others. It's also a stupid number as it is skewed by how much you can squeeze out of your employees. A better number would be to compare it vs revenue per capita.

notnullorvoid

7 hours ago

Both metrics are valuable.

If one uses AI minimally and is able to out perform peers who are maxing out AI spend, one might want to use that in salary negotiations.

ransom1538

9 hours ago

"$330k/year" Lol. I thought I clicked on hacker news 2022.

barumrho

6 hours ago

Is it too high or too low? Honestly cannot tell

random__duck

5 hours ago

Quoting the article : > Levels.fyi lists the median yearly compensation package for Uber software engineers in the USA at $330,000.

stego-tech

9 hours ago

It’s also worth noting that’s the peak benefit. Expect most engineers to not hit those limits on the regular (if at all, since limiting this puts skills in focus again), and that limit to come down over time as the easy processes are automated and humans are re-tasked with harder problems relative to their TC.

This is not a good bellwether for the AI industry, including its adherents. Their growth assumed a level of indispensability that’s not being reflected in hard numbers and real costs, which lends credence to the notion that these IPOs being fast-tracked are meant to try and cash out before the bubble really pops in earnest. There’s no way consuming enterprises are going to pay such insane costs for such minimal uplift in the long run, and the AI companies can’t keep offering subsidized tokens via subscription plans at their current pricing.

f311a

19 hours ago

How many more months do we need to wait, until big companies realize that flash models work just fine if you:

1) Don't ask LLMs for big changes

2) Review everything and point them in the right direction

Large models still suck at big changes, they produce questionable architecture and you still have to review the code, if your project is serious enough.

The codebase quickly become a mess, if you don't pay enough attention. Does not matter which model.

So why bother with big models, when flash models are 10x cheaper and much faster to iterate under guidance? Large models can be used for security and bug audits. Flash models work almost the same for changes under 300 LOC when you dictate how you want your code to look.

_jab

13 hours ago

It's pretty simple; organizations are willing to tolerate paying $1500/month/engineer, which seems to be roughly inline with "normal" consumption for most full-time engineers. If that number grows significantly, then I bet companies will start exploring flash models more, as you propose.

lavezzi

13 hours ago

They are willing to tolerate it now, which is quite a switch up from the free for all we had a few weeks ago, and if they aren’t able to tie in this new ~$1500p/m cap to demonstrable productivity and revenue increases then that will be kneecapped even faster

phreeza

2 hours ago

There are plenty of expenses in this order of magnitude that are not tied to direct increases in productivity. I think it may become a serious hiring impediment for companies to be really skimpy on these budgets for example.

rudedogg

12 hours ago

> organizations are willing to tolerate paying $1500/month/engineer

One organization, that is a software company

> which seems to be roughly inline with "normal" consumption for most full-time engineers

My peers are using $20/mo plans, only a handful are using more than $100/mo in tokens. We haven’t had any limits imposed yet.

epolanski

10 hours ago

Which organizations?

Uber is not representative of any trend beyond big tech and VC over funded startups.

mrothroc

13 hours ago

The easy decision is to just go with the biggest SOTA model you can afford.

But this overlooks the other critical part of getting the most out of these things: the harness. I run an autonomous plan/design/code/build/test pipeline with agents using my own orchestrator. Different models are better at different stages, and I use LLMs to judge the output between them. Not everything needs Opus 4.8.

The harness provides both the scaffolding to get the right things into the model, and the right things out. But it also lets you dictate which model does which work.

It's the pipeline, not the model, that gets you quality at a given token budget.

jmtulloss

10 hours ago

Is your argument that $1500 / mo is too much? Why would the engineering team not be more rigorous in their model selection given a constraint?

gravypod

10 hours ago

If you had a business task to complete that was only possible with ai and it cost you >$1500/month of work, how long would you have to delay the task so that it's cheaper long run to buy hardware and do local models?

$1,500/mo * 14 months = $21,000.

If local models are 14mo behind as many in HN say it may be profitable to just wait. Maybe just spend a few hundred dollars of your tokens and buy hardware piece by piece.

therealdrag0

3 hours ago

Nearly no one is doing anything that is “only possible with AI”. This doesn’t seem like a relevant calculation. People spend on AI as an investment in their current productivity.

pchristensen

9 hours ago

There's a lot of opportunity cost to waiting 14 months to build something.

garrickvanburen

7 hours ago

I agree, outside of the AI bubble, there's a lot of wait-and-see happening in the B2B world right now, I'd say we're currently 6-8 months into that 14 months.

edmundsauto

6 hours ago

It also presupposes that open models will bridge that gap towards opus4.5, which was really when I drank the AI coding koolaid

econ

14 hours ago

I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly? Perhaps, if it can measure complexity, even generate a quote?

Small models are fine for small coding tasks but I don't see why big ones can't be broken down most of the time.

AgentMasterRace

13 hours ago

Many harnesses do this, I've recently dropped all my big subscriptions for using deepseek. Codewhale (formerly deepseek-tui) will use pro for large tasks and route smaller ones to flash. It's pretty good, but I just use pro and everything as the cost is quite low.

This one does not have routing, but reasonix is insane, absolutely insane for saving money. I've used 1.3billion tokens at the cost of 4$. (99-100% cache hit)

ValentineC

13 hours ago

> I wonder to what extent models should figure out which model to forward a query to. Or perhaps the big models could learn the difference between an easy and a hard question and charge accordingly?

This sounds like something a harness could do (and might already be doing), with work delegated to subagents running on lower-cost models.

jorl17

12 hours ago

Yes, they are all already doing this

andersmurphy

12 hours ago

This a thousand times. The bigger models also have a habit of overcomplicating things.

warmwaffles

14 hours ago

> Don't ask LLMs for big changes

> Review everything and point them in the right direction

Sorry upper management doesn't care. That's an engineering problem that you need to solve.

eikenberry

14 hours ago

They were proposing a solution.. To use flash models and use them in a way that best amplifies your work.

AgentMasterRace

13 hours ago

He was making a joke.

epolanski

10 hours ago

I'm legit annoyed at opus 4.8 at any setting above 4.8.

I believe it can be great for vibe coding, but mundane day work? Hell no, I'd rather work with Haiku. It's too slow, checks too many things, it's annoying as hell.

thesumofall

an hour ago

Plenty of comparisons here between salaries and token costs. All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location? The WFH discussion surfaced some of that. If money is cheap, all sorts of funny things are happening. Is it worth to spend 1500 USD on AI? I don’t know. Is it worth paying engineers 300k USD instead of 30k? Honestly, I don’t know

bjackman

21 minutes ago

As well as rational vs irrational they are also just different types of spending.

Hiring someone vs paying a vendor for a service:

- different level of commitment

- might tie your org to a physical location

- different legal risks

- shows investors a different picture (probably this would even influence a bank loan)

- manager has to fight a different bureaucracy

Not to mention that comparing the cost of a hire by looking at their salary is pretty dumb. ISTR hearing at Google that the overall estimated cost of employing a SWE is like 4X their compensation? Can't remember the exact figures though.

palmotea

37 minutes ago

> All fair but very much assumes that salaries are rational. Why do we pay some engineers 10x as much for the same role just because they are in a different location?

Who's this "we" you're talking about? Are you a software engineer or a temporarily embarrassed billionaire? Do you think the rational thing is to pay the lowest regional salary worldwide?

inemesitaffia

23 minutes ago

If your competitors do, you likely will

tuesdaynight

14 hours ago

Why there are so many people that still believe that AI coding is a fad? It's something that started less than two years ago and companies are already paying thousands per seat. I know one that gives you 5k per month. Which other tool went from nothing to this level of acceptance so quickly?

OptionOfT

13 hours ago

Because companies are betting that this spending will allow them to reduce cost by firing people.

Right now the AI LLM PRs we're seeing are just introducing more work for other people, while these so-called builders are looking good with their new dashboards and functionality they're demoing.

But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

It's not built up from the ground with experience from x people taken into account. It's materialized from nothing, with no foundational separation, and barely any abstractions.

No one wants to touch it. The PRs are too large, and the 'authors' of the PRs aren't on call with us.

They get all the glory, but do none of the work.

It's kinda like designing a house and then sending it to an architect and engineer saying: make this work.

saulpw

13 hours ago

> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

You can absolutely do this. It's even right most of the time.

chmod775

12 hours ago

Let's be real. Most of the time you ask an LLM "Why did you do it like this?", it responds with something along the lines of "Oops. My bad. You're right to point this out."

You even have a fair chance of getting a response like that when there isn't anything wrong and the question wasn't rhetorical - which perfectly illustrates the level of the genuine understanding LLMs operate at.

seventhtiger

12 hours ago

When you criticize AI, always remember that the alternative is the average employee. Today's models are pretty good.

devin

11 hours ago

A lot of people think they're above average. A lot of them are wrong.

A lot of average people are producing gigantic messes. At least previous to this they were gated by their mediocrity.

Frieren

3 hours ago

> the alternative is the average employee. Today's models are pretty good.

I have never seen anywhere in the world people that hates so much the working class as people do in the USA.

In my country the average employee is competent, they do their work and create wealth for the nation.

Again, only in the USA people think that billionaires are the ones creating value. Total non-sense indoctrination.

cyh555

3 hours ago

and have they totally got rid of the average employees? They can blame the models for the production outages already?

djeastm

11 hours ago

I remember hearing (perhaps last year?) that the model companies have specifically tried to obfuscate the "thinking/reasoning" behind the decisions the models make so as to prevent cheaper models from training on the reasoning logs. So asking one "why did you do it like this" might be not fruitful.

Not sure if that's true or if it might be influencing what you're seeing, but it's a thought.

NewsaHackO

10 hours ago

I think that has to do more with the thinking "train of thought" that some models show as what the model is processing before making the response. There shouldn't be a distillation risk with actually asking the model to explain why it made a decision and getting the response.

saulpw

12 hours ago

This has happened to me, so I put this in my global CLAUDE.md, and it seems to help (I don't remember getting the response you mentioned for awhile now):

    **Lead with the answer when asked how/which/whether.** Name the command/mechanism first; a question seeking understanding isn't a go-ahead to execute. Answer, then offer to act.

baggy_trough

12 hours ago

Can't remember the last time that happened.

javier2

11 hours ago

Happened to me at least three times the past 14 days. I point out where it made a design decision that causes data loss. «Oops my mistake»

therealdrag0

3 hours ago

So what? That doesn’t negate the value they provide.

dmayle

11 hours ago

That's because of a fundamental misunderstanding of what an LLM is. The only correct answer to "Why did you do it like this?" is that the specific combination of input text and RNG state caused this particular output. There's no reasoning to be had.

* EDIT * What's with the downvoting? That's a correct description of what happened. You can't ask an LLM why it did something and expect a coherent response, because there's no thinking chain, and no stored thinking state... At best, you can get a reconstruction of how the context relates to the output (basically a summarization of the context).

datsci_est_2015

13 hours ago

I believe the “them” the OP was talking about was referring to the people opening the PRs, not the LLMs.

saulpw

12 hours ago

My mistake, that is definitely a different scene.

ssss11

11 hours ago

And you can certainly tell it the flow you want (and any other constraints) in the prompt.

com2kid

5 hours ago

> Because companies are betting that this spending will allow them to reduce cost by firing people.

I've never worked at a company that didn't have a technical backlog measured in years.

scubbo

10 hours ago

> But you can't talk to them about the flow of the code. You can't ask them for their thinking as to why certain things are.

There are plenty of valid criticisms or warnings about over-reliance on AI coding, but this is not one of them. Today, I am using a semi-autonomous agentic coding system which has an `interview` functionality built in - when it spits out the PR from the input, if you have questions about the motivation or context for a particular choice, you can start up a clone of the original agent in a sandbox to question it.

Now, you might claim that those responses aren't always reliable, accurate, or consistent, and that claim has a little more weight (though, in my experience, decreasingly so) - but it is _certainly_ not the case that you cannot interview an agent about choices made. I'm literally doing it every day.

OptionOfT

7 hours ago

Sorry, I meant interviewing the PR author for certain choices.

scuff3d

There are so many. Can I start with Oracle databases?

therealdrag0

3 hours ago

Oracle DBs have powered enormous numbers of applications and economic value.

overfeed

an hour ago

> Can you name a service that charged companies thousands/seat/month that turned out to be almost or completely useless?

The Concorde turned out to be fad (not "useless" - which was your reframing.) Touted as the future of travel, each seat cost about $20,000 of today's dollars, but it turned out even at those high prices people and companies were willing to pay per-passenger, supersonic trans-Atlantic air travel is not economically viable, and was discontinued.

mike_hock

11 hours ago

Every consultant ever, but to be fair that's not per seat.

sdevonoes

10 hours ago

Not a service, but do you remember Scrum Masters? We had them as full time employees not so long ago. Pure fad.

therealdrag0

3 hours ago

Hah. Great example actually. But far less common than AI afaict.

ThunderSizzle

9 hours ago

I hope this is sarcasm, or "half" my job doesn't exist or something. Or you talking about full time non-dev scrum masters?

oblio

6 hours ago

Yes

ipaddr

8 hours ago

Oracle and some company wide Microsoft licenses.

therealdrag0

3 hours ago

These are clearly useful.

Kiro

12 hours ago

It's just silly to claim it has zero correlation.

tokioyoyo

13 hours ago

“AI coding is a fad” is not just one big camp of similar-minded people. Different groups have to give up on their pre-existing beliefs in order to be ok with AI coding.

Think of people who were very strict with variable names. People who pushed for multiple-levels deep of abstractions for a single API logic that’s not going to be reused. People who believed that coding is craft, rather than just a process to get to the end during work hours. This makes most of these people’s points more-or-less moot.

I was in some of those camps, but I’ve seen coding evolve in the last 15 years. So I understand that these priors need to be updated, as most arguments don’t apply to today’s world.

devin

11 hours ago

"as most arguments don't apply to today's world" makes me want to roll my eyes so hard at you. The vast majority of problems we had with building complicated systems are all still just sitting there. People are speedrunning relearning things we've known about software engineering for decades.

The more things change, the more they stay the same.

rootusrootus

11 hours ago

Between AI and the stock market (which of course relates directly to AI), I’ve lost count of the number of times I’ve heard lately another variation of “this time is different.” Sometimes so close to those words that I wonder why the person speaking them doesn’t feel a bit tingly. Great big warning signs all around.

tokioyoyo

11 hours ago

The examples I gave, and the arguments that usually support them don’t really translate into “building complicated systems”. I was talking about the arguments in support of variable naming flamewars, etc.

I’m not proponent of AI generating everything without any supervision as of now. But willing to change my mind when it gets better.

Most software engineering jobs are not cutting-edge tech, or research, or solving unsolved problems. Integrations, APIs, figma-to-react pipelines, devops and etc. is what people get hired for. All those can be done much faster in the same-or-better quality by an experienced person with the supplement of AI. It’s hard to imagine any company would go against the grain and slow things down on purpose.

devin

10 hours ago

So I accept that “nonsense arguments are nonsense”, but with some minor differences of opinion. Naming of things matters insofar as you care as a human to actually conceptualize the system you’re building. You can call all of this stuff minutiae, and on some level I kind of agree, except for the general vibe of _caring about the quality of the stuff you produce_. That is something that still matters whether it “works”. Like, yes you can get an LLM to gen some junk, but _is it any good_ is still something you are in charge of.

As far as “boring systems are boring”, I can tell you from experience that I work on a pretty boring system, and AI is not all that meaningful in terms of its impact, and it’s not for a lack of trying.

Can it help me create a migration and add an endpoint and such? Sure. But those aren’t the hard problems. They never were.

It’s funny that you think the idea of slowing down is such a bad one, but it is another well-established truth. Slow is smooth, and smooth is fast. This notion of break/fixing your way to prosperity by way of 10,000 ill-conceived PRs is a fool’s game.

tokioyoyo

10 hours ago

I'm sorry, you might be right. But this simply doesn't reflect my daily reality. All I can say is, nobody in my org is creating 10,000 PRs. But everyone is using Claude Code for virtually all commits. We've been doing it since about Opus 4.5ish. So far, so good.

Generally we've modified our timelines heavily, systems are working as intended, company is still making money. There are some AI-authored commits that had mistakes that we didn't catch, but I'm sure this could've been an issue even if all were human-authored. I know first-hand multiple other companies who are doing exactly the same thing.

I agree with "slow is smooth, and smooth is fast" for mission critical systems. But super majority of systems are, indeed, not mission critical.

therealdrag0

3 hours ago

I have the same experience. Slow is smooth with AI is still productivity improvement.

fragmede

13 hours ago

What's an int vs a float vs a boolean? What's a function? What's a class? What's a variable? You don't actually need to know the answer to those questions in order to vibe code. That's a lot of priors to update!

tokioyoyo

13 hours ago

Just to go on record, as of today, I’m a big believer that a person that knows all that stuff is much more productive with AI-coding than a person who doesn’t.

I have no idea how we can get people motivated to learn these through trial-and-error when AI coding exists though. I remember the days of spending hours on stupid bugs that AI can resolve within a minute. But I recall learning heavily from those experiences. Oh well…

scubbo

9 hours ago

I like the presentation I heard from a Principal, that AI tools amplify your competence. If you start out incompetent, it'll just allow you to be incompetent with greater scope and (negative) impact.

grogenaut

3 hours ago

yes, but a person who doesn't know any of this stuff is infinitely more productive with ai than someone who isn't when it comes to many things.

we've got product folks vibing out prototypes (not shippable but clickable) in our main front end in a few minutes to an hour. This would previously have involved 3 people and several weeks, or a ton of figma and documents to fill in the gaps. This saves weeks to months and lets them really experience the items.

Then they hand it off to someone who knows all that stuff who is also using AI and the impl also gets done faster.

The PMs are either moving infinitely faster, or at least 30x faster and not blocked constantly by others.

basically you're not comparing people who don't know much (tech) with those who do, you're comparing them before and after access to AI.

mewpmewp2

12 hours ago

I honestly feel like my own learning has accelerated after using AI. Simply because now it's so easy to write the same thing in so many different languages, I can e.g. learn pros and cons of each language, which otherwise would have been I think unfathomable to me. I have now created so much stuff I wouldn't have had time to create.

I setup k3s, and tons of what would be otherwise unnecessarily complicated stuff on my laptop for my side projects with additional home servers, smart house stuff. Otherwise k8s and things like that would have been daunting to learn and in theory and without constant professional exposure, etc...

Microservices in Go, Rust, which I didn't have any previous experience with, games in C and other languages. Didn't know anything about low level memory management before. Was just mainly TypeScript person. Just constantly building random fun stuff.

tokioyoyo

11 hours ago

The question is if you already had intuitive understanding of what those things “are”. The languages and systems have been easier to learn once you picked up a couple. Same applies here as well.

The question is, how quickly does a junior with no experience builds intuition without trial and error.

harry8

10 hours ago

When I started I learnt something about coding from VBA macros to automate excel.

Often that started with the macro recorder. Then you worked out what that "recorded" code/sludge did, removed the crud you didn't need or want, improved the logic and so on. I bought books to understand it better. Now you can ask a (different) LLM "what is this? why is it used? How would I?" etc which is probably a faster learning curve than books, newsgroups and old school personal home pages with good info.

I would have been quite surprised when I first used a VBA macro in anger just how far I would go down the rabbit hole. C, asm, verilog, Linux were no part of what I originally signed up for!

Some people will specialise in the equivalent of recording macros and go no further. And this will be fine for code that gets it done but doesn't matter too much in the other dimensions (security, reliability, usefulness without the authors' support, etc.) Much like VBA utilities inside companies that were useful way back when. Other people will want what they produce to be better, even good, and they will learn about floating point [1] and all the rest, much as I did. Probably learn pretty fast too. [2]

[1] https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h...

[2] Working out how to write an excel vba webserver and using it to collect and and collate summary data from various divisions into reports was seedy as hell, solved the actual business problem (given ridiculous but intractable constraints) and isn't something you can record. We all have stories from a misspent youth that we're simultaneously ashamed and yet somehow proud of.

nomel

13 hours ago

And, you don't have to vibe code. A competent developer can make great use of AI. I think a developer that can develop the system themselves is the most accelerated user.

malfist

12 hours ago

> You don't actually need to know the answer to those questions in order to vibe code

No, but you do need to know the answer to respond to that 3AM page about prod being down.

agumonkey

13 hours ago

I would use these exact facts as a sign that it's maybe not what it seems. It's much too big and too fast to feel stable. It might keep at that level, increase even more, or drop down to a saner level of use / allocation.

teeray

13 hours ago

I can see a corporate future where tokens are haggled over in department budgets just like any other line item. Some projects will get more of them, other projects will get less of them. "Use AI for everything" will become "use AI economically and build things that outlast our budget for it."

Aurornis

13 hours ago

> It might keep at that level, increase even more, or drop down

Bold prediction. :)

I think anyone predicting a drop or near-term flattening is not thinking beyond the online bubbles where these tools are discussed. In a local tech meetup a lot of the normal companies are barely coming online with AI tools at their company, and even then with very low limits.

johnfn

13 hours ago

So it might either go up, stay the same, or go down? :)

agumonkey

10 hours ago

heh yeah, i'm also selling trading advice :p

sirsinsalot

9 hours ago

6 hours ago

So touche, but since it's usage per task it's kind of weird.

This means that the average engineer is efficient at (say) identifying the first 10 tasks they should do but there are diminishing returns after that? That seems like a weird pattern. Wouldn't it be more likely that certain tasks have a ROI based on how efficient the task is generated?

Like I'm trying to imagine in my head, if you think an engineer is more efficient with the tool, why deny them more tokens. I guess so they think to use them more efficiently?

So, maybe I conclude that I think your conclusion that there must be $1500 per engineer is flawed. And even if it were true, I don't think the benefit would be evenly distributed. I suspect this is a first pass at figuring how to budget them and there will be a second pass.

While it certainly reeks of motivated reasoning, Jensen Huang assertion that an expensive engineer should be using at least their salary in tokens feels more logically sound to me (assuming the average engineer is efficient at using tokens, I have a feeling it's a normal distribution)

therealdrag0

3 hours ago

Setting a cap motivates developers to invest their tokens wisely such as choosing the right models and not burning tokens for fun or side projects, same as any budget.. it’s not any deeper than that.

At my company we can ask for temporary cap limits if it’s justified, which is fairly common.

simonw

6 hours ago

"I suspect this is a first pass at figuring how to budget them and there will be a second pass."

Completely agree with that.

toasty228

13 hours ago

There is a whole spectrum between "ai coding is a fad" and "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"

tmp10423288442

12 hours ago

> "unlimited tokens for every employees we don't even care if it actually ends up being a net positive financially"

That was clearly a short-term trend that would obviously get fixed. Doesn't say much about AI coding as a business model.

javier2

11 hours ago

Because the vibe coded stuff is sometimes great, sometimes it breaks stuff, sometimes it breaks things that we fixed multiple times earlier. The PRs are too large, nobody can review that mess and you better be on call for your deployment. Maybe it will get better, maybe not. I dont know yet.

therealdrag0

3 hours ago

What about that means AI coding is a fad?

Gigachad

11 hours ago

The massive PRs is something that probably has to end. You can ai generate smaller changes in reviewable PR sizes. It probably even helps the AI code review tools to break the work in to smaller logical chunks too.

marcosdumay

11 hours ago

Oh, it won't get any better. LLMs already trained on every bit of code ever published, they won't get any more material.

therealdrag0

3 hours ago

They can be reinforced with best practices and context windows etc will increase.

throwatdem12311

10 hours ago

If anything the snake is eating it’s own tail because now it’s training on vast amounts of its new slop…dragging down the average bar of quality.

anthonypasq

14 hours ago

perhaps the personal computer? Companies were spending 3-5k (10-15k inflation adjusted) on every employee for just hardware.

everyone making comparisons to the dotcom bubble seems misguided. this is clearly computing 2.0 imo

thewebguyd

13 hours ago

No disagreement on computing 2.0, but companies spending 3-5k per employee for hardware isn't generally a monthly cost. It's a at the time of hire, and then once every 3 to 5 years after that, for a monthly amortized cost of about $50/employee.

I have my concerns with current inference pricing in that there's a non-zero possibility for a rug pull in the future for the subscription plans for organizations and individuals that can still use them. For now, its only companies larger than ~150 users that need to pay per token, but what if that wasn't the case? Not every company can afford over $1k/month/employee to give them access to AI tooling, further making it harder to compete against the behemoths. If we get to a point where an individual can no longer pay $100/month for nearly unlimited usage and instead must pay per token, that's going to be a problem.

Personal computing eventually became an equalizer (until we started centralizing on mainframes again, aka the cloud) because it got cheap. My hope is that inference also gets just as, if not cheaper.

I have high hopes for local AI and open weight models and we will continue the ethos of local, personal computing and not needing to offload everything to OpenAI/Anthropic/Google, etc. to get work done once the hardware and hardware availability catch up.

GrinningFool

11 hours ago

Any kind of rug-pull is a serious concern. Companies are re-orienting their entire development processes around these tools. Sure they can go back, but it will require a much larger and more expensive effort than to transition in the first place.

All companies who make this transition will be more or less at the mercy of model providers.

dghlsakjg

13 hours ago

Every employee doesn't need $1k in token spend per month, either. That kind of spend makes sense for technical workers in r+d.

Most other workers are served fine by $20-30 worth of tokens on a budget model. You don't need Opus to help support write emails.

tmp10423288442

12 hours ago

No, but you do want Opus-tier models to do desktop and office software automation (think about people who intensely use Excel and the like). Actually those might take even more tokens that coding in a lot of cases. Why do you think Claude Cowork is successful, and why do you think Codex is leaning so hard into Computer use?

dghlsakjg

8 hours ago

I wonder if you will see app makers begin to open APIs (MCPs) up in ways that replace computer use. Computer use via human interfaces is pretty hacky IME, and if you can use an app that exposes spreadsheets in a way that reduces token costs by 90%.

I'm optimistic that the demand for AI accessibility will drive programmatic interfaces in places where companies were previously reluctant to.

dghlsakjg

13 hours ago

The Dotcom bubble is an interesting comparison.

The general thrust that everything would be online was correct, it was just that the market mistimed and misallocated of capital by a decade or more. There was massive spending on infrastructure capacity that we wouldn't end up needing until the 2010s. There were hype driven valuations completely disconnected from business fundamentals just because a company was an 'internet' company. Things were going from cutting edge to obsolete in less than a year. There were breathless promises that this was business 2.0! Of course, none of that sounds remotely like what is going on today...

I'm optimistic about AI, but I also don't think that it is going to change everything as fast as promised.

threetonesun

13 hours ago

The question you always have to ask is what problems does it directly solve. I personally think most of the current problems in software development and really the world at large are not time-bound problems but alignment issues, and all an LLM can really do there is be some 3rd party oracle that gives you an answer without needing other humans to agree with you.

squidbeak

11 hours ago

> The question you always have to ask is what problems does it directly solve

Most directly, human labour. Labour is always a problem for capital. At a certain level of AI competence, businesses don't need to pay humans to complete the work they need doing in order to operate. I don't think anyone would dispute AI competence isn't growing steadily.

rafterydj

12 hours ago

I agree with you. I think that if we're talking about actual reliable problem solving, we have to be discussing robotic / drone systems. Software is as complex as you want to make it, and always has been.

jghn

13 hours ago

Two things can be true at the same time. It can be true that this is here to stay. It can also be true that companies are grossly overvalued right now and that the market is irrationally exuberant. This would mean we could both have a crash and also see AI coding be the new future.

pmg101

13 hours ago

I think the right comparison is the invention of the microprocessor. At that time people were grappling with a lot of the same things we are today - would it automate jobs away, would it transform education and the work place, etc.

pixelesque

13 hours ago

Hardware's not generally a subscription, monthly cost though.

You update it for them every 3/4 years (if they're lucky).

It probably makes a bit more sense to compare it to existing software subscriptions like Office, or the old-school 'per-seat' licenses per user for software.

thewebguyd

13 hours ago

There's some software that can cost $1k or more per seat/month, but it's pretty rare. Big tier ERPs usually fall in the ~$600/seat/moth range, specialty engineering stuff can hit over $1k, Bloomberg terminal, etc. I wonder if what Uber's building with that $1.5k/month/employee is actually delivering the same value that something like an ERP would to the entire org...

tikhonj

5 hours ago

I still believe Scrum is a fad and yet companies have been spending obscene amounts on to push it down developers' throats for decades now.

therealdrag0

3 hours ago

Scrum spending is very rare IMO. No company I have worked at pays anything for scrum.

maplethorpe

7 hours ago

> Which other tool went from nothing to this level of acceptance so quickly?

NFTs? My company had nothing to do with blockchain but I ended up working on NFT integration regardless.

Barrin92

13 hours ago

>Why there are so many people that still believe that AI coding is a fad?

Because there's not a single piece of evidence that this has improved the quality of the delivered software, or for that matter even the speed of features any of these companies produce, in fact if anything the opposite.

The point of software development, the hint is in the name, is to develop software, not consume tokens. If Uber was now full of 10x engineers the stock price of Uber would be up, not down on a yearly basis. Hilariously enough the only company whose stock price is up appears to be Antrophic

sirsinsalot

9 hours ago

How dare you mention evidence! This isn't engineering you know!

jbvlkt

12 hours ago

Because writing huge amounts of code is easy for humans too. Agents already proved that they can do it. But are agents able to maintain it? I do not know and unless I know for sure, I am not fully committing to AI generated code.

i.e. I am able to write about 1k lines of code of "acceptable" quality per week. Which means in 1 year, there will be about 5Ok LoC. I am pretty sure, that I would have to spent like 60-80% of time to maintain 1st year code and the rest to make new features in the second year so I would have to hire more people and spent time to onboard them to maintain velocity. All of that are rough estimates, probably overoptimistic and way worse in 3rd year. Good luck doing such estimates with code agents. Even worse if you already have huge amounts of legacy code.

themafia

11 hours ago

Why are there so many people who mistake simple anecdotes for actionable data? Why do the majority of businesses fail rather than succeed?

LAC-Tech

11 hours ago

Because we have spent a lot of time and money using AI to generate code and have been unimpressed with the results.

As for why they got accepted so quickly 1) the industry's long running desperation to deskill computer programming 2) the addictive psychology baked into LLMs "That's an elegant solution! Shall I ... ?"

asadotzler

7 hours ago

Also, a bucket for VC to put all that NFT, IoT, blockchain, VR investment into. VCs gonna VC and the last 15 years of bets failed so the last few years have been a transition away from those toward "the next thing".

jujube3

12 hours ago

It's cope. People desperately want to believe that AI coding is going away so that they can go back to partying like it's 2020.

So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???) or that code bases that AI contributes to will spontaneously combust, or something.

maplethorpe

9 minutes ago

> So there's a huge number of HN posters claiming that the price of tokens will go UP over time rather than down (that's how Moore's Law works, right???)

I mean, Github Copilot's pricing just went up considerably, so I guess they were right?

dofm

12 hours ago

I don't think it is unreasonable to say both will happen, is it?

In the long term, tokens will fall in price. Obviously. (If "tokens" continues to be the unit)

In the short to medium term, for the IPOs to succeed, people have to start actually paying for what they are using, so the price will go up, and is going up, quite a lot. Once their value is set they will slowly fall from that point (or some point maybe halfway, depending on how much the market is willing to continue to subsidise).

I am an AI cynic, but I am now an informed cynic; I am learning agentic tools so I know where they are useful and I know my enemy.

I think the "fad" here is cloud-based, metered AI being a dominant work mode.

Nothing, so far, has suggested to me that any other outcome is likely than edge- to local-scale, on-device, on-laptop, on-prem models getting good enough to the point where people use them by default and use the cloud models only when they need the extra oomph.

I cannot believe that there is anything other than an enormous incentive for companies like Uber to find local, small model and on-premises solutions to their problems, not least while pricing is so changeable and people are getting nasty surprises.

Betting on OpenAI and Anthropic being around over the long term in the form that they are now, that feels like valley hopium. Utility monopolies essentially always derive from physical/geograpical limitations, don't they?

jujube3

8 hours ago

I mean, there's an "enormous incentive" for people to run their own data centers rather than using AWS. And yet, cloud is growing and on-premise is shrinking.

While I hope local AI continues to exist, I'm skeptical that it will take over, for the same reason running your own servers hasn't taken over. It's just hard, and involves spending huge sums of money up front.

It's also not really clear how much tokens are being subsidized. The discussion reminds me of Uber. For years people on HN claimed that Uber was going to collapse once they ran out of VC money. Then... that never happened, and everyone just moved on to discussing other things.

oblio

2 hours ago

Infrastructure is massively complex and multi cloud is super hard to do. Switching LLMs is... a drop down.

Now, that doesn't mean running your own LLM will be easy, but this will mean it's a lot more likely that there will be at least regional LLMs, in my opinion. I.e. there will be Google, whichever (if any) is left standing of OpenAI or Anthropic, and then there will be Chinese hosted LLMs, probably Indian hosted LLMs, European hosted LLMs, plus LLMs hosted on managed services (i.e. Bedrock). For sure I see large banks on the like being able to host the best OSS or even licensed LLMs on their own cloud infrastructure accounts (i.e. at AWS, Azure, etc).

And that's on top of the LLMs running on owned server infrastructure plus actual local, on device LLMs.

Der_Einzige

8 hours ago

Token costs do go down over time for sure due to software optimizations (i.e. better attention kernals) but acting like hardware INFLATION isn't happening for at least a few more years is just nonsense. Objectively an A100 is more expensive to rent today than it was in 2024 (a 7 year old GPU - Big short guy is a turbo idiot) and rising. As such, over short time horizons, it's possible to see limited amounts of "price per token goes up" for the same model.

oblio

an hour ago

It's a mix. If the current wave of LLM businesses crater, demand for LLM specific hardware (and related hardware) will crater. GPUs were propped up by crypto currencies and now by LLMs. They're still great at doing fundamental math operations, but for their value to stay up another massive business opportunity involving matrix multiplication and the like would need to rise as soon as the current business cycle winds down.

Not impossible, not unlikely, probably 50-50.

CharlieDigital

19 hours ago

$1500/mo is $18,000/seat/annum.

Maybe Microsoft and Nvidia are on to something.

128 GB machines that can run local LLMs are a bargain even if priced $5-8k. Yes, tok/s is not quite there, but that's probably OK since the bottleneck really isn't the code; it's WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?

pqtyw

14 hours ago

How is tok/s not a bottleneck I? I assume most people still use ai agents interactively rather than leaving them to do their own thing during the night.

14 hours ago

> I don’t think companies will do that. Why don’t they just buy local on-premise infrastructure even though it’s cheaper than AWS?

For customer facing, production software, its worth paying a cloud tax to get the reliability guarantee. For tools that are used by engineers for code development, there is no need for such bulletproof guarantees.

Buttons840

2 hours ago

Local AI servers are different because they don't have to form a single system. If one AI server goes down, just use the other one.

This is unlike customer facing systems where, if your database server goes down, you probably can't just use the other one--the whole system is down.

dangus

12 hours ago

That makes very little sense. SaaS/cloud tooling is overwhelmingly popular for internal tooling.

Which category of developer tool has on-premise as the more popular option?

Cloud isn’t about “reliability,” it’s about being able to focus on your core business rather than spending all your time maintaining stuff.

zozbot234

17 hours ago

It probably allowed them to avoid hiring as many people to build a certain amount of software. Even if it didn't increase revenue, it could have lowered human labor costs.

> 128 GB machines that can run local LLMs are a bargain even if priced $5-8k.

Don't forget the energy costs. Searching around, advanced models use an average of 25 Wh/1000Tok.

$1500/month gets you about 150M tokens.

At the aforementioned energy/token, that's 3750kWh.

What are your local office electricity rates/tariffs? (Hint: they are going up because of AI data centers). Even if my price and energy assumptions are wrong above, you probably aren't going to get the rates that the hyperscalers do.

Even at cheap (i.e Texas) retail electricity rates, that many tokens will probably cost you hundreds per month. In most other electricity markets, probably far more.

eclipticplane

7 hours ago

How much more software does Uber need?

Unless they are iteratively replacing expensive vendors and optimizing other headcount costs?

ricardobayes

13 hours ago

128GB machines can't run anything locally that is even nearly as capable as a frontier model like Claude. We can get an idea from deepseek v4 pro being 1.6T model, requiring approx. 860GB VRAM to run.

dkdcdev

19 hours ago

at their scale they could also just run a large on-premise or rented (basically still cloud, but cheaper) GPU cluster and run through that. fixed costs, even license a SOTA model’s weights if you’d like

embedding-shape

19 hours ago

> even license a SOTA model’s weights if you’d like

Yeah, I bet all labs releasing SOTA models are more than happy to remove the main way they make money and let you run it locally, especially if you're a big spender like Uber who seems very willing to throw money into the sea as an experiment.

throwway120385

19 hours ago

That's going to stop eventually, and I think at that point we're going to see business models more like the major CAD providers.

idiotsecant

19 hours ago

I don't think they'll have a choice, open weights models are not far behind. At some point it's essentially a commodity game

dkdcdev

19 hours ago

they also already do this…

Anthropic and OpenAI license to the public clouds. Google reportedly licenses to Apple. licensing to Fortune 100 companies running on their own infra is an obvious next step

it is a race to the bottom and I’m not sure the labs win that race. we’ll see!

thewebguyd

13 hours ago

I'm not sure the labs will win either. I wouldn't be surprised to see OpenAI & Anthropic just get acquired, either by Microsoft or Amazon and their models just become another product offering in their public cloud and and some hybrid on-prem offering like Azure Stack HCI or Azure Stack Hub (already basically a "cloud in a black box" that could become "AI in a box")

mrweasel

16 hours ago

The problem isn't really Uber, Microsoft or Nvidia, it's all the smaller none IT companies that also have developers on staff. They are screwed. $1500 per seat per month is just way to expensive, but they also can't afford to build and maintain their own on-premise solution. If Microsoft can't afford to run CoPilot for their own developer, what chance does any of their customers stand?

If the large, well founded IT companies in the world believes the current AI cost is to high, then Anthropic, OpenAI and CoPilot have no actual customer base. AI is then relegated to very profitable niche business, but that can't fund the R&D for the models.

skybrian

15 hours ago

It's an extra 18k a year for developer tools when they're paying how much a year per developer? Having software developers at all isn't cheap.

Also, I don't believe you need to spend $1500 a month on a coding agent if you optimize usage at all.

KronisLV

10 hours ago

In Latvia, the net salary for a Java dev is around 1729 - 4314 EUR, based on https://www.algas.lv/algu-informacija/informacijas-tehnologi... (crowd sourced data)

For the employer those employees cost between 2945 - 7736 EUR per month based on https://kalkulatori.lv/lv/algas-kalkulators (income and social taxes).

jvanderbot

19 hours ago

"Windows XP+Dell" should have been in quotes. It's similar to the way enterprise productivity software was developed, packaged co-designed with hardware, and sold on an 18mo upgrade cycle assumption. It's not literally windows xp.

nonethewiser

16 hours ago

Oh gotcha. Yeah that's an interesting idea.

treis

15 hours ago

I don't see it. Leasing equipment and paying per seat license fees makes a lot of accounting and cash flow sense. Maybe when it gets to the point where you can run SOTA LLMs on consumer hardware. But that seems a solid decade and probably much more away.

Even then it makes more sense to rent the bigger GPU and get your answer faster.

gedy

14 hours ago

There's waayyyy too much money betting on that not happening, to the point I feel there'll be regulations popping up for "safety reasons" etc to ensure the big players control this.

thewebguyd

13 hours ago

3/4 of Microsoft's BUILD conference the past two days were about local AI, foundry local and Windows ML along with a big section in the keynote about running local workloads on their new hardware with Nvidia. Say what you want about Microsoft's reputation, but they are a "big player" and seem to be moving in the direction of local AI first.

gedy

12 hours ago

I would love this to happen of course, just paranoid it won't.

ssivark

14 hours ago

Even if companies decided to move away from expensive models from the major labs, it probably much more economical to pay a cloud provider to host some open weights model which could then be amortized across all (internal) users and do inference at a substantial batch size, rather than giving everyone their own hardware -- which means the company would need to provision for peak usage and inference at batch size of one.

ungreased0675

19 hours ago

Your last question is really important. What did they accomplish with all that spend?

I suspect there’s some mass delusion with respect to actual accomplishments as a result of LLM use. Sure, things are moving faster, but does it matter?

devttyeu

19 hours ago

If you believe a 128gb machine that is essentially DGX Spark in a laptop chassis can run models comparable to SOTA you either never ran open models on hard tasks, or you aren't scratching the surface of SOTA closed LLM capability in how you're using them.

f311a

19 hours ago

Can you show me an example of a hard task that can't be achieved using light models? When we don't want the model to work on autopilot without reviewing the code at all. Even SOTA models will produce garbage code, if you don't guide them all the time.

Hard tasks require a lot of guidance and code reviewing, unless you are creating another throw away project where correctness, maintainability and code understanding does not matter.

thelastgallon

8 hours ago

>WTF did Uber build with all of that spend? How did it meaningfully impact their revenue in a positive direction?

Uber (and quite a few bay area companies and startups) can afford to spend that money. There is no expectation of profit, Uber lost ~62B and growing: https://uberlosses.com/

infecto

19 hours ago

I am wondering more and more if this becomes true as these smaller models take off. I might be old fashioned but I have yet to crack the workflows some of the hype people spout like Claude codes Boris where he and others talk about running hundreds of agents overnight.

I have still found the sweet spot for me is using LLMs but I am still in the drivers seat.

CharlieDigital

15 hours ago

That's because for some of these folks, the cost of the tokens doesn't have to match the value of the output; the hype from the story is all they need.

Normal people have to produce something of value from that spend. So starting 100 agents and then waking up to something cool but useless just means you spent a few thousand dollars and created nothing of value............

ofjcihen

18 hours ago

Running hundreds of agents overnight is almost certainly 99 percent waste.

sourcecodeplz

19 hours ago

$1.5kpm for SOTA. 128gb you run DSV4 Flash.

pqtyw

14 hours ago

What's the point of running it locally though? Inference for open models is quite cheap already. They could just selfhost, anyway. The experience of running LLMs locally will be excruciatingly bad in comparison at least for the near future.

jcgrillo

19 hours ago

> WTF did Uber build with all of that spend?

WTF did anyone build with all that spend? Despite all the feel-good anecdotes about how productive folks feel using ai coding tools there's a deafening silence when it comes to actual, demonstrated efficacy. How can we be this far entrenched in these workflows and still not know whether they actually do anything useful?

awesan

19 hours ago

I can say at least for me at a small-ish company (~40 FTE) there has been a surge in internal productivity tools. Nothing to improve the end user product directly but a lot of tools to make processes easier and less error prone.

What would previously be janky internal dashboards or excel sheets are now actually nice to use tools. That said of course the maintenance cost of all that has yet to be discovered, and the ROI is questionable.

CharlieDigital

19 hours ago

About the same ~40 FTE team. We're doing the same thing. Smattering of internal tools, but no net gain in external revenue. Who knows which of those tools will have any value or ppl are just doing it because it's cool now to make fancy dashboards.

OK. I guess that's good, too.

jcgrillo

19 hours ago

Yeah this seems to be a pretty widespread story, from what I've heard as well. The thing about those janky dashboards and spreadsheets though is that somebody understood them and built them with intent to solve a particular problem. Despite the rickety appearance, they're trustworthy tools. A polished single page app might look nicer but it's harder to debug than an excel sheet, and much less transparent in its internal workings--especially if nobody actually wrote it...

izacus

15 hours ago

More importantly, it's questionable how much extra revenue improving a design of internal tool brings.

ftkftk

14 hours ago

~70 FTE Engineering team. We are shipping more features, especially features that previously would not have survived the cut to make it on the roadmap. Even though we are shipping more, our total amount of escaped bugs has not increased, so our escape rate has actually lowered. On top of that we are able to triage and fix escaped bugs more quickly now. And then of course there has been an uptick in internal tooling that makes the rest of the company more efficient, and we have been able to address tech debt at a higher rate than before.

I don't think this would have been possible without having solid engineering culture and processes in place before bringing in ai coding tools.

And I don't want to sugarcoat it, this hasn't been easy, requires continued discipline, and took well over a year to get good at. And we still have to continuously learn, experiment and adapt our training, tooling, and processes.

CharlieDigital

13 hours ago

    > We are shipping more features

That's not really the important question; the important question: is it generating revenue.

If you increase your spend -> ship more features -> no correlated increase in revenue, that's just burning money.

If a team of 10 spends 1 extra headcount ($180k/year) and ships features with no corresponding growth in revenue, what does that mean?

There was probably a reason it was on the backlog (because it didn't really have value).

ftkftk

12 hours ago

> is it generating revenue

Yes! :)

> There was probably a reason it was on the backlog (because it didn't really have value).

There are definitely things in the backlog with low value. We don't work those items, even if we could now. The additional bandwidth we have now goes to valuable features that drive revenue and retention metrics. The reason they were on the backlog were because we just didn't have the bandwidth to execute on them well and they were just somewhat less valuable than the critical path items on the roadmap.

nonethewiser

19 hours ago

The real answer?

Software engineer quality of life.

There can be an increase in productivity without a corresponding increase in total output. The gains could be captured by software engineers doing a days work in an hour then fucking off in a variety of ways.

pqtyw

14 hours ago

> doing a days work in an hour then fucking off in a variety of ways

Until companies start hiring 5x less engineers than they did before and well.. we are clearly moving towards that direction

nonethewiser

14 hours ago

Quite possibly. Doubftul it will happen all at once. If you can get 8 hours of work done in 1 they'd need to ramp up demand 8x. Would be interesting to see that happen over night. Happy monday. Here, take these 30 tickets.

MengerSponge

13 hours ago

But that's an inefficient use of dev salary. Y'all are gonna get ground to smooth well-compensated paste.

slopinthebag

15 hours ago

Yeah I think this is probably most accurate.

RugnirViking

19 hours ago

Imo its pretty clear that anyone who is taking the issue at least somewhat seriously knows the amount of value they provide is not non-zero. However, the problems are manifold: firstly, toolchains vary wildly, from fancy autocomplete, to engineers chatting with codebases they're unfamiliar with, to people integrating them into devops and infra, to people doing spec driven development, with a thousand philosophies inbetween. Many people suspect that those above them in the ladder are on the cusp of massive failure due to losing track of the code, and many people higher on the ladder think those below them are overly cautious. I hate to be the guy saying "oh it must be somewhere in the middle", but I will say at the very least I like being able to use it to read docs for me, and to synthesize syntax and simple scripts (give me a join that works across these tables and gives me column x, y and z - give me a python script that parses a file like this example and extracts abc data - given this api spec figure out how I can get this data from this endpoint, go)

as for building actually complex software, the art of that is not in simply chaining together such scripts. Its the art of using architecture and testing to shape uncertainty, and developing requirements (and extrapolating sensibly from incomplete requirements). I don't think llms are great at this, but they arent terrible either. A lot of the more active users in the space are doing stuff where theyve realised they need more detailed specs, which like, yeah, we knew this already - better defined problems lead to better software.

jcgrillo

19 hours ago

I agree the most interesting use cases I've heard of are about increasing the rigor of software development practices, but there's definitely a lack of coherence in methodology.. I believe that some users and companies are successful in this effort, but the odd (and interesting!) thing is that so far we don't seem to know how to communicate how to do it successfully.

empath75

14 hours ago

jg0r3

13 hours ago

$18k a year is near half of my salary as junior verging on senior developer in the conservation field. Not everyone works in FAANG.

fg137

12 hours ago

In the old world, the refactor probably won't happen in the first place, but the effort would be put elsewhere. "Increased velocity of .. greenfield features" doesn't directly translate to additional revenue, and your number is very questionable in the first place.

Software engineers like to talk as if business and finance are as easy as pushing code out and refactoring. It's not and never has been.

analognoise

13 hours ago

The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).

You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.

It’s a lose/lose situation for…I would say anyone employed as an engineer or programmer. I’m not taking responsible for AI output, the same way I won’t try to fix auto-generated code: because you just regenerate it.

The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.

Marsymars

12 hours ago

> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).

I'm pretty pessimistic on AI and don't have access to good agentic workflows, but refactors are exactly the thing where it seems to me like agents could be really strong - once I've refactored something architecturally, I might have hundreds of instances of a thing that needs to be updated in a predictable way, but is complicated enough that it's going to be faster for me to manually update hundreds of instances rather than writing a generalizable find/replace tool.

strange_quark

4 hours ago

Sure they’re fine at that sort of rote find/replace job as long as it’s relatively straightforward. But it only really works if you do the hard parts yourself then tell the agent to go and do the rote part. Even then I’ve had it turn to slop more often than not as the agent has to start contorting the code into weird shapes to try and finish the job. It’ll never stop and be like “hey maybe this was a bad idea, let’s try something else”. And by the time you get to review it, you’ve spent 20 bucks on something that needs to be thrown away.

Daishiman

12 hours ago

> The point of a refactor is for you to think deeply about the code you are responsibility for, so you can make it better (faster, easier to work on, more tests, whatever).

Absolutely false. Refactors (in my case) can be as simple as dropping old packages for newer packages with slightly different semantics. It can be moving legacy pages from jQuery to Vue.

> You’ve gotten a result, but without the work that made you valuable, while deskilling yourself.

I've 25 years coding, trust me, I don't lose anything by not finding out on my own that the semantics of a jQuery promise changed between major versions.

> The only person that wins here is the person who can pay you less because they don’t need you, they just need another “types computer guy”.

You have no idea of what you're talking about. There are entire classes of K8s networking issues that would have taken me a day to debug which Claude solved in minutes just because it can run 20 diagnostics commands in two minutes and deal with technical minutae that is time-consuming but ultimately irrelevant to my business goals.

ofjcihen

18 hours ago

Can you share some examples that you would say justify that price? Not a gotcha, I’m genuinely curious where you’re seeing a return at that level.

simonw

17 hours ago

I've written tens of thousands of lines of tested, working code that I would not have written otherwise, and that code is useful to me.

I effectively get to operate at the rate of a small team of engineers - I know that because I've managed small teams of engineers in the past.

ofjcihen

16 hours ago

> that I would not have written otherwise

I think this is the part I struggle with. The code I write makes me money or is a way of teaching me something, both of which are reasons that I would write the code regardless.

I don’t think I have any projects in mind that I’d be willing to spend half of a car on that I also wouldn’t have written myself.

Obviously just a personal take though. I’m glad you get the usage you want out of it.

simonw

15 hours ago

My "job" is building open source software for data journalism (and anyone else who needs the tools data journalists need, which is pretty much everyone else). I can build more of those tools, and better, in exchange for a fraction of the cost it would take to hire a team to help.

dekhn

9 hours ago

I reached my own productivity limit on several projects (in my case, I'm building a fully automated microscope that uses realtime computer vision to solve a number of longstanding problems with microscopes). As much as I'd want to write the code for it, I hit a wall when it came to debugging some particularly tricky issues- either I couldn't do it, or the time investment was too high.

I use Gemini/ChatGPT/Claude to do that work and it unblocked the enjoyable parts of the project while taking care of the tedium.

I also find LLMs help me learn faster because they can often take a paper and turn it into working code, which I find to be a very slow process.

siliconc0w

13 hours ago

I use the $100/mo sub but my 30 day API cost is about $1700/mo.

It really depends how you use it, if you're using prompts to generate detailed designs, breaking those into lists of tasks, and then feeding those to multiple agents - it's really easy to burn through many thousands.

10 hours ago

About 30 million software developers. At least that's what a quick web search says.

piskov

10 hours ago

It is not only for devs

https://openai.com/index/codex-for-every-role-tool-workflow/

marcosdumay

8 hours ago

So, are companies paying that amount for people at other roles to use it?

danny_codes

4 hours ago

Obviously not

cousinbryce

7 hours ago

World bank says there are 3.7B employed humans. Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month. This lines up well with current forecasts from the major AI labs

0xc0c0c0

2 hours ago

Congrats, you're hired at Anthropic.

root_axis

6 hours ago

> Putting the total addressable market at around 67T if all of us spend USD 1.5k on tokens every month

However, that's an absurd scenario.

barumrho

4 hours ago

Not worried at all. Switching is trivial. Rebuilding context isn't very difficult and harnesses are a dime-a-dozen.

ksajadi

8 hours ago

This.^ I realized this first when moving a design spec from Claude chat to Claude Code and panicked. I literally had to build something like Notion but for agents to act as a portable memory between all cloud and local models and agents. But honestly it paid off!

If you are interested you can try it out at markbase.cloud (disclaimer and all that). I am not charging for it.

NichoPaolucci

7 hours ago

We run a "context" repository that enables us to transition pretty seamlessly from model to model (usually codex to claude and back). It has skills / plugins / connectors / tooling in relatively malleable MD files. That's what I see as the future. Rather than exporting IDE settings we'll just carry our markdown to the next best tool.

It's hedging a bet at this point, but that's why people say there's no moat. If the tools are properly used + maintained, there should be no reason we can't use a new provider even next week (maybe with a little tweaking).

ksajadi

6 hours ago

that's an interesting approach and something i also considered (using git to avoid conflicts). one thing i needed was a "database" (basically a folder of markdowns) with a fixed schema so i can let the agents record their decisions in (for example when the code conflicts with product design spec). this combined with search has been a real lifesaver.

this is how it works: https://help.markbase.cloud/humans/collections/overview

fg137

10 hours ago

What knowledge?

Unless you work in some obscure domain, chances are that any general "knowledge" Claude has "learned" is already public data somewhere.

If you don't believe me, launch Codex and immediately start working on the same project (s). You might discover that all the knowledge accumulated means almost nothing.

linsomniac

10 hours ago

Claude Code definitely remembers things about you. For just one of the more obvious examples: I was recently asking it to make some suggestions on software alternatives, and part of the answer included (paraphrased) "While a hosted service may be attractive due to your small ops team size, your experience with hosting Linux container-based services puts this squarely in the realm of an option for you." My prompt mentioned nothing about this.

This isn't something that is public knowledge, in the sense that you mean it.

Just earlier today it asked me if I wanted to create a jira ticket for something I asked it about doing. My prompt mentioned nothing about jira.

If you use Claude Code, you might want to take a look at the "auto memories" files that it creates. See "/memory" for some more information.

sparrc

11 hours ago

My favorite solution to this is to use the Cline coding agent, which is open and allows you to easily switch between different providers and models.

spicyusername

11 hours ago

Knowledge in there?

Where is the knowledge stored?

All of my knowledge typically gets stored in plans outside of the agent?

And each agent window gets archived regularly, anyways.

meszmate

an hour ago

It still probably produces better results than some junior engineers in a lot of cases.

But yeah, for a company at Uber’s scale, I can see why they would want real engineering discipline around it.

geodel

13 hours ago

> A $1,500 monthly limit per tool strikes me as a rational policy response to over-spending,...

This whole article seems to me like Multi level marketing "businesses" where 'Diamonds' have made their money by promoting MLM in seminars and telling hopefuls at bottom that "Buying AI subscription now is their one shot to be a winner in life"

Perhaps there is something to MLM vs LLM to create a FOMO effect.

iLoveOncall

12 hours ago

That's just Simon Willison since LLMs came out. It's glaringly obvious that he's a paid shill.

simonw

8 hours ago

Genuine question: what would make me a "paid shill"?

Who do you think would be paying me, and what would they expect in return?

iLoveOncall

2 hours ago

Your unwavering praise of LLMs' performance which does not match anyone's reality?

OpenAI or Anthropic would be paying you, like they pay bot farms and other influencers, and they would expect marketing in return, which you provide in boatloads.

Your job is to be an influencer, I'm not sure why anyone would be surprised that this is a possibility.

fontain

12 hours ago

oh come on, a paid shill?

Simon is very fascinated by AI and at times he can be a little too optimistic but he is generally balanced and his perspective evolves over time which can be seen in his writing.

Nerd who loves nerd things a little too much? Sure. Paid shill by Big LLM? Nah.

emp17344

10 hours ago

The issue is he’s not actually balanced at all. I’ve never seen him say anything negative about an AI product.

simonw

8 hours ago

Here's my AI misuse tag: https://simonwillison.net/tags/ai-misuse/ - 54 posts

My ongoing coverage of AI ethical issues: https://simonwillison.net/tags/ai-ethics/ - 308 posts

I've been the loudest voice about the fundamental insecurity of LLMs for several years: https://simonwillison.net/tags/prompt-injection/ - 150 posts

In https://simonwillison.net/2025/Aug/25/agentic-browser-securi... I said "I strongly expect that the entire concept of an agentic browser extension is fatally flawed and cannot be built safely."

iLoveOncall

an hour ago

Literally none of those articles are critizing LLMs, only use made of them by 3rd party actors outside of the providers. It really has nothing to do with LLMs themselves.

The fact that you had to dig to August 2025 to find a single article that's actually a critic of something produced by the AI labs is just further proof.

fontain

10 hours ago

Days ago he said…

“I'm finding that coding agents can take me from a vague idea to a working solution, one with tests and documentation and that looks like a carefully considered project evolved over the course of many weeks... in less than an hour.

Even if the code is rock solid, there's a limit to how many projects like that I can sensibly care for - and if they're instantly abandoned, what value was there from creating them in the first place?”

https://simonwillison.net/2026/May/31/the-solution-might-be-...

Here is Simon questioning a fundamental belief held by the pro-LLM lobby. Would a paid shill question that?

Simon is, without question, an enthusiastic pro-LLM person. I disagree with what he says often, the product market fit post was a bad take. But I don’t believe he is shying away from sharing his thoughts when they’re not favorable to the industry.

iLoveOncall

10 hours ago

That's not at all negative about LLMs, just negative about his own usage of LLMs. He's still very heavily and unrealistically (unless he has very poor coding standards and skills, which I won't rule out) praising LLMs in the sentences you've quoted.

Note that it's not surprising that he finds his own usage (described in the quote) negative, since his real job is as a blogger, not anything else.

iLoveOncall

12 hours ago

Yes, a paid shill. You can find a clear point in time where he shifted from sceptic to 1000% fully onboard non-stop praise, with no reason.

simonw

8 hours ago

I'd love to know when that point was myself!

HDThoreaun

11 hours ago

Maybe the reason is because he thought the tools became really powerful?

827a

4 hours ago

This week an S&P 20 company with previously unlimited Claude limits also set a $250/mo/person limit; though its unclear to me how widely the limits are being enforced, may be the case that its just non-software engineers. Do with this info what you will.

john01dav

13 hours ago

Why isn't self hosting (even just renting a GPU server, not necessarily on premise) at large companies or hosting via something like together AI to run the open weight models not more common? I've tried the open weight models and the premium models like Opus and Gemini Pro, and I find that the latter are a little better, but not nearly to the degree to justify the extreme price difference, since the differences largely don't matter for what I've tried them for, and I expect that many other users likely have similar use cases.

soleveloper

13 hours ago

If the premium models are just about 10% better - that could justify the price vs. self hosting a ~0.5-1T open weights model.

Remember that utilization of these huge racks will not be 24h/7, and these are usually not GPU intensive shops that would train models on the spare compute. With prices of 100-200k USD and north with ~2 years lifetime, that would be hard to justify financially.

Self hosting could easily amount to ~1000 USD a month amortized across many developers. In rush hours - there will be hard rate limits.

Would that 1500-1000=500$ monthly USD justify the 10% decrease in "AI Productivity" ? I guess not. In most cases.

For everyone that asks me around, I'd say that in short term, unless there's a really good reason to self host these coding assistant models, then the big 2/3 coding assistants providers are the better choice.

No one got fired from licensing claude code.

Jianghong94

12 hours ago

I just went through a similar discussion in my $WORK (traditional finance company on NYSE with average IT expertise) and I think the thought process is as such: it's one thing to just give your stellar dev/hacker a beefy GPU server and run whatever model they can run; it's another thing to maintain such platform for company wide. You would need human resource (likely way above normal software dev paygrade) to understand and maintain such models, maintain backend, availability etc. All these extra hassle make it just easier to pay a top tier external lab + slap a reasonable spending limit on everybody.

esikich

12 hours ago

Why do you think it would be more common? The pooling of GPUs to serve multiple users and connecting to docs/datalakes while respecting security controls, as a start, is non-trivial. You'd end up paying a team to manage that.

fg137

12 hours ago

For the same reasons companies are not building data centers for their "regular" hosting and storage needs but put things on AWS, Azure etc.

It costs money to maintain the hardware and hire experts to manage the services. For something as common as LLM models, there is absolutely no reason a company serves models on their own hardware unless they are maniac about sending bytes to AWS.

datsci_est_2015

13 hours ago

There’s probably plenty of money to be made in LLMs as a service - but not enough time has passed for the commodification to occur. I’m with you in that when the dust settles I don’t think any of the frontier model providers will have a moat. Just like during the dotcom boom a catchy URL and a webpage that could accept payments wasn’t a moat, either.

malfist

12 hours ago

Where are you buying the GPUs to have enough compute to run a medium size buisness?

fg137

11 hours ago

> I've tried the open weight models ...

You tried that on a personal machine for yourself once. It's completely different calculation when serving a model to 3000 employees with ever evolving hardware and software requirements. You'll need dedicated hardware in data centers and experts to run them. A company will need to figure out how to manage acquisition, assets and expenses plus 1000 other things, in addition to its actual business. Guess who has figured out all of that already? AWS/Azure/OpenAI etc.

Galanwe

an hour ago

I think the logical follow up will be for Uber to lay off a bunch of people so that the remaining ones can token maxx.

To the mooooon!

zkmon

3 hours ago

The big question is, will the productivity gains be absorbed by the needs? Societies don't have a need for infinite amount of luxury and laziness offered by the productivity of the machines. At some point, you would shake off things, get up from the couch and start walking again, breathing afresh.

linuxhansl

8 hours ago

I use Claude every day. Often for multiple hours a day. Basically doing my job not worrying how many tokens I spend (as in too many or too few). This is a pretty complex code base (database optimizer and related).

Just looked at spent for the past 30 day, didn't even come to $600. 95% of my tokens are from cache. If I were to reach even $1500 I have to let claude run unsupervised over night (and with the amount of mistakes it still makes and guidance it needs, I do not believe we are there yet.)

root_axis

6 hours ago

> didn't even come to $600.

That's still in the ballpark. A modest change in your usage habits or workload could easily get you there.

fontain

8 hours ago

is this with a subscription or pure API billing?

sylwk

2 hours ago

Due to recent Copilot price increase my friend was capped to $70 per month of usage. Not on a subscription…

My $100 subscription is not cheap. At the same time our product burns orders of magnitude more tokens.

colonelspace

12 hours ago

If a worker doesn't use their AI/LLM budget, can they get a raise?

asadm

12 hours ago

probably will get fired for lack of performance.

colonelspace

11 hours ago

Let's just say their performance (OKR, KPI, whatever "impact" metric you want) was indistinguishable from a peer that used the AI/LLM monthly allowance in full.

Maybe a $10k raise would be nice?

HDThoreaun

11 hours ago

Theyd get a bad review for leaving performance on the table. When has finishing your work ever resulted in anything other than more work?

conartist6

11 hours ago

It's disturbingly anti-merotocratic. You're not allowed to prove that you're more useful without AI because they just assume that AI is a 10x multiplier on everyone.

cdavid

10 hours ago

no because it does not come from the same budget

colonelspace

8 hours ago

Money spent is money spent.

szatkus

13 hours ago

That's a lot. On my usual day I burn less than $1 on Opus. I could get beyond $10 only if I have a complex and well-defined problem, which is rare (the second part at least).

sothatsit

10 hours ago

You must not be using coding agents. You can sneeze and spend $1 on Opus in Claude Code.

dzonga

7 hours ago

> That means each employee's AI spending cap is ~11% of that median compensation package.

when looking at costs - numbers make sense. however decisions as an org/company/solo founder - costs help you set prices, but to reach profitability you want to model around ROI.

now the question is what's the ROI for a $36K/investment per engineer or $90M for the total org ?

I bet the ROI is negative.

NichoPaolucci

7 hours ago

I'm in a similar boat - it's hard to measure, but let's say you pay an engineer 150K. Giving them a tool that costs 15K a year is effectively a 10% increase in that expense.

If we were seeing 3X, 5X etc improvement from individual engineers, that 10% increase in expense would be a fantastic investment (even 3 engineers for the price of 1.1??!). I have a feeling they are just not seeing that much of an improvement.

pmontra

14 hours ago

I wonder what they are doing with $1500 per month. I'm on Claude Pro $20 plan and I'm doing well. That's 3 days per week. On the other 2 days I'm using a customer's Claude Max, I don't know if it's the $100 or the $200 plan, but I'm sharing it with some of its other developers.

hrpnk

14 hours ago

$1500/mth is token pricing.

Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly. These plans are economical only if majority of users spend less tokens in $ than the plan's costs. This subsidizes the gap vs. power users who spend multiple k$ monthly in API tokens.

pmontra

13 hours ago

> Your other plans are fixed price with rate limits where you get more tokens than the dollar equivalent you pay monthly.

Or the fixed cost plans reflect the real cost and the people paying API prices give them the profit.

Anyway, none of my customers will let me bill them $1500 more (about $75 per day) because I'm using AI. And what for? I'm not working to move money from the pockets of my customers to the pockets of AI companies.

fontain

12 hours ago

No, we know from the financials of these companies that API prices are close to being at cost and the individual developer plans are heavily subsidized (because they are roughly 10% of API cost per token[1]).

If plans were at cost and API pricing was marked up that would mean there’s a 90%+ profit margin on tokens and instead of raising money and talking about revenue, Anthropic and OpenAI would be talking about their obscene profits.

[1] the caveat is that the average plan user probably doesn’t use all of their quota, I guess maybe 30% is the average across all users.

sothatsit

as far as we know there's no evidence that they can produce any profits at all

swiftcoder

14 hours ago

14 hours ago

unless they changed something in the like 2 months (edit: besides implementing a cap for claude code specifically, since other tools already had caps) since ive left my job there im pretty sure 1500$ is the very max you can use after maxing out free calls, initial budget, then 2 extensions individually reviewed by your manager

higher ups pushed for these last 2 years to be AI focused so I don't think this restriction is a measure of "don't use too much AI" as much as it is a measure of "don't use only 'manual' AI tooling" since we had a dozen more specialized tools in-house running locally or otherwise that didn't count towards the budget

newobj

15 hours ago

It's also a useful signal for AI value. Looks like it's a max value add of $18,000 per engineer per year.

Anon1096

14 hours ago

No, that's not what it means at all even if just doing it purely in math terms. Really it is just a reasonable amount to cap at to stop the long tail of super spenders (tokenmaxxers). You could also call it "the amount of AI spend after which Uber has decided there is diminishing returns for the average engineer".

dandellion

13 hours ago

I'm sure if a dev can show useful results at 1k they won't have trouble getting permission for a higher cap as well.

csallen

14 hours ago

10 hours ago

A blanket cap makes no sense to me. There's a power distribution of AI use in my company and I'd imagine it's the same at a much greater scale at Uber.

I'd guess there should be a few people Uber is bascially allocating unlimited AI spending to and a large swath they're giving basically nothing.

seanlinehan

The tool categories that pay for themselves fastest: (1) Anything that gets invoices out faster and makes it easier for clients to pay. (2) Scheduling links that eliminate email back-and-forth. Everything else is optimization. I keep notes on which freelancer tools hit each threshold at freelancerkit.surge.sh

andix

9 hours ago

It finally puts a number on productivity gain of engineers with AI. This is probably less than 10% of the cost of an average uber developer. So they don't assume much more productivity gain from AI than 10%.

(Cost of an employee is much higher than their salary, it includes things like office space, supporting structures like HR/accounting, insurance, hardware/software, and much more)

al_borland

9 hours ago

But is it an accurate number? Does AI reach diminishing returns after $1,500/month, or is that all they are willing to risk/burn to stay in this game?

rasbmn

14 hours ago

Uber is in the business of experimenting with robotaxis and automated food delivery.

They can't say that $0 per employee is the appropriate amount for AI spending. So they capped it, perhaps in order to "send a signal" that is eagerly picked up by the AI boosters.

There is no signal. Uber does not work any better since AI. They still want to promote AI, so they chose the highest number that doesn't bankrupt them so the press and AI promoters pick it up as the new price anchor.

Probably they'll quietly reduce the number more soon.

lazyasciiart

14 hours ago

Is this inside knowledge, or speculation?

5701652400

13 hours ago

eventually tokens will cost price of energy. and china is miles ahead.

china will be major token exporter soon. mark my words.

cmiles8

11 hours ago

Electricity actually is only a small part of the data center costs. There are challenges in getting enough electricity that create problems, but the cost of the electricity really isn’t an issue.

dude250711

12 hours ago

Technically, tokens travel both ways.

5701652400

2 hours ago

Technically, on both sides there is an intelligence producing them.

easygenes

9 hours ago

If I were paying API rates this year, I would have already burned through $20k in tokens. Looking forward to the costs of this level of capability coming down.

jwpapi

19 hours ago

If you estimate 10k salary per engineer that means the moment it’s cheaper for them to hire another engineer but that doesn’t mean it’s improving productivity 15% but if 15% is the moment it stopped being better than another human we can assume 7.5%?

Probably even less because you would spend those 1500 extra per employee also if you just save 10% so 150 per employee that’s 1.5% on salary.

This is imho one of the best ranges we can assume for now how much would that be on the whole swe market?

ilia-a

Oh that's actually really economical! I wonder if they're doing a lot on locally running models or managing a shared context or knowledge-base in some clever way, maybe just encouraging employees to be efficient and mindful.

...

> each employee

...

> per AI coding tool

...

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI

What on this godforsaken earth are all you rich idiots doing???

ChrisArchitect

20 hours ago

Uber’s COO says it’s getting harder to justify money spent on tokenmaxxing

https://news.ycombinator.com/item?id=48268871

Uber torches 2026 AI budget on Claude Code in four months

https://news.ycombinator.com/item?id=47976415

Corporate America Is Starting to Ration AI as Cost Skyrockets

https://news.ycombinator.com/item?id=48335388

cadamsdotcom

11 hours ago

Token costs rising because data center build costs must be paid down.. is not the whole picture. It is actually possible for token costs to fall despite the spending frenzy.

Naively you’d expect to always keep paying more - but growth in token usage is what changes the equation. Amortizing debt over an exponentially growing amount of spend across a growing customer base (not per customer) lets the debt be paid off & costs covered even as each individual’s spend stays steady or even goes down - but it only works if there’s growth beyond some threshold that makes the whole thing hang together. No one on the outside knows how much growth that is, and everyone chases maximum growth.

Jevons Paradox ends up being your friend as well as the friend of the inference providers as well as the friend of the inference financiers.

If it’s a strong enough effect, it has potential to cancel out all the circular financing too, and let everyone ride out the bursting of the bubble.

KnuthIsGod

11 hours ago

China will bring down the price per million tokens.

cloudking

19 hours ago

They are also beholden to enterprise pricing and can't use the subsidized consumer max plans.

nalekberov

10 hours ago

What is the point of allowing a developer to spend $18,000 a year on AI subscriptions? Can't they hire a decent developer who is capable of producing a quality solution faster? Clearly, these decisions are all made by high-level management team.

I was recently talking to an HR person from a European company, and she goes: 'We are forcing our developers to use AI coding agents, but they are still kind of hesitant.' This person had never written a single line of code, nor did she know what software engineering is. For these people, using AI coding agents = faster delivery without breaking anything.

tmp10423288442

9 hours ago

It costs a lot more than $18,000 to hire a decent developer, pretty much anywhere in the world. Also using a model is better than another developer in some ways, because there aren't two independent minds trying to work with each other.

insane_dreamer

12 hours ago

I still have never hit a ceiling with my Claude Max $100 account, much less the Max $200 account. I'm not burning tokens needlessly, nor running it all day, but I do use CC almost daily. What are these devs doing that they are burning more than $1500 in tokens a month?

Maybe it's just me, but I still find that I really have to "shepherd" the AI and work with it to get the results I want. And I read every line of code added and challenge the model's logic. So that limits my token burning. Maybe these people are just "vibe-coding" without really checking the results?

era-epoch

10 hours ago

I would not be surprised if they have engineers vibecoding 2-3 projects each simultaneously, nonstop, on largely un-moderated review-suggest-iterate-test feedback loops.

All the code gets summarized and fed into their manager's agent contexts, probably duplicated several times across levels and departments, with some generated back-and-forth emails pinging around the org chart, eventually generating 2-3 long-winded reports that nobody will read chock full of generated visualizations that can all get consolidated into a generated slide deck that they'll show (maybe, at some point) to a handful of humans with more money than a human brain can conceptualize to demonstrate all of the innovation they're doing.

I am increasingly convinced that many of these companies are dead trees whose only function is to burn money lest it fall into the hands of the peasantry.

HDBaseT

9 hours ago

You are paying account pricing. Uber is paying API pricing.

You're $100/m plan is likely equivalent to thousands of dollars of API pricing. You are being subsidized by the companies using AI.

kshacker

7 hours ago

And this is why as the freeloader (includes me) volume goes up, they add more and more rules to constrain us.

insane_dreamer

6 hours ago

I wasn't aware the Max $100/user plan wasn't available to Enterprise; it used to be IIRC

human305893

6 hours ago

just don't care about the output. Produce more. Don't check the results.

sremani

18 hours ago

I have strong conviction that companies will now choose tech stack/programming languages based on 'tokenomics'. I am vibe coding using Clojure, a language I can read but cannot write and I never hit the usage limits even when using the latest model on Claude. I have similar experience with F#, which is a bit more verbose than clojure but absolutely beats every OOP language, Python, Typescript etc.

The reason, I use F# & Clojure is they hit JVM and CLR, two popular enterprise stacks.

In my not so humble opinion Lisp(Clojure) still remains the language of AI.

genericone

12 hours ago

Typescript is also hugely represented. My projects are TS in a big way, where I have no experience with it at all.

noncoml

10 hours ago

They want to replace employees with AI, then replace paid AI with unpaid AI.

Their wet dream was never automation. It was zero marginal cost labor. And that dream is starting to rot.

That's the most useful signal. Pre OpenAI mafia RAM pricing, that comes out to $250/month.