Aurornis
11 hours ago
The key point for me was not the rewrite in Go or even the use of AI, it was that they started with this architecture:
> The reference implementation is JavaScript, whereas our pipeline is in Go. So for years we’ve been running a fleet of jsonata-js pods on Kubernetes - Node.js processes that our Go services call over RPC. That meant that for every event (and expression) we had to serialize, send over the network, evaluate, serialize the result, and finally send it back.
> This was costing us ~$300K/year in compute, and the number kept growing as more customers and detection rules were added.
For something so core to the business, I'm baffled that they let it get to the point where it was costing $300K per year.
The fact that this only took $400 of Claude tokens to completely rewrite makes it even more baffling. I can make $400 of Claude tokens disappear quickly in a large codebase. If they rewrote the entire thing with $400 of Claude tokens it couldn't have been that big. Within the range of something that engineers could have easily migrated by hand in a reasonable time. Those same engineers will have to review and understand all of the AI-generated code now and then improve it, which will take time too.
I don't know what to think. These blog articles are supposed to be a showcase of engineering expertise, but bragging about having AI vibecode a replacement for a critical part of your system that was questionably designed and costing as much as a fully-loaded FTE per year raises a lot of other questions.
ezst
3 hours ago
>> This was costing us ~$300K/year in compute, and the number kept growing as more customers and detection rules were added.
> For something so core to the business, I'm baffled that they let it get to the point where it was costing $300K per year.
And this, this is the core/true/insightful story the executives will never hear about.
sigmoid10
2 hours ago
Eh. If you get into enterprise business, this is the accepted management style. AI will now mix this up a little, but before you basically needed to ask if you want to blow 300k on developer salaries to maybe fix something that is already working and generating money, or add more features to the roadmap you can pin on your chest. Scaling infrastructure is the best choice for 90% of managers, especially since they are not the ones paying for it and this kind of technical debt doesn't matter on typical bonus check timeframes.
xnx
42 minutes ago
Managers love big cloud spend so the vendors take them on fancy golf trips ... er ... "Conferences".
DrBazza
21 minutes ago
Most of the other replies to this hit the nail on the head.
A human writing some poor, but working code that is supposed to be a demo, goes to production 9 times out of 10.
Then it becomes critical infrastructure.
Then management cannot understand why something working needs a rewrite because there's no tangible numbers attached to it. The timeless classic developer problem.
We were here ^^^^ up to 2024-2025.
Now, with LLMs, you can at least come up with a vibe coded, likely correct, likely faster, solution in a morning, that management won't moan at you about.
faangguyindia
17 minutes ago
I've worked many companies
Kubernetes, app engine, beanstalk all are huge money sink
All managed services like cloud datastore, firestore all tend to accure lots of costs if you've good size app.
These are quick to start when you don't have any traffic. Once traffic comes, you the cost drastically goes up.
You can always do better running your own services.
hansvm
10 hours ago
I mostly agree, but it's more appropriate to weigh contributions against an FTE's output rather than their input. If I have a $10m/yr feature I'm fleshing out now and a few more lined up afterward, it's often not worth the time to properly handle any minor $300k/yr boondoggle. It's only worth comparing to an FTE's fully loaded cost when you're actually able to hire to fix it, and that's trickier since it takes time away from the core team producing those actually valuable features and tends to result in slower progress from large-team overhead even after onboarding. Plus, even if you could hire to fix it, wouldn't you want them to work on those more valuable features first?
Aurornis
10 hours ago
They were running a big kubernetes infrastructure to handle all of these RPC calls.
That takes a lot of engineer hours to set up and maintain. This architecture didn't just happen, it took a lot of FTE hours to get it working and keep it that way.
kitd
4 hours ago
But that k8s engineer's cost is spread over all the functions the cluster is doing, not just the rpc setup.
hansvm
10 hours ago
Yeah, the situation from TFA doesn't make a lot of sense; I was just highlighting that it's not as clear-cut as "costs > 1 FTE => fix it."
arjie
6 hours ago
Kube is trivial to run. You hit a few switches on GKE/EKS and then a few simple configs. It doesn't take very many engineer hours to run. Infrastructure these days is trivial to operate. As an example, I run a datacenter cluster myself for a micro-SaaS in the process of SOC2 Type 2 compliance. The infra itself is pretty reliable. I had to run some power-kill sims before I traveled and it came back A+. With GKE/EKS this is even easier.
Over the years of running these I think the key is to keep the cluster config manual and then you just deploy your YAMLs from a repo with hydration of secrets or whatever.
cryptonym
an hour ago
The cost is not just tokens, you need an actual human contributor looking into the issue, prompting, checking output, validating, deploying,... Difficult to compute the actual AI ROI. If $300K didn't matter without AI, it probably still doesn't matter with AI.
otabdeveloper4
an hour ago
> it's often not worth the time to properly handle any minor $300k/yr boondoggle
No, because you can use that 300k to solve some real problem instead of literally lighting it on fire.
(Hell, just give employees avocado toasts or pingpong tables instead.)
CalRobert
31 minutes ago
"For something so core to the business, I'm baffled that they let it get to the point where it was costing $300K per year."
You build something that's a dirty hack but it works, then your company grows, and nobody ever gets around to building it.
I was at a place spending over $4 million a year on redshift basically because someone had slapped together some bad (but effective!) queries when the company was new, and then they grew, and so many things had been built on top they were terrified to touch anything underneath.
andai
10 hours ago
Yeah, it's like those posts "we made it 5,000x faster by actually thinking about what the code is doing."
therealdrag0
7 hours ago
Exactly. Reddit did one last year like: “We migrated from python to golang and fixed a bunch of non-performant SQL queries. It was so fast, isn’t golang awesome?”
selcuka
4 hours ago
I was once asked to migrate a Microsoft Access application to C#/MS SQL Server because it was too slow. I just added a few database indexes to make it an order of magnitude faster.
(They still wanted to go ahead with the migration, but that's a different story.)
anon7000
6 hours ago
I have about a dozen projects I’d love to tackle in this vein. (Not as low hanging fruit, but enough effort they’re languishing in the backlog.) we’ll actually be able to get to more those projects with agents and good specs
SkyPuncher
6 hours ago
In my experience, a lot of these types of migrations aren't incredibly deep in terms of actual code being written. It's about being able to assess all of the affected facets accurately. Once that's all mapped out, it's pretty straight forward to migrate.
hobofan
10 hours ago
> If they rewrote the entire thing with $400 of Claude tokens it couldn't have been that big.
The original is ~10k lines of JS + a few hundred for a test harness. You can probably oneshot this with a $20/month Codex subscription and not even use up your daily allowance.
raincole
2 hours ago
> Those same engineers will have to review and understand all of the AI-generated code now and then improve it, which will take time too.
Will they? What makes you think so? If no one cared to improve it when it costed $300k/year, no one will care it when it's cheaper now.
heavyset_go
6 hours ago
I wonder how much it would have cost them if they weren't paying cloud rates for all of that, and they kept the same general inefficient architecture, sans the Kubernetes bloat.
Doubt they'd have a blog post to write about that, though.
andersmurphy
2 hours ago
Wonder if the real value of LLMs/AI is similar to microservices in that it solves an organisational/culture problem.
In this case AI allowed the developer to make a change that the organisation would not have allowed. Regular rewrites don't let you signal to investors that you are AI ready/ascendant/agentic (whatever the latest AI hype term is) so would have been blocked. But, an AI rewrite.
deckar01
9 hours ago
You aren’t accounting for managerial politics. A product manager won’t gamble on a large project to lower operating cost, when their bonus is based on customer acquisition metrics.
parpfish
6 hours ago
The original author said he built this on the weekend, so my assumption is that this was something engineers had advocated for before but were shut down because management wanted them elsewhere.
The use of ai agents allowed them to shrink the problem down to the point where it was small enough to fit in their free time and not interrupt their assigned work.
ahtihn
an hour ago
Why are engineers spending their week-end on saving their company money especially if the company clearly doesn't care to allocate resources to the problem?
I get that it's fun and there's personal satisfaction in it, but it just reinforces to management that they don't need to care about allocating resources to optimisation, the problem will just take care of itself for free.
swiftcoder
13 minutes ago
At some point it's hard not to care about the work you do everyday. And if you care, then you are going to find yourself donating a Saturday here or there to solving big DevEx papercuts that you can't convince management to care about.
Should it be this way? No. Is it this way in practice? Unfortunately often.
usrusr
2 hours ago
A bit sarcastic, but still too close to reality for comfort:
For the managers, it's about a bonus. For engineers it's the existential question of future hirability: every future employer will love the candidate with experience in operating a $500k/a cluster. They guy who wrote a library that got linked into a service... Yeah, that's the kind they already have, not interested, move along.
jackkinsella
2 hours ago
A more charitable explanation would be that they were under product pressure for more features and were never given the slack time to even explore this angle. Happens a lot.
hiyer
8 hours ago
I was thinking the same - if JSONata was a priority for them, why not choose a language with good support, like JS or Java? OTOH if development language was a priority why not choose a format that is well supported in it?
otherme123
4 hours ago
>If they rewrote the entire thing with $400 of Claude tokens it couldn't have been that big.
It was "A few iterations and some 7 hours later - 13,000 lines of Go with 1,778 passing test cases."
mewpmewp2
2 hours ago
Yeah that checks out to me, 1 hour of active Claude Code usage has been around $50 per hour for me.
arjie
6 hours ago
I've seen it happen and it's usually just Normalization of Deviance in an organization that is focusing on something else. Someone needs some kind of functionality and Kube makes creating services trivial so they launch it into a different service[0]. Over time, while people are working on important things this thing occasionally has load issues so someone goes and bumps the maxReplicas up periodically. Eventually you come back to it a year later and maxReplicas is at 24 and you've removed the code paths for almost everything that is hitting the server except some inexplicable hot-loop.
Then you look at it and you're like "Jesus! What the fuck, I meant to have this be a stop-gap". I've done as bad when at near 100% duty-cycle. Often you're targeting just the primary thing that's blocking some revenue and if you get caught yak-shaving you're screwed. A year ago, I did one of these things because I was in the middle of two projects that were blocking a potential hundred-million in revenue.
A year down the line, Claude Opus 4.6 could have live-solved it. But Claude of that time would have required some time and attention and I was doing something else.
That engineering team is some 15 people strong and the company is at $400m+ revenue. If you saw the code, you'd wonder why anyone would have done something like this.
0: I once did this because some inscrutable code/library was tying us to an old runtime so I just encapsulated it in HTTP and moved it into a service.
cogogo
10 hours ago
Think this is pure piggyback marketing on what cloudflare did with next.js. In my experience a company that raised $30MM a month ago is extremely unlikely to be investing energy in cost rationalization/optimization.
edit: saw the total raise not the incremental 30MM
hparadiz
6 hours ago
I've been refactoring stuff with a $20 ChatGPT account.
pepa65
6 hours ago
I've been refactoring stuff with anonymous ChatGPT usage..!
antonvs
5 hours ago
Completely agree. We have > $50m from our most recent funding round, and even a cloud expense of $50k/year (in our case for storage) is considered a high priority to address. If it was $300k, our CTO would be running around with a butane torch setting everyone’s hair on fire until the problem was resolved.
But, venture funding does create a lot of weird inefficiencies which vary from company to company.
mewpmewp2
2 hours ago
But what is your income? How important it is to address should be compared to that and current profits too if any, and whether you have to be profitable right now.
neya
5 hours ago
No offence, but inexperienced JS fanatics always do this because of some weird affectionado they have for the language itself. Otherwise, even a decently qualified CTO would have chosen to keep everything in Go from the beginning or might have not waited until they were bleeding $300k. JS is also the worst possible language choice for this problem. So, it definitely sounds a bunch of script kiddies with fancy titles bought with VC money rather than actual experience.
mewpmewp2
2 hours ago
What if you are about to get a potentially really high paying customer, but they might go elsewhere unless you deliver X feature immediately and it is so much quicker to do it with the JS script?
neya
2 hours ago
Given that the potential high paying customer is just that - a potential, one must always keep the long term platform stability in mind as it affects every other customer, not just this potential customer. Hence, it boils down to opportunity cost and setting the right expectations:
We can deliver feature X for you - incrementally broken down into sub-features x1, x2, x3 over a period of Y weeks/months
The other way to do this would be to build a custom integration on top of your existing APIs and beta test it alongside the customer, bill them accordingly and eventually merge the changes into the main platform, once you can guarantee stability.
But, both these methods will sound boring to VC funded companies as they are under constant pressure from VCs to show something in their weekly graphs - meaningful or not.
mewpmewp2
2 hours ago
The customer could be on the fence between you and a competitor and this customer could be potentially paying 10x more than all your existing customers together. It could make or break your company. They would go to the competitor immediately if you make it complicated for them and have delays with the setup. What do you do then?
neya
an hour ago
Sounds like a bad business model then - if you have to depend on one single customer to make or break your company.