hackernews client

StrongDM's AI team build serious software without even looking at the code

28 pointsposted 2 hours ago

28 Comments

CuriouslyC

11 minutes ago

Until we solve the validation problem, none of this stuff is going to be more than flexes. We can automate code review, set up analytic guardrails, etc, so that looking at the code isn't important, and people have been doing that for >6 months now. You still have to have a human who knows the system to validate that the thing that was built matches the intent of the spec.

There are higher and lower leverage ways to do that, for instance reviewing tests and QA'ing software via use vs reading original code, but you can't get away from doing it entirely.

cronin101

4 minutes ago

This obviously depends on what you are trying to achieve but it’s worth mentioning that there are languages designed for formal proofs and static analysis against a spec, and I have suspicions we are currently underutilizing them (because historically they weren’t very fun to write, but if everything is just tokens then who cares).

And “define the spec concretely“ (and how to exploit emerging behaviors) becomes the new definition of what programming is.

simianwords

3 minutes ago

did you read the article?

>StrongDM’s answer was inspired by Scenario testing (Cem Kaner, 2003).

codingdave

an hour ago

> If you haven’t spent at least $1,000 on tokens today per human engineer, your software factory has room for improvement

At that point, outside of FAANG and their salaries, you are spending more on AI than you are on your humans. And they consider that level of spend to be a metric in and of itself. I'm kinda shocked the rest of the article just glossed over that one. It seems to be a breakdown of the entire vision of AI-driven coding. I mean, sure, the vendors would love it if everyone's salary budget just got shifted over to their revenue, but such a world is absolutely not my goal.

simonw

22 minutes ago

Yeah I'm going to update my piece to talk more about that.

Edit: here's that section: https://simonwillison.net/2026/Feb/7/software-factory/#wait-...

dixie_land

36 minutes ago

This is an interesting point but if I may offer a different perspective:

Assuming 20 working days a month: that's 20k x 12 == 240k a year. So about a fresh grad's TC at FANG.

Now I've worked with many junior to mid-junior level SDEs and sadly 80% does not do a better job than Claude. (I've also worked with staff level SDEs who writes worse code than AI, but they offset that usually with domain knowledge and TL responsibilities)

I do see AI transform software engineering into even more of a pyramid with very few human on top.

bobbiechen

30 minutes ago

Important too, a fully loaded salary costs the company far more than the actual salary that the employee receives. That would tip this balancing point towards 120k salaries, which is well into the realm of non-FAANG

elicash

9 minutes ago

It reminds me of how people talk about the proper ratio of employees to supervisor. (With AI being the employees in this example.)

dewey

42 minutes ago

It would depend on the speed of execution, if you can do the same amount of work in 5 days with spending 5k, vs spending a month and 5k on a human the math makes more sense.

verdverm

16 minutes ago

You won't know which path has larger long term costs, for a example, what if the AI version costs 10x to run?

philipp-gayret

26 minutes ago

$1,000 is maybe 5$ per workday. I measure my own usage and am on the way to $6,000 for a full year. I'm still at the stage where I like to look at the code I produce, but I do believe we'll head to a state of software development where one day we won't need to.

gipp

19 minutes ago

Maybe read that quote again. The figure is 1000 per day

verdverm

13 minutes ago

The quote is if you haven't spent $1000 per dev today

which sounds more like if you haven't reached this point you don't have enough experience yet, keep going

At least that's how I read the quote

kaffekaka

42 minutes ago

If the output is (dis)proportionally larger, the cost trade off might be the right thing to do.

And it might be the tokens will become cheaper.

japhyr

an hour ago

> That idea of treating scenarios as holdout sets—used to evaluate the software but not stored where the coding agents can see them—is fascinating. It imitates aggressive testing by an external QA team—an expensive but highly effective way of ensuring quality in traditional software.

This is one of the clearest takes I've seen that starts to get me to the point of possibly being able to trust code that I haven't reviewed.

The whole idea of letting an AI write tests was problematic because they're so focused on "success" that `assert True` becomes appealing. But orchestrating teams of agents that are incentivized to build, and teams of agents that are incentivized to find bugs and problematic tests, is fascinating.

I'm quite curious to see where this goes, and more motivated (and curious) than ever to start setting up my own agents.

Question for people who are already doing this: How much are you spending on tokens?

That line about spending $1,000 on tokens is pretty off-putting. For commercial teams it's an easy calculation. It's also depressing to think about what this means for open source. I sure can't afford to spend $1,000 supporting teams of agents to continue my open source work.

Lwerewolf

7 minutes ago

Re: $1k/day on tokens - you can also build a local rig, nothing "fancy". There was a recent thread here re: the utility of local models, even on not-so-fancy hardware. Agents were a big part of it - you just set a task and it's done at some point, while you sleep or you're off to somewhere or working on something else entirely or reading a book or whatever. Turn off notifications to avoid context switches.

Check it: https://news.ycombinator.com/item?id=46838946

verdverm

11 minutes ago

Do you know what those hold out twats should look like before thoroughly iterating on the problem?

I think people are burning money on tokens letting these things fumble about until they arrive at some working set of files.

I'm staying in the loop more than this, building up rather than tuning out

d0liver

20 minutes ago

> As I understood it the trick was effectively to dump the full public API documentation of one of those services into their agent harness and have it build an imitation of that API, as a self-contained Go binary. They could then have it build a simplified UI over the top to help complete the simulation.

This is still the same problem -- just pushed back a layer. Since the generated API is wrong, the QA outcomes will be wrong, too. Also, QAing things is an effective way to ensure that they work _after_ they've been reviewed by an engineer. A QA tester is not going to test for a vulnerability like a SQL injection unless they're guided by engineering judgement which comes from an understanding of the properties of the code under test.

The output is also essentially the definition of a derivative work, so it's probably not legally defensible (not that that's ever been a concern with LLMs).

wrs

29 minutes ago

On the cxdb “product” page one reason they give against rolling your own is that it would be “months of work”. Slipped into an archaic off-brand mindset there, no?

verdverm

4 minutes ago

We make this great, just don't use it to build the same thing we offer

Heat death of the SaaSiverse

CubsFan1060

38 minutes ago

I can't tell if this is genius or terrifying given what their software does. Probably a bit of both.

I wonder what the security teams at companies that use StrongDM will think about this.

verdverm

2 minutes ago

I doubt this would be allowed in regulated industries like healthcare

g947o

37 minutes ago

Serious question: what's keeping a competitor from doing the same thing and doing it better than you?

simonw

32 minutes ago

That's a genuine problem now. If you launch a new feature and your competition can ship their own copy a few hours later the competitive dynamics get really challenging!

My hunch is that the thing that's going to matter is network effects and other forms of soft lockin. Features alone won't cut it - you need to build something where value accumulates to your user over time in a way that discourages them from leaving.

CubsFan1060

26 minutes ago

The interesting part about that is both of those things require some sort of time to start.

If I launch a new product, and 4 hours later competitors pop up, then there's not enough time for network effects or lockin.

I'm guessing what is really going to be needed is something that can't be just copied. Non-public data, business contracts, something outside of software.

verdverm

a few seconds ago

Marketing and brand are still strong, though I personally hope for a world where business is more indie and less winner take all

You can see the first waves of this trend in HN new

rhrthg

an hour ago

Can you disclose the number of Substack subscriptions and whether there is an unusual amount of bulk subscriptions from certain entities?

simonw

an hour ago

I recently passed 40,000 but my Substack is free so it's not a revenue source for me. I haven't really looked at who they are - at some point it would be interesting to export the CSV of the subscribers and count by domains, I guess.

My content revenue comes from ads on my blog via https://www.ethicalads.io/ - rarely more than $1,000 in a given month - and sponsors on GitHub: https://github.com/sponsors/simonw - which is adding up to quite good money now. Those people get my sponsors-only monthly newsletter which looks like this: https://gist.github.com/simonw/13e595a236218afce002e9aeafd75... - it's effectively the edited highlights from my blog because a lot of people are too busy to read everything I put out there!

I try to keep my disclosures updated on the about page of my blog: https://simonwillison.net/about/#disclosures