yurimo
4 hours ago
What I still can't understand is why is massive amount of code generated is a flex? I don't feel that software has gotten a lot better in past 3 years, only sloppier. It's surprising to me that people who know about reward hacking choose a simple objective like lines of code generated as a signal for quality. I'd argue you have to optimize for less lines generated as possible while secondary optimization should be readability for humans. I suspect it's not seen as a problem by providers because more lines generated means more tokens used and hence more billing put out on customers.
And if I am working on an existing codebase then isn't a good commit often a negative sum between added and removed lines? I don't want to bloat my codebase but make it more polished and elegant. After reading that I wonder if what they have done could have been accomplished for a far fewer LoC budget.
YZF
3 hours ago
Lines of code has always been a terrible metric. But all else being equal it is a measure. If all else is not equal, which is usually the case, then it's not.
A lot of the focus has been on AI recently.
Three years ago we didn't have software where a non-software engineer can describe what they want in English and get working (-ish) software generated by other software? Is that not "software has gotten a lot better"?
Other than that I'm not sure how we measure "software has gotten better". New applications? More features? How do we measure sloppier? Is Google Maps suddenly taking you the wrong way more often? I'm not really doubting your subjective experience but seriously how do tell? I mean a doc is a doc and a spreadsheet is a spreadsheet.
We're also only about 10 months into models that are powerful enough to potentially make a bigger difference and we are still figuring out how to use them best.
arcticbull
an hour ago
I also can't help but notice they didn't mention how many tokens were burned, or how much that translates to in terms of cost over the 5 months at enterprise AI prices. I'm going to guess this wasn't a cheap demo.
trollbridge
3 hours ago
The "lines of code" at this point are basically the same thing as binary code that comes out of a compiler - something you almost never look at and certainly won't try to touch by hand.
The actual "code" is everything driving the harness.
The current problem for this is that the harness is not (yet) deterministic, so it's sort of like having a compiler where your output program works slightly differently every build, and then the compiler tries to just patch the binary programs when you recompile to minimise this problem, or even worse, disassembles the whole thing to figure out what it does, makes the chance, and then recompiles it.
yurimo
an hour ago
I think the telling part is in this line:
> Because the repository is entirely agent-generated, it’s optimized first for Codex’s legibility
I asked a question from a perspective of a human engineer, as in, I will have to read the code and understand, fix it once it breaks. OpenAI approach is opposite, even if it is breaking it is the agent that will be doing the fixing, millions of lines and inelegant designs don't matter because human readability doesn't matter. In any case you use more tokens so you fork over more money.
I will say, however, that IMHO there is objectively bad and good code in terms what it can do and performance, if I can do the same thing in 50 lines as opposed to 1000 lines, this difference still matters for the model. Smaller context usage, better approach that informs downstream generation.
ArtRichards
an hour ago
This is the part I think we will see become more relevant.
I created docs-cli (pypi) to manage the index of specs as source code: the framework that goes with it will first create tests for as much as it can, so reproducability becomes the goal, not readability.
B-Con
an hour ago
People want to do X, so the metric is how much X can be done.
Everyone is over-complicating the explanation. The answer for "why are we fixating on this bad metric" is almost always the same pattern.
Broad audiences need simple metrics to talk about. If the metric itself requires nuance, it's hard to communicate and hard to reason about. It's easier to push the need for nuance from understanding the metric itself down the road to where the metric is applied, which allows everyone to ignore it in immediate conversation.
crdrost
4 hours ago
Yeah so all of personal computing—text editing, SVG antialiasing, etc, fits in 20,000 LOC (VPRI's STEPS project) so a million lines of code is 50 reïnventions of personal computing. BUT: it is unlikely that humans would have solved this problem in 20 kLOC. Sussman said “we really don't know how to compute!” as his talk title and LLMs had to ossify some pre-existing voice as the forever programming habitus and it chose a persona that doesn't know how to program—because we don't —and now we are stuck with it. Claude is our tickets, our implementations, our documentation... And if you tell it “hey the node role should not have those permissions, that should be a service account” it will happily do the right thing, but it has no intrinsic sense of taste and the error message it's trying to clear just says “the node role doesn't have that permission and the system prompt says “keep it short, stupid ” and graybeards might be our last bulwark.
zahlman
2 hours ago
> VPRI's STEPS project
The what now? Search engines failed me here.
linguae
an hour ago
This is a description of Alan Kay’s STEPS project:
https://worrydream.com/refs/Kay_2007_-_STEPS_2007_Progress_R...
This is the final report:
dotancohen
43 minutes ago
I've heard it said that measuring productivity of a software developer by lines of code added, is akin to measuring the productivity of an aerospace engineer by mass added.
It is a metric. It is often not a good metric. But it is easy to measure.
Sparkyte
4 minutes ago
Indeed. The more you do to add complexity or generate without ensuring it isn't pointless code adds bloat. It is still probably more MVP than junk developers will do, but not ever anywhere near as great as someone who studies a programming language.
bonigv
2 hours ago
It is a very valid question. My intution (no grounding) is to the model training. Optimizations traditionally have worked well in human wrote software with either experience of the developer , usage of architectural patterns or a second ir third pass of fine tuning. In case of model written code - (e/p one token at a time), only possible orchitectural optimization is either with a strict guardrail on patterns to use for a specific implementation OR by giving a second or third optmization path. All of which burns more tokens, but can lead to better software.
dumbdumb125
4 hours ago
It's a huge flex if the alternative is no code at all. Reward hacking aside, LOC resonates with me in the sense that I've seen 10+ projects to fruition that wouldn't have even begun without an agentic harness and an LLM.
It's like the difference between doing stock price predictions with binary "up" or "down" histories and trying to figure out how to normalize actual price histories (basically impossible). The binary work gives a well-defined signal.
Npovview
an hour ago
Don't compare human loc with machine loc.
Compare machine to machine (as these headlines come) and discount that by a factor.
yurimo
an hour ago
You can't really do that here because one of the key arguments for this, as people in the thread focus on is "1/10th of time" estimate, the comparison with humans is here already, albeit it is just an estimate and no actual comparison has been done.
This is a problem of conflicting incentives that exists today in my opinion. Companies will market greater human-AI collaboration in science and engineering but focus on releasing things like this where it is clear that downstream goal is complete agent ownership over the product, from inception to testing to monitoring. Maybe the speculative future agents will use their own very efficient language to code that won't be readable for people at all. They focus on agent code being readable by agent in the article, as you've said. But in my mind in at least near future, there is a case where your prod will break, you won't be able to understand it or the attempted fixes. Maybe agent will fail to fix it at all and start a massive rewrite. In any case is this different from kicking technical debt down the road along with worse interpretability of what you have built?
I do think there is a way where agent can write great solid code that we can read, but with the way LLMs are built this requires something new in terms of reward that accounts for "taste" and constant refinement so it might take more than 1/10th of a time to produce something good.
cpard
3 hours ago
I don’t think the flex here is the amount of code alone. Their goal is to show that AI can improve productivity, the number of lines is just the proxy to that. This article is a marketing piece after all.
Now someone can argue that lines of code are not a good proxy of engineering productivity, but I wouldn’t be surprised if the audience they target with this content is not the HN commenters of this thread.
threethirtytwo
2 hours ago
Because sloppy code and doesn't matter. He wrote he completed it in 1/10th time. That's equivalent to 1/10th the engineers.
That is a business win. That is really all that matters in capitalism.
The flex is a direct insult to your face. He is shitting on the faces of all software engineers (me included). It is equivalent to saying we don't need you to code anymore. One man can produce 10x the code.
So why am i voting him up even though he's shitting on my face? Because what he says is true. I value honesty and people who say things like it is. Yes my identity as a software engineer is getting dismantled before my very eyes. But the solution to this problem isn't some delusional statement about not understanding what he's flexing about. We're not stupid. Everyone on this thread understands his flex. The difference is some people like you don't want to understand it.
Like seriously. He literally wrote it was completed in 1/10th of the time and you expect me to believe that YOU don't know what HE is flexing about? Be real. You're not stupid.
estetlinus
2 hours ago
The real flex is delivering it in one-tenth of the time. Mentioning lines of code is mostly noise.
I’ve worked with 20-year-old codebases and products that grew organically over decades and still sit well below a million lines of code. Using LOC as some kind of health or success metric makes me more suspicious than impressed.