zkmon
3 hours ago
I take this "code-output" metrics with a pinch of salt. Ofcourse, a machine can generate 1000 times more lines of code similar to a power loom does. However, the comparison with power loom ends there.
How maintainable is this code output? I saw a SPA html file produced by a model, which appeared almost similar to assembly code. So if the code can only be maintained by model, then an appropriate metric should should be based on a long-term maintainability achieved, but not on instant generation of code.
a_imho
2 hours ago
My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.
As a dev I very much subscribe to this line of thought, but I also have to admit most of the business class people would disagree.
order-matters
an hour ago
From a business perspective, the developer is the expert in lines of code and the assumption is that expertise should agree on the necessity of a line of code. To create lines of code that do not need to be there is akin to simply not doing your job in this perspective. The finished product should have X lines of code
so from a business standpoint, if equivalent expertise amongst staff is assumed then productivity comes down to lines of code created. Just like how you might measure productivity of a warehouse employee by the number of items moved per hour. Of course if someone just throws things across the warehouse or moves things that dont need to be moved they will maximize this metric, but that would be doing the job wrong - which is not a productivity measurement problem. though admittedly the incentive structures and competition make these things often related
the bigger issue to highlight, imo, is that the business side of things have no idea if coders are doing the job sufficiently well or not, and the lack of understanding is amplified by the reality that productivity contribution varies wildly per line, some requiring much more work to conjure than others. The person they need to rely on validate this difference per instance is the same person who is responsible for creating the lines. So there is a catch-22 on the business side. An unproductive employee can claim productivity no matter what the measurement is.
if the variance of work required per line could be understood by the business side then it could be managed for. I used to manage productivity metrics for a medical coding company, and some charts are more dense and harder to code than others. I did not know how to code a medical chart but I could still manage productivity by charts per hour while still understanding this caveat
the point isnt to use the productivity metric as a one stop shop for promoting and firing people but as a filter for attention, where all the middle of the pack stuff will more of less even out and not require too much direct attention. you then just need to get an understanding of how the average difficulty per item varies by product/project.
that said, maybe lines edited is still a step better - so that refactoring in a way that reduces the size of the codebase can still be seen as productive. 1 point for each line deleted and 1 point for each line added.
I understand that every line should be viewed as a liability, not an asset, but thats the job responsibility of the hired expert to figure out how many need to exist. its not the job of the business side of things to manage.
I wouldnt tell my foundation guys how much concrete to use, or my electrician how much wire to use, but if one team can handle more concrete per hour than another and they are both qualified professionals, it really doesnt seem unreasonable to start off conversations with an assumption that one is more productive than the other. Lazy people do exist everywhere, its usually a matter of magnitude of laziness between people more than it is a matter of actual full earnest capability
Talanes
24 minutes ago
"Just like how you might measure productivity of a warehouse employee by the number of items moved per hour. Of course if someone just throws things across the warehouse or moves things that dont need to be moved they will maximize this metric, but that would be doing the job wrong - which is not a productivity measurement problem."
I fail to see how having a measurement that clearly doesn't measure what is actually produced isn't exactly a productivity measurement problem. If your measurement is defeated by someone doing their job badly, what use is it?
hvb2
2 hours ago
Agreed, I stopped reading at that point. You can't take yourself seriously to create a report and use LOC as your measure.
I feel like we humans try to separate things and keep things short. We do this not because we think it's pretty, we do it so our human brains can still reason about a big system. As a result LOC is a bad measure as being concise then hurts your productivity????
dakshgupta
2 hours ago
We're careful not to draw any conclusions from LoC. The fact is LoCs are higher, which by itself is interesting. This could be a good or bad thing depending on code quality, which itself varied wildly person-to-person and agent-to-agent.
mrdependable
2 hours ago
Can you expand on why it is interesting?
zed31726
2 hours ago
Because it's different. Change is important to track
dakshgupta
2 hours ago
How would you measure code quality? Would persistence be a good measure?
scuff3d
an hour ago
That question has been baffling product managers, scrum masters, and C-suite assholes for decades. Along with how you measure engineering productivity.
epicureanideal
an hour ago
Bad code can persist because nobody wants to touch it.
Unfortunately I’m not sure there are good metrics.
scuff3d
an hour ago
It shouldn't be taken with a pinch of salt, it should be disregarded entirely. It's an utterly useless metric, and given that the report leads with it makes the entire thing suspect.
apercu
an hour ago
When I was first learning Perl after being a shell scripter/sysadmin I produced a lot of code. 2-3 years later the same tasks would be way less code. So is more code good?
Also, my anecdotal experience is that LLM code is flat wrong sometimes. Like a significant percentage. I can't quote a number really, because I rarely do the same thing/similar thing twice. But it's a double digit percentage.