al_borland
4 months ago
LOC is often a rough approximation for complexity. We once had an intern who made some useful things, but he didn’t know how to break anything down. One of them was a 1,000 line perl all as one function. I asked if it could be broken down into something more maintainable and he said no. There were several projects like this.
Knowing At a high level what needed to happen to accomplish what the code did, I know for a fact it could have and should have been broken down more. However, because of the complexity of what he wrote, no one had the patience to go through it. When he left, the code got thrown away.
While 10 LOC vs 50 LOC doesn’t matter, when commas enter the number, it’s a safe bet that things have gone off the rails.
xorcist
4 months ago
After having this discussion in so many professional settings, I'm starting to think that this is just something that divides people. Maybe our brains are wired differently. What's important for me is that the call graph is simple. Second most important is that data structures are easy, or at least fits the problem. The amount of lines in a function is a distant third.
It could have been me behind one of those thousand line functions. Some tings lends itself naturally to this style, such as interpreter-like loops and imperative business logic. There's always that someone that argues that splitting up the code with function calls would somehow make it more "maintainable".
I'm reluctant to do it, because the code is identical, only non-linear which means harder to understand. Combine this with global symbols and it's a recipe for accidental complexity. This is so obvious that I for a long time thought people with the opposite opinion were just cargo-culting some OOP "best practice" mess which has doomed so many projects, but maybe they're just wired differently.
stirfish
4 months ago
Do you write comments in your code? If you ever have a long function that like
....
// Here's where we apply coupon codes
...
It makes sense to me to break that out into it's own function, preferably one with no side effects that I can write a test for if I need to. I can give it a name related to the business logic.Inevitably there will come a new special case where, for example, where Gold Members will get to use double coupon codes on Thursday. You'll start by looking in `checkout()` where you'll see either 1000 lines of other special cases, or maybe 30 lines with function calls like `foo = applyCouponCode(shoppingCart, couponCode)`.
For me, it's easier to see where to start with the latter.
Kinrany
4 months ago
Comments are perfectly fine on their own. It's common that a block of code can be given a higher level explanation in a comment, but wouldn't be a good standalone function because it depends on the context too much and so the function has absolutely no chance of ever being used a second time.
stirfish
4 months ago
It's okay if a function is only used once. It's worth it to separate it from the context it doesn't need.
Well, twice, including the test. If a block is complex enough to warrant a high level explanation, you might as well capture that intent in a test too.
Edit: in my example, `applyCouponCode` takes in a `shoppingCart` and a `couponCode`, so you know that's all you need to apply a coupon code. You'd change it to something like `applyCouponCode(shoppingCart, couponCode, user.memberStatus)` so you can tell at a glance that `memberStatus` has something to do with how coupons work. You wouldn't want to pass in the whole `user`, because giving the function more than it needs makes it hard to infer what context the function needs and, therefore, what it does.
Kinrany
3 months ago
Tests are code. If the function is easy to test then it's more likely to really be worth having.
westcoast49
4 months ago
I think you're right. I'm on the opposite side of it, but I have a colleague who routinely writes functions that are hundreds of lines long, with 30-40 local variables inside each function. I've come to realize that he does so because his brain allows him to do it. He has a brain that is more detail-oriented than my brain. Where I naturally see a problem as a "tree" of sub-problems with their own details, he naturally sees one large problem with 100 details. He naturally writes code to accommodate his own cognitive style, while I write code to accommodate my own cognitive style.
Moreover, I've come to realize that my colleague is not adverse to using abstractions if they are well established, and if they already exist. But he is (much) less inclined to invent or write new abstractions than I am. As you have concluded, I have also concluded that this is actually a matter of cognitive style, more than it's a symptom of "slop" or "cargo-culting of best practices".
delichon
4 months ago
I had a colleague who wrote long functions like that. Over time I came to think of him as an inconsiderate asshole. Unfortunately he was me a few years earlier. I gradually learned the value of organizing code in short sweet methods. When that style become old code, I and my other colleagues came to like that crusty old coder better. I also thought it was a matter of cognitive style but for me it turned out to be inexperience.
1718627440
4 months ago
I think a better measure than simple LoCs is to cluster the local variables by use. If you get entirely separate clusters or large clusters that only share a few variables, then you should break them up at the cluster boundaries.
endymion-light
4 months ago
I guess I don't really see a reason to have a 1000 line of code function that's not broken down.
Although I personally despise the massive call graphs, my rule of thumb tends to be if I spend over 10 minutes trying to hold all the logic in my head, something is wrong.
Especially with imperative business logic - surely it makes so much more sense to break it down into functions. If you're on call at 2am trying to solve a function with imeprative business logic that someone else has written and has over 1000 lines, you're going to hate yourself, and your coworker.
frlinook
4 months ago
[dead]
jcranmer
4 months ago
> While 10 LOC vs 50 LOC doesn’t matter, when commas enter the number, it’s a safe bet that things have gone off the rails.
There are times when even a 1,000 LOC function is the better solution. The best example I can think of involve a switch statement with a very high branch factor (e.g., 100), where the individual cases are too simple to break out into separate functions. Something like the core loop of an interpreter will easily become a several-thousand line function.
nake89
4 months ago
John Carmack wrote about inlining code: http://number-none.com/blow/john_carmack_on_inlined_code.htm...
imoverclocked
4 months ago
Cyclomatic complexity is not equal to LOC and cyclomatic complexity of a switch can be seen as adding just 1 to its enclosing function. Either way, LOC is still not a great metric while cyclomatic complexity approaches a better one.
In my experience, there are very few places where something can't be broken up to reduce cognitive overhead or maintainability. When a problem is solved in a way as to make it difficult to do then maybe it's better to approach the problem differently.
zdragnar
4 months ago
Higher cyclomatic complexity requires more lines (barring bad code style like writing an entire program on a single line).
The inverse is not always true, but often is.
imoverclocked
4 months ago
> Higher cyclomatic complexity requires more lines
Often true; That's why cognitive complexity wins over cyclomatic complexity.
eg: Embedded logic doesn't add linearly.
fn() { // 1
if(...) { // +1
if(...) { // +2
if(...) { // +3
...
}
}
}
} // cognitive complexity of 7 vs cyclomatic complexity of 4It's been a while since I've implemented anything around this and was remembering cognitive complexity while writing cyclomatic complexity in the previous response. They both have their place but limiting cognitive complexity is vastly more helpful, IMHO. eg: The above code might be better written as guard-clauses to reduce cognitive complexity to 4.
gorgoiler
4 months ago
Interestingly, I read once that Hacker News moderators use a similar metric regarding the level of indentation in a thread. It’s a blunt tool to spot flame wars. More replying = more indentation = more heated = less interesting.
kqr
4 months ago
I would argue an interpreter that needs 1,000 lines for its core loop is probably a complex piece of software, comparable to other 1000-line projects I've made.
user
4 months ago
morshu9001
4 months ago
That's a special case
manwe150
4 months ago
Seems like the article missed an opportunity to talk about testing and MC/DC coverage. I don’t care if the PR is long or short, just show me that you have meaningfully tested how each branch can be reached with the full range of values and behaves correctly. Unit testing is easier with well chosen interfaces and worse without them, regardless of LOC.
ruszki
4 months ago
I don’t understand either. The complexity will be there no matter whether you split up large functions to smaller ones or not. If you need to understand the given function, you need to read and parse the same amount of code. If you blindly trust naming of functions, you will have a bad time. You need to write smaller functions for other reasons, not because of complexity:
- Testing
- Reusability
- Configurability
- Garbage collector/memory management
- etc
I never understood these kind of restrictions in code analysis tools. These kind of restrictions don’t help overall complexity at all, and sometimes they even make things more complex at the end.
bsder
4 months ago
> LOC is often a rough approximation for complexity.
I would argue that the word you are looking for is "containment".
Is there any real difference between calling a function and creating a "const varname = { -> insert lots of computation <-; break retval;}" inline?
If you never do that computation a second time anywhere else, I would argue that a new function is worse because you can't just scan in it quickly top to bottom. It also ossifies the interface if you have the function as you have to rearrange the function signature to pass in/out new things. Premature optimization ... mumble ... mumble.
Even Carmack mentioned this almost 20 years ago: http://number-none.com/blow/john_carmack_on_inlined_code.htm...
Given that we've shrugged off "object-itis" which really demands small functions and given modern IDEs and languages, that kind of inline-style seems a lot more useful than it used to be.
RHSeeger
4 months ago
> If you never do that computation a second time anywhere else, I would argue that a new function is worse because you can't just scan in it quickly top to bottom.
I feel exactly the opposite. By moving "self contained" code out of a method, and replacing it with a call, you make it easier to see what the calling location is doing. Ie, you can have a method that says very clearly and concisely what it does... vs making the reader scan through dozens or hundreds of lines of code to understand it all.
Chapter 12 of Kent Beck's book Tidy First? is about exactly this.
bsder
4 months ago
Do not cite Kent Beck as a software authority. Consider this your semi-regular reminder that Kent Beck is part of the posse (along with Ron Jeffries and Martin Fowler) responsible for the software disaster that was the "Chrysler Comprehensive Compensation System". They were also proponents of Smalltalk--which leads to object-itis and the concomitant tiny functions.
As for your "function documentation", the primary difference between that assignment and a function is solely ossified arguments. And ossified function arguments is something that I would put under "premature abstraction".
At some point you can pull that out into a separate function. But you should have something that causes you to do that rather than it being a default.
RHSeeger
4 months ago
> Do not cite Kent Beck as a software authority.
I didn't. Rather, I stated my opinion on the subject and then noted that there is a book that discusses the same (so clearly I'm not the only one that thinks this).
> As for your "function documentation", the primary difference between that assignment and a function is solely ossified arguments.
No, it's not. It's that the code being moved to a function allows the original location to be cleaner and more obvious.
shakadak
4 months ago
> If you never do that computation a second time anywhere else, I would argue that a new function is worse because you can't just scan in it quickly top to bottom.
I do the computation in my head each time I read it. If it is a function I can cache its behavior more easily compared to a well defined block of code. Even if it's never used anywhere else, it's read multiple time, by multiple people. Obviously, it has to make sense from a domain logic perspective, and not be done for the sake of reducing lines of code, but I have yet to see a function/module/file that is justified in being gigantic when approaching it with domain logic reasoning instead of code execution reasoning.
Sohcahtoa82
4 months ago
An example I'm dealing with right now...
At my job, we're not a SaaS. Our software is distributed as cloud images that customers run in their own environments. I need to run regular vulnerability scans on built images.
So, I need to do several things:
- Stand up EC2 instances with our image
- Run our software setup
- Start the scans
- Wait for the scans to complete
- Destroy all the instances
- Download the scan reports
- Upload the reports to S3
- Create Jira tickets based on findings
Now, I COULD easily write it all as one big function. But I tend to prefer to have one "main" function that calls each step as its own function. When looking at the main function, I can much more easily tell what the program is going to do and in what order.
In other words, I much prefer style A from Carmack's post.
saghm
4 months ago
I find out useful to split things up sometimes to reduce the number of things in scope at a given time. I've found that this both makes it easier for me to reason about, and anecdotally over the years it seems to correlate with my coworkers finding it easier to review as well, although I suppose it's possible that there's a bit of selection bias in terms of teams being more likely to hire people (and individuals to join teams) with similar mindsets.
mrheosuper
4 months ago
>When he left, the code got thrown away.
Why even merging his code at first place. He was intern so i assume whatever he was working on was not mission critical.
al_borland
4 months ago
They were one-off projects where he was the sole dev, not part of a larger project. While not mission critical, the 1k line Perl script was helpful at the time. When we ran into a specific issue it allowed for recovery in minutes vs hours. We eventually added safeguards which prevented the issue from happening in the first place.
Ferret7446
3 months ago
> LOC is often a rough approximation for complexity.
Setting aside the anecdote, this statement is key. If you exclude deliberate efforts to game the LOC metric, LOC is closely correlated to complexity, however else you measure it (branch count, instruction count, variable count, rate of bugs, security vulnerabilities, etc). Unlike any other metrics however, LOC is dead simple to measure. Thus, it's an extremely useful metric, so long as you don't set up an incentive for devs to game it.
waynesonfire
4 months ago
I worked closely with an engieer that would write these massive long functions; but they read clear and linear. They were really easy to follow and understand.
He'd of course develop functions and macros to avoid repeating themselves.
It just worked for his style and problem solving approach. He knew when to stretch a function. So, I'm convinced long functions can be done elegantly, it's probably uncommon.
westcoast49
4 months ago
> Knowing At a high level what needed to happen to accomplish what the code did, I know for a fact it could have and should have been broken down more. However, because of the complexity of what he wrote, no one had the patience to go through it. When he left, the code got thrown away.
And today an LLM could probably have refactored it fairly well, automatically.
hu3
4 months ago
> I asked if it could be broken down into something more maintainable and he said no.
This is something AI is good at. It could have shown the intern that it is indeed possible to break such function without wasting an hour of a seniors time.
stuaxo
4 months ago
Exactly, at that level someone is not thinking about how someone else will read and understand the code.