And where they aren't effectively "free", either your project doesn't need that performance or you're using the wrong language for the job. :p
(I have a lot of frustration around modern software taking a ton of CPU power to do almost nothing, but local variables aren't to blame for that.)
One of my goals for this quarter is to try to quantify the cost breakpoints of recalculation versus memoization of data, because most of my instincts are built around the tail end of 32 bit computing.@:$ while thats only a few DRAM generations ago, that’s a lot of CPU generations and thus suspect.
Just wanted to say “thank you” for this article. I found it years ago, probably not long after you initially wrote it and have preached it as widely as possible ever since, both as an IC and as an eng manager. It’s one of the best such tidbits I’ve ever come across!
Edited to add: and thanks for keeping it up to date with the new Swift version!
Years ago, I hated having "unnecessary" local variables in my code, i.e. for things that would only be written in one place in the code, and then immediately read in a single place. What was the point, I thought, if I wasn't getting any DRY benefits from it? It just looks amateurish to force the reader to think about the code in individual steps.
But nowadays I've realized many benefits. It often simplifies dealing with line-wrapping and debugging; but more importantly, such a local variable is a name for a subexpression, and giving such a name helps document the code. Making the reader step through the code forces awareness of what the steps are and why they need to exist. And yes, naming things is one of the "hard problems", but that's not a reason to avoid it.
Nice! I’ll add that to my bag of debugging tricks. Because, It’s 2024 and conditional breakpoints still don’t fire consistently in my language / ide of choice. But this always works great:
if i == 6 {
println!(“xxx”); // breakpoint set here in my ide
}
The breakpoint always fires, and unlike an assert I can run the program forward from there to see why i == 6 (or whatever) is problematic.
In JavaScript / typescript you can also just write “debugger;” in the code and it’ll break on that line when a debugger is attached. I wish more languages supported that.
Python supports something analogous since 3.7 (https://docs.python.org/3/library/functions.html#breakpoint). (Before that, you could put a breakpoint in the code using the standard library `pdb.set_trace()`, but you couldn't "detach" the debugger with an environment variable - only by monkey-patching the `set_trace` name.)
"If you have nested method calls on one line of code, you can’t easily set a breakpoint in the middle."
You can now do this in Jetbrains products. Pretty awesome, you can even step through them.
I keep hoping someone will come up with an objective way to score readability of libraries under a debugger breakpoint so I don’t have to be the one to try to do it. Data on the stack is a large part of that but far from the only factor.
> add assertions to your code.
Yes, and many programming languages have assertions such as "assert greater than or equal to".
For example with Rust and the Assertables crate:
fn calculate_something() -> f64 {
let big = get_big_number();
let small = get_small_number();
assert_ge!(big, small); // >= or panic with message
(big - small).sqrt
}
It turns out it's even better if your code has good error handling, such as a runtime assert macro that can return a result that is NaN (not a number) or a "maybe" result that is either Ok or Err.
For example with Rust and the Assertables crate:
fn calculate_something() -> Result(f64, String) {
let big = get_big_number()
let small = get_small_number()
assert_ge!(big, small)?; // >= or return Err(message)
(big - small).sqrt
}
Is there a reason you need specialized assertions over something like Python's generic assert?
>>> big = 2; small = 3
>>> assert big >= small, "big is smaller than small"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AssertionError: big is smaller than small
In pytest, assertions are rewritten[1] to return something more useful like:
def test():
big = 2
small = 3
> assert big > small
E assert 2 > 3
test/test_dumb.py:4: AssertionError
========== short test summary info ==========
FAILED test/test_dumb.py::test - assert 2 > 3
1:
https://github.com/pytest-dev/pytest/blob/f373974707f57a0b28...Rust has a generic assert too `assert!(foo >= bar);`. I assume (haven't used the crate myself) the advantage of `assertable::assert_ge!(foo, bar)` is that it prints the values of foo and bar in the assert message. The `assert_eq!(foo, bar)` and `assert_ne!(foo, bar)` macros provided by Rust libstd also do this. But the generic `assert!()` just sees the boolean result of its expression and only prints that in its message.
The values of the variables can be included in the generic macro's message via a custom message format, like `assert!(foo >= bar, "foo = {foo}, bar = {bar}");` but having the macro do it by default is convenient. There is an old discussion to have the `assert!()` macro parse its expression to figure out what variables are there and print them out by default, but it's still WIP. ( https://github.com/rust-lang/rfcs/blob/master/text/2011-gene... https://github.com/rust-lang/rust/issues/44838 )
What's more, pytest errs on the side of "just capture more" and In my experience it's quite useful:
============================= test session starts ==============================
platform linux -- Python 3.11.2, pytest-7.2.1, pluggy-1.0.0+repack
rootdir: /tmp/dl/py
collected 1 item
test_bigsmall.py F [100%]
=================================== FAILURES ===================================
________________________________ test_something ________________________________
def test_something():
> assert get_big_number() > get_small_number()
E assert 0 > 1
E + where 0 = get_big_number()
E + and 1 = get_small_number()
test_bigsmall.py:8: AssertionError
=========================== short test summary info ============================
FAILED test_bigsmall.py::test_something - assert 0 > 1
============================== 1 failed in 0.06s ===============================
Or even better have the assertion offer a way to repair the problem:
CL-USER 1 > (defun calculate-something (big small)
(assert (>= big small)
(big small) ; these can be reset
"Big ~a must be bigger than small ~a" big small)
(sqrt (- big small)))
calculate-something
The error handling will now offer me to repair it, if necessary:
CL-USER 2 > (calculate-something 4 7)
Error: Big 4 must be bigger than small 7
1 (continue) Retry assertion with new values for BIG, SMALL.
2 (abort) Return to debug level 0.
3 Restart top-level loop.
Type :b for backtrace or :c <option number> to proceed.
Type :bug-form "<subject>" for a bug report template or :? for other options.
CL-USER 3 : 1 > :c 1
The old value of BIG is 4.
Do you want to supply a new value? yes
Enter a form to be evaluated:
17
The old value of SMALL is 7.
Do you want to supply a new value? yes
Enter a form to be evaluated:
5
3.4641016
The last number is the return value
I'm a fan of this.
Not just for debugging either. Giving something a name gets you to think about what a good name would be, which gets you thinking about the nature of the thing, which clarifies your thinking about the thing, and leads you to better code.
When I've struggled to figure out what the right name for something is, I sometimes realize it's hard because the thing doesn't really make sense. E.g., I might find I want to name two different things the same, which leads me to understand I was confused about the abstractions I was juggling.
But it's also always nice to have a place to drop a break point or to automatically see relevant values in debuggers and other tools.
I have done this since a long time. I always thought I am too dumb to read and debug complex code with multiple function calls in one line. I always put intermediate results into variables. Makes debugging so much easier.
I also do it so I can have more single-line statements, which I find easier to read.
The problem I always have with locals (in kernel code written in C) is that the compiler tends to optimize them away, and gdb can't find them. So I end up having to read the assembly and try to figure out where values in various registers came from.
This is kind of related to a change I recently made to how I structure variables in ansible. Part of that is because doing even mildly interesting transformations in ansible, filters and jinja is just as fun as sorting dirty needles, glass shards and rusty nails by hand, but what are you gonna do.
But I've started to group variables into two groups: Things users aka my fellow admins are supposed to configure, and intermediate calculation steps.
Things the user has to pass to use the thing should be a question, or they should be something the user kind of has around at the moment. So I now have an input variable called "does_dc_use_dhcp". The user can answer this concrete question, or recognize if the answer is wrong. Or similarly, godot and other frameworks offer a Vector.bounce(normal) and if you poke around, you find a Collision.normal and it's just matching shapes - normal probably goes into normal?
And on the other hand, I kinda try to decompose more complex calculations into intermediate expressions which should be "obviously correct" as much as possible. Like, 'has_several_network_facing_interfaces: "{{ network_facing_interfaces | length > 0 }}"'. Or something like 'can_use_dhcp_dns: "{{ dc_has_dhcp and dhcp_pushes_right_dns_servers }}'.
We also had something like 'network_facing_interfaces: "{{ ansible_interfaces | rejectattr(name='lo') }}"'. This was correct on a lot of systems. Until it ran into a system running docker. But it was easy to debug because a colleague quickly wondered why docker0 or any of the veth-* interfaces were flagged as network-facing, which they aren't?
It does take work to get it to this kind of quality, but it is very nice to get there.
> accidental
Besides a debugger, isn't one of the first things people do (even undergrads) is start logging out variables that may be suspect in the issue? If you have potentially a problematic computation, put it in a variable and log it out - track it and put metrics against it, if necessary. I'm not entirely sure a full article is worth it here.
You missed the point of the article.
> If you have potentially a problematic computation, put it in a variable and log it out
My point was: what a "potentially problematic" computation is is not always known in advance. A style which is rich in local variables, when combined with a tool that shows actual values for all local variables when unhandled exceptions occur gives you this "for free". I.e. no need to log anything.
What is it that you believe they missed?
People have writing code in a certain way to provide logical "breadcrumbs" for a very long time, and doing it very deliberately. The fact that a tool was created that takes advantage of that isn't an "accident."
Compare to: "Correctly-spelled words as accidental hyperlinks to the dictionary definition."
They missed that many programmers sometimes especially experienced ones like to write code that's very clever and packs a lot into one line of code to the detriment of debugging, often by others who may just be trying to reverse engineer what someone who left wrote.
Pythonistas have a saying for that: "...Although practicality beats purity". It rarely makes sense to apply dogmatic programming advice absolutely. It's a good idea to expect titles to be clickbait, consider context and reasoning, etc. Sometimes code has too many names and sometimes it has too few.
This is my main problem with introducing functional programming in OOP languages (like streams in Java).
If it was a for loop I'd know at first glance at the exception what exactly failed...
If your language & IDE does not support functional programming properly with debugger and exception reporting - don't do it.
I bloody hate Python stacktraces because they usually don’t have enough information to fix the bug. The curse of dynamic languages.
What’s the easiest possible way to cause stacktraces to also dump local variable information? I feel like this is a feature that should be built into the language…
I don't think it's related to being a dynamic language. There are many "pretty exception printers" that will dump all the local and global variables, if you want, even up the stack!
I love debugging Python. The stacktraces are great when logged through Sentry so even on production I can normally spot the bug immediately.
On my local machine it’s even better because I can run the code, let it break and then jump straight into the debugger where I can move up and down through the stack. I can sit in the context at any point and play with the data. I can call any Python function I want to see how the code behaves.
> The stacktraces are great when logged through Sentry
<cough>Bugsink</cough> :-)
I don't think Python is special in this regard. I have the same issue with .NET/C# stack traces.
With Python, you can run your program under pdb, which will automatically enter break on exceptions, and you can easily print locals.
https://docs.python.org/3/library/pdb.html
> What’s the easiest possible way to cause stacktraces to also dump local variable information? I feel like this is a feature that should be built into the language…
That is built into Python!
import sys
def div(x, y):
return x / y
try:
print(div(1, 0))
except ZeroDivisionError as e:
print("Div by zero!")
print("Locals:")
_, _, tb = sys.exc_info()
print(tb.tb_next.tb_frame.f_locals)
This will output:
Div by zero!
Locals:
{'x': 1, 'y': 0}
https://stackoverflow.com/a/5328139 has a more thorough implementation that will print you entire stack of local variables.
I don't think it's intrinsic to dynamic languages. I've been reading this from a Common Lisp perspective and all I can think of is, "look what they need to mimic a fraction of our power".
I strongly believe nested expressions increase cognitive overhead. Between the two examples in the blog post
def calculate_something():
big_number = get_big_number()
small_number = get_small_number()
return math.sqrt(big_number - small_number)
vs
def calculate_something():
return math.sqrt(get_big_number() - get_small_number())
I'll pick the first one every time. This is a bit of an extreme example, but our languages provide us with the ability to extract and name subexpressions, and we should do that, rather than forcing people to parse expression trees mentally when reading code.
In Zope you can create a local variable named __traceback_info__ and its value will be inserted in the traceback. It is very useful.
Like add a line to a log, but only when an traceback is shown.
See: https://zopeexceptions.readthedocs.io/en/latest/narr.html#tr...
Seems like the zope.exceptions package can be used independent from Zope.
> Accidental
What? For whom? I've been extremely intentionally breaking up longer expressions into separate lines with local variables for a long time.
Writing local variables as "breadcrumbs" to trace what happens is one of the very first things new developers are taught to do, along with a print statement. I'd wager using a "just to break things up" local variable is about as common as using them to avoid recomputing an expression.
... Perhaps the author started out with something in the style of Haskell or Elm, and casual/gratuitous use of named local variables is new from that perspective?
> However, the local variables are a different kind of breadcrumbs. They’re not explicitly set by the developer, but they are there anyway.
While I may not have manually designated each onto a "capture by a third-party addon called Bugsink" whitelist, each one is very explicitly "set" when I'm deciding on their names and assigning values to them.
I don’t ignore a lot of pull request comments but I ignore these with extreme prejudice. It isn’t making the code slower to assign intermediate values. You’re just playing the “fuck you, I’ve got mine” card if you insist the variables are unnecessary to understand the code. I’m leaving them in and if you have a problem with that then we have a problem.
Right now the code is clear. After you’ve had your morning coffee, and read my PR description. How about at 5:05 when you’re trying to get to your kid’s soccer game and there’s a production issue? After I merge this you’re never going to look at this code again when there isn’t something else going on. A bug or a feature you’re trying to wedge into this code.
I think the main times I ever argue to remove local variables, it occurs when they're being run together into a data-structure which would be easier to see at-a-glance, ex:
header = {"name": "Unknown"}
body = {"items": []}
footer = {"page": 0}
placeholder = {
"header": header
"body": body
"footer": footer
}
return(placeholder)
> accidental
Admittedly this may not be the best choice of words... but it was a good trade-off of length/clarity at the time for me.
The longer version is: an _ideal_ programming language (from the perspective of debugging, though not all other perspectives) would just allow a full reverse playback through time from the point-of-failure to an arbitrary point in the past. A (small) step towards that is the "Breadcrumb" as introduced by Sentry; a hint at what happened before an error occurred. I argue that, in the coding-style as discussed, and when exposing local variables in stacktraces, local variables actually serve as breadcrumbs, albeit not explicitly set using the breadcrumb-tooling.
> along with a print statement
yeah but the point is that in this combination of coding style and tooling print statements become redundant
> third-party addon called Bugsink
If by third-party you mean "the data flows to a third party" you're mistaken, Bugsink is explicitly made to keep the data with you. If by "third party" you mean "not written by either myself or the creators of my language of choice, you're right.
Without giving an opinion on coding style, it seems like IDEs/debuggers could present the unnamed values this way.
I suggested this on a thread in /r/cpp a few years ago, and was downvoted heavily, and chewed out for the reason that coding for ease of debugging was apparently akin to baby killing.
Ahhhgh. That's too bad they did that - not doing this is the bane of code maintenance IMO.