hackernews client

Logging all C++ destructors, poor mans run-time tracing

90 pointsposted 10 months ago

49 Comments

loeg

10 months ago

But what was the shutdown bug you were trying to identify? Was this destructor logging actually useful? The article teases the problem and provides detailed instructions for reproducing the logging, but doesn't actually describe solving the problem.

jprete

10 months ago

Address/MemorySanitizer are also meant for this kind of problem. https://github.com/google/sanitizers/wiki/AddressSanitizer https://github.com/google/sanitizers/wiki/MemorySanitizer

Also valgrind, but I'm more familiar with the first two.

richardwhiuk

10 months ago

The author explicitly name checks valgrind.

I think plain gdb would have been sufficient if it's exiting with a segfault or terminating...

kccqzy

10 months ago

One of my favorite hacky workarounds is to simply call _exit(0) for an immediate exit without running destructors. Most of the time, the destructors are just freeing memory that will be reclaimed by the OS anyways so they are not worth running if you know the program is exiting. And even if the destructor does more than just freeing memory, maybe the work it's doing isn't needed if you know the process is ending soon: maybe it's joining threads or releasing mutexes or deleting timers.

You will find that in a typical C++ codebase, the destructors that do useful things (say, flushing useful buffers and closing files etc) are much fewer.

ignoramous

10 months ago

Sub-optimal if the exiting process must relinquish resources external to the current process or the host.

gpderetta

10 months ago

Optimal of you are writing crash-only, i.e. reliable, software.

TeMPOraL

10 months ago

Suboptimal if you're relying on RAII not for "regular" resources, but more abstract ones like "correctly bookended piece of persisted data".

Then again, I guess it's optimal if you're writing fault-resistant software.

akira2501

10 months ago

And you guarantee you can release those resources with out error or exception all the time? If this is not the case then a destructor is the wrong place to be doing this work anyways. If an external service requires a client experiencing an error to manually release resources than that external service has significant design or protocol issues.

ignoramous

10 months ago

> guarantee you can release those resources with out error or exception all the time

releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.

> a destructor is the wrong place

I commented on GP's use of exit() not on finalizers / destructors.

> If an external service requires a client experiencing an error to manually release resources...

You perhaps missed what I wrote: It isn't optimal, if a failing client never signals release (think: distributed lock).

akira2501

10 months ago

> releasing resources (optimally) and resource exhaustion / ageing (crash recovery) are orthogonal.

In concept. In the reality of implementation they are not.

> I commented on GP's use of exit() not on finalizers / destructors.

You said they were "suboptimal." Implying there is an available optimal solution. I'm challenging that exact notion.

> It isn't optimal,

You perhaps missed the point. It _can't_ possibly be optimal given the nature of the problem itself. These are the wrong terms to understand the problem in.

ignoramous

10 months ago

> In concept. In the reality of implementation they are not.

Recovery flows are very different in both concept and implementation.

> It can't possibly be optimal given the nature of the problem itself.

What's in the "nature" of this problem that one should never bother releasing resources once done? I mean, if any program creates an external resource (say, unix domain socket or shared mem), it isn't okay if it never releases it.

rqtwteye

10 months ago

I did this a long time ago with macros. It helped me to find a ton of leaks in a huge video codec codebase.

I still don't understand the hate for the C preprocessor. It enables doing this like this without any overhead. Set a flag and you get constructor/destructor logging and whatever else you want. Don't set it and you get the regular behavior. Zero overhead.

jonathrg

10 months ago

The hate might have to do with it being such a primitive and blunt tool; doing anything moderately complex becomes extremely complicated and fragile.

tialaramex

10 months ago

Yeah, this very primitive tool easily creates the programming equivalent of the "iwizard problem".

[You replace straight forwardly "mage" with "wizard" and oops, now your images are "iwizards" and your "magenta" is "wizardnta"]

immibis

10 months ago

By contrast something like AspectJ for C++ (if such a thing existed) could express this requirement cleanly.

akira2501

10 months ago

Complex arrangements _are_ fragile.

There is no magic environment that can fix this for you. If you feel you've seen one, then you've focused on the parts of it that were important to you, while ignoring the parts of it everyone else actually needs.

user

10 months ago

[deleted]

synergy20

10 months ago

do you have a write-up how you did it? I'm interested, thanks.

naruhodo

10 months ago

I did a similar thing, in C++, 3 decades ago. I used a macro, FUNC(), that I would put at the start of functions. It took no arguments and declared a local instance, using the __FUNC__ preprocessor builtin to pass the function name to the Trace constructor:

    Trace trace##__LINE__(__FUNC__);

The Trace instance would generate one log on construction and another on destruction. It also kept track of function call nesting (a counter) in a static member that would increment in the constructor and decrement in the destructor. It was inherently single-threaded, because I used a static member, but it could be adapted to multiple threads using thread local storage. I paired it with a LINE("Var x is " << x); macro for arbitrary ostreams-style logging. And building on that, EXPR(x) would do LINE(#x " = " << (x)). The output was along the lines of:

    ,- A::f()
    | ,- A::g()
    | | ,- B::B()
    | | `- B::~B()
    | | x = 12
    | | About to do a thing...
    | | ,- A::doAThing()
    | | `- A::doAThing()
    | `- A::g()
    `- A::f()

The macros could be disabled (defined to do nothing) by a preprocessor symbol.

synergy20

10 months ago

gcc's -finstrument-functions can also add the function call traces without changing the code I think

MontagFTB

10 months ago

I consider Tracy the state of the art for profiling C++ applications. It’s straightforward to integrate, toggle, gather data, analyze, and respond. It’s also open source, but rivals any product you’d have to pay for:

https://github.com/wolfpld/tracy

Veserv

10 months ago

Looks fine, but it does not look like there is a automatic full function entry/exit trace, just sampling. The real benefit is when you do not even need to insert manual instrumentation points, you just hit run and you get a full system trace.

How well does the visualizer handle multi-TB traces? Usually pretty uncommon, but a 10-100 GB is not that hard to produce when doing full tracing.

jms55

10 months ago

Of note is that tracy is aimed at games, where sampling is often too expensive and not fine-grained enough. Hence the manual instrumenting.

For the Bevy game engine, we automatically insert tracy spans for each ECS system. In practice, users can just compile with the tracy feature enabled, and get a rough but very usable overview of which part of their game is taking a long time on the CPU.

Veserv

10 months ago

I was talking about automatic instrumentation of every single function call by default. No manual instrumentation needed because everything is already instrumented.

To be fair, you do still want some manual instrumentation to correlate higher level things, but full trace everywhere answers most questions. You also want to be able to manually suppress calls for small functions since that can be performance relevant or distorting, but the point is “default on, manual off” over “default off, manual on”.

jpc0

10 months ago

How would you implement this may I ask? C++ does not have reflection in the language so you at best can do that by hooking into the running application, but C++ also aggressively inlines functions on anything except -O0 which mean your function call might never be a function call. Running at -O0 is alsp just generally a bad idea since many many instances of UB will never get caught.

The only way I can see doing this at compile time is with a compiler extension but then you are entirely locked in to 1 compiler.

Maybe if you compile with debug symbols but then well, you are shipping debug symbols...

TickleSteve

10 months ago

gcc has "-finstrument-functions". This calls your code on every function entry and exit. I've used this previously for tracing as described here and to move memory-protection windows around based on the running code.

Veserv

10 months ago

If you want something passable, most compilers have function prologue/epilogue hooks that you can write in plain code. Realistically, development and test systems are only going to target like 1 or 2 compilers, so it is not very much work. Unless you are distributing a source library and you want to get full-trace telemetry from customer systems in-development, that is probably all you really need to do.

If you want to really zoom, you need to get the hooks inlined and probably just written in straight assembly. You then need to optimize your binary format and recording system. You then need to start optimizing your memory bandwidth usage when that becomes the bottleneck. Your overhead in the end is basically limited by memory bandwidth; you can only shovel so many tens of GB/s of logging into memory. Note that persistence has likely been infeasible for the last 2 or so orders of magnitude; RAM is likely the only storage consistently fast enough for the data rates you want to generate when doing this.

jpc0

10 months ago

Makes sense and that's for three detailed answer. May put that one on the list of things to play with in the future.

andersa

10 months ago

This would be unbelievably inefficient, game engines will be running hundreds of millions of functions per second. And if the code runs 10x slower with the trace active, then it's no longer sensible.

We use sampling for the cases where this level of detail is needed as it has lower overhead.

What use case did you find this useful for?

Veserv

10 months ago

No, it is quite reasonable with efficient implementation. 10-50% overhead or so depending on function size distribution (since it is small fixed overhead per call, smaller functions result in a greater fraction of overhead). 10x for just function entry/exit recording would be grotesquely inefficient. You can do inefficient time travel debugging recording for less than that.

You do need to allocate a ton of memory for the recording buffer to record sizable amounts of trace data. GB per core-second of trace or so (ring buffer so you get to see the last N seconds, not you need to run for less than N seconds) but that is fine during development on normal dev machines.

It is useful for everything. Why would you not want full traces for everything? It is amazing. We use it for everything internally where I work. Or rather, it is part of it. We actually prefer full time travel debugging during development and automated testing (again, overhead is low enough) but it is not available for everything. So sometimes we are stuck with just traces.

donadigo

10 months ago

Every function would be pretty overkill, but you can automatically install instrumentation on functions on demand or with a pre-determined user selection. You can even instrument on a line basis if your instrumentation is cheap enough. I've experimented with this in a VS extension I'm developing and I could easily browse through a non-trivial game codebase without causing noticeable performance overhead [1]. In the demo, the instrumentation is auto-installed on all functions within the file you opened. Obviously, this is just one project I was testing on but it shows that this type of tracing is feasible.

[1] https://www.youtube.com/watch?v=3PnVG49SFmU

user

10 months ago

[deleted]

rerdavies

10 months ago

Alas, not for Linux. I've been using the unloved and mostly abandoned (and mostly awful) google perf tools on Linux. :-(

jchw

10 months ago

Hmm? I haven't used Tracy yet but the demo trace they show at the URL linked on GitHub[1] sure looks like a trace from an application running on Linux. The documentation[2] also seems to reference what you need to run it on Linux, and the NixOS derivation[3] also suggests it runs on at least Linux and macOS, and I was able to run several of the binaries including the UI and capture binary. I still hesitate to doubt you on this because I haven't figured out how one is supposed to actually use it but it surely seems to support Linux. (I will definitely find a use for this, it looks amazing.)

[1]: https://tracy.nereid.pl/

[2]: https://github.com/wolfpld/tracy/releases/latest/download/tr...

[3]: https://github.com/NixOS/nixpkgs/blob/nixos-24.05/pkgs/devel...

dkersten

10 months ago

I’ve used Tracy on Linux about two months ago. Works fine.

HellsMaddy

10 months ago

Tracy works fine on Linux! Just used it today.

rerdavies

10 months ago

OK. Thanks for straightening me out. I saw the .msi, but no linux packages. I will definitely check it out!

las_balas_tres

10 months ago

That github repo contains the sources. You can also compile tracy by cloning the repo.

user

10 months ago

[deleted]

neverartful

10 months ago

I did something similar once but my implementation didn't rely on any compiler features. I made tracing macros for constructors, destructors, and regular c++ methods. If the tracing was turned on in the macros, the information given to the macro (class name, method name, etc.) would be passed to the tracing manager. The tracing manager would serialize to a string and send it through a TCP socket. I also wrote a GUI tracing monitor that would listen on a socket for tracing messages and then display the trace messages received (including counts by class and method). The tracing monitor had filters to tweak. It was a nice tool to have and was very instrumental in finding memory leaks and obscure crashes. This was back in the late 1990s or early 2000s.

ASAN also checks for memory leaks like valgrind, the main difference with the tools is whether you can recompile all of libraries to get the compiler support for detection or whether binary instrumentation is better (https://github.com/google/sanitizers/wiki/AddressSanitizerLe...)