hinkley
6 hours ago
Let me save you fifteen minutes, or the rest of your life: They aren’t.
Profilers alter the behavior of the system. Nothing has high enough clock resolution or fidelity to make them accurate. Intel tried to solve this by building profiling into the processor, and that only helped slightly.
Big swaths of my career, and the resulting wins, started with the question,
“What if the profiler is wrong?”
One of the first things I noticed is that no profilers make a big deal out of invocation count, which is a huge source of information for continuing past tall tent poles or hotspots into productive improvement. I have seen one exception to this, but that tool became defunct sometime around 2005 and nobody has copied them since.
Because of cpu caches and branch prediction and amortized activities in languages or libraries (memory defrag, GC, flushing), many things get tagged by the profiler as expensive that are being scapegoated because they get stuck paying someone else’s bill. They exist at the threshold where actions can no longer be deferred and have to be paid for now.
So what you’re really looking for in the tools is everything that looks weird. And that often involves ignoring the fancy visualization and staring at the numbers. Which are wrong. “Reading the tea leaves” as they say.
SerCe
5 hours ago
> Let me save you fifteen minutes, or the rest of your life: They aren’t.
Knowing that all profilers aren't perfectly accurate isn't a very useful piece of information. However, knowing which types of profilers are inaccurate and in which cases is indeed very useful information, and this is exactly what this article is about. Well worth 15 minutes.
> And that often involves ignoring the fancy visualization and staring at the numbers.
Visualisations are incredibly important. I've debugged a large number [1] of performance issues and production incidents highlighted by the async profiler producing Brendan Gregg's flame graphs [2]. Sure, things could be presented as numbers, but what I really care about most of the time when I take a CPU profile from a production instance is – what part of the system was taking most of the CPU cycles.
pjc50
3 hours ago
> no profilers make a big deal out of invocation count
This is where we get into sampling vs. tracing profilers. Tracing is even more disruptive to the runtime, but gives you more useful information. It can point you at places where your O-notation is not what you expected it to be. This is a common cause of things which grid to a halt after great performance on small examples.
It gets even worse in distributed systems, which is partly why microservice-oriented things "scale" at the expense of a lot more hardware than you'd expect.
It's definitely a specialized discipline, whole-system optimization, and I wish I got to do it more often.
geokon
4 hours ago
im pretty sure performance counters count accurately. theyre a bit finnicky to use but they dont alter cpu execution.
last i had to deal with it.. which was eons ago.. Higher end CPUs like Xeons had more counters and more useful ones
im sure there are plenty of situations where theyre insufficient, but its absurd to paint the situation as completely always hopeless
mrjay42
3 hours ago
Last time I checked, Intel's MSRs (https://en.wikipedia.org/wiki/Model-specific_register) allow Intel PCM (https://github.com/intel/pcm) to work, are indeed used to profile, or "measure performance" (sorry if my vocabulary is not the most accurate). Last time I checked the code of Intel PCM, it still relies on hardcoded values for each CPU which are as close as possible to reality but are still an estimation.
It doesn't mean that you get wrong measurements, it means there's a level of inaccuracy that has to be accepted.
BTW, I am aware that Intel PCM is not a profiler, and more of a measurement tool, however you CAN you use it to 'profile' your program and see how it behaves in terms of computing and memory utilization (with deep analysis of cache behavior (cache hit, cache miss, etc.))
jstanley
an hour ago
If you think it's difficult to optimise performance with the numbers the profiler gives you, try doing it without them!
whatever1
3 hours ago
Heisenberg principle but for programming