jandrewrogers
4 months ago
I’ve seen many examples of a similar phenomenon. Modern compilers are impressively good at generating bit-twiddling micro-optimizations from idiomatic code. They are also good at larger scale structural macro-optimization. However, there is a No Man’s Land for compiler optimization between micro-optimization and macro-optimization where the effectiveness of compiler optimizations are much less reliable.
Intuitively I understand why it is a hard problem. Micro-optimizations have deterministic properties that are simple enough that optimality is all but certain. Macro-optimization heuristics create incidental minor pessimization effects that are buried below the noise floor by major optimization effects on average.
In the middle are optimizations that are too complex to guarantee effective optimization but too small in effect to offset any incidental pessimization. It is easy to inadvertently make things worse in these cases. Most cases of surprising performance variances I see fall into this gap.
It is also where the codegen from different compilers seems to disagree the most, which lends evidence to the idea that the “correct” codegen is far from obvious to a compiler.
fooker
4 months ago
It's a hard problem because the cost model is incredibly complicated.
For example: you decide to unroll/inline/vectorize code has unpredictable effect on performance.
bell-cot
4 months ago
Or worse - it has perfectly predictable effects on the performance...so long as you know details (core counts, cache sizes, etc.) of the CPU model and dataset which your code will be run on, and the TLB/cache/fabric/memory usage pattern of other jobs which might be running at the same time.
fooker
4 months ago
No, maybe you misunderstand the problem here?
Given a specific program, yes it is measurable and predictable.
A compiler has to deal with arbitrary programs, actually - small fragments of arbitrary programs at a time. Even if you could have an oracle giving you very precise measurements for the program fragment the compiler is dealing with, it won't hold up in practice.
Consider a small representative example. Suppose the compiler is compiling a function, and it did the best possible job.
Now, inlining this function at two callsites could have completely different second order effects like further optimizations, register pressure. There are partial solutions of course, people have spent their entire careers dealing with this specific problem. Rest assured that whatever you can come up with after thinking for a minute has been tried in 1970.
fancyfredbot
4 months ago
I think OP was agreeing with you.
fooker
4 months ago
My point was - even if you knew everything about the hardware and could measure performance with absolute precision, it is still a hard problem for compilers for many many reasons.
algo_trader
4 months ago
> also good at larger scale structural macro-optimization
Can u give some examples?
Perhaps you mean small granular choices which occur widely throughout the code?
viraptor
4 months ago
Automatic unrolling + inlining + constant hoisting (+ vectorisation)? I'd call that macro, but maybe op had something different in mind.