Advent of Compiler Optimisations 2025

229 pointsposted 7 hours ago
by vismit2000

32 Comments

calibas

an hour ago

Advent of Computer Science Advent Calendars, Day 2

drob518

40 minutes ago

Seems we’ve reached that point.

bspammer

4 hours ago

I really appreciate that despite being an obvious domain expert, he’s starting with the simple stuff and not jumping straight into crazy obscure parts of the x86 instruction set

ketanmaheshwari

an hour ago

I am personally interested in the code amalgamation technique that SQLite uses[0]. It seems like a free 5-10% performance improvement as is claimed by SQLite folks. Be nice if he addresses it some in one of the sessions.

[0] https://sqlite.org/amalgamation.html

nickelpro

37 minutes ago

Unity builds have been largely supplanted by LTO. They still have uses for build time improvements in one-off builds, as LTO on a non-incremental build is usually slower than the equivalent unity build.

Sponge5

11 minutes ago

At my company, we have not seen any performance benefits from LTO on a GCC cross-compiled Qt application.

GCC version: 11.3 target: Cortex-A9 Qt version: 5.15

I think we tested single core and quad core, also possibly a newer GCC version, but I'm not sure. Just wanted to add my two cents.

adev_

4 hours ago

Matt Godbolt is an absolute gem for the C & C++ community.

Many thanks to him for that.

Between that and compiler explorer, it is fair to say he made the world a better place for many of us, developers.

alberth

3 hours ago

After 25-years of software development, I still wonder whether I’m using the best possible compiler flags.

cogman10

2 hours ago

What I've learned is that the fewer flags is the best path for any long lived project.

-O2 is basically all you usually need. As you update your compiler, it'll end up tweaking exactly what that general optimization does based on what they know today.

Because that's the thing about these flags, you'll generally set them once at the beginning of a project. Compiler authors will reevaluate them way more than you will.

Also, a trap I've observed is setting flags based on bad benchmarks. This applies more to the JVM than a C++ compiler, but never the less, a system's current state is somewhat random. 1->2% fluctuations in performance for even the same app is normal. A lot of people won't realize that and ultimately add flags based on those fluctuations.

But further, how code is currently layed out can affect performance. You may see a speed boost not because you tweaked the loop unrolling variable, but rather your tweak may have relocated a hot path to be slightly more cache friendly. A change in the code structure can eliminate that benefit.

alberth

21 minutes ago

Doesn't -O2 still exclude any CPU features from the past ~15 years (like AVX).

If you know the architecture and oldest CPU model, we're better served with added a bunch more flags, no?

I wish I could compile my server code to target CPU released on/after a particular date like:

  -O2 -cpu-newer-than=2019

SubjectToChange

5 minutes ago

A CPU produced after a certain date is not guaranteed to have the every ISA extension, e.g. SVE for Arm chips. Hence things like the microarchitecure levels for x86-64.

201984

2 hours ago

What's your reason for -O2 over -O3?

cogman10

2 hours ago

Historically, -O3 has been a bit less stable (producing incorrect code) and more experimental (doesn't always make things faster).

Flags from -O3 often flow down into -O2 as they are proven generally beneficial.

That said, I don't think -O3 has the problems it once did.

sgerenser

an hour ago

-O3 gained a reputation of being more likely to "break" code, but in reality it was almost always "breaking" code that was invalid to start with (invoked undefined behavior). The problem is C and C++ have so many UB edge cases that a large volume of existing code may invoke UB in certain situations. So -O2 thus had a reputation of being more reliable. If you're sure your code doesn't invoke undefined behavior, though, then -O3 should be fine on a modern compiler.

drob518

37 minutes ago

Exactly. A lot of people didn’t understand the contract between the programmer and the compiler that is required to use -O3.

wavemode

2 hours ago

You have to profile for your specific use case. Some programs run slower under O3 because it inlines/unrolls more aggressively, increasing code size (which can be cache-unfriendly).

grogers

an hour ago

Yeah, -O3 generally performs well in small benchmarks because of aggressive loop unrolling and inlining. But in large programs that face icache pressure, it can end up being slower. Sometimes -Os is even better for the same reason, but -O2 is usually a better default.

bluGill

an hour ago

Most people use -O2 and so if you use -O3 you risk some bug in the optimizer that nobody else noticed yet. -O2 is less likely to have problems.

In my experience a team of 200 developers will see 1 compiler bug affect them every 10 years. This isn't scientific, but it is a good rule of thumb and may put the above in perspective.

macintux

an hour ago

Would you say that bug estimate is when using -O2 or -O3?

bluGill

an hour ago

The estimate includes visual studio, and other compilers that are not open source for whatever optimization options we were using at the time. As such your question doesn't make sense (not that it is bad, but it doesn't make sense).

In the case of open source compilers the bug was generally fixed upstream and we just needed to get on a newer release.

nickelpro

32 minutes ago

People keep saying "O3 has bugs," but that's not true. At least no more bugs than O2. It did and does more aggressively expose UB code, but that isn't why people avoid O3.

You generally avoid O3 because it's slower. Slower to compile, and slower to run. Aggressively unrolling loops and larger inlining windows bloat code size to the degree it impacts icache.

The optimization levels aren't "how fast do you want to code to go", they're "how aggressive do you want the optimizer to be." The most aggressive optimizations are largely unproven and left in O3 until they are generally useful, at which point they move to O2.

SubjectToChange

14 minutes ago

More aggressive optimization is necessarily going to be more error prone. In particular, the fact that -O3 is "the path less traveled" means that a higher number of latent bugs exist. That said, if code breaks under -O3, then either it needs to be fixed or a bug report needs to be filed.

squater

5 hours ago

You can never have too much Godbolt!

alfanick

4 hours ago

Is there a PDF somewhere? I'm not really able to follow YT videos.

philipportner

3 hours ago

There's a link to the AoCO2025 tag for his blog posts in the op.

filosofo_rancio

5 hours ago

Thanks for sharing, I've always found optimizing a really interesting field, I will keep a close eye!

ktallett

6 hours ago

This is really cool. Congrats on the quality of the work!

NooneAtAll3

3 hours ago

I don't understand

where is the problem to be solved?

eapriv

2 hours ago

The problem is “to add two numbers”. The meta-problem is “to learn how computers work”.

azundo

24 minutes ago

I think they're expecting a daily problem set like Advent of Code. This is not a set of problems to solve, it's a series with one release per day in December, similar to an Advent calendar.