adev_
3 days ago
Feedback of someone who is used to manage large (>1500) software stack in C / C++ / Fortran / Python / Rust / etc:
- (1) Provide a way to compile without internet access and specify the associated dependencies path manually. This is absolutely critical.
Most 'serious' multi-language package managers and integration systems are building in a sandbox without internet access for security reasons and reproducibility reasons.
If your build system does not allow to build offline and with manually specified dependencies, you will make life of integrators and package managers miserable and they will avoid your project.
(2) Never ever build in '-03 -march=native' by default. This is always a red flag and a sign of immaturity. People expect code to be portable and shippable.
Good default options should be CMake equivalent of "RelWithDebInfo" (meaning: -O2 -g -DNDEBUG ).
-O3 can be argued. -march=native is always always a mistake.
- (3) Allow your build tool to be built by an other build tool (e.g CMake).
Anybody caring about reproducibility will want to start from sources, not from a pre-compiled binary. This also matter for cross compilation.
- (4) Please offer a compatibility with pkg-config (https://en.wikipedia.org/wiki/Pkg-config) and if possible CPS (https://cps-org.github.io/cps/overview.html) for both consumption and generation.
They are what will allow interoperability between your system and other build systems.
- (5) last but not least: Consider seriously the cross-compilation use case.
It is common in the world of embedded systems to cross compile. Any build system that does not support cross-compilation will be de facto banned from the embedded domain.
Teknoman117
3 days ago
As someone who has also spent two decades wrangling C/C++ codebases, I wholeheartedly agree with every statement here.
I have an even stronger sentiment regarding cross compilation though - In any build system, I think the distinction between “cross” and “non-cross” compilation is an anti-pattern.
Always design build systems assuming cross compilation. It hurts nothing if it just so happens that your host and target platform/architecture end up being the same, and saves you everything down the line if you need to also build binaries for something else.
bsder
3 days ago
> In any build system, I think the distinction between “cross” and “non-cross” compilation is an anti-pattern.
This is one of the huge wins of Zig. Any Zig host compiler can produce output for any supported target. Cross compiling becomes straightforward.
sebastos
2 days ago
Amen. It always baffled me that cross compiling was ever considered a special, weird, off-nominal thing. I’d love to understand the history of that better, because it seems like it should have been obvious from the start that building for the exact same computer you’re compiling from is a special case.
pjmlp
2 days ago
Agree with the feedback.
Also the problem isn't creating a cargo like tool for C and C++, that is the easy part, the problem is getting more userbase than vcpkg or conan for it to matter for those communities.
CoastalCoder
3 days ago
> Never ever build in '-03 -march=native' by default. This is always a red flag and a sign of immaturity.
Perhaps you can see how there are some assumptions baked into that statement.
eqvinox
3 days ago
What assumptions would that be?
Shipping anything built with -march=native is a horrible idea. Even on homogeneous targets like one of the clouds, you never know if they'll e.g. switch CPU vendors.
The correct thing to do is use microarch levels (e.g. x86-64-v2) or build fully generic if the target architecture doesn't have MA levels.
tempest_
3 days ago
I build on the exact hardware I intend to deploy my software to and ship it to another machine with the same specs as the one it was built on.
I am willing to hear arguments for other approaches.
zahllos
3 days ago
Not the OP, but: -march says the compiler can assume that the features of that particular CPU architecture family, which is broken out by generation, can be relied upon. In the worst case the compiler could in theory generate code that does not run on older CPUs of the same family or from different vendors.
-mtune says "generate code that is optimised for this architecture" but it doesn't trigger arch specific features.
Whether these are right or not depends on what you are doing. If you are building gentoo on your laptop you should absolutely -mtune=native and -march=native. That's the whole point: you get the most optimised code you can for your hardware.
If you are shipping code for a wide variety of architectures and crucially the method of shipping is binary form then you want to think more about what you might want to support. You could do either: if you're shipping standard software pick a reasonable baseline (check what your distribution uses in its cflags). If however you're shipping compute-intensive software perhaps you load a shared object per CPU family or build your engine in place for best performance. The Intel compiler quite famously optimised per family, included all the copies in the output and selected the worst one on AMD ;) (https://medium.com/codex/fixing-intel-compilers-unfair-cpu-d...)
account42
2 days ago
> Not the OP, but: -march says the compiler can assume that the features of that particular CPU architecture family, which is broken out by generation, can be relied upon. In the worst case the compiler could in theory generate code that does not run on older CPUs of the same family or from different vendors.
Or on newer CPUs of the same vendor (e.g. AMD dropped some instructions in Zen that Intel didn't pick up) or even in different CPUs of the same generation (Intel market segmenting shenanigans with AVX512).
eslaught
2 days ago
Just popping in here because people seem to be surprised by
> I build on the exact hardware I intend to deploy my software to and ship it to another machine with the same specs as the one it was built on.
This is exactly the use case in HPC. We always build -march=native and go to some trouble to enable all the appropriate vectorization flags (e.g., for PowerPC) that don't come along automatically with the -march=native setting.
Every HPC machine is a special snowflake, often with its own proprietary network stack, so you can forget about binaries being portable. Even on your own machine you'll be recompiling your binaries every time the machine goes down for a major maintenance.
tempest_
2 days ago
If you get enough of them they can start to look like cattle.
Still, they are all the same breed.
eqvinox
3 days ago
I'm willing to hear arguments for your approach?
it certainly has scale issues when you need to support larger deployments.
[P.S.: the way I understand the words, "shipping" means "passing it off to someone else, likely across org boundaries" whereas what you're doing I'd call "deploying"]
teo_zero
2 days ago
So, do you see now the assumptions baked in your argument?
> when you need to support larger deployments
> shipping
> passing it off to someone else
tom_
3 days ago
On every project I've worked on, the PC I've had has been much better than the minimum PC required. Just because I'm writing code that will run nicely enough on a slow PC, that doesn't mean I need to use that same slow PC to build it!
And then, the binary that the end user receives will actually have been built on one of the CI systems. I bet they don't all have quite the same spec. And the above argument applies anyway.
pjmlp
2 days ago
So I get you don't do neither cloud, embedded, game consoles, mobile devices.
Quite hard to build on the exact hardware for those scenarios.
dijit
3 days ago
What?! seriously?!
I’ve never heard of anyone doing that.
If you use a cloud provider and use a remote development environment (VSCode remote/Jetbrains Gateway) then you’re wrong: cloud providers swap out the CPUs without telling you and can sell newer CPUs at older prices if theres less demand for the newer CPUs; you can’t rely on that.
To take an old naming convention, even an E3-Xeon CPU is not equivalent to an E5 of the same generation. I’m willing to bet it mostly works but your claim “I build on the exact hardware I ship on” is much more strict.
The majority of people I know use either laptops or workstations with Xeon workstation or Threadripper CPUs— but when deployed it will be a Xeon scalable datacenter CPU or an Epyc.
Hell, I work in gamedev and we cross compile basically everything for consoles.
ninkendo
3 days ago
… not everyone uses the cloud?
Some people, gasp, run physical hardware, that they bought.
izacus
2 days ago
So you buy exact same generation of Intel and AMD chips to your developers than your servers and your cutomsers? And encode this requirement into your development process for the future?
ninkendo
a day ago
[flagged]
lkjdsklf
3 days ago
We use physical hardware at work, but it's still not the way you build/deploy unless it's for a workstation/laptop type thing.
If you're deploying the binary to more than one machine, you quickly run into issues where the CPUs are different and you would need to rebuild for each of them. This is feasible if you have a couple of machines that you generally upgrade together, but quickly falls apart at just slightly more than 2 machines.
dijit
2 days ago
And all your deployed and dev machines run the same spec- same CPU entirely?
And you use them for remote development?
I think this is highly unusual.
ninkendo
2 days ago
Lots of organizations buy many of a single server spec. In fact that should be the default plan unless you have a good reason to buy heterogeneous hardware. With the way hardware depreciation works they tend to move to new server models “in bulk” as well, replacing entire clusters/etc at once. I’m not sure why this seems so foreign to folks…
Nobody is saying dev machines are building code that ships to their servers though… quite the opposite, a dev machine builds software for local use… a server builds software for running on other servers. And yes, often build machines are the same spec as the production ones, because they were all bought together. It’s not really rare. (Well, not using the cloud in general is “rare” but, that’s what we’re discussing.)
tempest_
2 days ago
There is a large subset of devs who have worked their entire career on abstracted hardware which is fine I guess, just different domains.
The size of your L1/L2/L3 cache or the number of TLB misses doesn't matter too much if your python web service is just waiting for packets.
PufPufPuf
3 days ago
The only time I used -march=native was for a university assignment which was built and evaluated on the same server, and it allowed juicing an extra bit of performance. Using it basically means locking the program to the current CPU only.
However I'm not sure about -O3. I know it can make the binary larger, not sure about other downsides.
adev_
3 days ago
> The only time I used -march=native
It is completely fine to use -march=native, just do not make it the default for someone building your project.
That should always be something to opt-in.
The main reason is that software are a composite of (many) components. It becomes quickly a pain in the ass of maintainability if any tiny library somewhere try to sneak in '-march=native' that will make the final binary randomly crash with an illegal instruction error if executed on any CPU that is not exactly the same than the host.
When you design a build system configuration, think for the others first (the users of your software), and yourself after.
hmry
3 days ago
-O3 also makes build times longer (sometimes significantly), and occasionally the resulting program is actually slightly slower than -O2.
IME -O3 should only be used if you have benchmarks that show -O3 actually produces a speedup for your specific codebase.
fyrn_
2 days ago
This various a lot between compilers. Clang for example treats O3 perf regressions a bugs In many cases at least) and is a bit more reasonable with O3 on. GCC goes full mad max and you don't know what it's going to do.
pclmulqdq
2 days ago
If you have a lot of "data plane" code or other looping over data, you can see a big gain from -O3 because of more aggressive unrolling and vectorization (HPC people use -O3 quite a lot). CRUD-like applications and other things that are branchy and heavy on control flow will often see a mild performance regression from use of -O3 compared to -O2 because of more frequent frequency hits due to AVX instructions and larger binary size.
atiedebee
2 days ago
I made a program with some inline assembly and tried O3 with clang once. Because the assembly was in a loop, the compiler probably didn't have enough information on the actual code and decided to fully unroll all 16 iterations, making performance drop by 25% because the cache locality was completely destroyed. What I'm trying to say, is that loop unrolling is definitely not a guarantee for faster code in exchange for binary size
pclmulqdq
2 days ago
Large blocks of inline assembly also destroy -O3. The compiler treats the asm statement as being essentially empty and makes decisions around it. Most inline asm is 1 instruction, so this is usually safe.
izacus
3 days ago
Not assumptions, experience.
I fully concur with that whole post as someone who also maintained a C++ codebase used in production.
tgma
3 days ago
> -march=native is always always a mistake
Gentoo user: hold my beer.
CarVac
3 days ago
Gentoo binaries aren't shipped that way
account42
2 days ago
They are shipped to a new system when you upgrade because reinstalling is for suckers.
greenavocado
3 days ago
Gentoo..... distributes binaries?
rascul
3 days ago
digitalPhonix
2 days ago
But not with march=native?
The distirbuted binaries use two standard instruction sets for x86-64 and one for arm like “march=x86-64-v3”
https://wiki.gentoo.org/wiki/Gentoo_binhost/Available_packag...
account42
2 days ago
You can have your own binary host or even just compile packages on another host on demand. -march=native is a concern in both cases.
jjmarr
3 days ago
It's also an option on NixOS but I haven't managed to get it working unlike Gentoo.
criticalfault
2 days ago
since you have a lot of experience, can I ask what do you think about this:
- skipping cmake completely? would this be feasible?
- integration of other languages in the project?
- how to handle qt?
adev_
2 days ago
> skipping cmake completely? would this be feasible?
Feasible but difficult. CMake has a tremendous user mass, so you do want to be able to use a CMake-based project as a dependency. The CMake Target/Config export system expose CMake internals and make that difficult to consume a CMake built project without CMake.
The cleanest way to do that is probably what xmake is doing: Calling cmake and extract targets information from CMake to your own build system with some scripting. It is flaky but xmake has proven it is doable.
That's said: CPS should make that easier on the longer term.
Please also consider that CMake is doing a lot of work under the hood to contains compiler quirks that you will have to do manually.
> integration of other languages in the project?
Trying to integrate higher level languages (Python, JS) in package managers of lower level languages (C, C++) is generally a bad idea.
The dependency relation is inverted and interoperability betweens package managers is always poor. Diamond dependency and conflicting versions will become quickly a problem.
I would advise to just expose properly your build system with the properties I described and use a multi-language package manager (e.g Nix) or, at default, the higher level language package manager (e.g uv with a scikit-build-core equivalent) on top of that.
This will be one order of magnitude easier to do.
> how to handle qt?
Qt is nothing special to handle.
Qt is a multi language framework (C++, MOC, QML, JS and even python for PySide) and need to be handle as such.
moralestapia
3 days ago
>15000
15000 what?
adev_
3 days ago
1500 C/C++ individual software components.
The 15000 was a typo on my side. Fixed.
moralestapia
3 days ago
I see, thanks. I didn't mind the number it just wasn't clear what was it about.