Nix Derivation Madness

188 pointsposted 3 months ago
by birdculture

55 Comments

edolstra

3 months ago

The deriver field in Nix has always been a misfeature. It was intended to provide traceability back to the Nix expression used to create the derivation, but it doesn't actually do that (since that wasn't really possible in the pre-flakes world, without hermetic evaluation). So instead it just causes a lot of confusion when the deriver recorded in the binary cache doesn't match the local evaluation result, due to fixed-output derivations changing.

In the future, Nix will hopefully gain proper provenance tracking that will tell you exactly where a store path came from: https://github.com/NixOS/nix/pull/11749

Ericson2314

3 months ago

The biggest problem of all is that derivers are not unique! A separate "build trace" map will solve this.

tomberek

3 months ago

Presumably this would support a big improvement to both SBOM generation as well as various UX features and workflow improvements.

setheron

3 months ago

is that the 'build-trace' feature I saw John write about ? (I want to explore that more)

Ericson2314

3 months ago

I think Eelco has in mind a separate thing that would still be a store object field. But IMO we should not do that since derives are unique, and we should instead use the "build trace" instead, which properly handles that.

As Martin Schwaighofer has discussed, it is fine and in fact good for build traces entries to have arbitrary meta data, so the "claims" being cryptographically signed are more precise. (This is good for auditing, and if something looks suspicious, having full accountability.)

So on that grounds, if eelco would like to include some "this came from this flake" information as informal metadata. (formally the key must still the resolved derivation.) That is fine with me.

---

As I linked in my other reply, see my fast-growing https://github.com/NixOS/nix/pull/14408 docs PR where I try to formally nail all this stuff down for the first time.

mschwaig

3 months ago

I mentioned another alternative to adding flake-specific metadata to data structures that are transferred over the network, as part of the signed traces or otherwise, in a comment on that PR Eelco linked.

It's keeping flake-specific data locally, to guarantee that it matches how the user ended up with the data, not how the builder produced it. I think otherwise from the user POV such data could again look misleading.

ronef

3 months ago

+1 to Farid, great write-up! What you’re seeing is the long-standing “deriver” mismatch: fixed-output derivations can change their .drv without changing the output path. Eelco is calling it out as well in the comment below. I believe the idea behind the path forward is there but happy to hear more!

Also. Check out Farid's other posts.

amelius

3 months ago

> The road to Nix enlightenment is no joke and full of dragons.

Nix was a great research project. Now is the time to rewrite it from the ground up.

Ericson2314

3 months ago

The core store layer is quite small, and I am trying to thoroughly document it, with all 3 of:

- a more "academic" spec of what it does

- nuts-and-bolts JSON schema for many data types

- JSON golden tests instead of C++ literals in the unit tests as often as possible.

I hope this will make additional store layer easy to churn out.

(The "hash derivation modulo" that is so fiddly described in this blog post can be dropped in a world where we no longer have input addressing, and just have content-addressing. Or, in a world where we have a new, simpler type of input-addressing instead.)

jbstack

3 months ago

Well, there's Guix as an alternative if you want a similar concept but different implementation philosophy. For me the major disadvantage of Guix is lack of package availability compared to Nix.

n8henrie

3 months ago

I really wish Guix worked on macOS. Nix-Darwin and home-manager have been game changers -- sharing much config and tooling between my Mac, arch, and nixos machines has been a blessing.

amelius

3 months ago

Isn't there a way to transpile the scripts from Nix to Guix?

brendyn

3 months ago

In practicing no because in the end it generally takes a human intelligence to fully understand the requirements of a particular program, sanity check everything, get the right dependency versions and fix build errors. For code library repositories like rust, importing is fairy automated since everything is neat, tidy, and regular. But end user applications are more often than not a pain in the ass

c0balt

3 months ago

That would be possible. The main problem there is that nixpkgs, the package repository one would want to translate, uses a good chunk of specialized build infrastructure (parts in nix, some in rust/Perl/Python) that is designed for nix (the package manger).

Some other semi-specific parts, like stdenv bootstrapping, are also a bit more complex than just some nix build instructions.

Y_Y

3 months ago

It's not to hard to translate manually, but since the dependency tree is massive it doesn't seem feasible to do wholesale.

zamalek

3 months ago

AFAIK Guix uses parts of Nix as a backend.

tkz1312

3 months ago

Guix uses a fork of the nix daemon

c0balt

3 months ago

Guix uses the sandboxing logic iirc

mystifyingpoi

3 months ago

I feel the same about HCL in Terraform. The tool is perfect, the language is bollocks.

WillDaSilva

3 months ago

Pulumi may be what you're looking for. Same concept as Terraform, and many of its provider libraries are just wrappers around Terraform provider libraries, but you can use a variety of common programming languages to declare your desired state, rather than HCL.

mystifyingpoi

3 months ago

Yeah, I tried it briefly some time ago and it seems like a solution.

otabdeveloper4

3 months ago

It has been rewritten a few times already. The "fixed output hash" is a dirty optimisation hack borne out of real-world needs and not a research idea.

Valodim

3 months ago

Eh. This can be applied to so many technologies that run the world..

beardsciences

3 months ago

If I understand this correctly, upcoming Ca-derivations will fix this by making these situations expected, properly-handled cases rather than a weird bug? https://nixos.wiki/wiki/Ca-derivations

Ericson2314

3 months ago

Yes, a hope of mine is that we can stop using "hash derivation modulo" entirely.

I've recently started some fancy formal spec-level documentation here https://github.com/NixOS/nix/pull/14408 The "resolution" equivalence class is both simpler and better than the "hash derivation modulo ..." one.

(The fact that it is a mouthful to say what the derivations are modulo kinda gives the game away! I put "hash quotient derivation" in the docs to side-step the issue.)

edolstra

3 months ago

To be clear, there is no bug here: derivers are simply not uniquely determined in the presence of fixed-output derivations, which is by design. That's even more true with CA derivations.

CA derivations also introduce the opposite situation, namely that the same derivation can produce different output paths for different users (if the build is not bitwise reproducible).

setheron

3 months ago

pick your poison: 1:N or N:1 ;P

Ericson2314

3 months ago

It's both, multiple derivations can produce the same (content-addressed) store object, and the derivations may not be reproducible and produce different (content-addressed) store objects each time.

The reality of executing arbitrary programs on non-deterministic computers is, unfortunately, N:M!

(Cue deterministic WASM derivations or something.)

setheron

3 months ago

ca-derivations from what i understand, fixed-output derivations but more general.

The point of the article to me (author) was that i found it odd that Nix replaces the derivations when calculating the output path but not the derivation path. (talking about "paths" in Nix is so hard!)

beardsciences

3 months ago

That makes sense, thanks for clarifying. Great writeup.

huem0n

3 months ago

As a mere mortal I find none of this surprising, mostly because I never understood any of it in the first place ... :)

eviks

3 months ago

> nix/store/24v9wpp393ib1gllip7ic13aycbi704g-ruby-3.3.9.drv

A different type of madness, but are ugly names so common, why not start with ruby-3.3.9 so any list of files is semantically sorted/readable?

rkomorn

3 months ago

The package name is "secondary" information in this context. The hash is the primary one because it's stable unless the input changes.

The semantic is "what did this configuration generate", not "what's this package's version".

eviks

3 months ago

it's primary for every human involved, also, the way you check whether it's changed is by automatically comparing that full hash, not its starting symbols, so you don't care where in the full string it's positioned

> The semantic is "what did this configuration generate", not "what's this package's version".

Then why have the name/version at all like in those nameless cache dirs?

rkomorn

3 months ago

It made sense to me when I looked at it, at mount points, at when it changed vs when it didn't, etc, so IDK what to tell you.

FWIW, I'm also pretty sure I'm human.

Edit: also, I'm pretty sure that I wouldn't find it any more or less complicated if the package name came first.

Kootle

3 months ago

In nix packages (derivations) are so lightweight that your store has tens of thousands of them, many with the same name, or with no meaningful name at all. On the rare occasions that you need to look in the store for a package you’re much more likely to be looking for a particular hash than a particular name. That, and having the hash as a prefix looks nicer in tabular output.

Ericson2314

3 months ago

If I had my way

1. store paths would have no names at all

2. listing the contents of the store directory would not be allowed

3. store paths have more bits of information

Then store paths are halfway decent (but non-revocable) capabilities.

eviks

3 months ago

> 2. listing the contents of the store directory would not be allowed

Wow, that's awful, that's what Windows AppStore does, so it's even hard to see how much of the preinstalled garbage there is or even whether you might have a huge game you forgot to uninstall but might want to to free up some space.

What's the cool benefit that could justify this limitation?

tracnar

3 months ago

What actually happens if you remove read permissions on the /nix/store directory? Do things still work? I suppose I'll need to try!

vatsachakrvthy

3 months ago

How could one debug if we couldn't view contents of the store directory?

tkz1312

3 months ago

I grep through the store pretty regularly looking for names. The tone of the original comment is annoying but the suggestion is imo quite a good one.

otabdeveloper4

3 months ago

It's done that way on purpose. Precisely so you don't try to use the paths semantically. The names literally mean nothing in this context.

eviks

3 months ago

That contradicts the simple fact that the name includes "ruby" and isn't just a hash

otabdeveloper4

3 months ago

That name is only there for debugging purposes. It doesn't actually mean anything and you only ever need to look at it to debug some hoary failing build.

whacked_new

3 months ago

I can't find the video of the talk where either Eelco Dolstra (nix) or Todd Gamblin (spack) talks about this, but IIRC it's a design decision; you generally don't go spelunking in the nix store, but if you do, and you ls /nix/store, you'll see a huge list of packages, and due to the hash being a constant length, you can visually separate the tails of the package names like

    0009flr197p89fz2vg032g556014z7v1-libass-0.17.3.drv
    000ghm78048kh2prsfzkf93xm3803m0r-default.md
    001f6fysrshkq7gaki4lv8qkl38vjr6a-python-runtime-deps-check-hook.sh.drv
    001gp43bjqzx60cg345n2slzg7131za8-nix-nss-open-files.patch
    001im7qm8achbyh0ywil6hif6rqf284z-bootstrap-stage0-binutils-wrapper-boot.drv
    001pc0cpvpqix4hy9z296qnp0yj00f4n-zbmath-review-template.r59693.tar.xz.drv
Spack, another deterministic builder / package manager, IIRC uses the reversed order so the hash is at the tail. Pros/cons under different search / inspection conditions.

eviks

3 months ago

> Pros/cons under different search / inspection conditions.

But what's the pro? The tail alignment is worse than the head alignment since you read head to tail, not the other way aground

whacked_new

3 months ago

Comparing nix style (hash head, package tail), and spack style (package head hash tail), and speaking from my own limited experience, the need arises in different cases, which also depends on the ease of tooling,

sometimes I'm grepping in /nix/store and you have (as shown earlier) a list of derivation paths like this

$ ls /nix/store | grep nodejs-2 | head | sed 's/^/ /'

    0a9kkw6mh0f80jfq1nf9767hvg5gr71k-nodejs-22.18.0.drv
    0pmximcv91ilgxcf9n11mmxivcwrczaa-nodejs-22.14.0-source.drv
    0zzxnv3kap4r4c401micrsr3nrhf87pa-nodejs-20.18.1-fish-completions.drv
    2a7y7d38x8kwa8hdj6p93izvrcl9bfga-nodejs-22.11.0-source.drv
    2gcjb0dibjw8c1pp45593ykjqzq5sknm-nodejs-20.18.1-source.drv
and thus as designed, your eyes ignore the block of hashes and you see the "nodejs-..." stuff

You might ask why are you grepping? Because it's fast and familiar and I don't know the native tooling as easily (possibly a UX problem).

Then in spack (see https://spack.readthedocs.io/en/latest/package_fundamentals....) they have

$ spack find --paths

    ==> 74 installed packages.
    -- linux-debian7-x86_64 / gcc@4.4.7 --------------------------------
        ImageMagick@6.8.9-10  ~/spack/opt/linux-debian7-x86_64/gcc@4.4.7/ImageMagick@6.8.9-10-4df950dd
        adept-utils@1.0       ~/spack/opt/linux-debian7-x86_64/gcc@4.4.7/adept-utils@1.0-5adef8da
        atk@2.14.0            ~/spack/opt/linux-debian7-x86_64/gcc@4.4.7/atk@2.14.0-3d09ac09
and

$ spack find --format "{name}-{version}-{hash}"

    autoconf-2.69-icynozk7ti6h4ezzgonqe6jgw5f3ulx4
    automake-1.16.1-o5v3tc77kesgonxjbmeqlwfmb5qzj7zy
    bzip2-1.0.6-syohzw57v2jfag5du2x4bowziw3m5p67
    bzip2-1.0.8-zjny4jwfyvzbx6vii3uuekoxmtu6eyuj
    cmake-3.15.1-7cf6onn52gywnddbmgp7qkil4hdoxpcb
you get the package name immediately from the left, which is nice, and you can pipe that straight to `sort`, but where the hash starts is more jagged on the right so there's a bit more noise when you're looking at the numbers. In the end the information is identical and it's a UX difference.

Tradeoffs wise I think they both made the right choice. Because for nix, the packages are almost always in /nix/store, so the path length including the hash is almost always the same.

For spack you can place your packages anywhere so the base directories can be highly variable, and so it's sensible to have the package names immediately after the package directory.

Or, I'm just trying to rationalize the choices each designer made post-hoc. But after using both I appreciate the design considerations that went in. In the end, humans are inefficient. When I make name / version / hash identifiers in my own applications I end up using one or the either design.

setheron

3 months ago

I would be interested in that video.

XorNot

3 months ago

The reason it's like this is because the only way to reliably grab it is to cut the string at the first hyphen - then the rest can be almost free text.

It you do it the other way it's harder. You can try this with nix commands /nix/store/<hash>-x is a valid way to refer to something in the store most of the time.

Kudos

3 months ago

You could just use the last hyphen. It's not as simple since you need to scan the whole string, but it certainly doesn't seem like a challenge to me.

XorNot

3 months ago

Yes but why pay that runtime cost for a problem the user can solve in human time using grep?

singron

3 months ago

It really doesn't matter. As a normal user, you don't use `drv` files directly, and everything you configure yourself will use attribute paths in nixpkgs. E.g. `pkgs.ruby` or `pkgs.ruby_3_3`.