acka
6 hours ago
I believe the XZ compromise partly stemmed from including binary files in what should have remained a source-only project. From what I remember, well-run projects such as those of the GNU project have always required that all binaries—whether executables or embedded data such as test files—be built directly from source, compiling a purpose-built DSL if necessary. This ensures transparency and reproducibility, both of which might have helped catch the issue earlier.
dijit
6 hours ago
thats not the issue, there will always be prebuilt binaries (hell, deb/rpm are prebuilt binaries).
The issue for xz was that the build system was not hermetic (and sufficiently audited).
Hermitic build environments that can’t fetch random assets are a pain to maintain in this era, but are pretty crucial in stopping an attack of this kind. The other way is reproducible binaries, which is also very difficult.
EDIT: Well either I responded to the wrong comment or this comment was entirely changed. I was replying to a comment that said. “The issue was that people used pre-built binaries” which is materially different to what the parent now says, though they rhyme.
jacquesm
3 hours ago
This is not going to be popular: I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken. It invites all kinds of trouble and makes it next to impossible to really achieve stability and to ensure that all code that is in the build has been verified. Because after you've done it four times the fifth time you won't be looking closely. But if you don't do it automatically but only when you actually need it you will be looking a lot more sharpish at what has changed since you last pulled in the code. Especially for older and stable libraries the consumers should dictate when they upgrade, not some automatic build process. But because we're all conditioned to download stuff because it may have solved some security issue we stopped to think about the security issues associated with just downloading stuff and dumping it into the build process.
Sophira
2 hours ago
I completely agree with you - I think that automatic downloading of dependencies when building is a bad idea.
However, for the sake of devil's advocacy, I do also want to point out that the first thing a lot of people used to do after downloading and extracting a source tarball was to run "./configure" without even looking at what it is they were executing - even people who (rightly) hate the "curl | bash" combo. You could be running anything.
Being able to verify what it is you're running is vitally important, but in the end it only makes a difference if people take the time to do so. (And running "./configure --help" doesn't count.)
imoverclocked
2 hours ago
Adding sudo in front makes it secure though, right? /sarcasm
--
Automatic downloading of dependencies can be done in a sane way but not sane without significant effort. eg: building Debian packages can install other pre-packaged dependencies. In theory other packages are built the same way.
Where this becomes an issue specifically is where language-specific mechanisms reach-out and just install dependencies. To be fair, this has happened for a long time (I'm looking at you, CPAN) and does provide a lot of potential utility to any given developer.
What might be better than advocating for "not doing this at all" is "fixing the pattern." There are probably better ideas than this but I'll start here:
1) make repositories include different levels of software by default. core vs community at an absolute minimum. Maybe add some levels like, "vetted versions" or "gist" to convey information about quality or security.
2) make it easy to have locally vetted pools to choose from. eg: Artifactory makes it easy to locally cache upstream software repos which is a good start. Making a vetting process out of that would be ... useful but cumbersome.
At the end of the day, we are always running someone else's code.
kragen
3 hours ago
I am pretty sure Debian Policy agrees with you, although I can't cite chapter and verse. Certainly Nix and Guix agree with you. But that evidently wasn't the problem here.
dataflow
2 hours ago
> I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken
I think your phrasing is a bit overbroad. There's nothing fundamentally broken with the build system fetching resources; what's broken is not verifying what it's fetching. Audit the package beforehand and have your build system verify its integrity after downloading, and you're fine.
mananaysiempre
6 hours ago
The XZ project’s build system is and was hermetic. The exploit was right there in the source tarball. It was just hidden away inside a checked-in binary file that masqueraded as a test for handling of invalid compressed files.
(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)
dijit
6 hours ago
I thought that the exploit was not injected into the Git repository on GitHub at all, but only in the release tarballs. And that due to how Autoconf & co. work, it is common for tarballs of Autoconf projects to include extra files not in the Git repository (like the configure script). I thought the attacker exploited the fact that differences between the release tarball and the repository were not considered particularly suspicious by downstream redistributors in order to make the attack less discoverable.
mananaysiempre
5 hours ago
First of all, even if that were true, that wouldn’t have much to do with hermetic builds as I understand the term. You could take the release tarball and build it on an air-gapped machine, and (assuming the backdoor liked the build environment on the machine) you would get a backdoored artifact. Fetching assets from the Internet (as is fashionable in the JavaScript, Go, Rust, and to some extent Python ecosystems) does not enter the equation, you just need the legitimate build dependencies.
Furthermore, that’s not quite true[1]. The differences only concerned the exploit’s (very small) bootstrapper and were isolated to the generated configure script and one of the (non-XZ-specific) M4 scripts that participated in its generation, none of which are in the XZ Git repo to begin with—both are put there, and are supposed to be put there, by (one of the tools invoked by) autoreconf when building the release tarball. By contrast, the actual exploit binary that bootstrapper injected was inside the Git repo all along, disguised as a binary test input (as I’ve said above) and identical to the one in the tarball.
To detect the difference, the distro maintainers would have needed to detect the difference between the M4 file in the XZ release tarball and its supposed originals in one of the Autotools repos. Even then, the attacker could instead have shipped an unmodified M4 script but a configure script built with the malicious one. Then the maintainers would have needed to run autoreconf and note that the resulting configure script differed from the one shipped in the tarball. Which would have caused a ton of false positives, because that means using the exact versions of Autotools parts as the upstream maintainer. Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
(Couldn’t you just check in the generated files and run autoreconf in a commit hook? You could. Glibc does that. I once tried to backport some patches—that included changes to configure.ac—to an old version of it. It sucked, because the actual generated configure file was the result of several merges and such and thus didn’t correspond to the output of autoreconf from any Autotools install in existence.)
It’s easy to dismiss this as autotools being horrible. I don’t believe it is; I believe Autotools have a point. By putting things in the release tarball that aren’t in the maintainer’s source code (meaning, nowadays, the project’s repo, but that wasn’t necessarily the case for a lot of their existence), they ensure that the source tarball can be built with the absolute bare minimum of tools: a POSIX shell with a minimal complement of utilities, the C compiler, and a POSIX make. The maintainer can introduce further dependencies, but that’s on them.
Compare this with for example CMake, which technically will generate a Makefile for you, but you can’t ship it to anybody unless they have the exact same CMake version as you, because that Makefile will turn around and invoke CMake some more. Similarly, you can’t build a Meson project without having the correct Python environment to run Meson and the build system’s Python code, just having make or ninja is not enough. And so on.
This is why I’m saying I’m of two minds about this (bootstrapper) part of the backdoor. We see the downsides of the Autotools approach in the XZ backdoor, but in the normal case I would much rather build a release of an Autotools-based project than a CMake- or Meson-based one. I can’t even say that the problem is the generated configure script being essentially an uninspectable binary, because the M4 file that generated it in XZ wasn’t, and the change was very subtle. The best I can imagine here is maintaining two branches of the source tree, a clean one and a release one, where each release commit is notionally a merge of the previous release commit and the current clean commit, and the release tarball is identical to the release commit’s tree (I think the uacme project does something like that?); but that still feels insufficient.
[1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
lmm
2 hours ago
> Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
Yes and no. "make -f Makefile.cvs" has been a supported workflow for decades. It's not what the "build from source" instructions will tell you to do, but those instructions are aimed primarily at end users building from source who may not have M4 etc. installed; developers are expected to use the Makefile.cvs workflow and I don't think it would be too unreasonable to expect dedicated distro packagers/build systems (as distinct from individual end users building for their own systems) to do the same.
jacquesm
3 hours ago
Focusing on the technical angle is imo already a step too far. This was first and foremost a social engineering exercise, only secondary a technical one.
huflungdung
6 hours ago
This was a devops exploit because they used the same env for building the app as they did for the test code. Many miss this entirely and think it is because a binary was shipped.
Ideally a test env and a build env should be entirely isolated should the test code some how modify the source. Which in this case it did.