acka
4 months ago
I believe the XZ compromise partly stemmed from including binary files in what should have remained a source-only project. From what I remember, well-run projects such as those of the GNU project have always required that all binaries—whether executables or embedded data such as test files—be built directly from source, compiling a purpose-built DSL if necessary. This ensures transparency and reproducibility, both of which might have helped catch the issue earlier.
dijit
4 months ago
thats not the issue, there will always be prebuilt binaries (hell, deb/rpm are prebuilt binaries).
The issue for xz was that the build system was not hermetic (and sufficiently audited).
Hermitic build environments that can’t fetch random assets are a pain to maintain in this era, but are pretty crucial in stopping an attack of this kind. The other way is reproducible binaries, which is also very difficult.
EDIT: Well either I responded to the wrong comment or this comment was entirely changed. I was replying to a comment that said. “The issue was that people used pre-built binaries” which is materially different to what the parent now says, though they rhyme.
jacquesm
4 months ago
This is not going to be popular: I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken. It invites all kinds of trouble and makes it next to impossible to really achieve stability and to ensure that all code that is in the build has been verified. Because after you've done it four times the fifth time you won't be looking closely. But if you don't do it automatically but only when you actually need it you will be looking a lot more sharpish at what has changed since you last pulled in the code. Especially for older and stable libraries the consumers should dictate when they upgrade, not some automatic build process. But because we're all conditioned to download stuff because it may have solved some security issue we stopped to think about the security issues associated with just downloading stuff and dumping it into the build process.
Sophira
4 months ago
I completely agree with you - I think that automatic downloading of dependencies when building is a bad idea.
However, for the sake of devil's advocacy, I do also want to point out that the first thing a lot of people used to do after downloading and extracting a source tarball was to run "./configure" without even looking at what it is they were executing - even people who (rightly) hate the "curl | bash" combo. You could be running anything.
Being able to verify what it is you're running is vitally important, but in the end it only makes a difference if people take the time to do so. (And running "./configure --help" doesn't count.)
frizlab
3 months ago
> I think that automatic downloading of dependencies when building is a bad idea.
Unless the dependencies are properly pinned and hashed.
nmz
4 months ago
I would like to add that sudo make install is a bigger security risk and there is absolutely no need to run make install as root when you could target a directory that mimics / and tar it with the appropriate root permissions leaving only the extraction as root, you could even take a snapshot of the system and undo on error. All done via coreutils.
bmandale
4 months ago
> even people who (rightly) hate the "curl | bash" combo. You could be running anything.
That's true unless I audit every single line, out of potentially millions, in the source of a program I intend to run. If I'm going to do that, then I could audit the ./configure script as well.
uecker
4 months ago
This is missing the point. The issue with "configure" is that it is easy to hide malicious code in it because it is so arcane. The issue with "curl | bash" is that - on top - there is not even a proper trust chain and independent verification. "curl | bash" needs to die. Any project who promotes this does not care or does not understand security. "configure" was a necessary evil in the past with all commercial UNIX working differently. Nowadays I think it should go away.
dns_snek
4 months ago
> "curl | bash" needs to die. Any project who promotes this does not care or does not understand security.
Do you "understand security"? There's a grain of truth to what you're saying, but not more than that. The crux of this problem is with running untrusted binaries (or unreviewed source code) vs. installing something from a trusted repository.
The majority of people either don't know or don't care to review the source code. They simply run the commands displayed on the website, and whether you ask them to "curl | bash" or "wget && apt install ./some.deb" won't make any difference to their security.
Even if you do a "proper trust chain" and digitally sign your packages, that key is served through the same channel as the installation instructions and thus requires trust on first use, just like "curl | bash".
Unfortunately publishing every piece of software through every operating system's default repository isn't very realistic. Someone, somewhere is going to have to install the binary manually and "curl | bash" is as good of a method for doing that as any.
uecker
4 months ago
If people install random stuff from the internet, there is no security. That sometimes this is done is no reason to give up and teach people that "curl | bash" is even remotely ok. "curl | bash" is much worse than every other way to install things from the internet, because there is no guarantee that what one persons gets the same what anybody else gets, so any kind of chance to even discover a compromise is lost.
dns_snek
4 months ago
> because there is no guarantee that what one persons gets the same what anybody else gets, so any kind of chance to even discover a compromise is lost.
This applies to "curl | bash", "download an exe and run it", and everything in between equally. If a malicious binary wants to cover up its tracks it can just delete itself and disappear just like "curl | bash" would.
Feel free to educate users about the importance of installing software from trusted repositories whenever possible but demonizing "curl | bash" like it's somehow uniquely terrible is just silly and misses the point completely.
uecker
4 months ago
With a binary, one can compare a hash or store a copy on the binary on another computer. And one person doing this might be enough to figure out something is wrong. But even if people don't, it needs additional effort by the attacker to search for the binary and clean up their tracks, which also creates more opportunities for detection. It is really not at all comparable to "curl | bash". You sound like the people who told me two decades ago that reproducible builds are a waste of time.
dns_snek
4 months ago
> With a binary, one can compare a hash
You lift a suspected binary from a machine that's under suspicion. You hash it and it matches a known good file. You declare victory, pat yourself on the back, and return it back into service. 3 months later all of your data is exfiltrated because you assumed that your attacker is an idiot.
> it needs additional effort by the attacker to search for the binary
Additional effort:
#include "stdio.h"
#include "unistd.h"
void main() {
char path[512];
readlink("/proc/self/exe", path, sizeof(path));
unlink(path);
printf("Poof, I disappeared from: %s\n", path);
}
> You sound like the people who told me two decades ago that reproducible builds are a waste of time.Except I wouldn't say that because these ideas are completely unrelated. Define your threat model and specify what problem you're trying to solve. Don't be the type of person who encrypts passwords because they heard that encryption is good for protecting data.
You demonized curl|bash because it "doesn't have a proper trust chain" and attacked the project for "not understanding security" with really weak arguments, now you're retreating all the way back to claim some fringe benefits to maybe-possibly discover the source of infection, and only if your attacker is lazy not to try to cover their tracks.
Continuing that argument like it's the same one you originally presented is quite a disrespectful way of wasting people's time: https://en.wikipedia.org/wiki/Motte-and-bailey_fallacy
uecker
4 months ago
You can copy the binary on a different system before installing it, or compute the hash before you run it. This is not hard. And even if you are not copying to another system, the attacker needs to find all copies of the binary and modify them. Also note that the installer binary / script is not the same as the binary that later runs. And any additional effort the attacker has to do to hide its tracks also increases chances for detection, this is also something that can be learned from XZ backdoor.
jacquesm
4 months ago
Autotools is a hot mess. Anything complex is going to be a rich environment for exploits of all kinds. The more silent the exploit the bigger the chance that it will spread widely.
imoverclocked
4 months ago
Adding sudo in front makes it secure though, right? /sarcasm
--
Automatic downloading of dependencies can be done in a sane way but not sane without significant effort. eg: building Debian packages can install other pre-packaged dependencies. In theory other packages are built the same way.
Where this becomes an issue specifically is where language-specific mechanisms reach-out and just install dependencies. To be fair, this has happened for a long time (I'm looking at you, CPAN) and does provide a lot of potential utility to any given developer.
What might be better than advocating for "not doing this at all" is "fixing the pattern." There are probably better ideas than this but I'll start here:
1) make repositories include different levels of software by default. core vs community at an absolute minimum. Maybe add some levels like, "vetted versions" or "gist" to convey information about quality or security.
2) make it easy to have locally vetted pools to choose from. eg: Artifactory makes it easy to locally cache upstream software repos which is a good start. Making a vetting process out of that would be ... useful but cumbersome.
At the end of the day, we are always running someone else's code.
gizmo686
4 months ago
The solution I've seen employed is to prevent the build environment from reaching outside.
Setup a mirror of all the repositories you care about; then configure the network so your build system can reach the mirrors; but not the general Internet.
Of course, once you do this, you eventually create a cron job on mirrors to blindly update themselves...
This setup does at least prevent an old version of a dependency from silently changing, so projects that pin their dependencies can be confident in that. But even in those cases, you end up with a periodic "update all dependencies" ticket, that just blindly takes the new version.
kragen
4 months ago
I am pretty sure Debian Policy agrees with you, although I can't cite chapter and verse. Certainly Nix and Guix agree with you. But that evidently wasn't the problem here.
dataflow
4 months ago
> I think the whole idea that a build system just fetches resources from outside of the build environment is fundamentally broken
I think your phrasing is a bit overbroad. There's nothing fundamentally broken with the build system fetching resources; what's broken is not verifying what it's fetching. Audit the package beforehand and have your build system verify its integrity after downloading, and you're fine.
jacquesm
4 months ago
xz.
nobody verifies all packages that are automatically downloaded all the time, unless there is a problem. We got lucky, that time.
mananaysiempre
4 months ago
The XZ project’s build system is and was hermetic. The exploit was right there in the source tarball. It was just hidden away inside a checked-in binary file that masqueraded as a test for handling of invalid compressed files.
(The ostensibly autotools-built files in the tarball did not correspond to the source repository, admittedly, but that’s another question, and I’m of two minds about that one. I know that’s not a popular take, but I believe Autotools has a point with its approach to source distributions.)
dijit
4 months ago
I thought that the exploit was not injected into the Git repository on GitHub at all, but only in the release tarballs. And that due to how Autoconf & co. work, it is common for tarballs of Autoconf projects to include extra files not in the Git repository (like the configure script). I thought the attacker exploited the fact that differences between the release tarball and the repository were not considered particularly suspicious by downstream redistributors in order to make the attack less discoverable.
mananaysiempre
4 months ago
First of all, even if that were true, that wouldn’t have much to do with hermetic builds as I understand the term. You could take the release tarball and build it on an air-gapped machine, and (assuming the backdoor liked the build environment on the machine) you would get a backdoored artifact. Fetching assets from the Internet (as is fashionable in the JavaScript, Go, Rust, and to some extent Python ecosystems) does not enter the equation, you just need the legitimate build dependencies.
Furthermore, that’s not quite true[1]. The differences only concerned the exploit’s (very small) bootstrapper and were isolated to the generated configure script and one of the (non-XZ-specific) M4 scripts that participated in its generation, none of which are in the XZ Git repo to begin with—both are put there, and are supposed to be put there, by (one of the tools invoked by) autoreconf when building the release tarball. By contrast, the actual exploit binary that bootstrapper injected was inside the Git repo all along, disguised as a binary test input (as I’ve said above) and identical to the one in the tarball.
To detect the difference, the distro maintainers would have needed to detect the difference between the M4 file in the XZ release tarball and its supposed originals in one of the Autotools repos. Even then, the attacker could instead have shipped an unmodified M4 script but a configure script built with the malicious one. Then the maintainers would have needed to run autoreconf and note that the resulting configure script differed from the one shipped in the tarball. Which would have caused a ton of false positives, because that means using the exact versions of Autotools parts as the upstream maintainer. Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
(Couldn’t you just check in the generated files and run autoreconf in a commit hook? You could. Glibc does that. I once tried to backport some patches—that included changes to configure.ac—to an old version of it. It sucked, because the actual generated configure file was the result of several merges and such and thus didn’t correspond to the output of autoreconf from any Autotools install in existence.)
It’s easy to dismiss this as autotools being horrible. I don’t believe it is; I believe Autotools have a point. By putting things in the release tarball that aren’t in the maintainer’s source code (meaning, nowadays, the project’s repo, but that wasn’t necessarily the case for a lot of their existence), they ensure that the source tarball can be built with the absolute bare minimum of tools: a POSIX shell with a minimal complement of utilities, the C compiler, and a POSIX make. The maintainer can introduce further dependencies, but that’s on them.
Compare this with for example CMake, which technically will generate a Makefile for you, but you can’t ship it to anybody unless they have the exact same CMake version as you, because that Makefile will turn around and invoke CMake some more. Similarly, you can’t build a Meson project without having the correct Python environment to run Meson and the build system’s Python code, just having make or ninja is not enough. And so on.
This is why I’m saying I’m of two minds about this (bootstrapper) part of the backdoor. We see the downsides of the Autotools approach in the XZ backdoor, but in the normal case I would much rather build a release of an Autotools-based project than a CMake- or Meson-based one. I can’t even say that the problem is the generated configure script being essentially an uninspectable binary, because the M4 file that generated it in XZ wasn’t, and the change was very subtle. The best I can imagine here is maintaining two branches of the source tree, a clean one and a release one, where each release commit is notionally a merge of the previous release commit and the current clean commit, and the release tarball is identical to the release commit’s tree (I think the uacme project does something like that?); but that still feels insufficient.
[1] https://gist.github.com/thesamesam/223949d5a074ebc3dce9ee78b...
lmm
4 months ago
> Unconditionally autoreconfing things would be better, but risk breakage because the backwards compatibility story in Autotools has historically not been good, because they’re not supposed to be used that way.
Yes and no. "make -f Makefile.cvs" has been a supported workflow for decades. It's not what the "build from source" instructions will tell you to do, but those instructions are aimed primarily at end users building from source who may not have M4 etc. installed; developers are expected to use the Makefile.cvs workflow and I don't think it would be too unreasonable to expect dedicated distro packagers/build systems (as distinct from individual end users building for their own systems) to do the same.
jacquesm
4 months ago
Focusing on the technical angle is imo already a step too far. This was first and foremost a social engineering exercise, only secondary a technical one.
dijit
4 months ago
this is very true, and honestly troubles me that it’s been flagged.
Even I’m guilty of focusing on the technical aspects, but the truth is that the social campaign was significantly more difficult to understand, unpick and is so much more problematic.
We can have all the defences we want in the world, but all it takes is to oust a handful of individuals or in this case, just one: or bribe them or blackmail them- then nobody is going to be reviewing because everybody believes that it has been reviewed.
I mean, we all just accept whatever the project believes is normal right?
It’s not like we’re pushing our ideas of transparency on the projects… and even if we were, it’s not like we are reviewing them either they will have their own reviewers and the only people left are package maintainers who are arguably more dangerous.
There is an existential nihilism that I’ve just been faced with when it comes to security.
unless projects become easier to reproduce and we have multiple parties involved in auditing then I’m a bit concerned.
mananaysiempre
4 months ago
> I mean, we all just accept whatever the project believes is normal right?
Not in this thread we don’t? The whole thing has been about the fact that it wasn’t easy for a distro maintainer to detect the suspicious code even if they looked. Whether anyone actually does look is a worthy question, but it’s not orthogonal to making the process of looking not suck.
Of course, if we trust the developer to put software on our machine with no intermediaries, the whole thing goes out the window. Don’t do that[1]. (Oh hi Flatpak, Snap. Please go away. Also hi NPM, Go, Cargo, PyPI; no, being a “modern programming language” is not an excuse.)
[1] https://drewdevault.com/2021/09/27/Let-distros-do-their-job....
acka
4 months ago
My apologies: yes, I edited my comment to try and clarify that I did not mean executable binaries, but rather binary data, such as the test files in the case of XZ.
dijit
4 months ago
All good mate, your comment makes a better argument than the weaker one I interpreted it as prior to the edit.
1oooqooq
4 months ago
how do you test your software can decompress files created with old/different implementations?
the exploit used the only solution for this problem: binary test payload. there's no other way to do it.
maybe including the source to those versions and all the build stuff to then create them programmatically... or maybe even a second repo that generates signed payloads etc... but its all overkill and would have failed human attention as the attack proved to begin with.
huflungdung
4 months ago
This was a devops exploit because they used the same env for building the app as they did for the test code. Many miss this entirely and think it is because a binary was shipped.
Ideally a test env and a build env should be entirely isolated should the test code some how modify the source. Which in this case it did.