ckastner
6 days ago
There is some nuance to this. Adding comments to the stated goal "Everyone who interacts with Debian source code (1) should be able to do so (2) entirely in git:
(1) should be able does not imply must, people are free to continue to use whatever tools they see fit
(2) Most of Debian work is of course already git-based, via Salsa [1], Debian's self-hosted GitLab instance. This is more about what is stored in git, how it relates to a source package (= what .debs are built from). For example, currently most Debian git repositories base their work in "pristine-tar" branches built from upstream tarball releases, rather than using upstream branches directly.
amluto
5 days ago
> For example, currently most Debian git repositories base their work in "pristine-tar" branches built from upstream tarball releases
I really wish all the various open source packaging systems would get rid of the concept of source tarballs to the extent possible, especially when those tarballs are not sourced directly from upstream. For example:
- Fedora has a “lookaside cache”, and packagers upload tarballs to it. In theory they come from git as indicated by the source rpm, but I don’t think anything verifies this.
- Python packages build a source tarball. In theory, the new best practice is for a GitHub action to build the package and for a complex mess to attest that really came from GitHub Actions.
- I’ve never made a Debian package, but AFAICT the maintainer kind of does whatever they want.
IMO this is all absurd. If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that. If it stores a copy of the source, that copy should be cryptographically traceable to the commit in question, which is straightforward: the commit hash is a hash over a bunch of data including the full source!
soneil
3 days ago
This was one of the "lessons learnt" from the XZ incident. One of the (many) steps they took to avoid scrutiny was modifications that existed in the real tarball but not the repo.
turminal
5 days ago
For lots of software projects, a release tarball is not just a gzipped repo checked out at a specific commit. So this would only work for some packages.
Terr_
5 days ago
A simple version of this might be a repo with a single file of code in a language that needs compilation, versus, and the tarball with one compiled binary.
Just having a deterministic binary can be non-trivial, let alone a way to confirm "this output came from that source" without recompiling everything again from scratch.
amluto
5 days ago
For most well designed projects, a source tarball can be generated cleanly from the source tree. Sure, the canonical build process goes (source tarball) -> artifact, but there’s an alternative build process (source tree) -> artifact that uses the source tarball as an intermediate.
In Python, there is a somewhat clearly defined source tarball. uv build will happily built the source tarball and the wheel from the source tree, and uv build --from <appropriate parameter here> will build the wheel from the source tarball.
And I think it’s disappointing that one uploads source tarballs and wheels to PyPI instead of uploading an attested source tree and having PyPI do the build, at least in simple cases.
In traditional C projects, there’s often some script in the source tree that runs it into the source tarball tree (autogen.sh is pretty common). There is no fundamental reason that a package repository like Debian or Fedora’s couldn’t build from the source tree and even use properly pinned versions of autotools, etc. And it’s really disappointing that the closest widely used thing to a proper C/C++ hermetic build system is Dockerfile, and Dockerfile gets approximately none of the details right. Maybe Nix could do better? C and C++ really need something like Cargo.
SonOfLilit
5 days ago
The hacker in me is very excited by the prospect of pypi executing code from my packages in the system that builds everyone's wheels.
vetrom
5 days ago
Launchpad does this for everything, as does sbuild/buildd in debian land. They generally make it work by both: running the build system in a neutered VM (network access generally not permitted during builds, or limited to only a debian/ubuntu/PPA package mirror), and going to some degree of invasive process/patching to make build systems work without just-in-time network access.
SUSE and Fedora both do something similar I believe, but I'm not really familiar with the implementation details of those two systems.
amluto
4 days ago
I’m only familiar with the Fedora system. The build is hermetic, but the source input come from fedpkg new-sources, which runs on the client used by the package developer.
amluto
5 days ago
This seems no worse than GitHub Actions executing whatever random code people upload.
It’s not so hard to do a pretty good job, and you can have layers of security. Start with a throwaway VM, which highly competent vendors like AWS will sell you at a somewhat reasonable price. Run as a locked-down unprivileged user inside the container. Then use a tool like gVisor.
Also… most pure Python packages can, in theory, be built without executing any code. The artifacts just have some files globbed up as configured in pyproject.toml. Unfortunately, the spec defines the process in terms of installing a build backend and then running it, but one could pin a couple of trustworthy build backends versions and constraint them to configurations where they literally just copy things. I think uv-build might be in this category. At the very least I haven’t found any evidence that current uv-build versions can do anything nontrivial unless generation of .pyc files is enabled.
zahlman
5 days ago
If it isn't at least a gzip of a subset of the files of a specific commit of a specific repo, someone's definition of "source" would appear to need work.
LtWorf
5 days ago
To get a specific commit from a repo you need to clone usually, which will involve a much bigger download than just downloading your tar file.
amluto
5 days ago
Shallow clones are a thing. And it’s fairly straightforward to create a tarball that includes enough hashes to verify the hash chain all the way to the commit hash. (In fact, I once kludged that up several years ago, and maybe I should dust it off. The tarball extracted just like a regular tarball but had all the git objects needed hiding inside in a way that tar would ignore.)
zahlman
5 days ago
I don't actually see why you'd need to verify the hash chain anyway. The point of a source tarball, as I understand it, is to be sure of what source you're building, and to be able to audit that source. The development path would seem to be the developer's concern, not the maintainer's.
amluto
4 days ago
> The point of a source tarball, as I understand it, is to be sure of what source you're building
Perhaps, in the rather narrow sense that you can download a Fedora source tarball and look inside yourself.
My claim is that upstream developers produce actual official outputs: git commits and sometimes release tarballs. (But note that release tarballs on GitHub are often a mess and not really desired by the developer.). And I further think that verification that a system like Fedora or Debian or PyPI is building from correct sources should involve byte-for-byte comparison of the source tree and that, at least in the common case, there should be no opportunity for a user of one of these systems to upload sources that do not match the claimed upstream sources.
The sadly common workflow where a packager clones a source tree, runs some scripts, and uploads the result as a “source tarball” is, IMO, wrong.
LtWorf
3 days ago
You know git allows history rewrite right?
LtWorf
5 days ago
of the head, or of any commit?
amluto
5 days ago
I’m not sure why this would make a difference. The only thing special about the head is that there is a little file (that is not, itself, versioned) saying that a particular commit is the head.
mjw1007
5 days ago
> If a package hosted by Fedora or Debian or PyPI or crates.io, etc claims to correspond to an upstream git commit or release, then the hosting system should build the package, from the commit or release in question plus whatever package-specific config and patches are needed, and publish that.
For Debian, that's what tag2upload is doing.
n8m8
4 days ago
shoutout AUR, I’m trying arch for the first time (Omarchy) and wasn’t planning on using the AUR, but realized how useful it is when 3 of the tools I wanted to try were distributed differently. AUR made it insanely easy… (namely had issues with Obsidian and Google Antigravity)
cryptonector
6 days ago
If "whatever tools they see fit" means "patch quilting" then please no. Leave the stone age and enter the age of modern DVCS.
lta
6 days ago
git can be seen as porcelain on top of patch quilting so it's not as much done âge as one might think
cryptonector
5 days ago
This is a misunderstanding of what Git does. Git is a Merkle hash tree, content-addressed, immutable/append-only filesystem, with commits as objects that bind a filesystem root by its hash. The diffs that make up a commit are not really its contents -- they are computed as needed. Now most of the time it's best to think of Git as a patch quilting porcelain, but it's really more than that, and while you can get very far with the patch quilting porcelain model, at some point you need to understand that it goes deeper.
ongy
5 days ago
That point is not reached during packaging though.
I prefer rebasing git histories over messing with the patch quilting that debian packaging standards use(d to use). Though last I had to use the debian packaging mechanisms, I roundtripped them into git for working on them. I lost nothing during the export.
cryptonector
5 days ago
Yes, I also end up doing things like that, but it's just a pain. If Debian did it themselves then adding a local commit would be truly trivial.