fweimer
6 days ago
We've got patches under review: https://inbox.sourceware.org/libc-alpha/cover.1722193092.git... (triggered by https://issues.redhat.com/browse/RHEL-42410, a graphics stack stability issue that wasn't as visible in RHEL 9 for some reason)
At least the first one (the getenv thread safety fix) will hopefully make it into glibc 2.41 and it should be quite safe to backport. It turns out that setenv is easier to handle because glibc already never frees environment strings. It's concurrent unsetenv that is rather tricky. Without some snapshot approach, getenv would return null pointers instead of environment variables values that are actually set. I don't want to introduce locking into getenv because getenv without setenv has been async-signal-safe for so long that it would likely break applications.
The environ handling fixes are a bit more controversial because vfork+execve make it complicated to avoid memory leaks, but these further fixes are less important to the stability of the graphics stack.
Asooka
6 days ago
What about using a linked list for variables added after the start of the process, which can be implemented atomically? Then once it gets "too long", a thread executing setenv could construct a new hashmap and replace the pointer to the old one with the pointer to the new one atomically, all without locking? To prevent two threads from rehashing at the same time, use an atomic flag for whether a thread is rehashing right now. That means if other threads call setenv in the meantime, the extra added variables would be appended to the list to be processed by a later setenv call (if one happens). That list could grow unbounded if enough threads call setenv quickly enough, but I think the simplicity of atomically swapping pointers might be worth it and setenv isn't called very frequently.
fweimer
6 days ago
POSIX requires that the environment variables can be access as an array, through the environ variable. This array is expected to be used with POSIX interfaces such as posix_spawn and execve. If the array already has to exist, why not use it in getenv?
A purely hash-based implementation is not possible because there is putenv, and some applications expect modifications of environ to be visible via getenv.
sunshowers
6 days ago
As far as I'm concerned this is the only correct way to do it. I believe illumos does this, which is why its env functions are thread safe and have been for decades.
jeffparsons
6 days ago
I like this approach, because it lets everything Just Work without anyone outside of the implementation having to think about it at all, and only incurs meaningful overhead if you're doing something really silly, but crucially still won't break — it'll just be slightly slow. That feels like the right trade-off.
robocat
6 days ago
Perhaps related: "Setenv Is Not Thread Safe and C Doesn't Want to Fix It" (2023) -- https://www.evanjones.ca/setenv-is-not-thread-safe.html and HN comments regarding how rust was affected: https://news.ycombinator.com/item?id=38342642
Decades old footguns - aaargh!
sumtechguy
5 days ago
That has been around for a long time. I remember it in the early 2000s.
> Decades old footguns - aaargh!
Indeed...
The tricky problem they have is wrapping it into some sort of lock could cause so many issues. Places that dont deadlock, suddenly could. Not a fun problem to solve. In practice it is usually not too bad as you are usually not changing your env vars much. But when you run into it, ugh.
fweimer
5 days ago
Fortunately, for glibc, the most controversial decision (setenv/unsetenv/clearenv leak) has been made decades ago. It does not look like something that can be changed, so it's actually fixable in the glibc context. But this puts pressure on other libcs to adopt essentially the same approach (even if they don't leak environment strings today), so that's not universally popular.
cryptonector
5 days ago
Solaris/Illumos has a thread-safe setenv()/unsetenv()/putenv(). There is no reason not to have it be thread-safe, though it must leak (Solaris/Illumos retains the references to the deleted envs so as to fool memory debuggers into thinking they are still referenced and so quiet what would be essentially false positives).