cperciva
17 hours ago
Don't forget about entropy! You've just created two identical copies of all of your random number generators, which could be very very bad for security.
The firecracker team wrote a very good paper about addressing this when they added snapshot support.
adammiribyan
14 hours ago
Good callout. We seed entropy before snapshot to unblock getrandom(), but forks still share CSPRNG state. The proper fix per Firecracker’s docs is RNDADDENTROPY + RNDRESEEDCRNG after each fork, plus reseeding userspace PRNGs like numpy separately. On the roadmap. https://github.com/firecracker-microvm/firecracker/blob/main...
mkj
13 hours ago
It looks like firecracker already supports ACPI vmgenid, which will trigger Linux random to reseed? https://github.com/firecracker-microvm/firecracker/blob/main...
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
So that just (!) leaves userspace PRNGs.
Retr0id
15 hours ago
I suppose it'd be easy enough to re-seed RNGs, but re-relocating ASLR sounds like a pain. (Although I suppose for Python that doesn't matter)
hinkley
15 hours ago
Off the cuff, the first step to ASLR is don’t publish your images and to rotate your snapshots regularly.
The old fastCGI trick is to buffer the forking by idling a half a dozen or ten copies of the process and initialize new instances in the background while the existing pool is servicing new requests. By my count we are reinventing fastCGI for at least the fourth time.
Long running tasks are less sensitive to the startup delays because we care a lot about a 4 second task taking an extra five seconds and we care much less about a 1 minute task taking 1:05. It amortizes out even in Little’s Law.
cperciva
14 hours ago
Re-seeding is easy. The hard parts are (a) finding everything which needs to be reseeded -- not just explicit RNGs but also things like keys used to pick outgoing port numbers in a pseudorandom order -- and (b) making sure that all the relevant code becomes aware that it was just forked -- not necessarily trivial given that there's no standard "you just got restarted from a snapshot" signal in UNIX.
Intermernet
9 hours ago
I would have thought that in the days of containers, we'd have better tooling around this. Sounds like a goldmine for vuln research!
aa-jv
6 hours ago
Isn't this what -HUP is supposed to be for in the first place? Maybe a -STOP/-HUP/-HUP situation?
treyd
2 hours ago
HUP is short for "hangup" which was supposed to be sent when the tty controlling the session the process is in hung up.