GlenTheMachine
10 hours ago
If you had no idea what a restorable sequence is the takeaway is about halfway down the OP:
“This is why Linux now provides rseq() which is a much more enlightened solution. With restartable sequences, you actually can get rid of both the mutex and atomics, while the OS continues to fully abstract scheduling. The way it works is you advise the kernel whenever your program enters a critical section of code that you don't want interrupted. It's probably going to be maybe 10 assembly instructions tops. The first assembly opcode should be a move instruction that sets the rseq_cs field. The last instruction needs to be the thing that makes the modification to your global data structure. Think of it sort of like a really tiny database transaction. What makes it go fast, is that the bidirectional communication with the kernel happens via shared memory.”
jstimpfle
2 hours ago
I think it wasn't explained in a very accessible way. If I got the gist right, this essentially brings "per-CPU" synchronization to userland. It's typical in the kernel to have per-cpu data, while per-thread data is rare and typically impractical. There is a high number of threads managed by the kernel, most of which probably belong to a userland process, most of which do not participate in any given synchronisation scheme. Also threads are often too much of an abstraction for parallel programming needs, given that they are hiding for example cache effects. So it's natural to want to use per-cpu data instead of thread_local data in a userland process, I know I've been wishing for that many times.
With rseq, we can allocate in any userland process one instance of a given synchronisation data structure per each CPU. It's important to understand that userland code accessing per-cpu data structures cannot prevent being scheduled away from a CPU and being replaced by another thread (kernel code can block scheduler for short critical sections). Such a replacement thread may subsequently corrupt that same data that was still in the middle of the transaction. But we can make a subset of transactions safe at least: If a transaction gets committed in a single (final) atomic instruction, and we get kernel support for this transaction to be restarted in case there has been a schedule mid-way, this is a guarantee that at the time of commit, the entire transaction hasn't been interrupted by the scheduler. I.e. a kind of "mutual exclusion" guarantee.
Did I get that right?
bryanlarsen
8 hours ago
That doesn't really explain it though, IMO. IIUC, it's a sequence of instructions that either runs to completion atomically or doesn't. If it is interrupted by anything the kernel jumps you to the abort/retry vector you set with a guarantee that the last instruction in the sequence was not executed.
(Based on my reading of the LWN article rwmj posted).
crote
6 hours ago
> it's a sequence of instructions that either runs to completion atomically or doesn't
The way I read it, it either runs to completion in one go, or gets restarted from the beginning. This means the sequence as a whole isn't executed atomically, as the already-executed instructions during an interrupt aren't rolled back.
It can be used to build atomic actions, but it is up to the developer to create a sequence of instructions where the very last instruction "commits" the entire operation, with the side-effects of partial execution being harmless.
bryanlarsen
4 hours ago
Yes, it's either atomic or the last instruction is guaranteed not to have run. I made this a little harder to read by inserting another clause in the sentence.
khuey
8 hours ago
Yes, the API contract isn't "don't interrupt me during this critical section" it's "if you have to interrupt me during this critical section, go to this recovery/restart code".
There is a time-slice extension feature in the works that's roughly "please let me finish this critical section before you interrupt me". But a hard guarantee that userspace code won't be interrupted is probably untenable in a preemptive multitasking system.
leoc
6 hours ago
Something not completely dissimilar to https://en.wikipedia.org/wiki/PCLSRing then?
rwmj
10 hours ago
LWN has a good article: https://lwn.net/Articles/1033955/
manoDev
10 hours ago
That’s clever — am I right to think it’s the intermediate solution between locks and full STM, implemented at the kernel level, and with zero abstraction cost?
khuey
10 hours ago
It's in some sense a light form of STM. The key insight behind rseq(2) is that if the data is local to a given CPU the only way to get a race is if the kernel deschedules your program from that CPU at an inopportune time. If your operation can be aborted and restarted and the kernel has a mechanism to notify you when that needs to happen you can dispense with the overhead of "real" synchronization and just use a couple mov instructions to enter and exit the critical section.