hackernews client

An Update on TinyKVM

150 pointsposted 3 months ago

(fwsgonzo.medium.com)

46 Comments

3eb7988a1663

3 months ago

This seems like real black magic.

Is there any way that TinyKVM + KVM Server could ever be made to work with a GUI program? The sandboxing performance seems free and possibly safer than other solutions.

Instead of firejail or bubblewrap would it ever be possible for me to wrap say Firefox (or a much less complicated GUI program) inside of TinyKVM and restrict it to just network access and reading/writing to ~/Downloads? Likely a way more ambitious target than you had ever imagined, but I can dream.

I am wondering if I could default wrap every command on my terminal to run inside a TinyKVM, no network access, and only permissions to the current directory or below.

jchw

3 months ago

That really isn't unreasonable at all IMO, it's just that it might be hard to do with userspace syscall emulation, since graphical programs will likely need a lot more of the syscall surface. For X11 and Wayland, you'll need some way of handling UNIX domain sockets. Wayland applications will require shared memory too, though you could get away with something like Waypipe instead to serialize everything. You'd probably want some sort of intermediary between any X11/Wayland communications anyways, just to add additional isolation.

It might be easier to adapt gVisor to handle this sort of workload. Adjacent comment mentions Qubes which does the same thing but uses an entire guest kernel.

(If you are creative enough, you can probably come up with some solutions. Qt apps could be made to work with a custom QPA that can somehow funnel information in and out of the sandbox. You could definitely run something like Waypipe or Xpra in the sandbox too, but again I imagine those would wind up requiring a much greater degree of emulation. It's not like I've actually tried this, though, so I could be off.)

laurencerowe

3 months ago

TinyKVM is probably most similar to gVisor in KVM platform mode. TinyKVM implements a smaller number of sys calls and is focussed on making resets as fast as possible.

Running sys calls on the host means there is approximately 1µs overhead per syscall from exiting and entering KVM so I'm not sure how well that would work for GUI applications.

And we currently only have very rudimentary support for threads, enough for a server program with ancillary threads to boot up but the expectation is currently that the call into TinyKVM only runs a single thread and we fork multiple copies of the VM to handle requests in parallel.

jchw

3 months ago

> Running sys calls on the host means there is approximately 1µs overhead per syscall from exiting and entering KVM so I'm not sure how well that would work for GUI applications.

That made me rather curious how many syscalls a complex GUI application might issue. I wanted to see how many syscalls were happening across my entire system. Thanks to StackOverflow I have a snippet that seems correct[1]:

> perf stat -e raw_syscalls:sys_enter -a -I 1000 sleep 5

Using this, it seems that most programs (as you would probably guess) don't execute a whole lot of syscalls when they're idle. However, starting a complex GUI program definitely causes a pretty massive flurry of syscalls. Starting winecfg without an already-existing wineserver spews a lot of syscalls, somewhere in the neighborhood of 500,000. If we assume that each syscall takes on average around 2µs including the overhead and that they're all serial, I guess that would add up to about 1 second spent on syscalls. That's probably making way too many assumptions, but it does make me feel like it's not completely infeasible to run GUI applications inside of a sandbox like this, though it may very not be compelling when the overhead is factored in.

And of course, just because it could be done does not mean it should, anyway. Even if this is a good idea, I doubt it makes any sense for TinyKVM to be attempting to do it. What TinyKVM does do is already very interesting and probably a lot more practical anyways. It'd probably be better to fork off or build an entire purpose-built sandbox for GUI software, realistically.

Still, pretty interesting stuff to think about.

> And we currently only have very rudimentary support for threads, enough for a server program with ancillary threads to boot up but the expectation is currently that the call into TinyKVM only runs a single thread and we fork multiple copies of the VM to handle requests in parallel.

BTW, I think this design is really cool. This is something I have wanted to exist for a while, even though I don't practically need it.

[1]: https://unix.stackexchange.com/a/591299

sheepscreek

3 months ago

Although I didn’t fully grasp half of it, I thoroughly enjoyed reading it. I was hooked from the beginning to the very end. I’m genuinely excited about the potential of TinyKVM. It’s unbelievable how far we’ve come from the early days of VMWare led virtualization, and the fact that we have such powerful machines that anyone can buy! We’ve even got much better tooling to squeeze out more performance without risking safety/security (Rust FTW!).

munchlax

3 months ago

The traditional way of doing this is by combining programs. Many programs already do this. e.g.:

time nice distcc ccache gmake

I do this with other tools as well. bwrap, chroot, env, setpriv, xchpst, etc. They all stack.

3eb7988a1663

3 months ago

I want to be more deliberate about securing my tools, but all of the options seem so complex that I do not know where to begin. Then you get various pithy statements like, "chroot is not a security layer", "X cannot be used when you use Y", and it feels hopeless for a novice. Most of the documentation for these tools seem to expect a baseline system administration greater than my own.

I instead lean on heavyweight VMs, but would love something like this which should be a hard security boundary for little cost.

mindcrash

3 months ago

Qubes maybe? https://www.qubes-os.org/

wmf

3 months ago

It sounds like you're talking about Qubes.

3eb7988a1663

3 months ago

I want to love Qubes, but it is a lot more heavyweight than I want to pursue. I have no crypto fortune or government/industrial secrets worth stealing, so it would be putting on a lot of pain knowing I am not a person of interest. I already run my development work inside a VM, but that has some papercuts. Going full Qubes would probably get even more annoying.

A security/isolation layer like this I could use for free feels like it would get me so close to the Qubes ideal without having to completely change how I interface with my machine.

pgaddict

3 months ago

IMHO the whole point of Qubes is that it does not do the compartmentalization at the level of individual applications, but groups of applications. Otherwise you'd need to very clearly specify how/when exactly the applications can exchange data, what data, etc. I'm not saying it's impossible, but "apps in the same qube VM can do whatever" is a much easier concept.

rolandog

3 months ago

You can do this with Guix [0], with the added benefit of package reproducibility.

[0]: https://www.futurile.net/2023/04/29/guix-shell-virtual-envir...

jchw

3 months ago

Given the use of the word "container" that seems to be using Linux namespacing rather than KVM. In case of containers, the isolation is provided solely by the Linux kernel, plus of course any additional defenses you add on top of it. While Guix shell having a built-in way to spawn isolated containers is extremely cool (I use NixOS. As far as I know, Nix does not have an equivalent feature) it seems like from a security standpoint, it would just be similar to using bubblewrap or Firejail directly. Though I like this idea. Seems very useful and convenient.

What I think we're really after though is something like gVisor, where the guest program is completely isolated from the host kernel, and the daemons that allow the guest program to reach the outside world are themselves highly locked down by the host kernel using technologies like seccomp-bpf and namespacing, on top of whatever constraints and validation they apply on their own. While nothing is foolproof, this feels like, if done carefully, it would give you a very good layer of isolation that would be extremely challenging to bypass. I reckon that the sandbox would cease to be the most interesting attack target in a system like gVisor, since in any complicated system, there will probably always be some lower-hanging fruit. (And of course, TinyKVM seems to be basically in the same wheelhouse. None of these solutions are designed to run GUI software, though I reckon it probably could be made to work.)

munchlax

3 months ago

I admit I havent investigated this thoroughly, but I suspect the low hanging fruit in the tinykvm case is having rw access to /dev/kvm

I think it should be possible to pass /dev/kvm as an open fd to daemons like kvm server and mark it as non-inheritable. As long as the vm is in a subprocess it would be okay I guess.

swiftcoder

3 months ago

Every time I click on one of these posts, I'm expecting it to be a tiny KVM switch. When did this whole KVM nomenclature catch on for virtual machines?

deivid

3 months ago

Well it was released on Linux in 2007, so it's meant Kernel Virtual Machine for at least 18 years

See: https://en.wikipedia.org/wiki/Kernel-based_Virtual_Machine

radeeyate

3 months ago

KVM (as in the switch) was termed in 1995: https://en.wikipedia.org/wiki/KVM_switch

swiftcoder

3 months ago

It certainly wasn't in common usage that early - at least not outside of linux circles. I don't really recall hearing it in this context before maybe the early 20's

mattbee

3 months ago

First I'd heard of this project; here's an introduction from the author: https://fwsgonzo.medium.com/tinykvm-the-fastest-sandbox-564a...

laurencerowe

3 months ago

A couple of discussions on previous TinyKVM posts:

TinyKVM: Fast sandbox that runs on top of Varnish - https://news.ycombinator.com/item?id=43358980

Deno Under TinyKVM in Varnish - https://news.ycombinator.com/item?id=43650792

dinobones

3 months ago

I was so confused by this article.

I was confusing it with TinyPilot, a hardware KVM made by an indie hacker Michael Lynch, that I think has since been acquired.

yjftsjthsd-h

3 months ago

I made the same mistake, confusing it with the Luckfox PicoKVM ( https://www.cnx-software.com/2025/09/23/luckfox-picokvm-low-... )

nmstoker

3 months ago

Yes, the overloading of KVM here caught me out too!

laurencerowe

3 months ago

I'm pretty hopeful that the combination of per-request isolation and the new snapshot functionality we're currently working on will be a big step forward for those running server-side JS at scale.

Having each request start from the exact same program state should make reproducing and fixing production issues easier. In a way it combines the predictability of the CGI programming model with the speed of a warmed modern JIT runtime.

skybrian

3 months ago

Could someone give a high-level overview of what this is and why you'd use it?

sterlinm

3 months ago

I spent a while mixing this up with PiKVM and was having trouble understanding how any of it would fit in with that project. Made a lot more sense once I got over that.

ValdikSS

3 months ago

I read until "gVisor, system call emulation" and though that this is some kind of IP-KVM project port to RTOS or microcontroller or something other thing which reuses Linux code but does not run Linux.

nl

3 months ago

How does this compare to Amazon's Firecracker VM?

laurencerowe

3 months ago

Firecracker runs a full Linux guest within KVM while TinyKVM runs just a single process within KVM and handles syscalls on the host by validating permissions then calling the host kernel syscall.

This minimises memory usage and lets us track file descriptors which lets us very quickly reset the guest process (under 100us for deno.)

deivid

3 months ago

This is amazing! I am also a little bit obsessed with fast-booting kvm for per-request isolation, and have managed to get Linux to pid1 in 3.6ms, I am starting to go a little insane because I don't know how to measure the rest of the CPU time (would love a flamegraph somehow) -- the ftrace data just... confuses me