hackernews client

Toro: Deploy Applications as Unikernels

148 pointsposted 2 months ago

140 Comments

m132

2 months ago

Projects like this and Docker make me seriously wonder where software engineering is going. Don't get me wrong, I don't mean to criticize Docker or Toro in parcicular. It's the increasing dependency on such approaches that bothers me.

Docker was conceived to solve the problem of things "working on my machine", and not anywhere else. This was generally caused by the differences in the configuration and versions of dependencies. Its approach was simple: bundle both of these together with the application in unified images, and deploy these images as atomic units.

Somewhere along the lines however, the problem has mutated into "works on my container host". How is that possible? Turns out that with larger modular applications, the configuration and dependencies naturally demand separation. This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

Now hardware virtualization. I like how AArch64 generalizes this: there are 4 levels of privilege baked into the architecture. Each has control over the lower and can call up the one immediately above to request a service. Simple. Let's narrow our focus to the lowest three: EL0 (classically the user space), EL1 (the kernel), EL2 (the hypervisor). EL0, in most operating systems, isn't capable of doing much on its own; its sole purpose is to do raw computation and request I/O from EL1. EL1, on the other hand, has the powers to directly talk to the hardware.

Everyone is happy, until the complexity of EL1 grows out of control and becomes a huge attack surface, difficult to secure and easy to exploit from EL0. Not good. The naive solution? Go a level above, and create a layer that will constrain EL1, or actually, run multiple, per-application EL1s, and punch some holes through for them to still be able to do the job—create a hypervisor. But then, as those vaguely defined "holes", also called system calls and hyper calls, grow, won't so the attack surface?

Or in other words, with the user space shifting to EL1, will our hypervisor become the operating system, just like docker-compose became a dynamic linker?

nine_k

2 months ago

I see a number of assumptions in your post which I find not matching my view of the picture.

Containers arose as a way to solve the dependency problems created by traditional Unix. They grow from tools like chroot, BSD jails, and Solaris Zones. Containers allow to deploy dependencies that cannot be simultaneously installed on a traditional Unix host system. it's not a UNIX architecture limitation but rather a result of POSIX + tradition; e.g. Nix also solves this, but differently.

Containers (like chroot and jail before them) also help ensure that a running service does not depend on the parts of the filesystem it wasn't given access to. Additionally, containers can limit network access, and process tree access.

These limitations are not a proper security boundary, but definitely a dependency boundary, helping avoid spaghetti-style dependencies, and surprises like "we never realized that our ${X} depends on ${Y}".

Then, there's the Fundamental Theorem of Software Engineering [1], which states: "We can solve any problem by introducing an extra level of indirection." So yes, expect the number of levels of indirection to grow everywhere in the stack. A wise engineer can expect to merge or remove a some levels here and there, when the need for them is gone, but they would never expect that new levels of indirection should stop emerging.

[1]: https://en.wikipedia.org/wiki/Fundamental_theorem_of_softwar...

m132

2 months ago

To be honest, I've read your response 3 times and I still don't see where we disagree, assuming that we do.

I've mostly focused on the worst Docker horrors I've seen in production, extrapolating that to the future of containers, as pulling in new "containerized" dependencies will inevitably become just as effortless as it currently is with regular dependencies in the new-style high-level programming languages. You've primarily described a relatively fresh, or a well-managed Docker deployment, while admitting that spaghetti-style dependencies have become a norm and new layers will pile up (and by extension, make things hard to manage).

I think our points of view don't actually collide.

nine_k

2 months ago

We do not disagree about the essence, but rather in accents. Some might say that sloppy engineers were happy to pack their Ruby-Goldbergesque deployments into containers. I say that even the most excellent and diligent engineers sometimes faced situations when two pieces of software required incompatible versions of a shared library, which depended on a tree of other libraries with incompatible versions, etc, and there's a practical limit of what you can and should do with bash scripts and abuse of LD_PRELOAD.

Many of the "new" languages, like Go (16 years), Rust (13 years), or Zig (9 years) just can build static binaries, not even depending on libc. This has both upsides and downsides, especially with security fixes. Rebuilding a container to include an updated .so dependency is often easier and faster than rebuilding a Rust project.

Docker (or preferably Podman) is not a replacement for linkers. It's an augmentation to the package system, and a replacement for the common file system layout, which is inadequate for modern multi-purpose use of a Unix (well, Linux) box.

m132

2 months ago

I see, you're providing a complementary perspective. I appreciate that, and indeed, Docker isn't always evil. My intention was to bring attention to the abuse of it and compare it to virtualization of unikernels, which to me appears to be on a similar trajectory.

As for the linker analogy, I compared docker-compose (not Docker proper) to a dynamic linker because it's often used to bring up larger multi-container applications, similar to how large monolithic applications with plenty of shared library dependencies are put together by ld.so, and those multi-container applications can be similarly brittle if developed under the assumption that merely wrapping them up in containers will assure portability, defeating most of Docker's advantages and reducing it to a pile of excess layers of indirection. This is similar to the false belief that running kernel-mode code under a hypervisor is by itself more secure than running it as process on top of a bare-metal kernel.

nine_k

2 months ago

Indeed, the problem of the distributed monolith does exist. If it arises, a reasonable engineering leader would just migrate to a proper monolith: https://www.twilio.com/en-us/blog/developers/best-practices/...

eyberg

2 months ago

Containers got popular at at time when there were an increasingly number of people that were finding it hard to install software on their system locally - especially if you were, for instance, having to juggle multiple versions of ruby or multiple versions of python and those linked to various major versions of c libraries.

Unfortunately containers have always had an absolutely horrendous security story and they degrade performance by quite a lot.

The hypervisor is not going away anytime soon - it is what the entire public cloud is built on.

While you are correct that containers do add more layers - unikernels go the opposite direction and actively remove those layers. Also, imo the "attack surface" is by far the smallest security benefit - other architectural concepts such as the complete lack of an interactive userland is far more beneficial when you consider what an attacker actually wants to do after landing on your box. (eg: run their software)

When you deploy to AWS you have two layers of linux - one that AWS runs and one that you run - but you don't really need that second layer and you can have much faster/safer software without it.

m132

2 months ago

I can understand the public cloud argument; if the cloud provider insists on you delivering an entire operating system to run your workloads, a unikernel indeed slashes the amount of layers you have to care about.

Suppose you control the entire stack though, from the bare metal up. (Correct me if I'm wrong, but) Toro doesn't seem to run on real hardware, you have to run it atop QEMU or Firecracker. In that case, what difference does it make if your application makes I/O requests through paravirtualized interfaces of the hypervisor or talks directly to the host via system calls? Both ultimately lead to the host OS servicing the request. There isn't any notable difference between the kernel/hypervisor and the user/kernel boundary in modern processors either; most of the time, privilege escalations come from errors in the software running in the privileged modes of the processor.

Technically, in the former case, besides exploiting the application, a hypothetical attacker will also have to exploit a flaw in QEMU to start processes or gain further privileges on the host, but that's just due to a layer of indirection. You can accomplish this without resorting to hardware virtualization. Once in QEMU, the entire assortment of your host's system calls and services is exposed, just as if you ran your code as a regular user space process.

This is the level you want to block exec() and other functionality your application doesn't need at, so that neither QEMU nor your code ran directly can perform anything out of their scope. Adding a layer of indirection while still leaving user/kernel, or unikernel/hypervisor junction points unsupervised will only stop unmotivated attackers looking for low-hanging fruit.

toast0

2 months ago

> Suppose you control the entire stack though, from the bare metal up. (Correct me if I'm wrong, but) Toro doesn't seem to run on real hardware, you have to run it atop QEMU or Firecracker.

Some unikernels are intended to run under a hypervisor or on bare metal. Bare metal means you need some drivers, but if you have a use case for a unikernel on bare metal, you probably don't need to support the vast universe of devices, maybe only a few instances of a couple types of things.

I've got a not production ready at all hobby OS that's adjacent to a unikernel; runs in virtio hypervisors and on bare metal, with support for one NIC. In it's intended hypothetical use, it would boot from PXE, with storage on nodes running a traditional OS, so supporting a handful of NICs would probably be sufficient. Modern NICs tend to be fairly similar in interface, so if the manufacturer provides documentation, it shouldn't take too long to add support at least once you've got one driver doing multiple tx/rx queues and all that jazz... plus or minus optimization.

For storage, you can probably get by with two drivers, one for sata/ahci and one for nvme. And likely reuse an existing filesystem.

ignoramous

2 months ago

> I've got a not production ready at all hobby OS

Do you usually publish your hobby code publicly? If not, consider this an appeal to do so (:

> Modern NICs tend to be fairly similar in interface, so if the manufacturer provides documentation, it shouldn't take too long to add support at least once you've got one driver ... For storage, you can probably get by with two drivers

I take that there aren't any pluggable drivers for NICs like there's for nvme/sata disks?

toast0

2 months ago

> Do you usually publish your hobby code publicly? If not, consider this an appeal to do so (:

Yes; https://github.com/russor/crazierl/ and there's an in browser demo as well https://crazierl.org/demo.html (thanks to v86! https://github.com/copy/v86 ) Supports virtio-net and realtek 8168.

> I take that there aren't any pluggable drivers for NICs like there's for nvme/sata disks?

I mean, there is NDIS / NDISWrapper. Or, I think it wouldn't be too hard to run netbsd drivers... but I'm crazy and want my drivers in userland, in Erlang, so none of that applies. :)

As a fair warning, there's some concurrency errors in the kernel which I haven't tracked down that results in sometimes getting stuck before the shell prompt comes up, the tcp stack is just ok enough to mostly work, and the dhcp client only works if everything goes right.

m132

2 months ago

Erlang! Indeed a crazy idea (in a good way!), and while I'm not normally a big fan of unikernels, now you've got me seriously intrigued :)

I've been dabbling in Erlang and OS development myself, my biggest inspirations being Microsoft Singularity and QNX. The former is a C# lookalike of what you're making, or at least that's how it seems from my perspective.

The readme mentions a FreeBSD-like system call interface, but then the drivers and the network stack are written in Erlang, and, as you've mentioned, run in the user land. Is that actually a unikernel design with BEAM running in the kernel, or more of a microkernel hosting BEAM, with it providing device handling and the user space?

toast0

2 months ago

The original plan was BEAM on metal, but I had a hard time getting that started... so I pivoted to BEAM from pkg, running on a just enough kernel that exposes only the FreeBSD syscalls that actually get called.

Where that fits in the taxonomy of life, I'm not sure. There is a kernel/userspace boundary (and also a c-code/erlang code boundary in userspace), so it's not quite a unikernel. I wouldn't really call it a microkernel; there's none of the usual microkernel stuff... I just let userspace do i/o ports with the x86 task structure and do memory mapped i/o by letting it mmap anything (more or less). The kernel manages timers/timekeeping and interrupts, Erlang drivers open a socket to get notified when an interrupt fires --- level triggered interrupts would be an issue. Kernel also does thread spawning and mutex support, connects pipes, early/late console output, etc.

If I get through my roadmap (networked demo, uefi/amd64 support, maybe arm64 support, run the otp test suite), I might look again and see if I can eliminate the kernel/userspace divide now that I understand the underneath, but the minimal kernel approach lets me play around with the fun parts, so I'm pretty happy with it. I've got a slightly tweaked dist working and can hotload userspace Erlang code over the network, including the tcp stack, which was the itch I wanted to scratch... nevermind that the tcp stack isn't very good at the moment ;)

m132

2 months ago

Really cool! Will definitely take a closer look in my spare time.

>I just let userspace do i/o ports [...] and do memory mapped i/o by letting it mmap anything (more or less). The kernel manages timers/timekeeping and interrupts [...]

This is how QNX does it too, allowing privileged processes to use MAP_PHYS and port I/O instructions on x86, and handle interrupts like they're POSIX signals. It all boils down to how you structure your design, but personally, I think that's not a bad approach at all. The cool thing about it is that, after the initial setup, you can drop the privileges for creating further mappings and handlers, reducing the attack surface.

Unless you're trying to absolutely minimize the cost and amount of context switches, I think moving BEAM into the kernel would be a downgrade, but again, I'm a big proponent of microkernels :)

Looking forward to the UEFI and AArch64 ports!

eyberg

2 months ago

I can't speak for all the various projects but imo these aren't made for bare metal - if you want true bare metal (metal you can physically touch) use linux.

One of the things that might not be so apparent is that when you deploy these to something like AWS all the users/process mgmt/etc. gets shifted up and out of the instance you control and put into the cloud layer - I feel that would be hard to do with physical boxen cause it becomes a slippery slope of having certain operations (such as updates) needing auth for instance.

mvaralar

2 months ago

> Suppose you control the entire stack though, from the bare metal up. (Correct me if I'm wrong, but) Toro doesn't seem to run on real hardware, you have to run it atop QEMU or Firecracker. In that case, what difference does it make if your application makes I/O requests through paravirtualized interfaces of the hypervisor or talks directly to the host via system calls? Both ultimately lead to the host OS servicing the request. There isn't any notable difference between the kernel/hypervisor and the user/kernel boundary in modern processors either; most of the time, privilege escalations come from errors in the software running in the privileged modes of the processor.

Toro can run on baremetal although I stopped to support on that a few years ago. I tagged in master the commit when this happened. Also, I removed the TCP/IP Stack in favor to VSOCK. Those changes, though, could be reversed in case there is interest on those features.

laurencerowe

2 months ago

> In that case, what difference does it make if your application makes I/O requests through paravirtualized interfaces of the hypervisor or talks directly to the host via system calls?

Hypervisors expose a much smaller API surface area to their tenants than an operating system does to its processes which makes them much easier to secure.

Veserv

2 months ago

That is a artifact of implementation. Monolithic operating systems with tons of shared services expose lots to their tenants. Austere hypervisors, the ones with small API surface areas, basically implement a microkernel interface yet both expose significantly more surface area and offer a significantly worse guest experience than microkernels. That is why high security systems designed for multi-level security for shared tenants that need to protect against state actors use microkernels instead of hypervisors.

ignoramous

2 months ago

> That is why high security systems designed for multi-level security for shared tenants

When you say "high security" do you mean Confidential Computing workloads run by Trusty (Enclave) / Virtee (Realm) etc? If so, aren't these system limited in what they can do, as in, there usually is another full-blown OS that's running the user-facing bits?

> that need to protect against state actors

This is a very high bar for a software-only solution (like a microkernel) to meet? In my view, open hardware specification, like OpenTitan, in combination with small-ish software TCB, make it hard for state actors (even if not impossible).

Veserv

2 months ago

No. I am talking about multi-level security [1] which allows a single piece of hardware to handle top secret and unclassified materials simultaneously via software protection. This protection is limited to software attempts to access top secret materials from the unclassified domain; hardware and physical attacks are out-of-scope.

There have been many such systems verified to be secure against state actors according to the TCSEC Orange Book Level A1 standard and the subsequent Common Criteria SKPP standard which requires both full formal proofs of security and explicitly requires the NSA to identify zero vulnerabilities during a multi-month penetration test before allowing usage in NSA and DoD systems.

[1] https://en.wikipedia.org/wiki/Multilevel_security

j-krieger

2 months ago

> Unfortunately containers have always had an absolutely horrendous security story and they degrade performance by quite a lot.

This is demonstratably untrue.

eyberg

2 months ago

Let's see last month (November 2025) we had CVE-2025-31133, CVE-2025-52565, and CVE-2025-52881 alone. Container breakouts happen almost monthly.

eikenberry

2 months ago

I think they were talking more about the degraded performance.

In terms of the security aspects though, how does security holes in a layer that restricts things more than without it degrade security? Seems like saying that CVEs on browser's javascript sandboxing degrade the browser security more than just not having sandboxes.

eyberg

2 months ago

Duplicating a networking and storage layer on top of existing storage/networking layers that containers, and the orchestrators such as k8s provide, absolutely degrade performance - full stop. No one runs containers raw (w/out an underlying vm) in the cloud - they always exist on top of vms.

The problem with "container" security is that even in this thread many people seem to think that it is a security barrier of some kind when it was never designed to be one. The v8 sandbox was specifically created to deal with sandboxing. It still has issues but at least it was thought about and a lot of engineering went into it. Container runtimes are not exported via the kernel. Unshare is not named 'create_container'. A lot of the container issues we see are runtime issues. There are over a half-dozen different namespaces that are used in different manners that expose hard to understand gotchas. The various container runtimes decide themselves how to deal with these and they have to deal with all the issues in their code when using them. A very common bug that these runtimes get hit by are TOCTOU (time of check to time of use) vulns that get exposed in these runtimes.

Right now there is a conversation about the upcoming change to systemd that runs sshd on vsock by default (you literally have to disable it via kernel cli flag - systemd.ssh_auto=no) - guess what one of the concerns is? Vsock isn't bound to a network namespace. This is not itself a vulnerability but it most definitely is going to get taken advantage in the future.

j-krieger

2 months ago

A container breakout is a valid CVE, but it also is an escape into an environment that is as secure as any unix environment was before we even had containers to begin with.

ritcgab

2 months ago

All specific to runc.

ahepp

2 months ago

> other architectural concepts such as the complete lack of an interactive userland is far more beneficial when you consider what an attacker actually wants to do after landing on your box

What does that have to do with unikernel vs more traditional VMs? You can build a rootfs that doesn't have any interactive userland. Lots of container images do that already.

I am not a security researcher, but I wouldn't think it would be too hard to load your own shell into memory once you get access to it. At least, compared to pulling off an exploit in the first place.

I would think that merging kernel and user address spaces in a unikernel would, if anything, make it more vulnerable than a design using similar kernel options that did not attempt to merge everything into the kernel. Since now every application exploit is a kernel exploit.

eyberg

2 months ago

A shell by design is explicitly made to run other programs. You type in 'ls', 'cd', 'cat', etc. but those are all different programs. A "webshell" can work to a degree as you could potentially upload files, cat files, write to files, etc. but you aren't running other programs under these conditions - that'd be code you're executing - scripting languages make this vastly easier than compiled ones. It's a lot more than just slapping a heavy-handed seccomp profile on your app.

Also merging the address space is not a necessity. In fact - 64-bit (which is essentially all modern cloud software) mandates virtual memory to begin with and many unikernel projects support elf loading.

pjmlp

2 months ago

Linux containers you mean.

The story is quite different in HP-UX, Aix, Solaris, BSD, Windows, IBM i, z/OS,...

ripdog

2 months ago

Windows has containers?

m132

2 months ago

Yes.

There are AppContainers. Those have existed for a while and are mostly targeted at developers intending to secure their legacy applications.

https://learn.microsoft.com/en-us/windows/win32/secauthz/app...

There's also Docker for Windows, with native Windows container support. This one is new-ish:

https://learn.microsoft.com/en-us/virtualization/windowscont...

torginus

2 months ago

The low level API of process isolation on Windows is Job Objects, that provide the necessary kernel APIs for namespacing objects and controlling resource use.

AppContainers, and Docker for Windows (the one for running dockerized windows apps, not running linux docker containers on top of WSL) is using this API, these high-level features are just the 'porcelain'

jayd16

2 months ago

Windows containers are actually quite nice once you get past a few issues. Perf is the biggest as it seems to run in a VM in windows 11.

Perf is much better on Windows server. It's actually really pleasant to get your office appliances (a build agent etc) in a container on a beefy Windows machine running Windows server.

mananaysiempre

2 months ago

> Perf is the biggest as it seems to run in a VM in windows 11.

Doesn’t “virtualization-based security” mean everything does, container or no? Or are they actually VMs even with VBS disabled?

ironhaven

2 months ago

With a standard windows server license you are only allowed to have a two hyperv virtual machines but unlimited "windows containers". The design is similar to Linux with namespaces bolted onto the main kernel so they don't provide any better security guaranies than Linux namespaces.

Very useful if you are packaging trusted software don't want to upgrade your windows server license.

pixl97

2 months ago

>what an attacker actually wants to do after landing on your box.

Aren't there ways of overwriting the existing kernel memory/extending it to contain an a new application if an attacker is able to attack the running unikernel?

What protections are provided by the unikernel to prevent this?

eyberg

2 months ago

To be clear there are still numerous attacks one might lob at you. For instance you if you are running a node app and the attacker uploads a new js file that they can have the interpreter execute that's still an issue. However, you won't be able to start running random programs like curling down some cryptominer or something - it'd all need to be contained within that code.

What becomes harder is if you have a binary that forces you to rewrite the program in memory as you suggest. That's where classic page protections come into play such as not exec'ing rodata, not writing to txt, not exec'ing heap/stack, etc. Just to note that not all unikernel projects have this and even if they do it might be trivial to turn them off. The kernel I'm involved with (Nanos) has other features such as 'exec protection' which prevents that app from exec-mapping anything not already explicitly mapped exec.

Running arbitrary programs, which is what a lot of exploit payloads try to achieve, is pretty different than having to stuff whatever they want to run inside the payload itself. For example if you look at most malware it's not just one program that gets ran - it's like 30. Droppers exist solely to load third party programs on compromised systems.

ignoramous

2 months ago

> The kernel I'm involved with (Nanos) has other features such as 'exec protection' which prevents that app from exec-mapping anything not already explicitly mapped exec.

Does this mean JIT (and I guess most binary instrumentation (debuggers) / virtualization / translation tech) won't run as expected?

eyberg

2 months ago

We don't enable that exec-protect feature on by default explicitly for this reason. You are right - jit needs it.

wmf

2 months ago

If the stack and heap are non-executable and page tables can't be modified then it's hard to inject code. Whether unikernels actually apply this hardening is another matter.

catlifeonmars

2 months ago

Isn’t this where ROP gadgets come in?

wmf

2 months ago

ASLR defeats ROP. Whether unikernels actually use ASLR is another matter.

dheera

2 months ago

I always thought of Docker as a "fuck it" solution. It's the epitomy of giving up. Instead of some department at a company releasing a libinference.so.3 and a libinference-3.0.0.x86_64.deb they ship some docker image that does inference and call it a microservice. They write that they launched, get a positive performance review, get promoted, and the Docker containers continue to multiply.

Python package management is a disaster. There should be ways of having multiple versions of a package coexist in /usr/lib/python, nicely organized by package name and version number, and import the exact version your script wants, without containerizing everything.

Electron applications are the other type of "fuck it" solution. There should be ways of writing good-looking native apps in JavaScript without actually embedding a full browser. JavaScript is actually a nice language to write front-ends in.

catlifeonmars

2 months ago

> Python package management is a disaster. There should be ways of having multiple versions of a package coexist in /usr/lib/python, nicely organized by package name and version number, and import the exact version your script wants, without containerizing everything.

Have you tried uv?

dheera

2 months ago

Well sure, every language has some band-aid. The real solution should have been Python itself supporting:

    import torch==2.9.1

Instead of a bunch of other useless crap additions to the language, this should have been a priority, along with the ability for multiple versions to coexist in PYTHON_PATH.

soulofmischief

2 months ago

There is a vast amount of complexity involved in rolling things from scratch today in this fractured ecosystem and providing the same experience for everyone.

Sometimes, the reduction of development friction is the only reason a product ends up in your hands.

I say this as someone whose professional toolkit includes Docker, Python and Electron; Not necessarily tools of choice, but I'm one guy trying to build a lot of things and life is short. This is not a free lunch and the optimizer within me screams out whenever performance is left on the table, but everything is a tradeoff. And I'm always looking for better tools, and keep my eyes on projects such as Tauri.

ahepp

2 months ago

I think there's merit to your criticisms of the way docker is used, but it also seems like it provides substantial benefits for application developers. They don't need to beg OS maintainers to update the package, and they don't need to maintain builds for different (OS, version) targets any more.

They can just say "here's the source code, here's a container where it works, the rest is the OS maintainer's job, and if Debian users running 10 year old software bug me I'm just gonna tell them to use the container"

dheera

2 months ago

Yeah I'm not against Docker in its entirety. I think it is good for development purposes to emulate multiple different environments and test things inside them, just not as a way to ship stuff.

nineteen999

2 months ago

Agree on all fronts. The advent of Dockerfiles as a poor mans packaging system and the per-language package managers has set the industry back several years in some areas IMHO.

catlifeonmars

2 months ago

> and the per-language package managers has set the industry back several years in some areas IMHO

Curious, can you expand on this?

nineteen999

2 months ago

Python has what, half a dozen mostly incompatible package managers? Node? Ruby? All because they're too lazy, inexperienced or stubborn to write or automate RPM spec files, and/or Debian rules files.

To be fair, the UNIX wars probably inspired this in the first place - outside of SVR4 deriviatives, most commercial UNIX systems (HP-UX, AIX, Tru64) had their own packaging format. Even the gratis BSD systems all have their own variants of the same packaging system. This was the one thing that AT&T and Sun Solaris got right. Linux distros merely followed suit at the time - Redhat with RPM, Debian with DEB, and then Slackware and half a dozen other systems - thankfully we seem to have coalesced on RPM, DEB, Flatpak, Snap, Appimage etc... but yeah that's before you get to the language specific package management. It's a right mess, carried over from 90's UNIX "NIH" syndrome.

fragmede

2 months ago

> JavaScript is actually a nice language to write front-ends in.

I've written my fair share of GUIs, and React (and thus Javascript) is great compared to, I don't know, PHP, but CSS is the absolute devil.

user

2 months ago

[deleted]

zozbot234

2 months ago

> This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

The difference is that you can move that whole bunch of interlinked containers to another machine and it will work. You don't get that when running on bare hardware. The technology of "containers" is ultimately about having the kernel expose a cleaned up "namespaced" interface to userspace running inside the container, that abstracts away the details of the original machine. This is very much not intended as "sandboxing" in a security sense, but for most other system administration purposes it gets pretty darn close.

BobbyTables2

2 months ago

I’ve had similar concerns.

At some point, few people even understand the whole system and whether all these layers are actually accomplishing anything.

It’s especially bad when the code running at rarified levels is developed by junior engineers and “sold” as an opaque closed source thing. It starts to actually weaken security in some ways but nobody is willing to talk about that.

“It has electrolytes…”

catlifeonmars

2 months ago

Docker is what plants crave.

j-krieger

2 months ago

Yea, with uneeded bload like rule based access controls, ACS and secret management. Some comments on this site.

kitd

2 months ago

This results in them moving up a layer, in this case creating a network of inter-dependent containers that you now have to put together for the whole thing to start... and we're back to square one, with way more bloat in between.

I think you're over-egging the pudding. In reality, you're unlikely to use more than 2 types of container host (local dev and normal deployment maybe), so I think we've moved way beyond square 1. Config is normally very similar, just expressed differently, and being able to encapsulate dependencies removes a ton of headaches.

fragmede

2 months ago

If you're going to bring up ARM and EL levels, but not mention rings/CPL on x86, the discussion seems incomplete.

drawnwren

2 months ago

Nix is where we're going. Maybe not with the configuration language that annoys python devs, but declarative reproducible system closures are a joy to work with at scale.

ignoramous

2 months ago

> Nix ... declarative reproducible system closures are a joy to work with ...

From what I read, I gather nixpkgs are more hermetic (as in Bazel [0]) & not reproducible? https://discourse.nixos.org/t/nixos-is-not-reproducible/4268... / https://archive.vn/mXeih

[0] https://bazel.build/basics/hermeticity

drawnwren

2 months ago

Reproducible can have a lot of meanings. Nix guarantees that your build environment + commands are the same. It still uses all the usual build tools and it would be trivial to create a non-reproducible binary (--impure).

cryptonector

2 months ago

Hardware isolation is ETOOEXPENSIVE, so we need soft containers/sandboxes. It's that simple.

immibis

2 months ago

We had that. It was called an operating system kernel. What happened?

cryptonector

2 months ago

That relied on hardware, which -recall- is ETOOEXPENSIVE.

soulofmischief

2 months ago

I've been running either Qubes OS or KVM/QEMU based VMs as my desktop daily driver for 10 years. Nothing runs on bare metal except for the host kernel/hypervisor and virt stack.

I've achieved near-native performance for intensive activities like gaming, music and visual production. Hardware acceleration is kind of a mess but using tricks like GPU passthrough for multiple cards, dedicated audio cards and and block device passthrough, I can achieve great latency and performance.

One benefit of this is that my desktop acts as a mainframe, and streaming machines to thin clients is easy.

My model for a long time has been not to trust anything I run, and this allows me to keep both my own and my client's work reasonably safe from a drive-by NPM install or something of that caliber.

Now that I also use a Apple Silicon MacBook as a daily driver, I very much miss the comfort of a fully virtualized system. I do stream in virtual machines from my mainframe. But the way Tahoe is shaping up, I might soon put Asahi on this machine and go back to a fully virtualized system.

I think this is the ideal way to do things, however, it will need to operate mostly transparently to an end user or they will quickly get security fatigue; the sacrifices involved today are not for those who lack patience.

Also, relevant XKCDs:

https://www.explainxkcd.com/wiki/index.php/2044:_Sandboxing_...

https://www.explainxkcd.com/wiki/index.php/2166:_Stack

m132

2 months ago

I think it's fine if you do it for yourself. It's a bit of a poor man's Linux-turned-microkernel solution. In fact, I work like this too, and this extends to my Apple Silicon Mac. The separation does have big security advantages, especially when different pieces of hardware are exclusively passed to the different, closed-off "partitions" of the system and the layer orchestrating everything is as minimal as it gets, or at least as guarded against the guests as it gets.

What worries me is when this model escalates from being cobbled up together by a system administrator with limited resources, to becoming baked into the design of software; the appropriation of the hypervisor layer by software developers who are reluctant to untangle the mess they've created at the user/kernel boundary of their program and instead start building on top of hardware virtualization for "security", to ultimately go on and pollute the hypervisor as the level of host OS access proves insufficient. This is beautifully portrayed by the first XKCD you've linked. I don't want to lose the ability to securely run VMs as the interface between the host and the guest OSes grows just as unmanageable as that of Linux and BSD system calls and new software starts demanding that I let it use the entirety of it, just like some already insists that I let it run as root because privilege dropping was never implemented.

If you develop software, you should know what kind of operating system access it needs to function and sandbox it appropriately, using the operating system's sandboxing facilities, not the tools reserved for system administrators.

nineteen999

2 months ago

> One benefit of this is that my desktop acts as a mainframe,

Are you for real? Tell us you've never worked on a mainframe without telling us you've ever worked on a mainframe.

soulofmischief

2 months ago

I'm not talking about an IBM mainframe. The definition Google gives me for mainframe is `a large high-speed computer, especially one supporting numerous workstations or peripherals`, which is exactly what my machine is.

nineteen999

2 months ago

Yeah nah. Mainframes have:

  * hot-swap power & CPUs
  * RAS (Reliability, Availability, Serviceability)
  * vendor SLAs
  * fault containment
  * designed uptime vs achieved uptime

If you can buy replacement parts on eBay and reboot to fix problems, it’s not a mainframe.

soulofmischief

2 months ago

According to Merriam-Webster:

  mainframe (noun)
  main· frame ˈmān-ˌfrām 

  1: a large, powerful computer that can handle many tasks concurrently and is usually used commercially
  2: (dated): a computer with its cabinet and internal circuits especially when considered separately from any peripherals connected to the computer

https://www.merriam-webster.com/dictionary/mainframe

The features you list are great to have, but my setup fits the first definition of mainframe as described. If you feel this definition is not specific enough, email Merriam-Webster and don't bother me about it.

dent9

2 months ago

Webster is wrong. A mainframe is not a generic high performance computer (that would be HPC). A mainframe is a very specific high performance computer.

soulofmischief

2 months ago

I repeat: I understand that mainframe has a specific meaning to many people, especially those who work on traditional mainframes, but I would rather you and the other user to email both Google and Merriam-Webster about their wrong definitions, and not bother me about it. I will correct my usage once they have updated the definition to your standards.

gucci-on-fleek

2 months ago

> Now hardware virtualization. I like how AArch64 generalizes this: there are 4 levels of privilege baked into the architecture. Each has control over the lower and can call up the one immediately above to request a service. Simple. Let's narrow our focus to the lowest three: EL0 (classically the user space), EL1 (the kernel), EL2 (the hypervisor). EL0, in most operating systems, isn't capable of doing much on its own; its sole purpose is to do raw computation and request I/O from EL1. EL1, on the other hand, has the powers to directly talk to the hardware.

> Everyone is happy, until the complexity of EL1 grows out of control and becomes a huge attack surface, difficult to secure and easy to exploit from EL0. Not good. The naive solution? Go a level above, and create a layer that will constrain EL1, or actually, run multiple, per-application EL1s, and punch some holes through for them to still be able to do the job—create a hypervisor. But then, as those vaguely defined "holes", also called system calls and hyper calls, grow, won't so the attack surface?

(Disclaimer: this is all very far from my area of expertise, so some of the below may be wrong/misleading)

Nobody can agree whether microkernels or monolithic kernel are "better" in general, but most people seem to agree that microkernels are better for security [0], with seL4 [1] being a fairly strong example. But microkernels are quite a bit slower, so in the past when computers were slower, microkernels were noticeably slower, and security was less of a concern than it is now, so essentially every mainstream operating system in the 90s used some sort of monolithic kernel. These days, people might prefer different security–performance tradeoffs, but we're still using kernels designed in the 90s, so it isn't easy to change this any more.

Moving things to the hypervisor level lets us gain most of the security benefits of microkernels while maintaining near-perfect compatibility with the classic Linux/NT kernels. And the combination of faster computers (performance overheads therefore being less of an issue), more academic research, and high-quality practical implementations [2] means that I don't expect the current microkernel-style hypervisors to gain much new attack surface.

This idea isn't without precedent either—Multics (from the early 70s) was partially designed around security, and used a similar design with hardware-enforced hierarchical security levels [3]. Classic x86 also supports 4 different "protection rings" [4], and virtualisation plus nested virtualisation adds 2 more, but nothing ever used rings 1 and 2, so adding virtualisation just brings us back to the same number of effective rings as the original design.

[0]: https://en.wikipedia.org/wiki/Microkernel#Security

[1]: https://sel4.systems/

[2]: https://xenproject.org/projects/hypervisor/

[3]: https://en.wikipedia.org/wiki/Multics#Project_history

[4]: https://en.wikipedia.org/wiki/Protection_ring

upboundspiral

2 months ago

I would like to qualify that seL4 (and the entire family of L4 kernels) were created exactly to disprove the idea that microkernels were slow. They are extremely perfomant.

The idea that microkernels are slow came from analyzing a popular microkernel at the time - mach. It in no way is a true blanket statement for all microkernels.

edit: found a good comparison chart (first link)

https://sigops.org/s/conferences/sosp/2013/talks/elphinstone...

https://sel4.systems/About/seL4-whitepaper.pdf

https://sel4.systems/performance.html

https://dl.acm.org/doi/10.1145/173668.168633

https://en.wikipedia.org/wiki/L4_microkernel_family

gucci-on-fleek

2 months ago

> The idea that microkernels are slow came from analyzing a popular microkernel at the time - mach. It in no way is a true blanket statement for all microkernels.

Don't microkernels inherently require lots of context switches between kernel-space and user-space, which are especially slow in a post-Meltdown/Spectre world? I know that Linux has semi-recently added kTLS and KSMBD to speed up TLS/SMB, and Windows used to implement parts of font rendering and its HTTP server in kernel mode to speed things up too, so this gave me the impression that having more things inside the kernel (== more monolithic) is better for speed. Or is this only the case because of how the Linux/NT kernels are implemented, and doesn't apply to microkernels?

upboundspiral

2 months ago

I am far from an expert on this topic, but I think you are right in pointing out that Spectre/Meltdown have greater impact on microkernel designs.

I do think there is very interesting research in this topic however, on how systems could mitigate some of these attacks in a more foolproof way.

https://microkerneldude.org/2024/04/18/gofetch-will-people-e...

https://trustworthy.systems/publications/abstracts/Wistoff_S...

mustache_kimono

2 months ago

Bryan Cantrill, "Unikernels are unfit for production". [0]

[0]: https://www.tritondatacenter.com/blog/unikernels-are-unfit-f...

bri3d

2 months ago

At a practical level I think a thesis that "good" process isolation systems (aka, not hosted on Linux) build on years of development that unikernels will struggle to replace holds true.

At a conceptual level I really disagree with this piece, though:

> one cannot play up Linux kernel vulnerabilities as a silent menace while simultaneously dismissing hypervisor vulnerabilities as imaginary.

One can reasonably recognize Linux kernel vulnerabilities as extant and pervasive while acknowledging that hypervisors can be vulnerable. One can also realize that the surface area exposed by Linux is fundamentally much larger than that exposed by most hypervisors, and that the Linux `unshare` mechanism is insecure by default. It's kind of funny - the invocation of Linux really undermines this argument; there's no _reason_ a process / container isolation based system should be completely broken, but Linux _is_, and so it becomes a very weak opponent.

I really don't think I can agree with the debugging argument here at a conceptual level, either. Issues with debugging unikernels are caused by poor outside-in tooling, but with good outside-in tooling, a unikernel should be _easier_ to debug than a container or OS process, because the VM-owner / hypervisor will often already have a way to inspect the unikernel-machine's entire state from the outside, without additional struggle of trying to maintain, juggle, and restore multiple contexts within a running system. There is essentially an ISP/ICE debugging probe attached to the entire system end to end by default, in the form of the hypervisor.

For example, there is no reason a hosting hypervisor could not provide DTrace in a way which is completely transparent to the unikernel guest, and this would be much easier to implement than DTrace self-hosted in a running kernel!

If done properly, this way a uni-application basically becomes debugging-agnostic: it doesn't need cooperative tracepoints or self-modifying patches (and all of the state juggling that comes with that, think like Kprobe), because the hypervisor can do the tracing externally. The unikernel does not need to grow (in security surface area, debug-size, blast radius, etc.) to add more trace and debug capability.

cmrdporcupine

2 months ago

Agree with your points and I think fundamentally the issue with unikernels at this point comes down to: nobody has really done it right yet.

By which I mean I see two variants:

1- exotic and interesting and constrained but probably not applicable for people in the form of e.g. MirageOS. not applicable because OCaml just isn't mainstream enough

2- Or other systems which allow much easier porting of existing systems by providing a libc and extended set of "porting" libraries which end up by recreating huge swathes of what the operating system is doing already anyways, in order to make the existing application just cross compile and "feel at home". But in reality probably always in an incomplete or odd way, and now you're using someone's hand crafted set of compatibility libraries instead of a battle tested operating system.

I just think we haven't seen the right system, yet, which would probably be some specific application development mostly from the ground up in the context of unikernel, not the other way around. Potentially a set of constrained and targeted Rust etc crates built from nostd up + some services. I kept looking for MirageOS for Rust and haven't seen, instead saw stuff more like 2.

vigilans

2 months ago

There's a man who hasn't tried running qubes-mirage-firewall.

Unikernels don't work for him; there are many of us who are very thankful for them.

mustache_kimono

2 months ago

> there are many of us who are very thankful for them.

Why? Can you explain, in light of the article, and for those of us who may not be familiar with qubes-mirage-firewall, why?

vigilans

2 months ago

In Qubes you use VMs to separate your banking environment from the one where you pull npm dependencies and the one where you open untrusted PDFs.

Networking also happens in its own VM, and you can have multiple VMs dedicated to networking.

Much lower memory footprint running mirage firewall, and an attack surface orders of magnitude smaller (compared to a VM running a Linux distribution purely for networking).

wmf

2 months ago

Toro provides a GDB stub so there has been a little progress since that time.

cmrdporcupine

2 months ago

Cantrill is far smarter and accomplished than me, but this article feels a bit strawman and hit and run?

I think unikernels potentially have their place, but as he points, they remain mostly untried, so that's fair. We should analyze why that is.

On performance: I think the kernel makes many general assumptions that some specialized domains may want to short circuit entirely. In particular I am thinking how there's a whole body of research of database buffer pool management basically having elaborate work arounds for kernel virtual memory subsystme page management techniques, and I suspect there's wins there in unikernel world. Same likely goes for inference engines for LLMs.

The Linux kernel is a general purpose utility optimizing for the entire range of "normal things" people do with their Linux machines. It naturally has to make compromises that might impact individual domains.

That and startup times, big world of difference.

Is it going to help people better sling silly old web pages and whatever it is people do with computers conventionally? Yeah, I'd expect not.

On security, I don't think it's unreasonable or pure "security theatre" to go removing an attack surface entirely by simply not having it if you don't need it (no users, no passwords, no filesystem, whatever). I feel like he was a bit dismissive here? That is also the principle behind capability-passing security to some degree.

I would hate to see people close the door on a whole world of potentials based on this kind of summary dismissal. I think people should be encouraged to explore this domain, at least in terms of research.

ironhaven

2 months ago

If you software has no bugs then unikernels are a straight upgrade. If your software has bugs then the blast area for issues is now much larger. When was the last time you needed a kernel debugger for a misbehaving application?

wmf

2 months ago

A kernel debugger isn't magic; it's just a debugger (e.g. GDB). Debugging a VM is similar to debugging a process.

wewtyflakes

2 months ago

Aren't there more places to for things to off the rails? Vibes of https://www.usenix.org/system/files/1311_05-08_mickens.pdf

mustache_kimono

2 months ago

> On performance: ... In particular I am thinking how there's a whole body of research of database buffer pool management

Why? The solution thus far has been to turn off what the kernel does, and, do those things in userspace, not move everything in the kernel? Where are these performance gains to be had?

> The Linux kernel is a general purpose utility optimizing for the entire range of "normal things" people do with their Linux machines.

Yeah, like logging and debugging. Perhaps you say: "Oh we just add that logging and debugging to the blob we run". Well isn't that now another thing that can take down the system, when before it was a separate process?

> That and startup times, big world of difference.

Perhaps in this very narrow instance, this is useful, but what is it useful for? Can't Linux or another OS be optimized for this use case without having to throw the baby out with the bathwater? Can't one snapshot a Firecracker VM and reach even faster startup times?

> On security, I don't think it's unreasonable or pure "security theatre" to go removing an attack surface entirely

Isn't perhaps the most serious problem removing any and all protection domains? Like between apps and the kernel and between the apps themselves?

I mean -- sure maybe remove the filesystem, but isn't no memory protection what makes it a unikernel? And, even then, a filesystem is usually a useful abstraction! When have I found myself wanting less filesystem? Usually, I want more -- like ZFS.

This is all just to say -- you're right -- there may be a use case for such systems, but no one has really adequately described what that actually is, and therefore this feels like systems autoeroticism.

cmrdporcupine

2 months ago

> Why? The solution thus far has been to turn off what the kernel does, and, do those things in userspace, not move everything in the kernel? Where are these performance gains to be had?

There's all sorts of jankin' about trying to squeeze ounces of performance out of the kernel's page management, specifically for buffer pools.

e.g. https://www.cs.cit.tum.de/fileadmin/w00cfj/dis/_my_direct_up...

Page management isn't really a thing we can do well "in user space". And the kernel has strong ideas about how this stuff works, which work very well in the general case. But a DB (or other system level things like garbage collectors, etc) are special cases, often with special needs.

LeanStore, Umbra, etc. do tricks with VMM overcommit and the like to fiddle around with this, and the above paper even proposes custom kernel modules for the purpose (There's a github repo associated, I'd have to go look).

And then, further, a DB goes and basically implements its own equivalent of a filesystem, managing its own storage. Often fighting with the OS about the semantics of fsync/durability, etc.

I don't think it's an unreasonable mental leap for people to start thinking: "I'm by necessity [cuz cloud] in a VM. Now I'm inside an OS in a VM, and the OS is sometimes getting in my way, and I'm doing things to get around the OS... Why?"

mustache_kimono

2 months ago

> Page management isn't really a thing we can do well "in user space".

But it is the thing most high performance OLTP DBMSs, most of us are aware of, do? I'm also not sure your cite is relevant here. Or it is at least niche. The comparison is made to LeanStore, which is AFAICT is not feature complete, and a research prototype?

Your cite does not describe a unikernel use case, but instead in kernel helper modules. Your cite is about leveraging in kernel virtual memory for the DB buffer cache, and thus one wonders how sophisticated the VM subsystem is in most unikernels? That is -- how is this argument for unikernels? Seems as though your cite is making the opposite argument -- for a more complex relationship between the DB and the kernel, not a pared down one.

> And then, further, a DB goes and basically implements its own equivalent of a filesystem, managing its own storage. Often fighting with the OS about the semantics of fsync/durability, etc.

The fights you're describing what have thus far been the problems of ceding control of the buffer cache to the kernel, via mmap, especially re: transactional safety.

If your argument is kernels may need a bottom up redesign to make ideas like this work, I suppose that makes sense. However, again, I'm not sure that makes unikernels more of an answer here than anywhere else, though.

> I don't think it's an unreasonable mental leap for people to start thinking: "I'm by necessity [cuz cloud] in a VM. Now I'm inside an OS in a VM, and the OS is sometimes getting in my way, and I'm doing things to get around the OS... Why?"

I think that's a fair thought to have, but the problem is how it actually works in practice. As in, less code seems really enticing, the problem is what abstractions are you throwing away. If the abstraction is less memory protection, maybe this is not a good tradeoff in practice.

torginus

2 months ago

I can see why he would make that argument. When you don't have any process isolation, a software fault means your entire stack is untrustworthy. The network driver, fs driver might be corrupted, so nothing you write to disk or send over the network can be trusted.

You also have to recreate your entire userspace and debugging tools to work in this environment, and testing or even running or debugging your software is also a headache.

keeganpoppen

2 months ago

damn… i am a big fan of bryan and i thought i was a big fan of unikernels… well, i still am, but all the points he makes are absolutely well-founded. i will say, in contraposition to the esteemed (and hilarious) mr. cantrill, that it is quite incredible to get to the end of an article about unikernels without seeing any mention of how the “warmup” time for a unikernel is subsecond whereas the warmup time for, say, containers is… let’s just call it longer than the warmup time for the water i am heating to make some pourover coffee after i finish my silly post. to dismiss this as a profound advantage is to definitely sell the idea more than a little short.

but at the same time i do think it is fair at this juncture to compare the tech to things like wasm, with which unikernels are much more of a direct competitor than containers. it is ironic because i can already hear in my head the hilarious tirade he would unleash about how horrific docker is in every way, debugging especially, but yet somehow this is worse for containers than for unikernels. my view at the present is that unikernels are fantastic for software you trust well enough to compile down to the studs, and the jury is still out on their efficacy beyond that. but holy fuck i seear to god i have spent more time fucking with docker than many professional programmers have spent learning their craft en toto, and i have nothing to show for it. it sucks every time, no matter what. my only gratitude for that experience revolves around (1) saving other peoples’ time on my team (same goes for git, but git is, indisputably, a “good” technology, all things considered, which redeems it entirely), and (2) it motivated me to learn about all the features that linux, systemd, et al. have (chroot jails, namespaces, etc.) in a way that surely exceeds my natural interest level.

ahepp

2 months ago

> the “warmup” time for a unikernel is subsecond whereas the warmup time for, say, containers is… let’s just call it longer than the warmup time for the water i am heating to make some pourover coffee after i finish my silly post. to dismiss this as a profound advantage is to definitely sell the idea more than a little short.

I'm surprised to read that unikernels would start up much faster than containers. It seems like a unikernel needs to do more work (load kernel, and load app), in a more restricted way (hypervisor) than simply loading the app in a cgroup + namespace and letting it rip.

Are you sure this is an apples to apples comparison of similarly optimized images?

nineteen999

2 months ago

> to dismiss this as a profound advantage is to definitely sell the idea more than a little short.

Nah not really what he's saying. He's saying that if you throw out all the security affordances provided by page tables and virtual memory, it outweighs the "profound advantage" (which as he mentions, is arguable anyway since user/kernel context switch is a negligible cost in most modern systems).

You're selling a great deal in order to buy not much. It's a poor tradeoff.

keeganpoppen

a month ago

fair

LarsKrimi

2 months ago

Just to save people from wasting their time reading this drivel:

` If this approach seems fringe, things get much further afield with language-specific unikernels like MirageOS that deeply embed a particular language runtime. On the one hand, allowing implementation only in a type-safe language allows for some of the acute reliability problems of unikernels to be circumvented. On the other hand, hope everything you need is in OCaml! `

ToroKernel is written in freepascal.

All of the text before and after is completely irrelevant

agentifysh

2 months ago

thanks i think there is a lot of nitpicking here but im interested to know how I can use Toro and what advantages and disdvantages there are

mvaralar

2 months ago

Toro is just a library OS that allows you to build an application and deploy it as a VM without the OS. Toro acts as the OS. Different that other unikernels, Toro is not meant to be POSIX compliant. The idea is to provide an API that suits better the use-case, i.e., an app deployed as a VM. Toro can also run in baremetal although I dropped the support a few commits ago. I can roll back that support in case there is interest.

droelf

2 months ago

I've been using unikraft (https://unikraft.org/) unikernels for a while and the startup times are quite impressive (easily sub-second for our Rust application).

ahepp

2 months ago

What drove you to choose that over something like containers?

droelf

2 months ago

Yeah, boot time, isolation (proper VM vs containers), and ease of use on a larger Hetzner box.

ahepp

2 months ago

Did you notice a substantial difference in those factors between more traditional micro VMs that use OCI images (like Firecracker) and unikernels?

m00dy

2 months ago

shorter cold-boot times.

ahepp

2 months ago

If we’re talking about cold boot times, wouldn’t the relevant metric for unikernels be the hypervisor’s boot time?

zozbot234

2 months ago

How would that compare with containers running on Firecracker or other virtio-based μVM's?

wmf

2 months ago

A unikernel on Firecracker is probably going to start faster than a container on Linux on Firecracker.

ahepp

2 months ago

I assume they meant using an OCI image for the rootfs of a firecracker VM, not running a container inside a firecracker VM.

Still difficult to see how the unikernel could be slower, but I doubt the difference would be huge? Don't have anything to back that up though.

ATechGuy

2 months ago

Fast boot up means nothing if your agent/app is slow at runtime (due to virtualization tax or QEMU emulation). Fast boot up is a PR term, which can easily be optimized for compared to designed a better virtualization layer that performs near-bare-metal.

wewtyflakes

2 months ago

Wouldn't faster boot times mean that scale-out can be done on-demand? Whether this is preferable or not over poorer runtime performance is up to the domain, no?

ATechGuy

2 months ago

When scaling out, edge latency will overshadow kernel boot-up times: speeding up boot-up from 1.5s to 150ms will not have any perceived impact on app performance when scaling on edge to meet the demand.

lacoolj

2 months ago

I use LXD + LXC, wondering if this is worth trying or if the overhead of accessing (network, etc) would be too much to deal with/care about.

Also always a little wary of projects that have bad typos or grammar problems in a README - in particular on one of the headings (thought it's possible these are on purpose?). But that's just me :\

mvaralar

2 months ago

Feel free to submit a PR I will happily accept it.

gnabgib

2 months ago

(2020) Currently, seems to have been around since 2011 (https://news.ycombinator.com/item?id=3288786) although at a few different domains (torokernel.org, torokernel.io)

mvaralar

2 months ago

Toro started in 2006 more or less. I lost the .org domain so I had to move to the .io.

cmrdporcupine

2 months ago

It's written in... Pascal...

Neat.

giancarlostoro

2 months ago

My last name is finally on the front page of HN as a project name, look mah!

I was not expecting Pascal, thats an interesting choice. One thing I do like is that Freepascal has one of the better ways of making GUIs meanwhile every other language had decided that just letting Javascript build UIs is the way.

lacoolj

2 months ago

Oh holy crap that's actually super cool. One of the first languages I (tried to) learn ... at 13. And failed.

Now I write Javascript and SQL.

AceJohnny2

2 months ago

yay no C strings!

richardwhiuk

2 months ago

I wonder how it compares to https://mirage.io/

speed_spread

2 months ago

Isn't Mirage OCaml only?

cmrdporcupine

2 months ago

And this one is Pascal. Choose your anachronisms.

As much as I'm nostalgic about Pascal and my childhood... I'd personally prefer OCaml.

mvaralar

2 months ago

As far as you provide the bindings, you can compile it with an application built in any language. There are in the repo a few examples with C programs.

raggi

2 months ago

I don't want the observability of my applications to be bound by themselves, it's kind of a real pain. I'm all for microvm images without excess dependencies, but coupling the kernel and diagnostic tools to rapidly developing application code can be a real nightmare as soon as the sun stops shining.

spacecadet404

2 months ago

What's the use case for this rather than containers? Separation from the hypervisor kernel?

Imustaskforhelp

2 months ago

Containers (docker/podman) are still not as secure as virtualization (qemu,kvm,proxmox)

Plus these might be smaller and might run faster than containers too.

throwaway894345

2 months ago

Smaller than containers seems unlikely since a container doesn't have any kernel at all, while these microvms have to reproduce at least the amount of kernel they would otherwise need (e.g., a networking stack). I'm sure some will be inclined to compare an optimized microvm to an application binary slapped into an Ubuntu container image, but that's obviously apples/oranges.

Faster might be possible without the context switching between kernel and app? And maybe additional opportunities for the compiler to optimize the entire thing (e.g., LTO)?

justatdotin

2 months ago

container can be smaller at rest, but larger at runtime

if you're not sure which you want its probably container

m00dy

2 months ago

yeah it's a fairy tale.

ignoramous

2 months ago

> Separation from the hypervisor kernel?

Not really. Separation from (type 1) hypervisor (or rather distrust of the host [0]) requires hardware support; ex: ARM CCA / AMD SEV-SNP / Intel TDX.

For separation from the supervisor, Android developed a peculiar approach in "pKVM" for ARM where the host (supervisor) is partitioned away from the guest [1].

Both those "separations" is not something Toro provides on its own; the Toro unikernel would totally be under the control of the host, from what I can tell. That said, what Toro (or any unikernel, really) does is reduce the attack surface area, as the (guest) supervisor is pruned to run just one particular application (more code to partition things up will eliminate a class of attacks but may result in new attack vectors [2]).

[0] ex: https://news.ycombinator.com/item?id=44678249

[1] Protected KVM on Arm64: A Technical Deep Dive - Quentin Perret, Google https://www.youtube.com/watch?v=9npebeVFbFw (2023)

[2] Mitigations are attack surface, too https://projectzero.google/2020/02/mitigations-are-attack-su... (2020)

mvaralar

2 months ago

> Both those "separations" is not something Toro provides on its own; the Toro unikernel would totally be under the control of the host, from what I can tell. That said, what Toro (or any unikernel, really) does is reduce the attack surface area, as the (guest) supervisor is pruned to run just one particular application (more code to partition things up will eliminate a class of attacks but may result in new attack vectors [2]).

Toro does not provides that separation. However, I was having some thoughs about running the user app in ring1 to provide some sort of separation whereas the kernel runs in ring0. However, in that case, we may end up in the current user/kernel level separation of general purpose OSs.

eru

2 months ago

It can be much faster, and much smaller surface area for attacks than using a full Linux kernel.

ahepp

2 months ago

Presumably to avoid the cost of context switches or copying between kernel/user address spaces? Looks to be the opposite of userspace networking like DPDK: kernel space application programming.

justatdotin

2 months ago

anywhere you want hard isolation and only a subset of OS. especially multiple instances thereof.

so, generally at the edge (gateways, shims, protocol boundaries)

m00dy

2 months ago

it is using qemu's network stack, would like to know how performant it is.

itsthecourier

2 months ago

reminds me of actors, they are sharing messages between kernels with a bus

file sharing is complex too it seems

would be good to see a benchmark or something showing where it shines

Imustaskforhelp

2 months ago

I think one reason UniKernels can be different are perhaps that they can allow more isolation or run user generated code perhaps inside the Unikernel with proper isolation whereas I don't think actors can do that

agentifysh

2 months ago

Great work. one feedback would be to add Why Toro? to readme.

What use cases would Toro fit? pros and cons ?

mvaralar

2 months ago

I can add that. Thanks!

Alifatisk

2 months ago

So this is like nanos.org ?