Yes, it’s the ephemerality that’s the biggest issue. Enterprise-grade SSDs are quite reliable, and typically have PLP so even in the event of a sudden power loss, any queued writes that the drive has accepted - and thus ack’d the fsync() - will be written. Presumably you’d be running some kind of redundancy, likely some flavor of RAID or zRAID (assuming purely local storage here, not a distributed system like Ceph, nor synchronous replication).
But in the cloud, if the physical server backing your instance dies, or even if someone accidentally issues a shutdown command, you don’t get that same drive back when the new instance comes up. So a problem that is normally solved by basic local redundancy suddenly becomes impossible, and thus you must either run synchronous or semi-sync replication (the latter is what PlanetScale Metal does), accepting the latency hit from distributed storage, or asynchronous replication and accept some amount of data loss, which is rarely acceptable.
Agreed on these trade offs. We do both synchronous and semi-synchronous depending on Postgres or MySQL.
... sounds like a trivial job for bare metal instances
and that EC2 local NVMe encryption keys are ephemeral is nice against leaks, but not a necessity for other clouds (and not great for resumability, which can really downgrade business continuity scores), and I expect for all the money they ask for it, to be able to keep it relatively secure even across reboots
Or even a bare metal simple server that just does databases with redundant nvme ssd
Databases like Postgres have well established ways to handle that. And if you're setting up the DB yourself, you absolutely need to do backups anyway. And a replica on a different server.
Backups don't alleviate durability concerns. Read replicas(async) neither.
I think only way it could work was if I implemented sync replication like planetscale, but that arduous.
On some providers (e.g. Hetzner), the dedicated servers come by default with 2x RAID 1 disks, so it's a lot less likely to fail (unless the datacenter burns down).
You have a call from France, some company called OVH on the line!
And your backup goes up in flames too.
I would never ever trust OVH with any important data or servers, I mean we saw how they secured their datacenters where it took 3h to cut the power while the datacenter was burning.
Yes, a single disk in a VPS or cloud provider has durability concerns. That's why EBS and products like it that pretend to be a single disk are actually several. Instead of relying on multiple block devices, though, we create that redundancy at a higher level by relying on multiple MySQL or Postgres servers for durability, each with a local NVMe drive for performance.
Sure. Till an extent. And if you run some mission-critical application, definitely.
But most applications run fine from local storage and can tolerate some downtime. They might even benefit from the improved performance. You can also fix the durability and disaster recovery concerns by setting up on RAID/ZFS and maintaining proper backups.
yeh planetscale loves to flex how fast they are but the main reason they are fast is because they run a full abstraction less than any other cloud provider and this does in fact have trade-offs.
What is wrong with running without lots of abstractions? We are clear about the downsides. The results are clear, you can see the customers love it. We run insane amounts of state safely on ephemeral compute. It's a flex. All I've seen from Timescale people is qqing. Write some code or be quiet.
I'm not criticizing your engineering approach at all. Running everything in one box has its merits as your benchmarks show but it is also just not apples to apples there are other trade-offs and I am just appreciating that the community calls that out.
Also hey this is HN not Twitter I think we can be a bit more civilized. Not a good look imo for a CEO to get that upset over a harmless comment.
We run 3 nodes not 1. Your comment is not in isolation we get constant shade from Timescale people when we don't even think about you.
Using a single disk has durability concerns. But I don't see why VPS vs dedicated server should matter much.
RAID isn't the answer, either, for the record. In AWS and GCP, the CPU or RAM blowing up will cost you access to that local NVMe drive, too, no matter how much RAID you throw at it.
we have mitigated the durability concerns in multiple ways.