kgeist
13 hours ago
From what I understand, the complexity stays there, it's just moved from the DB layer to the app layer (now I have to decide how to shard data, how to reshard, how to synchronize data across shards, how to run queries across shards without wildly inconsistent results), so as I developer I have more headaches now than before, when most of that was taken care of by the DB. I don't see why it's an improvement.
The author also mentions B2B and I'm not sure how it's going to work. I understand B2C where you can just say "1 user=1 single-threaded shard" because most user data is isolated/independent from other users. But with B2B, we have accounts ranging from 100 users per organization to 200k users per organization. Something tells me making a 200k account single-threaded isn't a good idea. On the other hand, artificially sharding inside an organization will lead to much more complex queries overall too, because usually a lot of business rules require joining different users' data within 1 org.
n2d4
12 hours ago
It's a different kind of complexity. Essentially, your app layer needs shift from:
- transaction serializability
- atomicity
- deadlocks (generally locks)
- occ (unless you do VERY long tx, like a user checkout flow)
- retries
- scale, infrastructure, parameter tuning
towards thinking about - separating data into shards
- sharding keys
- cross-shard transactions
which can be sometimes easier, sometimes harder. I think there are a surprising amount of problems where it's much easier to think about sharding than about race conditions!> But with B2B, we have accounts ranging from 100 users per organization to 200k users per organization.
You'd be surprised at how much traffic a single core (or machine) can handle — 200k users is absolutely within reach. At some point you'll need even more granular sharding (eg. per user within organization), but at that point, you would need sharding anyways (no matter your DB).
bawolff
11 hours ago
If you have to think about cross-shard transactions then you have to think about all the things on your first list too, as they are complexities related to transaction. I fail to see how that could possibly be simpler.
n2d4
10 hours ago
Cross-shard transactions are only a tiny fraction of transactions — if the complexities of dealing with that is constrained to some transactions instead of all of them, you're saving yourself a lot of headaches.
Actually, I'd argue a lot of apps can do entirely without cross-shard transactions! (eg. sharding by B2B orgs)
user
12 hours ago
whizzter
12 hours ago
Yeah, mgmt (and more than anything, query tools) is gonna be a PITA.
But looking at it in a different way, say building something like Google Sheets.
One could place user-mgmt in one single-threaded database (Even at 200k users you probably don't have too many concurrently modifying administrators) whilst "documents" gets their own database. I'm prototyping one such "document" centric tool and the per-document DB thinking has come up, debugging users problems could be as "simple" as cloning a SQLite file.
Now on the other hand if it's some ERP/CRM/etc system with tons of linked data that naturally won't fly.
Tool for the job.