shikhar
9 hours ago
How we run it:
Auto-scaled Kubernetes deployments, one for each availability zone, currently on m*gd instances which give us local NVMe. The pods are able to easily push GiBps with 1-2 CPUs used — network is the bottleneck so we made it a scaling dimension (thanks KEDA).
On the client side, each gateway process uses kube.rs to watch ready endpoints in the same zone as itself, and frequently polls /stats exposed by Cachey for recent network throughput as a load signal.
To improve hit rates with key affinity, clients use rendezvous hashing for picking a node, with bounded load (https://arxiv.org/abs/1608.01350) – if a node exceeds a predetermined throughput limit, the next choice for the key is picked.
We may move towards consistent hashing – it would be a great problem to have, if we needed so many Cachey pods in a zone that O(n) hashing was meaningful overhead! An advantage with the current approach is it does not suffer from the cascaded overflow problem (https://arxiv.org/abs/1908.08762).