Show HN: Cachey, a Read-Through Cache for S3

1 pointsposted 9 hours ago
by shikhar

2 Comments

shikhar

9 hours ago

How we run it:

Auto-scaled Kubernetes deployments, one for each availability zone, currently on m*gd instances which give us local NVMe. The pods are able to easily push GiBps with 1-2 CPUs used — network is the bottleneck so we made it a scaling dimension (thanks KEDA).

On the client side, each gateway process uses kube.rs to watch ready endpoints in the same zone as itself, and frequently polls /stats exposed by Cachey for recent network throughput as a load signal.

To improve hit rates with key affinity, clients use rendezvous hashing for picking a node, with bounded load (https://arxiv.org/abs/1608.01350) – if a node exceeds a predetermined throughput limit, the next choice for the key is picked.

We may move towards consistent hashing – it would be a great problem to have, if we needed so many Cachey pods in a zone that O(n) hashing was meaningful overhead! An advantage with the current approach is it does not suffer from the cascaded overflow problem (https://arxiv.org/abs/1908.08762).

whyandgrowth

9 hours ago

To be honest: for use as a local cache/S3 accelerator for large files – it’s fine. The API is simple but flexible. The only point is that the documentation is in English, and you need to understand how “hedged fe” works.