What are your worst on-call stories?

11 pointsposted 10 hours ago
by holtonbeeps

Item id: 41602252

2 Comments

ezekg

9 hours ago

I wrote about my worst on-call experience here [0]. Back in Feb of this year, I had a unique customer workload take down my SaaS in the middle of the night, 2 nights in a row. It took a long time to find the root cause, but it ended up being an inefficient uniqueness lib I was using for background jobs. This particular customer's workload was queuing up millions of background jobs in Redis at a certain time every day, but each time a job was queued, the entire job set was iterated (synchronously) to assert uniqueness. Obviously, this didn't scale.

I'd rank these 2 nights as the most stressful times of my career. I wrestled with sleeplessness, hopelessness, imposter syndrome, etc.

[0]: https://keygen.sh/blog/that-one-time-keygen-went-down-for-5-...

aristofun

9 hours ago

After half year in production finally some malicious user hit my code with injection that crumbled the hi throughput system.

This was surprising for 2 reasons:

1. That nobody came up with the injection for so long.

2. That the injection was pretty silly and non obvious (kinda explains first point lol). Not your typical sql/js/css etc. it had something to do with localization and the cumbersome way it was implemented in some java libraries