hackernews client

blundergoat

10 hours ago

We treat webhooks as at-least-once delivery over an unreliable transport and design for duplicates and out-of-order events.

A few rules that have saved us:

- Persist before responding. Never process inline. Write payload to DB, return 200 fast.

- Idempotency key required. Either provider event ID or hash the payload.

- Async worker processes from queue. Exponential backoff + max attempts.

- Dead letter queue + dashboard. Humans need visibility.

- Alert on backlog growth, not single failures. One failure is noise. A growing retry queue is signal.

- Relying on provider retries alone has bitten us more than once.

Thank you so much for tips! I was feeling nervous about relying on provider retires as well. I especially like the idea of alerting on backlog growth. There's nothing I hate more than a bunch of emails and notifications!

chickensong

8 hours ago

This was a nice goat exchange

JacobArthurs

9 hours ago

We receive the webhook, return 200 immediately, and push the payload to a message queue for processing. That way you own the retry logic, can inspect stuck messages, and DLQ alerts handle repeated failures automatically.

Idempotency becomes your responsibility, though, since messages can be delivered more than once.

toomuchtodo

10 hours ago

Have you checked out https://svix.com? No affiliation, I just like the product. Might also check out https://www.standardwebhooks.com/

GoatPerfect

9 hours ago

I just checked them out! Looks like it would make handling failures a breeze!

Ask HN: How do you monitor and retry failed webhooks in production?

6 Comments

blundergoat

GoatPerfect

chickensong

JacobArthurs

toomuchtodo

GoatPerfect