Webhooks Deep Dive
Webhook delivery internals — retry policy, signature verification edge cases, auto-disable behavior, and scaling guidance. For the basic setup and signature-verification recipe, see Webhooks (guide).
Delivery flow#
When ScaiSend needs to emit an event:
- Event is created. The SMTP service (delivered/bounce/deferred) or the Worker (processed/dropped) or the tracking API (open/click/unsubscribe) writes a
webhook_deliveriesrow for each subscribed endpoint. - Delivery queue picks it up. An arq worker consumes the delivery queue and makes the HTTP request.
- Response is evaluated. 2xx within 30 seconds is a success. Anything else fails.
- Retries scheduled. Failed deliveries schedule a retry with exponential backoff. ScaiSend persists the delivery row; retries survive service restarts.
Retry schedule#
| Attempt | Delay after previous |
|---|---|
| 2 | 60 seconds |
| 3 | 300 seconds (5 min) |
| 4 | 900 seconds (15 min) |
| 5 | 3600 seconds (1 hour) |
| 6 | 7200 seconds (2 hours) |
Total elapsed time from first send to final retry: about 4 hours. After attempt 6, delivery is marked FAILED and not retried further.
Individual deliveries don't disable the endpoint; the endpoint's failure_count tracks consecutive failures across deliveries. Once it hits 10, the endpoint is auto-disabled:
1 2 3 4 5 6 | |
Re-enable explicitly after fixing the underlying problem:
1 2 3 4 | |
At-least-once semantics#
ScaiSend guarantees at-least-once delivery. In practice that means:
- A single event might be delivered multiple times. If your handler returns 2xx but the response is lost in transit (connection reset after the TCP ack but before the HTTP response), ScaiSend retries. You'll see the same
event_idagain. - Your handler must be idempotent. Use
event_idas the dedupe key. A simple RedisSETNXwith a 7-day TTL works for most volumes.
1 2 3 4 5 6 7 8 | |
Ordering#
No guarantees. Events for the same message can arrive out of order. Specifically:
- A
deliveredcan arrive before aprocessedif the processed delivery is retrying and the delivered delivery succeeds on first try. openevents can arrive days afterdelivered— that's normal (the recipient got the mail days ago, opened now).- After a retry, late arrivals are common.
Don't rely on arrival order. Use the timestamp field on the payload to reconstruct the actual sequence if you care. For most use cases, you don't — you care about the latest status, which you can query from /v3/messages/{id}.
Signature verification edge cases#
Clock skew#
The X-ScaiSend-Timestamp header is the Unix timestamp when ScaiSend computed the signature. Compare against your server's wall clock, allowing ~5 minutes of skew:
1 2 | |
NTP-synchronized servers typically skew under a second. Five minutes is generous; tighter if your infrastructure is clean.
Replay attacks#
The timestamp rejection is what prevents replay. A captured signed request is a valid HMAC forever — but it expires from your acceptance window in 5 minutes.
Rotation gap#
When you rotate a signing secret, the old secret stops validating immediately. Events in-flight (already queued with the old signature) will fail verification on your side.
Solution: during a rotation, accept either signature for a short grace period:
1 2 3 4 5 6 7 8 9 10 | |
After 10 minutes (longer than the longest expected in-flight retry window for recent events), stop accepting the old secret.
Signature on a body your framework mutates#
Some web frameworks parse and re-serialize JSON before your handler runs. If the re-serialization differs (key order, whitespace), the HMAC won't verify.
Fix: verify against the raw request body bytes, not the parsed-and-serialized JSON. In Express, use express.raw(); in FastAPI, read with await request.body() before parsing.
OAuth2 authentication#
If your webhook endpoint sits behind an OAuth2 flow, register credentials with the endpoint:
1 2 3 4 5 6 7 8 9 | |
ScaiSend does a client-credentials grant to your token endpoint before each delivery (caching the token until close to expiry), then passes it as Authorization: Bearer <token> on the delivery request. This is optional; most setups use the signature-only model.
Scale#
Typical webhook volume:
- Per sent message: 2–4 events (
processed,delivered, maybeopen, maybeclick). Sometimes more (deferred → delivered, or multiple clicks). - Burst: a marketing send to 100k recipients produces ~400k events clustered within a few minutes.
Your endpoint should handle 10× sustained peak without degrading. If your normal load is 100 req/sec, aim for a comfortable 1000 req/sec ceiling before response times start climbing.
Scaling patterns#
- Respond fast, process async. Accept the webhook, enqueue to your own worker queue, return 200. Don't do DB writes on the synchronous path.
- Batch in your consumer. If you're writing to an analytics system, batch inserts (50–100 events per insert) rather than one row per event.
- Consider a dedicated webhook fleet. Scale horizontally; don't share the webhook endpoint with your main API.
Diagnosing delivery failures#
Inspect an endpoint's recent history:
1 2 | |
Fields:
| Field | Meaning |
|---|---|
last_success_at |
Last 2xx response received |
last_failure_at |
Last delivery that didn't get 2xx |
failure_count |
Consecutive failures since last success |
disabled_at |
Auto-disabled after 10 failures |
If failure_count is climbing, check:
- Is the URL correct? A typo deploys a trivially fixable problem.
- Is the endpoint reachable from ScaiSend? Firewall, load balancer, DNS.
- Is TLS valid? ScaiSend verifies TLS certificates. A self-signed cert or mismatched hostname will fail.
- Is your handler returning 2xx? Any redirect (
3xx) is treated as failure. Any 4xx/5xx is a failure. - Is your handler fast enough? > 30 seconds is a timeout, counted as failure.
Logs on your side (with the X-ScaiSend-Event header logged) are the fastest way to diagnose.
Testing your webhook handler#
Use a test API key and send mail to yourself:
1 2 3 4 | |
Test keys only produce a processed event (no actual delivery, so no delivered or bounce). To test the full event lifecycle, use a live key with sandbox_mode.enable: false and send to an address you control — you'll see the full sequence.