Troubleshooting

Common failure modes and the fastest way to diagnose them. Symptoms in bold; likely causes and fixes underneath.

Mail is accepted (202) but never delivers#

Symptoms: /v3/mail/send returns 202, but the message sits in QUEUED or PROCESSING forever. No events arrive.

Check:

Worker service alive? systemctl status scaisend-worker or the equivalent. Restart if dead.
Redis reachable? redis-cli ping.
Queue depth. redis-cli LLEN email:process. If non-zero and growing, workers aren't keeping up (scale) or are deadlocked (restart).
Worker logs. Look for render_error, database_error, or stack traces.

Most common fix: the Worker service died silently. The API still accepts; nothing consumes. Restart.

Messages stay in `PROCESSING` indefinitely#

Symptoms: Messages enter PROCESSING but never progress to RENDERED or SENDING.

Check:

Template errors. GET /v3/messages/{id} — check error_message. If it's a template render failure, fix the template or delete the offending send.
Worker crash mid-render. The worker was handling the job, died, and the job record stayed in PROCESSING. Use POST /v3/messages/{id}/retry to requeue.

All sends failing with 403 "Sender domain not verified"#

Symptoms: Every POST /v3/mail/send returns 403 with "Sender domain not verified" even though you previously verified.

Check:

Is the DNS still published? Query scaisend._domainkey.<yourdomain> — does it resolve?
Did someone rotate DKIM and not re-verify? Check GET /api/admin/domains/{id}. If verified: false, run POST /api/admin/domains/{id}/verify.
Did a PATCH set is_active: false? Re-activate.

Bounces climbing above 2%#

Symptoms: GET /v3/stats/sum shows bounces / requests > 0.02.

Check:

Sample recent bounces. GET /v3/suppression/bounces?limit=50&start_time=<1h ago>. Look at reason:
- Mostly 5.1.1 User unknown → list quality problem. Your list has many invalid addresses.
- Mostly 5.7.* → reputation problem. ISPs are rejecting you.
- Mostly 5.7.26 DMARC failure → authentication misconfig. Run POST /api/admin/domains/{id}/verify and fix any failures.
Check recent signup flows. Was there a spike of obvious garbage addresses? Bot signups, typos?
If reputation-driven, cool down sends. Slow marketing sends; prioritize transactional (which ISPs evaluate separately, usually more charitably). Give yourself a week before sending the next big batch.

Spam report rate > 0.1%#

Symptoms: Recipients are hitting "Report spam" in large numbers.

Check:

Recent campaigns. Was there a send to a list that might not have been opt-in? An imported list that's older than 6 months? A broad-reach campaign that hit dormant users?
Unsubscribe flow. Is it obvious? Is the List-Unsubscribe header present (you can see it in any received message via "Show original")? Is one-click working?
Suspend the offending campaign. Cancel remaining queued messages in the batch (POST /v3/messages/{id}/cancel for each).

Webhook endpoint gets disabled#

Symptoms: GET /v3/user/webhooks/{id} shows disabled_at set.

Check:

Endpoint returning 2xx? Test with curl -X POST <url> -d '{}' to confirm it answers. 404s from typos are the most common cause.
Endpoint too slow? 30-second timeout. If your handler takes more than ~10s on average, you're flirting with timeouts during retries.
TLS valid? Certificate expired, mismatched hostname, self-signed cert that ScaiSend won't accept.
DNS change? Your endpoint hostname moved and the old IP stopped answering.

Fix the cause, then PATCH /v3/user/webhooks/{id} with {"enabled": true}.

Outbound mail rejected by all recipients#

Symptoms: Every message bounces with varied reputation-related 5.7.* codes.

Check:

PTR records on your outbound IPs. dig -x <IP> — does it resolve? Is the resolved name's forward record the same IP (FCrDNS)? If not, fix with your IP provider. This is the #1 cause of mysterious reputation drops.
RBL listings. Check your outbound IPs against Spamhaus, SpamCop, Barracuda. MXToolbox Blacklists is a quick survey.
Rapid volume ramp-up. If you went from 0 to 100k/day overnight, ISPs flag that as suspicious. Warm up IPs with gradually increasing volume.
Content issues. Some content patterns (excessive links, spammy phrases, bare IPs) tank deliverability. Test a sample message through mail-tester.com.

"Missing required scope" on unexpected endpoints#

Symptoms: Getting 403 with Missing required scope: X on an endpoint that previously worked.

Check:

Did someone rotate the API key? If the new key was created with a smaller scope set, it can't do what the old key could. Check GET /v3/api_keys/{id}.
Did admin change role permissions? User roles are editable. A permission may have been removed from the user's role.
Is the user part of the expected group? If group-to-role mapping is your primary RBAC mechanism, a group membership change in ScaiKey might have removed the permission.

GET /v3/auth/me shows the caller's effective permissions. Compare against what the endpoint needs.

DNS verification stuck at `verified: false`#

Symptoms: POST /api/admin/domains/{id}/verify keeps returning verified: false even though you believe the records are published.

Check:

Wait for DNS TTL. DNS propagates at the pace of the slowest cache in the chain. Even a "small TTL" provider takes minutes. Try again in 10 minutes.
Query DNS the same way ScaiSend does. dig scaisend._domainkey.<domain> TXT. Compare the returned value against what GET /api/admin/domains/{id}/dns-records tells you to publish. Look for:
- Trailing or leading whitespace in the DNS value. Some providers strip or pad.
- Line breaks. Long TXT records may be split across multiple DNS strings; ScaiSend joins them, but some DNS servers concatenate with spaces. Check the raw query result.
- Wrong base64. Copy-paste errors are common; check a few characters from the beginning and end.
Multiple TXT records. If you have an existing SPF record, adding a second TXT can cause verification to see the wrong one. Merge SPF records into a single v=spf1 ... entry.

Inbound SMTP server not receiving DSNs#

Symptoms: Messages sometimes bounce silently — the upstream MX accepts but the message never delivers, and no bounce event appears in ScaiSend.

Check:

Port 25 reachable from the internet? telnet <your-smtp-host> 25 from outside your network. If it refuses, check firewall rules.
Forward-confirmed reverse DNS. dig -x <inbound-ip> should return your hostname; dig <hostname> should return the same IP.
SMTP service running? systemctl status scaisend-smtp.
Correct envelope sender on outbound messages? ScaiSend puts a distinctive envelope-from that routes bounces back to its inbound server. If you've overridden this somehow, bounces go elsewhere.

Admin UI can't log in#

Symptoms: OAuth redirect lands on an error page.

Check:

ScaiKey reachable from the browser? The admin UI initiates OAuth against ScaiKey's URL; the browser must be able to reach it.
ADMIN_URL correctly configured? The redirect URI registered with ScaiKey must match ADMIN_URL exactly. Mismatch → "invalid redirect_uri" from ScaiKey.
JWKS endpoint reachable from ScaiSend? ScaiSend fetches JWKs at JWT validation. A blocked egress to ScaiKey means no login works server-side.

Statistics look wrong#

Symptoms: /v3/stats counts don't match what you think happened.

Check:

Stats are aggregated on a daily-batch basis. The current day may not be fully reflected until the next rebuild. For up-to-the-minute numbers, query /v3/messages directly and count.
Sandbox messages don't count. If you've been testing with a test key, those messages are excluded from stats.
Rebuild if suspicious. POST /v3/stats/rebuild replays events from source. Requires stats.export.

Redis queue growing without bound#

Symptoms: redis-cli LLEN smtp:deliver climbs monotonically.

Check:

SMTP service alive? If the consumer is dead, the queue fills.
Outbound IP blocked? If every send is being refused at the network layer (firewall, ISP blocking port 25), the SMTP service loops retrying without making progress. Check logs for connection-refused patterns.
Rate-limited by a specific recipient ISP? One domain's MX saying "slow down" can back up the queue if you have a lot of mail to that domain. Check retry counts — if one domain dominates, you're seeing a reputation issue with that specific ISP.

Temporarily drain by increasing SMTP service replicas. Long-term fix depends on the root cause.

Still stuck?#

Request IDs. Every API response carries X-Request-ID. Include it when filing a support ticket.
Message IDs. For delivery issues, the message_id from the send response is the key identifier. Pair it with the request ID.
Log context. Ship structured logs to a searchable store so grep request_id=... across all three services is possible.

Deployment — the architecture being troubleshot.
Health and Monitoring — the signals that should have warned you.
Bounce Handling — deep dive on delivery failures.

Troubleshooting

Mail is accepted (202) but never delivers#

Messages stay in PROCESSING indefinitely#

All sends failing with 403 "Sender domain not verified"#

Bounces climbing above 2%#

Spam report rate > 0.1%#

Webhook endpoint gets disabled#

Outbound mail rejected by all recipients#

"Missing required scope" on unexpected endpoints#

DNS verification stuck at verified: false#

Inbound SMTP server not receiving DSNs#

Admin UI can't log in#

Statistics look wrong#

Redis queue growing without bound#

Still stuck?#

Related#

Messages stay in `PROCESSING` indefinitely#

DNS verification stuck at `verified: false`#