Troubleshooting
Common failure modes and the fastest way to diagnose them. Symptoms in bold; likely causes and fixes underneath.
Mail is accepted (202) but never delivers#
Symptoms: /v3/mail/send returns 202, but the message sits in QUEUED or PROCESSING forever. No events arrive.
Check:
- Worker service alive?
systemctl status scaisend-workeror the equivalent. Restart if dead. - Redis reachable?
redis-cli ping. - Queue depth.
redis-cli LLEN email:process. If non-zero and growing, workers aren't keeping up (scale) or are deadlocked (restart). - Worker logs. Look for
render_error,database_error, or stack traces.
Most common fix: the Worker service died silently. The API still accepts; nothing consumes. Restart.
Messages stay in PROCESSING indefinitely#
Symptoms: Messages enter PROCESSING but never progress to RENDERED or SENDING.
Check:
- Template errors.
GET /v3/messages/{id}— checkerror_message. If it's a template render failure, fix the template or delete the offending send. - Worker crash mid-render. The worker was handling the job, died, and the job record stayed in
PROCESSING. UsePOST /v3/messages/{id}/retryto requeue.
All sends failing with 403 "Sender domain not verified"#
Symptoms: Every POST /v3/mail/send returns 403 with "Sender domain not verified" even though you previously verified.
Check:
- Is the DNS still published? Query
scaisend._domainkey.<yourdomain>— does it resolve? - Did someone rotate DKIM and not re-verify? Check
GET /api/admin/domains/{id}. Ifverified: false, runPOST /api/admin/domains/{id}/verify. - Did a PATCH set
is_active: false? Re-activate.
Bounces climbing above 2%#
Symptoms: GET /v3/stats/sum shows bounces / requests > 0.02.
Check:
- Sample recent bounces.
GET /v3/suppression/bounces?limit=50&start_time=<1h ago>. Look atreason:- Mostly
5.1.1 User unknown→ list quality problem. Your list has many invalid addresses. - Mostly
5.7.*→ reputation problem. ISPs are rejecting you. - Mostly
5.7.26 DMARC failure→ authentication misconfig. RunPOST /api/admin/domains/{id}/verifyand fix any failures.
- Mostly
- Check recent signup flows. Was there a spike of obvious garbage addresses? Bot signups, typos?
- If reputation-driven, cool down sends. Slow marketing sends; prioritize transactional (which ISPs evaluate separately, usually more charitably). Give yourself a week before sending the next big batch.
Spam report rate > 0.1%#
Symptoms: Recipients are hitting "Report spam" in large numbers.
Check:
- Recent campaigns. Was there a send to a list that might not have been opt-in? An imported list that's older than 6 months? A broad-reach campaign that hit dormant users?
- Unsubscribe flow. Is it obvious? Is the
List-Unsubscribeheader present (you can see it in any received message via "Show original")? Is one-click working? - Suspend the offending campaign. Cancel remaining queued messages in the batch (
POST /v3/messages/{id}/cancelfor each).
Webhook endpoint gets disabled#
Symptoms: GET /v3/user/webhooks/{id} shows disabled_at set.
Check:
- Endpoint returning 2xx? Test with
curl -X POST <url> -d '{}'to confirm it answers. 404s from typos are the most common cause. - Endpoint too slow? 30-second timeout. If your handler takes more than ~10s on average, you're flirting with timeouts during retries.
- TLS valid? Certificate expired, mismatched hostname, self-signed cert that ScaiSend won't accept.
- DNS change? Your endpoint hostname moved and the old IP stopped answering.
Fix the cause, then PATCH /v3/user/webhooks/{id} with {"enabled": true}.
Outbound mail rejected by all recipients#
Symptoms: Every message bounces with varied reputation-related 5.7.* codes.
Check:
- PTR records on your outbound IPs.
dig -x <IP>— does it resolve? Is the resolved name's forward record the same IP (FCrDNS)? If not, fix with your IP provider. This is the #1 cause of mysterious reputation drops. - RBL listings. Check your outbound IPs against Spamhaus, SpamCop, Barracuda. MXToolbox Blacklists is a quick survey.
- Rapid volume ramp-up. If you went from 0 to 100k/day overnight, ISPs flag that as suspicious. Warm up IPs with gradually increasing volume.
- Content issues. Some content patterns (excessive links, spammy phrases, bare IPs) tank deliverability. Test a sample message through
mail-tester.com.
"Missing required scope" on unexpected endpoints#
Symptoms: Getting 403 with Missing required scope: X on an endpoint that previously worked.
Check:
- Did someone rotate the API key? If the new key was created with a smaller scope set, it can't do what the old key could. Check
GET /v3/api_keys/{id}. - Did admin change role permissions? User roles are editable. A permission may have been removed from the user's role.
- Is the user part of the expected group? If group-to-role mapping is your primary RBAC mechanism, a group membership change in ScaiKey might have removed the permission.
GET /v3/auth/me shows the caller's effective permissions. Compare against what the endpoint needs.
DNS verification stuck at verified: false#
Symptoms: POST /api/admin/domains/{id}/verify keeps returning verified: false even though you believe the records are published.
Check:
- Wait for DNS TTL. DNS propagates at the pace of the slowest cache in the chain. Even a "small TTL" provider takes minutes. Try again in 10 minutes.
- Query DNS the same way ScaiSend does.
dig scaisend._domainkey.<domain> TXT. Compare the returned value against whatGET /api/admin/domains/{id}/dns-recordstells you to publish. Look for:- Trailing or leading whitespace in the DNS value. Some providers strip or pad.
- Line breaks. Long TXT records may be split across multiple DNS strings; ScaiSend joins them, but some DNS servers concatenate with spaces. Check the raw query result.
- Wrong base64. Copy-paste errors are common; check a few characters from the beginning and end.
- Multiple TXT records. If you have an existing SPF record, adding a second TXT can cause verification to see the wrong one. Merge SPF records into a single
v=spf1 ...entry.
Inbound SMTP server not receiving DSNs#
Symptoms: Messages sometimes bounce silently — the upstream MX accepts but the message never delivers, and no bounce event appears in ScaiSend.
Check:
- Port 25 reachable from the internet?
telnet <your-smtp-host> 25from outside your network. If it refuses, check firewall rules. - Forward-confirmed reverse DNS.
dig -x <inbound-ip>should return your hostname;dig <hostname>should return the same IP. - SMTP service running?
systemctl status scaisend-smtp. - Correct envelope sender on outbound messages? ScaiSend puts a distinctive envelope-from that routes bounces back to its inbound server. If you've overridden this somehow, bounces go elsewhere.
Admin UI can't log in#
Symptoms: OAuth redirect lands on an error page.
Check:
- ScaiKey reachable from the browser? The admin UI initiates OAuth against ScaiKey's URL; the browser must be able to reach it.
ADMIN_URLcorrectly configured? The redirect URI registered with ScaiKey must matchADMIN_URLexactly. Mismatch → "invalid redirect_uri" from ScaiKey.- JWKS endpoint reachable from ScaiSend? ScaiSend fetches JWKs at JWT validation. A blocked egress to ScaiKey means no login works server-side.
Statistics look wrong#
Symptoms: /v3/stats counts don't match what you think happened.
Check:
- Stats are aggregated on a daily-batch basis. The current day may not be fully reflected until the next rebuild. For up-to-the-minute numbers, query
/v3/messagesdirectly and count. - Sandbox messages don't count. If you've been testing with a test key, those messages are excluded from stats.
- Rebuild if suspicious.
POST /v3/stats/rebuildreplays events from source. Requiresstats.export.
Redis queue growing without bound#
Symptoms: redis-cli LLEN smtp:deliver climbs monotonically.
Check:
- SMTP service alive? If the consumer is dead, the queue fills.
- Outbound IP blocked? If every send is being refused at the network layer (firewall, ISP blocking port 25), the SMTP service loops retrying without making progress. Check logs for connection-refused patterns.
- Rate-limited by a specific recipient ISP? One domain's MX saying "slow down" can back up the queue if you have a lot of mail to that domain. Check retry counts — if one domain dominates, you're seeing a reputation issue with that specific ISP.
Temporarily drain by increasing SMTP service replicas. Long-term fix depends on the root cause.
Still stuck?#
- Request IDs. Every API response carries
X-Request-ID. Include it when filing a support ticket. - Message IDs. For delivery issues, the
message_idfrom the send response is the key identifier. Pair it with the request ID. - Log context. Ship structured logs to a searchable store so
grep request_id=...across all three services is possible.
Related#
- Deployment — the architecture being troubleshot.
- Health and Monitoring — the signals that should have warned you.
- Bounce Handling — deep dive on delivery failures.