Troubleshooting
A short list of things that go wrong with ScaiEcho and how to fix them. If none of these match, check the request id in the response envelope and grep the ScaiGrid logs for it.
POST /transcribe returns 502 SCAIECHO_BACKEND_UNAVAILABLE#
Your tenant policy pinned the request to a backend that isn't usable.
- Pinned to A, no online STT node. Either wait for ops to bring an STT-loaded ScaiInfer node back online, or
PATCH /tenant-policyto allow B. - Pinned to B, relay API key not configured. Operator misconfiguration. Check the managed STT relay credentials in the deployment settings.
backend_preference=prefer_self_hostedagainst an AB policy with no A node. The resolver falls through to B in that case; if you got the error you actually pinned A. Drop the preference toany.
Async job stuck at queued for minutes#
- No arq worker running. Check the
workermode service is up; without it, jobs never leave the queue. - Worker backlog. Long backlogs are visible in the admin UI's transcription dashboard. Scale the worker pool out or raise the inline threshold so more requests bypass async.
- S3 unreachable from the worker. The worker fetches the staged audio before dispatch; check the S3 endpoint is reachable from worker pods.
Async job ends at failed#
The status_reason field tells you why. Common reasons:
- "backend unavailable" — same diagnosis as the sync 502 above; the worker hit it at dispatch time.
- "decode failure" — the audio was not actually a format the dispatcher could parse. Often a
Content-Typemismatch — re-upload with the correct mime. - "quota exceeded" — tenant budget was hit between enqueue and dispatch. Raise the budget or wait for the next period.
WebSocket closes with code 4401 immediately#
Bearer token missing or invalid.
- Did you pass
?token=...on the URL or setAuthorization: Bearer ...? FastAPI dependencies don't fire before WS accept, so the route does its own check. - Is the token an API key (
sgk_...) or a JWT? Both are accepted; anything else is rejected.
WebSocket closes with code 4403#
You have a valid token but lack the required module permission.
- For
/stream/transcribe: missingscaiecho:transcribe. - After sending an
openwithdiarize: true: missingscaiecho:diarize. The server sends anerrorJSON frame before closing.
Grant the permission via a custom role (see Permissions).
WebSocket opens but no delta frames arrive#
- Wrong
media_type. Default isaudio/wav. If you're sending Opus, MP3, or anything else, declare it on theopenframe. - Audio frames not being sent as binary. The route treats text frames as control. If your client library coerces bytes to base64 strings, switch it to binary.
chunk_secondstoo large. On Backend B, the dispatcher accumulates this much audio before relaying; deltas don't appear until the first chunk is sent. Drop to2.0for snappier feedback at the cost of more API calls.- Backend A picked but the node disappeared. The dispatcher may be stuck mid-bidi. Close the WS and reopen; the resolver picks again.
Diarize=true requested but deltas have no speaker_label#
Either the dispatcher can't run diarization right now, or the segment doesn't match any enrolled profile.
- Backend B selected. The managed STT relay does not expose diarization;
diarizeis silently a no-op. Push tenant policy toward A or accept unlabelled output. - No pyannote node online. Required engine is
audio.analyze.pyannote.GET /v1/admin/nodesand look for nodes with this engine inmodels_loaded. - Speakers not yet enrolled on the running node. Check
GET /speakers/{id}/warm. If the candidate node isn't inwarm_node_ids, fan out viaPOST /speakers/{id}/warm. - Unknown speaker. Segments from speakers not in the library get a per-session cluster id (
spk_0,spk_1), not the enrolled label.
Speaker enrollment returns SCAIECHO_SPEAKER_PREFLIGHT_FAILED#
The reference audio didn't pass quality checks. The error.preflight field on the response details what failed — typically too short, too much silence, sample rate out of range, or signal-to-noise ratio too low. Re-record with a cleaner sample and resubmit.
Enrollment succeeds but enrollment_status stays pending#
No online audio.analyze.pyannote node when the request landed. The response includes a note saying so. Once a node lands, run POST /speakers/{id}/warm to fan out — the status flips to ready when at least one node accepts.
DELETE /speakers/{id} reports errors per node#
The audit row was still written and the speaker is tombstoned, but one or more replicas couldn't evict the embedding. Check error_summary for which nodes failed. The warm registry will eventually age the stale entries; if you need them gone immediately, take the affected nodes offline so the registry drops them.
WebRTC POST /sessions/{id}/offer returns 501#
aiortc and av aren't installed in this deployment. Session creation worked because it just records intent — but the SDP exchange needs the actual peer-connection library. Operators install with pip install aiortc av and restart.
WebRTC session lost between offer and control WS#
You'll see SCAIECHO_WEBRTC_SESSION_STATE_LOST (410) or the control WS closing with 4404 and a session_not_ready error message. The in-process peer state died — usually an operator restart of the HTTP server. Create a new session and resend the offer.