Troubleshooting

A short list of things that go wrong with ScaiEcho and how to fix them. If none of these match, check the request id in the response envelope and grep the ScaiGrid logs for it.

`POST /transcribe` returns 502 `SCAIECHO_BACKEND_UNAVAILABLE`#

Your tenant policy pinned the request to a backend that isn't usable.

Pinned to A, no online STT node. Either wait for ops to bring an STT-loaded ScaiInfer node back online, or PATCH /tenant-policy to allow B.
Pinned to B, relay API key not configured. Operator misconfiguration. Check the managed STT relay credentials in the deployment settings.
backend_preference=prefer_self_hosted against an AB policy with no A node. The resolver falls through to B in that case; if you got the error you actually pinned A. Drop the preference to any.

Async job stuck at `queued` for minutes#

No arq worker running. Check the worker mode service is up; without it, jobs never leave the queue.
Worker backlog. Long backlogs are visible in the admin UI's transcription dashboard. Scale the worker pool out or raise the inline threshold so more requests bypass async.
S3 unreachable from the worker. The worker fetches the staged audio before dispatch; check the S3 endpoint is reachable from worker pods.

Async job ends at `failed`#

The status_reason field tells you why. Common reasons:

"backend unavailable" — same diagnosis as the sync 502 above; the worker hit it at dispatch time.
"decode failure" — the audio was not actually a format the dispatcher could parse. Often a Content-Type mismatch — re-upload with the correct mime.
"quota exceeded" — tenant budget was hit between enqueue and dispatch. Raise the budget or wait for the next period.

WebSocket closes with code `4401` immediately#

Bearer token missing or invalid.

Did you pass ?token=... on the URL or set Authorization: Bearer ...? FastAPI dependencies don't fire before WS accept, so the route does its own check.
Is the token an API key (sgk_...) or a JWT? Both are accepted; anything else is rejected.

WebSocket closes with code `4403`#

You have a valid token but lack the required module permission.

For /stream/transcribe: missing scaiecho:transcribe.
After sending an open with diarize: true: missing scaiecho:diarize. The server sends an error JSON frame before closing.

Grant the permission via a custom role (see Permissions).

WebSocket opens but no `delta` frames arrive#

Wrong media_type. Default is audio/wav. If you're sending Opus, MP3, or anything else, declare it on the open frame.
Audio frames not being sent as binary. The route treats text frames as control. If your client library coerces bytes to base64 strings, switch it to binary.
chunk_seconds too large. On Backend B, the dispatcher accumulates this much audio before relaying; deltas don't appear until the first chunk is sent. Drop to 2.0 for snappier feedback at the cost of more API calls.
Backend A picked but the node disappeared. The dispatcher may be stuck mid-bidi. Close the WS and reopen; the resolver picks again.

Diarize=true requested but deltas have no `speaker_label`#

Either the dispatcher can't run diarization right now, or the segment doesn't match any enrolled profile.

Backend B selected. The managed STT relay does not expose diarization; diarize is silently a no-op. Push tenant policy toward A or accept unlabelled output.
No pyannote node online. Required engine is audio.analyze.pyannote. GET /v1/admin/nodes and look for nodes with this engine in models_loaded.
Speakers not yet enrolled on the running node. Check GET /speakers/{id}/warm. If the candidate node isn't in warm_node_ids, fan out via POST /speakers/{id}/warm.
Unknown speaker. Segments from speakers not in the library get a per-session cluster id (spk_0, spk_1), not the enrolled label.

Speaker enrollment returns `SCAIECHO_SPEAKER_PREFLIGHT_FAILED`#

The reference audio didn't pass quality checks. The error.preflight field on the response details what failed — typically too short, too much silence, sample rate out of range, or signal-to-noise ratio too low. Re-record with a cleaner sample and resubmit.

Enrollment succeeds but `enrollment_status` stays `pending`#

No online audio.analyze.pyannote node when the request landed. The response includes a note saying so. Once a node lands, run POST /speakers/{id}/warm to fan out — the status flips to ready when at least one node accepts.

`DELETE /speakers/{id}` reports errors per node#

The audit row was still written and the speaker is tombstoned, but one or more replicas couldn't evict the embedding. Check error_summary for which nodes failed. The warm registry will eventually age the stale entries; if you need them gone immediately, take the affected nodes offline so the registry drops them.

WebRTC `POST /sessions/{id}/offer` returns 501#

aiortc and av aren't installed in this deployment. Session creation worked because it just records intent — but the SDP exchange needs the actual peer-connection library. Operators install with pip install aiortc av and restart.

WebRTC session lost between offer and control WS#

You'll see SCAIECHO_WEBRTC_SESSION_STATE_LOST (410) or the control WS closing with 4404 and a session_not_ready error message. The in-process peer state died — usually an operator restart of the HTTP server. Create a new session and resend the offer.

Troubleshooting

POST /transcribe returns 502 SCAIECHO_BACKEND_UNAVAILABLE#

Async job stuck at queued for minutes#

Async job ends at failed#

WebSocket closes with code 4401 immediately#

WebSocket closes with code 4403#

WebSocket opens but no delta frames arrive#

Diarize=true requested but deltas have no speaker_label#

Speaker enrollment returns SCAIECHO_SPEAKER_PREFLIGHT_FAILED#

Enrollment succeeds but enrollment_status stays pending#

DELETE /speakers/{id} reports errors per node#

WebRTC POST /sessions/{id}/offer returns 501#