---
summary: Common symptoms and what they usually mean.
title: Troubleshooting
path: troubleshooting
status: published
---

A short list of things that go wrong with ScaiEcho and how to fix them. If none of these match, check the request id in the response envelope and grep the ScaiGrid logs for it.

## `POST /transcribe` returns 502 `SCAIECHO_BACKEND_UNAVAILABLE`

Your tenant policy pinned the request to a backend that isn't usable.

- **Pinned to A, no online STT node.** Either wait for ops to bring an STT-loaded ScaiInfer node back online, or `PATCH /tenant-policy` to allow B.
- **Pinned to B, relay API key not configured.** Operator misconfiguration. Check the managed STT relay credentials in the deployment settings.
- **`backend_preference=prefer_self_hosted` against an AB policy with no A node.** The resolver falls through to B in that case; if you got the error you actually pinned A. Drop the preference to `any`.

## Async job stuck at `queued` for minutes

- **No arq worker running.** Check the `worker` mode service is up; without it, jobs never leave the queue.
- **Worker backlog.** Long backlogs are visible in the admin UI's transcription dashboard. Scale the worker pool out or raise the inline threshold so more requests bypass async.
- **S3 unreachable from the worker.** The worker fetches the staged audio before dispatch; check the S3 endpoint is reachable from worker pods.

## Async job ends at `failed`

The `status_reason` field tells you why. Common reasons:

- "backend unavailable" — same diagnosis as the sync 502 above; the worker hit it at dispatch time.
- "decode failure" — the audio was not actually a format the dispatcher could parse. Often a `Content-Type` mismatch — re-upload with the correct mime.
- "quota exceeded" — tenant budget was hit between enqueue and dispatch. Raise the budget or wait for the next period.

## WebSocket closes with code `4401` immediately

Bearer token missing or invalid.

- Did you pass `?token=...` on the URL or set `Authorization: Bearer ...`? FastAPI dependencies don't fire before WS accept, so the route does its own check.
- Is the token an API key (`sgk_...`) or a JWT? Both are accepted; anything else is rejected.

## WebSocket closes with code `4403`

You have a valid token but lack the required module permission.

- For `/stream/transcribe`: missing `scaiecho:transcribe`.
- After sending an `open` with `diarize: true`: missing `scaiecho:diarize`. The server sends an `error` JSON frame before closing.

Grant the permission via a custom role (see [Permissions](./reference/permissions)).

## WebSocket opens but no `delta` frames arrive

- **Wrong `media_type`.** Default is `audio/wav`. If you're sending Opus, MP3, or anything else, declare it on the `open` frame.
- **Audio frames not being sent as binary.** The route treats text frames as control. If your client library coerces bytes to base64 strings, switch it to binary.
- **`chunk_seconds` too large.** On Backend B, the dispatcher accumulates this much audio before relaying; deltas don't appear until the first chunk is sent. Drop to `2.0` for snappier feedback at the cost of more API calls.
- **Backend A picked but the node disappeared.** The dispatcher may be stuck mid-bidi. Close the WS and reopen; the resolver picks again.

## Diarize=true requested but deltas have no `speaker_label`

Either the dispatcher can't run diarization right now, or the segment doesn't match any enrolled profile.

- **Backend B selected.** The managed STT relay does not expose diarization; `diarize` is silently a no-op. Push tenant policy toward A or accept unlabelled output.
- **No pyannote node online.** Required engine is `audio.analyze.pyannote`. `GET /v1/admin/nodes` and look for nodes with this engine in `models_loaded`.
- **Speakers not yet enrolled on the running node.** Check `GET /speakers/{id}/warm`. If the candidate node isn't in `warm_node_ids`, fan out via `POST /speakers/{id}/warm`.
- **Unknown speaker.** Segments from speakers not in the library get a per-session cluster id (`spk_0`, `spk_1`), not the enrolled label.

## Speaker enrollment returns `SCAIECHO_SPEAKER_PREFLIGHT_FAILED`

The reference audio didn't pass quality checks. The `error.preflight` field on the response details what failed — typically too short, too much silence, sample rate out of range, or signal-to-noise ratio too low. Re-record with a cleaner sample and resubmit.

## Enrollment succeeds but `enrollment_status` stays `pending`

No online `audio.analyze.pyannote` node when the request landed. The response includes a `note` saying so. Once a node lands, run `POST /speakers/{id}/warm` to fan out — the status flips to `ready` when at least one node accepts.

## `DELETE /speakers/{id}` reports errors per node

The audit row was still written and the speaker is tombstoned, but one or more replicas couldn't evict the embedding. Check `error_summary` for which nodes failed. The warm registry will eventually age the stale entries; if you need them gone immediately, take the affected nodes offline so the registry drops them.

## WebRTC `POST /sessions/{id}/offer` returns 501

`aiortc` and `av` aren't installed in this deployment. Session creation worked because it just records intent — but the SDP exchange needs the actual peer-connection library. Operators install with `pip install aiortc av` and restart.

## WebRTC session lost between offer and control WS

You'll see `SCAIECHO_WEBRTC_SESSION_STATE_LOST` (410) or the control WS closing with `4404` and a `session_not_ready` error message. The in-process peer state died — usually an operator restart of the HTTP server. Create a new session and resend the offer.
