---
summary: "Every ScaiEcho endpoint \u2014 transcribe, streaming, WebRTC, speaker library,\
  \ tenant policy."
title: API reference
path: reference/api
status: published
---

All endpoints are mounted at `/v1/modules/scaiecho/` and authenticate with the standard ScaiGrid bearer token. WebSocket routes accept the same bearer from the `token=` query parameter or the `Authorization` header. Responses use ScaiGrid's standard envelope (`{ "data": ... }` for success, `{ "error": ... }` for failures).

## Health

### `GET /healthz`

Liveness check. Always returns `{ "status": "ok", "module": "scaiecho" }`. Unauthenticated.

### `GET /readyz`

Readiness check. Returns `status`, `module`, and the current rollout `phase`. The platform health aggregator probes this. Unauthenticated.

## Batch transcription

### `POST /transcribe`

Multipart upload. Returns the transcript inline for short audio, or `202 Accepted` with a `job_id` for long audio.

| Form field | Required | Notes |
|---|---|---|
| `file` | yes | Audio file (`wav`, `mp3`, `flac`, `ogg`, `m4a`). |
| `language_hint` | no | ISO 639-1 code (2 chars). Helps the model with low-resource languages. |
| `backend_preference` | no | `prefer_self_hosted`, `prefer_relay`, or `any` (default). |
| `temperature` | no | `0.0`–`1.0`. Some models honour it for hesitancy / filler handling. |
| `force_async` | no | Force the async path even when under the byte threshold. |

Inline response (`200 OK`):

```json
{
  "data": {
    "job_id": "...",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 12500,
    "audio_bytes": 400000
  }
}
```

Async response (`202 Accepted`):

```json
{
  "data": {
    "job_id": "...",
    "status": "queued",
    "audio_bytes": 14580000,
    "note": "Long-form transcribe runs asynchronously. Poll GET /v1/modules/scaiecho/transcribe/jobs/{id}"
  }
}
```

Permission: `scaiecho:transcribe`.

### `GET /transcribe/jobs/{job_id}`

Poll an async transcribe job. The transcript is inline on the response when `status == "completed"` — STT outputs are text, no S3 fetch needed.

```json
{
  "data": {
    "job_id": "...",
    "status": "completed",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 612000,
    "audio_bytes": 14580000,
    "created_at": "...",
    "completed_at": "...",
    "status_reason": null
  }
}
```

Status values: `queued`, `running`, `completed`, `failed`. Cross-tenant or cross-user reads return `404` (deliberate, to avoid leaking job existence).

Permission: `scaiecho:transcribe`.

## WebSocket streaming

### `WS /stream/transcribe`

Real-time STT over WebSocket. Client opens the WS with `?token=...` or an `Authorization` header, sends an `open` JSON frame, pushes binary audio frames, then `close`.

Open frame (client → server):

```json
{
  "type": "open",
  "language_hint": "en",
  "media_type": "audio/wav",
  "backend_preference": "any",
  "chunk_seconds": 5.0,
  "sample_rate": 16000,
  "diarize": false
}
```

Server frames:

| Type | Payload |
|---|---|
| `ready` | `{ "backend_used": "A|B" }` |
| `delta` | `{ "text": "...", "is_final": false, "start": 0.0, "end": 4.8, "confidence": 0.0, "speaker_label": "..." }` |
| `closed` | `{ "audio_bytes": 80000 }` |
| `error` | `{ "code": "...", "message": "..." }` |

The `speaker_label` field is omitted when no diarization label is available. Close codes: `4401` (unauthorized), `4403` (forbidden), `4400` (bad request), `4502` (backend unavailable), `4500` (server error).

Permission: `scaiecho:transcribe`. Diarization additionally requires `scaiecho:diarize`.

## WebRTC streaming

> **Status:** signalling and lifecycle ship end-to-end. The audio-decode plane (`av.AudioFrame` → backend dispatcher) is still in progress. Sessions create, SDP exchanges, ICE trickles, control WebSocket attaches — but no audio reaches the backend yet, so no transcript deltas come back. Use the WebSocket streaming endpoints for production until this caveat is removed.

All WebRTC endpoints are under `/stream/transcribe/webrtc/`.

### `POST /stream/transcribe/webrtc/sessions`

Create a session. Returns `201 Created`:

```json
{
  "data": {
    "session_id": "...",
    "ice_servers": [{ "urls": ["stun:..."] }],
    "expires_at": "...",
    "control_ws_url": "/v1/modules/scaiecho/stream/transcribe/webrtc/sessions/{id}/control"
  }
}
```

Body fields: `language_hint` (2-char), `media_type` (default `audio/wav`), `backend_preference`, `chunk_seconds` (0.5–60.0), `sample_rate` (8000–48000), `ice_servers` (optional tenant-supplied ICE config).

Permission: `scaiecho:transcribe`.

### `POST /stream/transcribe/webrtc/sessions/{id}/offer`

Apply client SDP offer, return our answer.

```json
{ "sdp": "v=0...", "type": "offer" }
```

Returns `{ "data": { "sdp": "...", "type": "answer" } }`. Errors: `SCAIECHO_WEBRTC_UNAVAILABLE` (501, aiortc not installed), `SCAIECHO_WEBRTC_SESSION_NOT_FOUND` (404), `SCAIECHO_WEBRTC_SESSION_STATE_LOST` (410).

### `POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates`

Trickle an ICE candidate. Body: `{ "candidate": "...", "sdp_mid": "...", "sdp_mline_index": 0 }`. Returns `204 No Content`.

### `DELETE /stream/transcribe/webrtc/sessions/{id}`

Tear down the peer connection and mark the session closed. Returns `204 No Content`.

### `WS /stream/transcribe/webrtc/sessions/{id}/control`

Control plane for the WebRTC session. Server emits `delta` JSON frames as the dispatcher produces transcript records. The client can send `{"type": "close"}` to tear the session down early.

## Speakers

### `GET /speakers`

List speakers visible to the caller.

Query parameters: `language` (ISO 639-1), `scope` (`global`, `tenant`, `user`), `enrollment_status` (`pending`, `ready`, `failed`, `evicted`), `limit` (1–200, default 50).

Permission: `scaiecho:enroll`.

### `GET /speakers/{speaker_id}`

Fetch one speaker. Permission: `scaiecho:enroll`.

### `POST /speakers`

Enroll a speaker. Multipart upload:

| Form field | Required | Notes |
|---|---|---|
| `display_name` | yes | Human-readable name. |
| `language_primary` | yes | ISO 639-1. |
| `description` | no | Free text. |
| `consent_user_full_name` | yes | The speaker's legal name. |
| `consent_stated_purpose` | yes | Why this enrollment exists. |
| `consent_text` | yes | The exact text the speaker agreed to. |
| `reference` | yes | Reference audio file. |
| `consent` | yes | Consent recording. |

Returns `201 Created` with the new speaker profile plus a `preflight` block, an `enrolled_on` node-id list, and optionally `enroll_errors` per node or a `note` if no pyannote node is online. Permission: `scaiecho:enroll`.

### `PATCH /speakers/{speaker_id}`

Update mutable fields. Body: `{ "display_name": "...", "description": "..." }`. Scope and ownership are locked at create. Permission: `scaiecho:enroll`.

### `DELETE /speakers/{speaker_id}`

Erase a speaker — Art. 17 fan-out. Deletes blobs, writes an audit row, tombstones the row, evicts the embedding from every replica.

```json
{
  "data": {
    "audit_id": "...",
    "speaker_id": "...",
    "blob_bytes_deleted": 480000,
    "error_summary": null,
    "completed_at": "..."
  }
}
```

Permission: `scaiecho:enroll`.

### `GET /speakers/{speaker_id}/warm`

Inspect current enrollment fan-out.

```json
{
  "data": {
    "speaker_id": "...",
    "warm_node_ids": ["..."],
    "candidate_node_ids": ["..."],
    "stale_node_ids": []
  }
}
```

Permission: `scaiecho:enroll`.

### `POST /speakers/{speaker_id}/warm`

Proactive re-enrollment fan-out. Body: `{ "node_ids": [] }` (empty for all candidates). Returns per-node outcomes plus `skipped_not_candidate` for any requested ids that weren't pyannote candidates. Permission: `scaiecho:enroll`.

## Tenant policy

### `GET /tenant-policy`

Read the resolved policy. Lazy-creates a row from tier defaults on first read. Permission: `scaiecho:admin`.

```json
{
  "data": {
    "tenant_id": "...",
    "allowed_backends": "AB",
    "default_backend": "B",
    "created_at": "...",
    "updated_at": "..."
  }
}
```

### `PATCH /tenant-policy`

Update allowed set and/or default backend.

```json
{ "allowed_backends": "AB", "default_backend": "A" }
```

`allowed_backends` matches `^(A|B|AB|BA)$`. `default_backend` is `A` or `B` and must be in the allowed set. Permission: `scaiecho:admin`.

## Errors

ScaiEcho returns ScaiGrid's standard error envelope. Specific codes:

| Code | HTTP | Meaning |
|---|---|---|
| `SCAIECHO_EMPTY_AUDIO` | 400 | The uploaded file was zero bytes. |
| `SCAIECHO_BAD_BACKEND_PREFERENCE` | 400 | `backend_preference` was not one of the three valid values. |
| `SCAIECHO_TENANT_POLICY_INVALID` | 400 | `allowed_backends` or `default_backend` didn't validate (e.g. default not in allowed). |
| `SCAIECHO_SPEAKER_PREFLIGHT_FAILED` | 400 | Reference audio failed quality checks; details in `error.preflight`. |
| `SCAIECHO_CONSENT_INVALID` | 400 | Consent recording missing or below the minimum byte threshold. |
| `SCAIECHO_FORBIDDEN` | 403 | Permission check failed inside a streaming open. |
| `SCAIECHO_SPEAKER_ACCESS_DENIED` | 403 | Caller cannot operate on this speaker. |
| `SCAIECHO_JOB_NOT_FOUND` | 404 | Async transcribe job id doesn't exist or isn't visible. |
| `SCAIECHO_SPEAKER_NOT_FOUND` | 404 | Speaker id doesn't exist or isn't visible. |
| `SCAIECHO_WEBRTC_SESSION_NOT_FOUND` | 404 | WebRTC session does not exist or has expired. |
| `SCAIECHO_SPEAKER_NO_REFERENCE` | 409 | Speaker has no reference URI — can't warm without re-intake. |
| `SCAIECHO_SPEAKER_NOT_READY_FOR_WARMING` | 409 | Speaker isn't in a state that supports fan-out. |
| `SCAIECHO_WEBRTC_SESSION_STATE_LOST` | 410 | Session state was lost (server restart). Create a new session. |
| `SCAIECHO_NO_USABLE_BACKEND` | 500 | Tenant policy resolved to an empty allowed set. |
| `SCAIECHO_STREAM_FAILED` | — | Dispatcher errored mid-stream. Sent as a WS `error` frame. |
| `SCAIECHO_BACKEND_UNAVAILABLE` | 502 | Selected backend isn't usable right now. |
| `SCAIECHO_WEBRTC_UNAVAILABLE` | 501 | WebRTC support requires aiortc + av in the deployment. |

## MCP tool

ScaiEcho registers one tool with the MCP catalog:

- `scaiecho.transcribe` — base64 audio in, transcript and metadata out. Backend selection is hidden from MCP callers; tenant policy decides A or B. Required permission: `scaiecho:transcribe`.
