API reference

All endpoints are mounted at /v1/modules/scaiecho/ and authenticate with the standard ScaiGrid bearer token. WebSocket routes accept the same bearer from the token= query parameter or the Authorization header. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).

Health#

`GET /healthz`#

Liveness check. Always returns { "status": "ok", "module": "scaiecho" }. Unauthenticated.

`GET /readyz`#

Readiness check. Returns status, module, and the current rollout phase. The platform health aggregator probes this. Unauthenticated.

Batch transcription#

`POST /transcribe`#

Multipart upload. Returns the transcript inline for short audio, or 202 Accepted with a job_id for long audio.

Form field	Required	Notes
`file`	yes	Audio file (`wav`, `mp3`, `flac`, `ogg`, `m4a`).
`language_hint`	no	ISO 639-1 code (2 chars). Helps the model with low-resource languages.
`backend_preference`	no	`prefer_self_hosted`, `prefer_relay`, or `any` (default).
`temperature`	no	`0.0`–`1.0`. Some models honour it for hesitancy / filler handling.
`force_async`	no	Force the async path even when under the byte threshold.

Inline response (200 OK):

json
{
  "data": {
    "job_id": "...",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 12500,
    "audio_bytes": 400000
  }
}

Async response (202 Accepted):

json
{
  "data": {
    "job_id": "...",
    "status": "queued",
    "audio_bytes": 14580000,
    "note": "Long-form transcribe runs asynchronously. Poll GET /v1/modules/scaiecho/transcribe/jobs/{id}"
  }
}

Permission: scaiecho:transcribe.

`GET /transcribe/jobs/{job_id}`#

Poll an async transcribe job. The transcript is inline on the response when status == "completed" — STT outputs are text, no S3 fetch needed.

json
{
  "data": {
    "job_id": "...",
    "status": "completed",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 612000,
    "audio_bytes": 14580000,
    "created_at": "...",
    "completed_at": "...",
    "status_reason": null
  }
}

Status values: queued, running, completed, failed. Cross-tenant or cross-user reads return 404 (deliberate, to avoid leaking job existence).

Permission: scaiecho:transcribe.

WebSocket streaming#

`WS /stream/transcribe`#

Real-time STT over WebSocket. Client opens the WS with ?token=... or an Authorization header, sends an open JSON frame, pushes binary audio frames, then close.

Open frame (client → server):

json
{
  "type": "open",
  "language_hint": "en",
  "media_type": "audio/wav",
  "backend_preference": "any",
  "chunk_seconds": 5.0,
  "sample_rate": 16000,
  "diarize": false
}

Server frames:

Type	Payload
`ready`	`{ "backend_used": "A
`delta`	`{ "text": "...", "is_final": false, "start": 0.0, "end": 4.8, "confidence": 0.0, "speaker_label": "..." }`
`closed`	`{ "audio_bytes": 80000 }`
`error`	`{ "code": "...", "message": "..." }`

The speaker_label field is omitted when no diarization label is available. Close codes: 4401 (unauthorized), 4403 (forbidden), 4400 (bad request), 4502 (backend unavailable), 4500 (server error).

Permission: scaiecho:transcribe. Diarization additionally requires scaiecho:diarize.

WebRTC streaming#

Status: signalling and lifecycle ship end-to-end. The audio-decode plane (av.AudioFrame → backend dispatcher) is still in progress. Sessions create, SDP exchanges, ICE trickles, control WebSocket attaches — but no audio reaches the backend yet, so no transcript deltas come back. Use the WebSocket streaming endpoints for production until this caveat is removed.

All WebRTC endpoints are under /stream/transcribe/webrtc/.

`POST /stream/transcribe/webrtc/sessions`#

Create a session. Returns 201 Created:

json
{
  "data": {
    "session_id": "...",
    "ice_servers": [{ "urls": ["stun:..."] }],
    "expires_at": "...",
    "control_ws_url": "/v1/modules/scaiecho/stream/transcribe/webrtc/sessions/{id}/control"
  }
}

Body fields: language_hint (2-char), media_type (default audio/wav), backend_preference, chunk_seconds (0.5–60.0), sample_rate (8000–48000), ice_servers (optional tenant-supplied ICE config).

Permission: scaiecho:transcribe.

`POST /stream/transcribe/webrtc/sessions/{id}/offer`#

Apply client SDP offer, return our answer.

json
{ "sdp": "v=0...", "type": "offer" }

Returns { "data": { "sdp": "...", "type": "answer" } }. Errors: SCAIECHO_WEBRTC_UNAVAILABLE (501, aiortc not installed), SCAIECHO_WEBRTC_SESSION_NOT_FOUND (404), SCAIECHO_WEBRTC_SESSION_STATE_LOST (410).

`POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates`#

Trickle an ICE candidate. Body: { "candidate": "...", "sdp_mid": "...", "sdp_mline_index": 0 }. Returns 204 No Content.

`DELETE /stream/transcribe/webrtc/sessions/{id}`#

Tear down the peer connection and mark the session closed. Returns 204 No Content.

`WS /stream/transcribe/webrtc/sessions/{id}/control`#

Control plane for the WebRTC session. Server emits delta JSON frames as the dispatcher produces transcript records. The client can send {"type": "close"} to tear the session down early.

Speakers#

`GET /speakers`#

List speakers visible to the caller.

Query parameters: language (ISO 639-1), scope (global, tenant, user), enrollment_status (pending, ready, failed, evicted), limit (1–200, default 50).

Permission: scaiecho:enroll.

`GET /speakers/{speaker_id}`#

Fetch one speaker. Permission: scaiecho:enroll.

`POST /speakers`#

Enroll a speaker. Multipart upload:

Form field	Required	Notes
`display_name`	yes	Human-readable name.
`language_primary`	yes	ISO 639-1.
`description`	no	Free text.
`consent_user_full_name`	yes	The speaker's legal name.
`consent_stated_purpose`	yes	Why this enrollment exists.
`consent_text`	yes	The exact text the speaker agreed to.
`reference`	yes	Reference audio file.
`consent`	yes	Consent recording.

Returns 201 Created with the new speaker profile plus a preflight block, an enrolled_on node-id list, and optionally enroll_errors per node or a note if no pyannote node is online. Permission: scaiecho:enroll.

`PATCH /speakers/{speaker_id}`#

Update mutable fields. Body: { "display_name": "...", "description": "..." }. Scope and ownership are locked at create. Permission: scaiecho:enroll.

`DELETE /speakers/{speaker_id}`#

Erase a speaker — Art. 17 fan-out. Deletes blobs, writes an audit row, tombstones the row, evicts the embedding from every replica.

json
{
  "data": {
    "audit_id": "...",
    "speaker_id": "...",
    "blob_bytes_deleted": 480000,
    "error_summary": null,
    "completed_at": "..."
  }
}

Permission: scaiecho:enroll.

`GET /speakers/{speaker_id}/warm`#

Inspect current enrollment fan-out.

json
{
  "data": {
    "speaker_id": "...",
    "warm_node_ids": ["..."],
    "candidate_node_ids": ["..."],
    "stale_node_ids": []
  }
}

Permission: scaiecho:enroll.

`POST /speakers/{speaker_id}/warm`#

Proactive re-enrollment fan-out. Body: { "node_ids": [] } (empty for all candidates). Returns per-node outcomes plus skipped_not_candidate for any requested ids that weren't pyannote candidates. Permission: scaiecho:enroll.

Tenant policy#

`GET /tenant-policy`#

Read the resolved policy. Lazy-creates a row from tier defaults on first read. Permission: scaiecho:admin.

json
{
  "data": {
    "tenant_id": "...",
    "allowed_backends": "AB",
    "default_backend": "B",
    "created_at": "...",
    "updated_at": "..."
  }
}

`PATCH /tenant-policy`#

Update allowed set and/or default backend.

json
{ "allowed_backends": "AB", "default_backend": "A" }

allowed_backends matches ^(A|B|AB|BA)$. default_backend is A or B and must be in the allowed set. Permission: scaiecho:admin.

Errors#

ScaiEcho returns ScaiGrid's standard error envelope. Specific codes:

Code	HTTP	Meaning
`SCAIECHO_EMPTY_AUDIO`	400	The uploaded file was zero bytes.
`SCAIECHO_BAD_BACKEND_PREFERENCE`	400	`backend_preference` was not one of the three valid values.
`SCAIECHO_TENANT_POLICY_INVALID`	400	`allowed_backends` or `default_backend` didn't validate (e.g. default not in allowed).
`SCAIECHO_SPEAKER_PREFLIGHT_FAILED`	400	Reference audio failed quality checks; details in `error.preflight`.
`SCAIECHO_CONSENT_INVALID`	400	Consent recording missing or below the minimum byte threshold.
`SCAIECHO_FORBIDDEN`	403	Permission check failed inside a streaming open.
`SCAIECHO_SPEAKER_ACCESS_DENIED`	403	Caller cannot operate on this speaker.
`SCAIECHO_JOB_NOT_FOUND`	404	Async transcribe job id doesn't exist or isn't visible.
`SCAIECHO_SPEAKER_NOT_FOUND`	404	Speaker id doesn't exist or isn't visible.
`SCAIECHO_WEBRTC_SESSION_NOT_FOUND`	404	WebRTC session does not exist or has expired.
`SCAIECHO_SPEAKER_NO_REFERENCE`	409	Speaker has no reference URI — can't warm without re-intake.
`SCAIECHO_SPEAKER_NOT_READY_FOR_WARMING`	409	Speaker isn't in a state that supports fan-out.
`SCAIECHO_WEBRTC_SESSION_STATE_LOST`	410	Session state was lost (server restart). Create a new session.
`SCAIECHO_NO_USABLE_BACKEND`	500	Tenant policy resolved to an empty allowed set.
`SCAIECHO_STREAM_FAILED`	—	Dispatcher errored mid-stream. Sent as a WS `error` frame.
`SCAIECHO_BACKEND_UNAVAILABLE`	502	Selected backend isn't usable right now.
`SCAIECHO_WEBRTC_UNAVAILABLE`	501	WebRTC support requires aiortc + av in the deployment.

MCP tool#

ScaiEcho registers one tool with the MCP catalog:

scaiecho.transcribe — base64 audio in, transcript and metadata out. Backend selection is hidden from MCP callers; tenant policy decides A or B. Required permission: scaiecho:transcribe.

API reference

Health#

GET /healthz#

GET /readyz#

Batch transcription#

POST /transcribe#

GET /transcribe/jobs/{job_id}#

WebSocket streaming#

WS /stream/transcribe#

WebRTC streaming#

POST /stream/transcribe/webrtc/sessions#

POST /stream/transcribe/webrtc/sessions/{id}/offer#

POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates#

DELETE /stream/transcribe/webrtc/sessions/{id}#

WS /stream/transcribe/webrtc/sessions/{id}/control#

Speakers#

GET /speakers#

GET /speakers/{speaker_id}#

POST /speakers#

PATCH /speakers/{speaker_id}#

DELETE /speakers/{speaker_id}#

GET /speakers/{speaker_id}/warm#

POST /speakers/{speaker_id}/warm#

Tenant policy#

GET /tenant-policy#

PATCH /tenant-policy#

Errors#

MCP tool#

`GET /healthz`#

`GET /readyz`#

`POST /transcribe`#

`GET /transcribe/jobs/{job_id}`#

`WS /stream/transcribe`#

`POST /stream/transcribe/webrtc/sessions`#

`POST /stream/transcribe/webrtc/sessions/{id}/offer`#

`POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates`#

`DELETE /stream/transcribe/webrtc/sessions/{id}`#

`WS /stream/transcribe/webrtc/sessions/{id}/control`#

`GET /speakers`#

`GET /speakers/{speaker_id}`#

`POST /speakers`#

`PATCH /speakers/{speaker_id}`#

`DELETE /speakers/{speaker_id}`#

`GET /speakers/{speaker_id}/warm`#

`POST /speakers/{speaker_id}/warm`#

`GET /tenant-policy`#

`PATCH /tenant-policy`#