API reference
All endpoints are mounted at /v1/modules/scaiecho/ and authenticate with the standard ScaiGrid bearer token. WebSocket routes accept the same bearer from the token= query parameter or the Authorization header. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).
Health#
GET /healthz#
Liveness check. Always returns { "status": "ok", "module": "scaiecho" }. Unauthenticated.
GET /readyz#
Readiness check. Returns status, module, and the current rollout phase. The platform health aggregator probes this. Unauthenticated.
Batch transcription#
POST /transcribe#
Multipart upload. Returns the transcript inline for short audio, or 202 Accepted with a job_id for long audio.
| Form field | Required | Notes |
|---|---|---|
file |
yes | Audio file (wav, mp3, flac, ogg, m4a). |
language_hint |
no | ISO 639-1 code (2 chars). Helps the model with low-resource languages. |
backend_preference |
no | prefer_self_hosted, prefer_relay, or any (default). |
temperature |
no | 0.0–1.0. Some models honour it for hesitancy / filler handling. |
force_async |
no | Force the async path even when under the byte threshold. |
Inline response (200 OK):
1 2 3 4 5 6 7 8 9 10 | |
Async response (202 Accepted):
1 2 3 4 5 6 7 8 | |
Permission: scaiecho:transcribe.
GET /transcribe/jobs/{job_id}#
Poll an async transcribe job. The transcript is inline on the response when status == "completed" — STT outputs are text, no S3 fetch needed.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
Status values: queued, running, completed, failed. Cross-tenant or cross-user reads return 404 (deliberate, to avoid leaking job existence).
Permission: scaiecho:transcribe.
WebSocket streaming#
WS /stream/transcribe#
Real-time STT over WebSocket. Client opens the WS with ?token=... or an Authorization header, sends an open JSON frame, pushes binary audio frames, then close.
Open frame (client → server):
1 2 3 4 5 6 7 8 9 | |
Server frames:
| Type | Payload |
|---|---|
ready |
`{ "backend_used": "A |
delta |
{ "text": "...", "is_final": false, "start": 0.0, "end": 4.8, "confidence": 0.0, "speaker_label": "..." } |
closed |
{ "audio_bytes": 80000 } |
error |
{ "code": "...", "message": "..." } |
The speaker_label field is omitted when no diarization label is available. Close codes: 4401 (unauthorized), 4403 (forbidden), 4400 (bad request), 4502 (backend unavailable), 4500 (server error).
Permission: scaiecho:transcribe. Diarization additionally requires scaiecho:diarize.
WebRTC streaming#
Status: signalling and lifecycle ship end-to-end. The audio-decode plane (
av.AudioFrame→ backend dispatcher) is still in progress. Sessions create, SDP exchanges, ICE trickles, control WebSocket attaches — but no audio reaches the backend yet, so no transcript deltas come back. Use the WebSocket streaming endpoints for production until this caveat is removed.
All WebRTC endpoints are under /stream/transcribe/webrtc/.
POST /stream/transcribe/webrtc/sessions#
Create a session. Returns 201 Created:
1 2 3 4 5 6 7 8 | |
Body fields: language_hint (2-char), media_type (default audio/wav), backend_preference, chunk_seconds (0.5–60.0), sample_rate (8000–48000), ice_servers (optional tenant-supplied ICE config).
Permission: scaiecho:transcribe.
POST /stream/transcribe/webrtc/sessions/{id}/offer#
Apply client SDP offer, return our answer.
1 | |
Returns { "data": { "sdp": "...", "type": "answer" } }. Errors: SCAIECHO_WEBRTC_UNAVAILABLE (501, aiortc not installed), SCAIECHO_WEBRTC_SESSION_NOT_FOUND (404), SCAIECHO_WEBRTC_SESSION_STATE_LOST (410).
POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates#
Trickle an ICE candidate. Body: { "candidate": "...", "sdp_mid": "...", "sdp_mline_index": 0 }. Returns 204 No Content.
DELETE /stream/transcribe/webrtc/sessions/{id}#
Tear down the peer connection and mark the session closed. Returns 204 No Content.
WS /stream/transcribe/webrtc/sessions/{id}/control#
Control plane for the WebRTC session. Server emits delta JSON frames as the dispatcher produces transcript records. The client can send {"type": "close"} to tear the session down early.
Speakers#
GET /speakers#
List speakers visible to the caller.
Query parameters: language (ISO 639-1), scope (global, tenant, user), enrollment_status (pending, ready, failed, evicted), limit (1–200, default 50).
Permission: scaiecho:enroll.
GET /speakers/{speaker_id}#
Fetch one speaker. Permission: scaiecho:enroll.
POST /speakers#
Enroll a speaker. Multipart upload:
| Form field | Required | Notes |
|---|---|---|
display_name |
yes | Human-readable name. |
language_primary |
yes | ISO 639-1. |
description |
no | Free text. |
consent_user_full_name |
yes | The speaker's legal name. |
consent_stated_purpose |
yes | Why this enrollment exists. |
consent_text |
yes | The exact text the speaker agreed to. |
reference |
yes | Reference audio file. |
consent |
yes | Consent recording. |
Returns 201 Created with the new speaker profile plus a preflight block, an enrolled_on node-id list, and optionally enroll_errors per node or a note if no pyannote node is online. Permission: scaiecho:enroll.
PATCH /speakers/{speaker_id}#
Update mutable fields. Body: { "display_name": "...", "description": "..." }. Scope and ownership are locked at create. Permission: scaiecho:enroll.
DELETE /speakers/{speaker_id}#
Erase a speaker — Art. 17 fan-out. Deletes blobs, writes an audit row, tombstones the row, evicts the embedding from every replica.
1 2 3 4 5 6 7 8 9 | |
Permission: scaiecho:enroll.
GET /speakers/{speaker_id}/warm#
Inspect current enrollment fan-out.
1 2 3 4 5 6 7 8 | |
Permission: scaiecho:enroll.
POST /speakers/{speaker_id}/warm#
Proactive re-enrollment fan-out. Body: { "node_ids": [] } (empty for all candidates). Returns per-node outcomes plus skipped_not_candidate for any requested ids that weren't pyannote candidates. Permission: scaiecho:enroll.
Tenant policy#
GET /tenant-policy#
Read the resolved policy. Lazy-creates a row from tier defaults on first read. Permission: scaiecho:admin.
1 2 3 4 5 6 7 8 9 | |
PATCH /tenant-policy#
Update allowed set and/or default backend.
1 | |
allowed_backends matches ^(A|B|AB|BA)$. default_backend is A or B and must be in the allowed set. Permission: scaiecho:admin.
Errors#
ScaiEcho returns ScaiGrid's standard error envelope. Specific codes:
| Code | HTTP | Meaning |
|---|---|---|
SCAIECHO_EMPTY_AUDIO |
400 | The uploaded file was zero bytes. |
SCAIECHO_BAD_BACKEND_PREFERENCE |
400 | backend_preference was not one of the three valid values. |
SCAIECHO_TENANT_POLICY_INVALID |
400 | allowed_backends or default_backend didn't validate (e.g. default not in allowed). |
SCAIECHO_SPEAKER_PREFLIGHT_FAILED |
400 | Reference audio failed quality checks; details in error.preflight. |
SCAIECHO_CONSENT_INVALID |
400 | Consent recording missing or below the minimum byte threshold. |
SCAIECHO_FORBIDDEN |
403 | Permission check failed inside a streaming open. |
SCAIECHO_SPEAKER_ACCESS_DENIED |
403 | Caller cannot operate on this speaker. |
SCAIECHO_JOB_NOT_FOUND |
404 | Async transcribe job id doesn't exist or isn't visible. |
SCAIECHO_SPEAKER_NOT_FOUND |
404 | Speaker id doesn't exist or isn't visible. |
SCAIECHO_WEBRTC_SESSION_NOT_FOUND |
404 | WebRTC session does not exist or has expired. |
SCAIECHO_SPEAKER_NO_REFERENCE |
409 | Speaker has no reference URI — can't warm without re-intake. |
SCAIECHO_SPEAKER_NOT_READY_FOR_WARMING |
409 | Speaker isn't in a state that supports fan-out. |
SCAIECHO_WEBRTC_SESSION_STATE_LOST |
410 | Session state was lost (server restart). Create a new session. |
SCAIECHO_NO_USABLE_BACKEND |
500 | Tenant policy resolved to an empty allowed set. |
SCAIECHO_STREAM_FAILED |
— | Dispatcher errored mid-stream. Sent as a WS error frame. |
SCAIECHO_BACKEND_UNAVAILABLE |
502 | Selected backend isn't usable right now. |
SCAIECHO_WEBRTC_UNAVAILABLE |
501 | WebRTC support requires aiortc + av in the deployment. |
MCP tool#
ScaiEcho registers one tool with the MCP catalog:
scaiecho.transcribe— base64 audio in, transcript and metadata out. Backend selection is hidden from MCP callers; tenant policy decides A or B. Required permission:scaiecho:transcribe.