Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

API reference

All endpoints are mounted at /v1/modules/scaiecho/ and authenticate with the standard ScaiGrid bearer token. WebSocket routes accept the same bearer from the token= query parameter or the Authorization header. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).

Health#

GET /healthz#

Liveness check. Always returns { "status": "ok", "module": "scaiecho" }. Unauthenticated.

GET /readyz#

Readiness check. Returns status, module, and the current rollout phase. The platform health aggregator probes this. Unauthenticated.

Batch transcription#

POST /transcribe#

Multipart upload. Returns the transcript inline for short audio, or 202 Accepted with a job_id for long audio.

Form field Required Notes
file yes Audio file (wav, mp3, flac, ogg, m4a).
language_hint no ISO 639-1 code (2 chars). Helps the model with low-resource languages.
backend_preference no prefer_self_hosted, prefer_relay, or any (default).
temperature no 0.01.0. Some models honour it for hesitancy / filler handling.
force_async no Force the async path even when under the byte threshold.

Inline response (200 OK):

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "data": {
    "job_id": "...",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 12500,
    "audio_bytes": 400000
  }
}

Async response (202 Accepted):

json
1
2
3
4
5
6
7
8
{
  "data": {
    "job_id": "...",
    "status": "queued",
    "audio_bytes": 14580000,
    "note": "Long-form transcribe runs asynchronously. Poll GET /v1/modules/scaiecho/transcribe/jobs/{id}"
  }
}

Permission: scaiecho:transcribe.

GET /transcribe/jobs/{job_id}#

Poll an async transcribe job. The transcript is inline on the response when status == "completed" — STT outputs are text, no S3 fetch needed.

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "data": {
    "job_id": "...",
    "status": "completed",
    "transcript": "...",
    "backend_used": "A",
    "language_detected": "en",
    "audio_duration_ms": 612000,
    "audio_bytes": 14580000,
    "created_at": "...",
    "completed_at": "...",
    "status_reason": null
  }
}

Status values: queued, running, completed, failed. Cross-tenant or cross-user reads return 404 (deliberate, to avoid leaking job existence).

Permission: scaiecho:transcribe.

WebSocket streaming#

WS /stream/transcribe#

Real-time STT over WebSocket. Client opens the WS with ?token=... or an Authorization header, sends an open JSON frame, pushes binary audio frames, then close.

Open frame (client → server):

json
1
2
3
4
5
6
7
8
9
{
  "type": "open",
  "language_hint": "en",
  "media_type": "audio/wav",
  "backend_preference": "any",
  "chunk_seconds": 5.0,
  "sample_rate": 16000,
  "diarize": false
}

Server frames:

Type Payload
ready `{ "backend_used": "A
delta { "text": "...", "is_final": false, "start": 0.0, "end": 4.8, "confidence": 0.0, "speaker_label": "..." }
closed { "audio_bytes": 80000 }
error { "code": "...", "message": "..." }

The speaker_label field is omitted when no diarization label is available. Close codes: 4401 (unauthorized), 4403 (forbidden), 4400 (bad request), 4502 (backend unavailable), 4500 (server error).

Permission: scaiecho:transcribe. Diarization additionally requires scaiecho:diarize.

WebRTC streaming#

Status: signalling and lifecycle ship end-to-end. The audio-decode plane (av.AudioFrame → backend dispatcher) is still in progress. Sessions create, SDP exchanges, ICE trickles, control WebSocket attaches — but no audio reaches the backend yet, so no transcript deltas come back. Use the WebSocket streaming endpoints for production until this caveat is removed.

All WebRTC endpoints are under /stream/transcribe/webrtc/.

POST /stream/transcribe/webrtc/sessions#

Create a session. Returns 201 Created:

json
1
2
3
4
5
6
7
8
{
  "data": {
    "session_id": "...",
    "ice_servers": [{ "urls": ["stun:..."] }],
    "expires_at": "...",
    "control_ws_url": "/v1/modules/scaiecho/stream/transcribe/webrtc/sessions/{id}/control"
  }
}

Body fields: language_hint (2-char), media_type (default audio/wav), backend_preference, chunk_seconds (0.5–60.0), sample_rate (8000–48000), ice_servers (optional tenant-supplied ICE config).

Permission: scaiecho:transcribe.

POST /stream/transcribe/webrtc/sessions/{id}/offer#

Apply client SDP offer, return our answer.

json
1
{ "sdp": "v=0...", "type": "offer" }

Returns { "data": { "sdp": "...", "type": "answer" } }. Errors: SCAIECHO_WEBRTC_UNAVAILABLE (501, aiortc not installed), SCAIECHO_WEBRTC_SESSION_NOT_FOUND (404), SCAIECHO_WEBRTC_SESSION_STATE_LOST (410).

POST /stream/transcribe/webrtc/sessions/{id}/ice-candidates#

Trickle an ICE candidate. Body: { "candidate": "...", "sdp_mid": "...", "sdp_mline_index": 0 }. Returns 204 No Content.

DELETE /stream/transcribe/webrtc/sessions/{id}#

Tear down the peer connection and mark the session closed. Returns 204 No Content.

WS /stream/transcribe/webrtc/sessions/{id}/control#

Control plane for the WebRTC session. Server emits delta JSON frames as the dispatcher produces transcript records. The client can send {"type": "close"} to tear the session down early.

Speakers#

GET /speakers#

List speakers visible to the caller.

Query parameters: language (ISO 639-1), scope (global, tenant, user), enrollment_status (pending, ready, failed, evicted), limit (1–200, default 50).

Permission: scaiecho:enroll.

GET /speakers/{speaker_id}#

Fetch one speaker. Permission: scaiecho:enroll.

POST /speakers#

Enroll a speaker. Multipart upload:

Form field Required Notes
display_name yes Human-readable name.
language_primary yes ISO 639-1.
description no Free text.
consent_user_full_name yes The speaker's legal name.
consent_stated_purpose yes Why this enrollment exists.
consent_text yes The exact text the speaker agreed to.
reference yes Reference audio file.
consent yes Consent recording.

Returns 201 Created with the new speaker profile plus a preflight block, an enrolled_on node-id list, and optionally enroll_errors per node or a note if no pyannote node is online. Permission: scaiecho:enroll.

PATCH /speakers/{speaker_id}#

Update mutable fields. Body: { "display_name": "...", "description": "..." }. Scope and ownership are locked at create. Permission: scaiecho:enroll.

DELETE /speakers/{speaker_id}#

Erase a speaker — Art. 17 fan-out. Deletes blobs, writes an audit row, tombstones the row, evicts the embedding from every replica.

json
1
2
3
4
5
6
7
8
9
{
  "data": {
    "audit_id": "...",
    "speaker_id": "...",
    "blob_bytes_deleted": 480000,
    "error_summary": null,
    "completed_at": "..."
  }
}

Permission: scaiecho:enroll.

GET /speakers/{speaker_id}/warm#

Inspect current enrollment fan-out.

json
1
2
3
4
5
6
7
8
{
  "data": {
    "speaker_id": "...",
    "warm_node_ids": ["..."],
    "candidate_node_ids": ["..."],
    "stale_node_ids": []
  }
}

Permission: scaiecho:enroll.

POST /speakers/{speaker_id}/warm#

Proactive re-enrollment fan-out. Body: { "node_ids": [] } (empty for all candidates). Returns per-node outcomes plus skipped_not_candidate for any requested ids that weren't pyannote candidates. Permission: scaiecho:enroll.

Tenant policy#

GET /tenant-policy#

Read the resolved policy. Lazy-creates a row from tier defaults on first read. Permission: scaiecho:admin.

json
1
2
3
4
5
6
7
8
9
{
  "data": {
    "tenant_id": "...",
    "allowed_backends": "AB",
    "default_backend": "B",
    "created_at": "...",
    "updated_at": "..."
  }
}

PATCH /tenant-policy#

Update allowed set and/or default backend.

json
1
{ "allowed_backends": "AB", "default_backend": "A" }

allowed_backends matches ^(A|B|AB|BA)$. default_backend is A or B and must be in the allowed set. Permission: scaiecho:admin.

Errors#

ScaiEcho returns ScaiGrid's standard error envelope. Specific codes:

Code HTTP Meaning
SCAIECHO_EMPTY_AUDIO 400 The uploaded file was zero bytes.
SCAIECHO_BAD_BACKEND_PREFERENCE 400 backend_preference was not one of the three valid values.
SCAIECHO_TENANT_POLICY_INVALID 400 allowed_backends or default_backend didn't validate (e.g. default not in allowed).
SCAIECHO_SPEAKER_PREFLIGHT_FAILED 400 Reference audio failed quality checks; details in error.preflight.
SCAIECHO_CONSENT_INVALID 400 Consent recording missing or below the minimum byte threshold.
SCAIECHO_FORBIDDEN 403 Permission check failed inside a streaming open.
SCAIECHO_SPEAKER_ACCESS_DENIED 403 Caller cannot operate on this speaker.
SCAIECHO_JOB_NOT_FOUND 404 Async transcribe job id doesn't exist or isn't visible.
SCAIECHO_SPEAKER_NOT_FOUND 404 Speaker id doesn't exist or isn't visible.
SCAIECHO_WEBRTC_SESSION_NOT_FOUND 404 WebRTC session does not exist or has expired.
SCAIECHO_SPEAKER_NO_REFERENCE 409 Speaker has no reference URI — can't warm without re-intake.
SCAIECHO_SPEAKER_NOT_READY_FOR_WARMING 409 Speaker isn't in a state that supports fan-out.
SCAIECHO_WEBRTC_SESSION_STATE_LOST 410 Session state was lost (server restart). Create a new session.
SCAIECHO_NO_USABLE_BACKEND 500 Tenant policy resolved to an empty allowed set.
SCAIECHO_STREAM_FAILED Dispatcher errored mid-stream. Sent as a WS error frame.
SCAIECHO_BACKEND_UNAVAILABLE 502 Selected backend isn't usable right now.
SCAIECHO_WEBRTC_UNAVAILABLE 501 WebRTC support requires aiortc + av in the deployment.

MCP tool#

ScaiEcho registers one tool with the MCP catalog:

  • scaiecho.transcribe — base64 audio in, transcript and metadata out. Backend selection is hidden from MCP callers; tenant policy decides A or B. Required permission: scaiecho:transcribe.
Updated 2026-05-18 15:01:27 View source (.md) rev 12