API reference
All endpoints are mounted at /v1/modules/scaispeak/ and authenticate with the standard ScaiGrid bearer token. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).
Health#
GET /healthz#
Liveness — process is responding. Cheap; no I/O.
GET /readyz#
Readiness — module can serve requests. Returns 200 when the module's upstream dependencies (managed TTS relay, ScaiInfer, Redis) are reachable enough to dispatch.
Voices — read#
GET /voices#
List voices visible to the caller (global + own tenant + own user). Query parameters:
| Parameter | Notes |
|---|---|
language |
2-letter ISO code (en, fr, de...). |
scope |
global, tenant, user. |
gender |
female, male, neutral, unspecified. |
embedding_status |
pending, processing, ready, failed, evicted. |
q |
Free-text search over display_name, description, style_tags. |
limit |
1-200, default 50. |
Permission: scaispeak:voice.read.
GET /voices/{voice_id}#
Fetch one voice's full record. Returns 404 if the voice doesn't exist OR isn't visible to the caller (existence isn't disclosed across scopes).
Voices — write#
POST /voices#
Create (clone) a voice from a reference + consent recording. Multipart form fields:
| Field | Required | Notes |
|---|---|---|
reference |
one of | Multipart file part with the reference audio. |
reference_scaidrive_json |
one of | JSON {file_id, mcp_uri, share_url} pointing at a ScaiDrive file. |
consent |
one of | Multipart file part with the consent audio. |
consent_scaidrive_json |
one of | ScaiDrive reference for the consent recording. |
display_name |
yes | Human-readable label. |
language_primary |
yes | 2-letter ISO code. |
language_supported_json |
no | JSON array of 2-letter codes the voice can speak. |
gender_hint, age_hint, style_tags_json |
no | Library metadata; advisory. |
consent_user_full_name |
yes | Speaker's full name; written to the consent row. |
consent_stated_purpose |
yes | What the cloned voice will be used for; verbatim audit. |
consent_text |
yes | The exact scripted statement the speaker reads in the consent clip. |
description |
no | Free-text description. |
Returns 201 Created with the new voice plus the preflight block. Permission: scaispeak:voice.write.
Errors: SCAISPEAK_VOICE_PREFLIGHT_FAILED (audio rejected), SCAISPEAK_AMBIGUOUS_SOURCE (inline + ScaiDrive for the same file), SCAISPEAK_CONSENT_INVALID (consent audio missing or doesn't match the script).
PATCH /voices/{voice_id}#
Partial update. Settable fields: display_name, description, language_supported, gender_hint, age_hint, style_tags. Scope mutation is not allowed here — use /share. Permission: scaispeak:voice.write.
DELETE /voices/{voice_id}#
Erase the voice (GDPR Art. 17). Tombstones the row, fans out EvictVoice to every warm replica, clears the Redis registry, deletes reference + consent blobs, writes an immutable erasure_audit row.
1 2 3 4 5 6 7 8 9 10 | |
Permission: scaispeak:voice.write.
POST /voices/{voice_id}/share#
Promote a user-scope voice to tenant scope. Permission: scaispeak:voice.share (separate from voice.write so sharing can be granted independently).
POST /voices/{voice_id}/preview#
Render a short preview clip (max 300 chars). Form fields: text, response_format. Uses the same dispatcher as /speak. Permission: scaispeak:voice.read.
POST /voices/{voice_id}/repromote#
Re-run intake processing for a voice. Idempotent — no-op if ready, no-op if already processing. Used to bring legacy voices (created under the previous-generation cloning engine) onto the current zero-shot path. Returns 202 Accepted. Permission: scaispeak:voice.write.
WS /voices/record#
Live-record voice intake — WebSocket alternative to POST /voices. Two-phase: first reference audio frames + phase_complete, then consent audio frames + finalize. Auth via ?token= query or Authorization header. Permission: scaispeak:voice.write.
Speak#
POST /speak#
Batch synthesis. Body:
| Field | Required | Notes |
|---|---|---|
voice_id |
yes | A voice the caller can see. |
text |
yes | Up to ~500 chars sync, longer async. |
language_hint |
no | 2-letter code to disambiguate multilingual voices. |
speed |
no | 0.5–2.0, default 1.0. |
response_format |
no | mp3, opus, wav, flac, aac, pcm. Default mp3. Self-hosted backend currently emits 48 kHz WAV regardless of this field and logs a downgrade warning if the requested format differs — see Troubleshooting. |
backend_preference |
no | prefer_self_hosted, prefer_relay, any. Advisory; tenant policy wins. |
idempotency_key |
no | Caller-supplied retry key for the output cache. |
force_async |
no | Force the job path regardless of text length. |
save_to |
no | ScaiDrive destination block (see below). JWT auth required. |
inline_response |
no | When save_to is set, return audio bytes too (default true). |
instructions |
no | Free-text style guidance (emotion / pace / affect). Example: "cheerful and energetic" or "slowly and carefully". Meaningful for cloned voices; preset speakers and the relay backend ignore this field. |
cfg_value |
no | Cloning-fidelity vs naturalness tradeoff. Range 0.5–5.0. Higher values stay closer to the reference voice at the cost of naturalness. Engine default ~2.0 when omitted. Meaningful for cloned voices only. |
warmup_trim_ms |
no | Strip the first N ms of generated audio to absorb the warm-up artefact at the start of cloned-voice output. Typical: 150. Use 0 to disable. Meaningful for cloned voices only. |
Short text (default ≤500 chars) returns 200 OK with audio_base64 inline. Longer text returns 202 Accepted with job_id — poll /speak/jobs/{job_id}.
save_to block:
1 2 3 4 5 6 | |
Permission: scaispeak:synthesize.
GET /speak/jobs/{job_id}#
Poll an async synth job. Returns status (queued, running, completed, failed), and when complete, audio_base64 inline (for small outputs) or audio_bytes + S3 URI for larger ones. If the job was submitted with save_to, the response also carries save_to.file_id once the upload finishes. Permission: scaispeak:synthesize, scoped to (user, tenant) — you can't poll another user's job by ID guess.
Streaming — WebSocket#
WS /stream/speak#
Real-time TTS over WebSocket. Wire protocol:
| Client → Server | Fields |
|---|---|
{"type":"open"} |
voice_id, language_hint, speed, output.codec, backend_preference |
{"type":"text"} |
delta |
{"type":"flush"} |
— |
{"type":"interrupt"} |
— |
{"type":"close"} |
— |
| Server → Client | Fields |
|---|---|
{"type":"ready"} |
voice_id, backend_used |
| binary frame | audio bytes in the negotiated codec |
{"type":"interrupted"} |
— |
{"type":"closed"} |
stats.chars, stats.backend_used |
{"type":"error"} |
code, message |
Close codes: 4401 unauthorized, 4403 forbidden, 4400 bad request, 4502 backend unavailable, 4500 server error. Auth via ?token= or header. Permission: scaispeak:synthesize.
Streaming — WebRTC#
Status: signalling and lifecycle ship end-to-end. The audio plane (aiortc
MediaStreamTrack.recv) raisesNotImplementedErrortoday — once a peer connection negotiates, no audio drains to the backend. Use the WebSocket streaming endpoints for production until this caveat is removed.
POST /stream/speak/webrtc/sessions#
Create a WebRTC session. Body:
| Field | Notes |
|---|---|
voice_id |
required |
language_hint |
optional 2-letter code |
speed |
0.5–2.0 |
output.codec |
opus or pcm |
output.sample_rate |
8000–48000 |
control.transport |
websocket or datachannel |
ice_servers |
optional tenant-supplied ICE config |
backend_preference |
same vocabulary as /speak |
Returns session_id, ice_servers, expires_at, control_ws_url. Permission: scaispeak:synthesize.
POST /stream/speak/webrtc/sessions/{session_id}/offer#
Apply client SDP offer, return server's SDP answer.
POST /stream/speak/webrtc/sessions/{session_id}/ice-candidates#
Trickle ICE candidate from client. Returns 204 No Content.
DELETE /stream/speak/webrtc/sessions/{session_id}#
Tear down the peer + mark session closed.
WS /stream/speak/webrtc/sessions/{session_id}/control#
Control plane for an active WebRTC session — same text/flush/interrupt/close vocabulary as the WebSocket streaming path, no binary audio frames (audio rides RTP).
Voice warming#
GET /voices/{voice_id}/warm#
Inspect current warm state. Returns warm_node_ids, candidate_node_ids, stale_node_ids. Permission: scaispeak:voice.read.
POST /voices/{voice_id}/warm#
Fan-out PrepareVoice to candidate replicas. Body: { "node_ids": [...] } (empty means "all candidates"). Returns outcomes array with per-node ok, cache_key, load_ms, error. Permission: scaispeak:voice.write.
POST /voices/{voice_id}/evict#
Drop the voice from every currently-warm replica. Always clears the registry. Permission: scaispeak:voice.write.
Tenant policy#
GET /admin/policy#
Read the caller's tenant policy: allowed_backends (subset of ["A","B"]), default_backend. Permission: scaispeak:synthesize — readable by any caller who can synthesise so UIs can show "your tenant routes through Backend B".
PUT /admin/policy#
Update the tenant policy. Body: allowed_backends (string shorthand "A"/"B"/"AB" or list), default_backend. Validation rejects default_backend not in allowed_backends. Permission: scaispeak:admin.
ScaiDrive proxy#
GET /admin/scaidrive/shares#
Read-only forwarding to ScaiDrive — list shares the caller can see. Used by the synth page destination picker. Requires JWT auth (not sgk_). Returns 404 with SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE when ScaiDrive isn't configured in the deployment.
GET /admin/scaidrive/shares/{share_id}/folders#
Lazy-browse folders inside a share. Query: folder_id (omit for the share root). Returns folder children only.
Admin lifecycle#
POST /admin/lifecycle/install#
First-time install hook called by the module-host. Idempotent. SuperAdmin-only.
POST /admin/lifecycle/upgrade#
Version upgrade hook. Idempotent. SuperAdmin-only.
POST /admin/lifecycle/uninstall#
Module uninstall — soft-deletes every non-global voice in the deployment, signals the erasure worker to fan out. Requires confirmation_token + expected_module_id. SuperAdmin-only.
POST /admin/lifecycle/tenant/{tenant_id}/enable#
Per-tenant enable. SuperAdmin-only.
POST /admin/lifecycle/tenant/{tenant_id}/disable#
Per-tenant disable — soft-deletes all the tenant's user + tenant scope voices and signals erasure. Global voices untouched. SuperAdmin-only.
Blocklist + audit#
POST /admin/blocklist#
Add a blocklist entry. Body: scope (tenant, user, voice), target_id, reason, optional expires_at. Permission: scaispeak:admin.
GET /admin/blocklist#
List active blocklist entries. Query: scope, tenant_id, limit. Permission: scaispeak:admin.
DELETE /admin/blocklist/{block_id}#
Remove a blocklist entry (manual unblock). Returns 204 No Content. Permission: scaispeak:admin.
GET /admin/erasure/audit#
List erasure audit rows. Query: tenant_id, voice_id, limit. Returns most-recent-first. Permission: scaispeak:admin.
Global voices (SuperAdmin)#
POST /admin/voices/global#
Create a platform-scope (scope='global') voice — no consent, license-based. SuperAdmin-only. Form fields:
| Field | Required | Notes |
|---|---|---|
reference |
yes | Multipart reference audio. ScaiDrive references not accepted for globals. |
display_name, language_primary |
yes | Same shape as user voices. |
licensor_name |
yes | Who licensed the voice to ScaiLabs. |
license_type |
yes | perpetual, time_bound, usage_bound. |
valid_until |
when time_bound |
ISO-8601 timestamp. |
usage_limit_chars |
when usage_bound |
Integer cap. |
licensor_reference |
no | Contract reference. |
valid_from |
no | ISO-8601 start. |
terms_summary |
no | Operator-facing summary of the terms. |
license_document |
no | Optional PDF; stored alongside the voice. |
Returns the new voice_id, license_id, and intake note.
DELETE /admin/voices/global/{voice_id}#
Revoke a global voice. SuperAdmin-only. Form field trigger (license_revoked, license_expired, platform_decision). Bypasses the owner-equality check that protects user/tenant voices. Updates the license row's status to match the trigger. Runs the full erasure pipeline.
Errors#
All endpoints return ScaiGrid's standard error envelope:
1 2 3 4 5 6 7 8 | |
ScaiSpeak-specific codes:
| Code | Meaning |
|---|---|
SCAISPEAK_VOICE_NOT_FOUND |
Voice id doesn't exist or isn't visible. |
SCAISPEAK_VOICE_ACCESS_DENIED |
Caller can't perform this operation on this voice. |
SCAISPEAK_VOICE_PREFLIGHT_FAILED |
Reference audio failed quality checks. Body includes preflight. |
SCAISPEAK_CONSENT_INVALID |
Consent recording missing or doesn't match the scripted text. |
SCAISPEAK_AMBIGUOUS_SOURCE |
Both inline upload and ScaiDrive reference supplied for the same file. |
SCAISPEAK_VOICE_SHARE_FORBIDDEN |
Only the owner with voice.share can promote to tenant scope. |
SCAISPEAK_BACKEND_UNAVAILABLE |
No allowed backend currently available. |
SCAISPEAK_TENANT_POLICY_INVALID |
Policy update rejected (e.g. default not in allowed set). |
SCAISPEAK_JOB_NOT_FOUND |
Job id doesn't exist or doesn't belong to this caller. |
SCAISPEAK_VOICE_NOT_READY_FOR_WARMING |
Legacy warming path returns this when the voice doesn't have the cached state the previous-gen engine needed. No-op on the current zero-shot engine; safe to ignore for new code. |
SCAISPEAK_SAVE_TO_REQUIRES_JWT |
save_to attempted with sgk_ API key auth. |
SCAISPEAK_SAVE_TO_EXCHANGE_FAILED |
ScaiKey token exchange against ScaiDrive failed. |
SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE |
ScaiDrive integration not configured. |
SCAISPEAK_SCAIDRIVE_FORBIDDEN |
Caller lacks write access on the destination share. |
SCAISPEAK_SCAIDRIVE_NOT_FOUND |
Destination share or folder doesn't exist. |
SCAISPEAK_SCAIDRIVE_CONFLICT |
File exists at destination and overwrite is false. |
SCAISPEAK_SCAIDRIVE_QUOTA_EXCEEDED |
Destination share over quota (HTTP 507). |
SCAISPEAK_LICENSE_FIELD_INVALID |
License-type / bound mismatch on global voice create. |
SCAISPEAK_GLOBAL_VOICE_NOT_FOUND |
Global voice doesn't exist or already deleted. |
SCAISPEAK_BLOCKLIST_NOT_FOUND |
Blocklist entry id doesn't exist. |
SCAISPEAK_UNINSTALL_TOKEN_MISMATCH |
Uninstall hook called without a matching token. |