Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

API reference

All endpoints are mounted at /v1/modules/scaispeak/ and authenticate with the standard ScaiGrid bearer token. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).

Health#

GET /healthz#

Liveness — process is responding. Cheap; no I/O.

GET /readyz#

Readiness — module can serve requests. Returns 200 when the module's upstream dependencies (managed TTS relay, ScaiInfer, Redis) are reachable enough to dispatch.

Voices — read#

GET /voices#

List voices visible to the caller (global + own tenant + own user). Query parameters:

Parameter Notes
language 2-letter ISO code (en, fr, de...).
scope global, tenant, user.
gender female, male, neutral, unspecified.
embedding_status pending, processing, ready, failed, evicted.
q Free-text search over display_name, description, style_tags.
limit 1-200, default 50.

Permission: scaispeak:voice.read.

GET /voices/{voice_id}#

Fetch one voice's full record. Returns 404 if the voice doesn't exist OR isn't visible to the caller (existence isn't disclosed across scopes).

Voices — write#

POST /voices#

Create (clone) a voice from a reference + consent recording. Multipart form fields:

Field Required Notes
reference one of Multipart file part with the reference audio.
reference_scaidrive_json one of JSON {file_id, mcp_uri, share_url} pointing at a ScaiDrive file.
consent one of Multipart file part with the consent audio.
consent_scaidrive_json one of ScaiDrive reference for the consent recording.
display_name yes Human-readable label.
language_primary yes 2-letter ISO code.
language_supported_json no JSON array of 2-letter codes the voice can speak.
gender_hint, age_hint, style_tags_json no Library metadata; advisory.
consent_user_full_name yes Speaker's full name; written to the consent row.
consent_stated_purpose yes What the cloned voice will be used for; verbatim audit.
consent_text yes The exact scripted statement the speaker reads in the consent clip.
description no Free-text description.

Returns 201 Created with the new voice plus the preflight block. Permission: scaispeak:voice.write.

Errors: SCAISPEAK_VOICE_PREFLIGHT_FAILED (audio rejected), SCAISPEAK_AMBIGUOUS_SOURCE (inline + ScaiDrive for the same file), SCAISPEAK_CONSENT_INVALID (consent audio missing or doesn't match the script).

PATCH /voices/{voice_id}#

Partial update. Settable fields: display_name, description, language_supported, gender_hint, age_hint, style_tags. Scope mutation is not allowed here — use /share. Permission: scaispeak:voice.write.

DELETE /voices/{voice_id}#

Erase the voice (GDPR Art. 17). Tombstones the row, fans out EvictVoice to every warm replica, clears the Redis registry, deletes reference + consent blobs, writes an immutable erasure_audit row.

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "data": {
    "audit_id": "aud_...",
    "voice_id": "vc_...",
    "warm_replicas_evicted": 3,
    "blob_bytes_deleted": 1240832,
    "error_summary": null,
    "completed_at": "2026-05-17T14:01:00Z"
  }
}

Permission: scaispeak:voice.write.

POST /voices/{voice_id}/share#

Promote a user-scope voice to tenant scope. Permission: scaispeak:voice.share (separate from voice.write so sharing can be granted independently).

POST /voices/{voice_id}/preview#

Render a short preview clip (max 300 chars). Form fields: text, response_format. Uses the same dispatcher as /speak. Permission: scaispeak:voice.read.

POST /voices/{voice_id}/repromote#

Re-run intake processing for a voice. Idempotent — no-op if ready, no-op if already processing. Used to bring legacy voices (created under the previous-generation cloning engine) onto the current zero-shot path. Returns 202 Accepted. Permission: scaispeak:voice.write.

WS /voices/record#

Live-record voice intake — WebSocket alternative to POST /voices. Two-phase: first reference audio frames + phase_complete, then consent audio frames + finalize. Auth via ?token= query or Authorization header. Permission: scaispeak:voice.write.

Speak#

POST /speak#

Batch synthesis. Body:

Field Required Notes
voice_id yes A voice the caller can see.
text yes Up to ~500 chars sync, longer async.
language_hint no 2-letter code to disambiguate multilingual voices.
speed no 0.5–2.0, default 1.0.
response_format no mp3, opus, wav, flac, aac, pcm. Default mp3. Self-hosted backend currently emits 48 kHz WAV regardless of this field and logs a downgrade warning if the requested format differs — see Troubleshooting.
backend_preference no prefer_self_hosted, prefer_relay, any. Advisory; tenant policy wins.
idempotency_key no Caller-supplied retry key for the output cache.
force_async no Force the job path regardless of text length.
save_to no ScaiDrive destination block (see below). JWT auth required.
inline_response no When save_to is set, return audio bytes too (default true).
instructions no Free-text style guidance (emotion / pace / affect). Example: "cheerful and energetic" or "slowly and carefully". Meaningful for cloned voices; preset speakers and the relay backend ignore this field.
cfg_value no Cloning-fidelity vs naturalness tradeoff. Range 0.5–5.0. Higher values stay closer to the reference voice at the cost of naturalness. Engine default ~2.0 when omitted. Meaningful for cloned voices only.
warmup_trim_ms no Strip the first N ms of generated audio to absorb the warm-up artefact at the start of cloned-voice output. Typical: 150. Use 0 to disable. Meaningful for cloned voices only.

Short text (default ≤500 chars) returns 200 OK with audio_base64 inline. Longer text returns 202 Accepted with job_id — poll /speak/jobs/{job_id}.

save_to block:

json
1
2
3
4
5
6
{
  "share_id": "shr_xyz",
  "folder_id": "fld_abc",
  "filename": "chapter-01.mp3",
  "overwrite": false
}

Permission: scaispeak:synthesize.

GET /speak/jobs/{job_id}#

Poll an async synth job. Returns status (queued, running, completed, failed), and when complete, audio_base64 inline (for small outputs) or audio_bytes + S3 URI for larger ones. If the job was submitted with save_to, the response also carries save_to.file_id once the upload finishes. Permission: scaispeak:synthesize, scoped to (user, tenant) — you can't poll another user's job by ID guess.

Streaming — WebSocket#

WS /stream/speak#

Real-time TTS over WebSocket. Wire protocol:

Client → Server Fields
{"type":"open"} voice_id, language_hint, speed, output.codec, backend_preference
{"type":"text"} delta
{"type":"flush"}
{"type":"interrupt"}
{"type":"close"}
Server → Client Fields
{"type":"ready"} voice_id, backend_used
binary frame audio bytes in the negotiated codec
{"type":"interrupted"}
{"type":"closed"} stats.chars, stats.backend_used
{"type":"error"} code, message

Close codes: 4401 unauthorized, 4403 forbidden, 4400 bad request, 4502 backend unavailable, 4500 server error. Auth via ?token= or header. Permission: scaispeak:synthesize.

Streaming — WebRTC#

Status: signalling and lifecycle ship end-to-end. The audio plane (aiortc MediaStreamTrack.recv) raises NotImplementedError today — once a peer connection negotiates, no audio drains to the backend. Use the WebSocket streaming endpoints for production until this caveat is removed.

POST /stream/speak/webrtc/sessions#

Create a WebRTC session. Body:

Field Notes
voice_id required
language_hint optional 2-letter code
speed 0.5–2.0
output.codec opus or pcm
output.sample_rate 8000–48000
control.transport websocket or datachannel
ice_servers optional tenant-supplied ICE config
backend_preference same vocabulary as /speak

Returns session_id, ice_servers, expires_at, control_ws_url. Permission: scaispeak:synthesize.

POST /stream/speak/webrtc/sessions/{session_id}/offer#

Apply client SDP offer, return server's SDP answer.

POST /stream/speak/webrtc/sessions/{session_id}/ice-candidates#

Trickle ICE candidate from client. Returns 204 No Content.

DELETE /stream/speak/webrtc/sessions/{session_id}#

Tear down the peer + mark session closed.

WS /stream/speak/webrtc/sessions/{session_id}/control#

Control plane for an active WebRTC session — same text/flush/interrupt/close vocabulary as the WebSocket streaming path, no binary audio frames (audio rides RTP).

Voice warming#

GET /voices/{voice_id}/warm#

Inspect current warm state. Returns warm_node_ids, candidate_node_ids, stale_node_ids. Permission: scaispeak:voice.read.

POST /voices/{voice_id}/warm#

Fan-out PrepareVoice to candidate replicas. Body: { "node_ids": [...] } (empty means "all candidates"). Returns outcomes array with per-node ok, cache_key, load_ms, error. Permission: scaispeak:voice.write.

POST /voices/{voice_id}/evict#

Drop the voice from every currently-warm replica. Always clears the registry. Permission: scaispeak:voice.write.

Tenant policy#

GET /admin/policy#

Read the caller's tenant policy: allowed_backends (subset of ["A","B"]), default_backend. Permission: scaispeak:synthesize — readable by any caller who can synthesise so UIs can show "your tenant routes through Backend B".

PUT /admin/policy#

Update the tenant policy. Body: allowed_backends (string shorthand "A"/"B"/"AB" or list), default_backend. Validation rejects default_backend not in allowed_backends. Permission: scaispeak:admin.

ScaiDrive proxy#

GET /admin/scaidrive/shares#

Read-only forwarding to ScaiDrive — list shares the caller can see. Used by the synth page destination picker. Requires JWT auth (not sgk_). Returns 404 with SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE when ScaiDrive isn't configured in the deployment.

GET /admin/scaidrive/shares/{share_id}/folders#

Lazy-browse folders inside a share. Query: folder_id (omit for the share root). Returns folder children only.

Admin lifecycle#

POST /admin/lifecycle/install#

First-time install hook called by the module-host. Idempotent. SuperAdmin-only.

POST /admin/lifecycle/upgrade#

Version upgrade hook. Idempotent. SuperAdmin-only.

POST /admin/lifecycle/uninstall#

Module uninstall — soft-deletes every non-global voice in the deployment, signals the erasure worker to fan out. Requires confirmation_token + expected_module_id. SuperAdmin-only.

POST /admin/lifecycle/tenant/{tenant_id}/enable#

Per-tenant enable. SuperAdmin-only.

POST /admin/lifecycle/tenant/{tenant_id}/disable#

Per-tenant disable — soft-deletes all the tenant's user + tenant scope voices and signals erasure. Global voices untouched. SuperAdmin-only.

Blocklist + audit#

POST /admin/blocklist#

Add a blocklist entry. Body: scope (tenant, user, voice), target_id, reason, optional expires_at. Permission: scaispeak:admin.

GET /admin/blocklist#

List active blocklist entries. Query: scope, tenant_id, limit. Permission: scaispeak:admin.

DELETE /admin/blocklist/{block_id}#

Remove a blocklist entry (manual unblock). Returns 204 No Content. Permission: scaispeak:admin.

GET /admin/erasure/audit#

List erasure audit rows. Query: tenant_id, voice_id, limit. Returns most-recent-first. Permission: scaispeak:admin.

Global voices (SuperAdmin)#

POST /admin/voices/global#

Create a platform-scope (scope='global') voice — no consent, license-based. SuperAdmin-only. Form fields:

Field Required Notes
reference yes Multipart reference audio. ScaiDrive references not accepted for globals.
display_name, language_primary yes Same shape as user voices.
licensor_name yes Who licensed the voice to ScaiLabs.
license_type yes perpetual, time_bound, usage_bound.
valid_until when time_bound ISO-8601 timestamp.
usage_limit_chars when usage_bound Integer cap.
licensor_reference no Contract reference.
valid_from no ISO-8601 start.
terms_summary no Operator-facing summary of the terms.
license_document no Optional PDF; stored alongside the voice.

Returns the new voice_id, license_id, and intake note.

DELETE /admin/voices/global/{voice_id}#

Revoke a global voice. SuperAdmin-only. Form field trigger (license_revoked, license_expired, platform_decision). Bypasses the owner-equality check that protects user/tenant voices. Updates the license row's status to match the trigger. Runs the full erasure pipeline.

Errors#

All endpoints return ScaiGrid's standard error envelope:

json
1
2
3
4
5
6
7
8
{
  "error": {
    "code": "SCAISPEAK_VOICE_NOT_FOUND",
    "message": "Voice does not exist or isn't visible to the caller",
    "details": { "voice_id": "vc_..." }
  },
  "meta": { "request_id": "req_..." }
}

ScaiSpeak-specific codes:

Code Meaning
SCAISPEAK_VOICE_NOT_FOUND Voice id doesn't exist or isn't visible.
SCAISPEAK_VOICE_ACCESS_DENIED Caller can't perform this operation on this voice.
SCAISPEAK_VOICE_PREFLIGHT_FAILED Reference audio failed quality checks. Body includes preflight.
SCAISPEAK_CONSENT_INVALID Consent recording missing or doesn't match the scripted text.
SCAISPEAK_AMBIGUOUS_SOURCE Both inline upload and ScaiDrive reference supplied for the same file.
SCAISPEAK_VOICE_SHARE_FORBIDDEN Only the owner with voice.share can promote to tenant scope.
SCAISPEAK_BACKEND_UNAVAILABLE No allowed backend currently available.
SCAISPEAK_TENANT_POLICY_INVALID Policy update rejected (e.g. default not in allowed set).
SCAISPEAK_JOB_NOT_FOUND Job id doesn't exist or doesn't belong to this caller.
SCAISPEAK_VOICE_NOT_READY_FOR_WARMING Legacy warming path returns this when the voice doesn't have the cached state the previous-gen engine needed. No-op on the current zero-shot engine; safe to ignore for new code.
SCAISPEAK_SAVE_TO_REQUIRES_JWT save_to attempted with sgk_ API key auth.
SCAISPEAK_SAVE_TO_EXCHANGE_FAILED ScaiKey token exchange against ScaiDrive failed.
SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE ScaiDrive integration not configured.
SCAISPEAK_SCAIDRIVE_FORBIDDEN Caller lacks write access on the destination share.
SCAISPEAK_SCAIDRIVE_NOT_FOUND Destination share or folder doesn't exist.
SCAISPEAK_SCAIDRIVE_CONFLICT File exists at destination and overwrite is false.
SCAISPEAK_SCAIDRIVE_QUOTA_EXCEEDED Destination share over quota (HTTP 507).
SCAISPEAK_LICENSE_FIELD_INVALID License-type / bound mismatch on global voice create.
SCAISPEAK_GLOBAL_VOICE_NOT_FOUND Global voice doesn't exist or already deleted.
SCAISPEAK_BLOCKLIST_NOT_FOUND Blocklist entry id doesn't exist.
SCAISPEAK_UNINSTALL_TOKEN_MISMATCH Uninstall hook called without a matching token.
Updated 2026-05-22 14:27:32 View source (.md) rev 13