API reference

All endpoints are mounted at /v1/modules/scaispeak/ and authenticate with the standard ScaiGrid bearer token. Responses use ScaiGrid's standard envelope ({ "data": ... } for success, { "error": ... } for failures).

Health#

`GET /healthz`#

Liveness — process is responding. Cheap; no I/O.

`GET /readyz`#

Readiness — module can serve requests. Returns 200 when the module's upstream dependencies (managed TTS relay, ScaiInfer, Redis) are reachable enough to dispatch.

Voices — read#

`GET /voices`#

List voices visible to the caller (global + own tenant + own user). Query parameters:

Parameter	Notes
`language`	2-letter ISO code (`en`, `fr`, `de`...).
`scope`	`global`, `tenant`, `user`.
`gender`	`female`, `male`, `neutral`, `unspecified`.
`embedding_status`	`pending`, `processing`, `ready`, `failed`, `evicted`.
`q`	Free-text search over `display_name`, `description`, `style_tags`.
`limit`	1-200, default 50.

Permission: scaispeak:voice.read.

`GET /voices/{voice_id}`#

Fetch one voice's full record. Returns 404 if the voice doesn't exist OR isn't visible to the caller (existence isn't disclosed across scopes).

Voices — write#

`POST /voices`#

Create (clone) a voice from a reference + consent recording. Multipart form fields:

Field	Required	Notes
`reference`	one of	Multipart file part with the reference audio.
`reference_scaidrive_json`	one of	JSON `{file_id, mcp_uri, share_url}` pointing at a ScaiDrive file.
`consent`	one of	Multipart file part with the consent audio.
`consent_scaidrive_json`	one of	ScaiDrive reference for the consent recording.
`display_name`	yes	Human-readable label.
`language_primary`	yes	2-letter ISO code.
`language_supported_json`	no	JSON array of 2-letter codes the voice can speak.
`gender_hint`, `age_hint`, `style_tags_json`	no	Library metadata; advisory.
`consent_user_full_name`	yes	Speaker's full name; written to the consent row.
`consent_stated_purpose`	yes	What the cloned voice will be used for; verbatim audit.
`consent_text`	yes	The exact scripted statement the speaker reads in the consent clip.
`description`	no	Free-text description.

Returns 201 Created with the new voice plus the preflight block. Permission: scaispeak:voice.write.

Errors: SCAISPEAK_VOICE_PREFLIGHT_FAILED (audio rejected), SCAISPEAK_AMBIGUOUS_SOURCE (inline + ScaiDrive for the same file), SCAISPEAK_CONSENT_INVALID (consent audio missing or doesn't match the script).

`PATCH /voices/{voice_id}`#

Partial update. Settable fields: display_name, description, language_supported, gender_hint, age_hint, style_tags. Scope mutation is not allowed here — use /share. Permission: scaispeak:voice.write.

`DELETE /voices/{voice_id}`#

Erase the voice (GDPR Art. 17). Tombstones the row, fans out EvictVoice to every warm replica, clears the Redis registry, deletes reference + consent blobs, writes an immutable erasure_audit row.

json
{
  "data": {
    "audit_id": "aud_...",
    "voice_id": "vc_...",
    "warm_replicas_evicted": 3,
    "blob_bytes_deleted": 1240832,
    "error_summary": null,
    "completed_at": "2026-05-17T14:01:00Z"
  }
}

Permission: scaispeak:voice.write.

`POST /voices/{voice_id}/share`#

Promote a user-scope voice to tenant scope. Permission: scaispeak:voice.share (separate from voice.write so sharing can be granted independently).

`POST /voices/{voice_id}/preview`#

Render a short preview clip (max 300 chars). Form fields: text, response_format. Uses the same dispatcher as /speak. Permission: scaispeak:voice.read.

`POST /voices/{voice_id}/repromote`#

Re-run intake processing for a voice. Idempotent — no-op if ready, no-op if already processing. Used to bring legacy voices (created under the previous-generation cloning engine) onto the current zero-shot path. Returns 202 Accepted. Permission: scaispeak:voice.write.

`WS /voices/record`#

Live-record voice intake — WebSocket alternative to POST /voices. Two-phase: first reference audio frames + phase_complete, then consent audio frames + finalize. Auth via ?token= query or Authorization header. Permission: scaispeak:voice.write.

Speak#

`POST /speak`#

Batch synthesis. Body:

Field	Required	Notes
`voice_id`	yes	A voice the caller can see.
`text`	yes	Up to ~500 chars sync, longer async.
`language_hint`	no	2-letter code to disambiguate multilingual voices.
`speed`	no	0.5–2.0, default 1.0.
`response_format`	no	`mp3`, `opus`, `wav`, `flac`, `aac`, `pcm`. Default `mp3`. Self-hosted backend currently emits 48 kHz WAV regardless of this field and logs a downgrade warning if the requested format differs — see Troubleshooting.
`backend_preference`	no	`prefer_self_hosted`, `prefer_relay`, `any`. Advisory; tenant policy wins.
`idempotency_key`	no	Caller-supplied retry key for the output cache.
`force_async`	no	Force the job path regardless of text length.
`save_to`	no	ScaiDrive destination block (see below). JWT auth required.
`inline_response`	no	When `save_to` is set, return audio bytes too (default true).
`instructions`	no	Free-text style guidance (emotion / pace / affect). Example: `"cheerful and energetic"` or `"slowly and carefully"`. Meaningful for cloned voices; preset speakers and the relay backend ignore this field.
`cfg_value`	no	Cloning-fidelity vs naturalness tradeoff. Range 0.5–5.0. Higher values stay closer to the reference voice at the cost of naturalness. Engine default ~2.0 when omitted. Meaningful for cloned voices only.
`warmup_trim_ms`	no	Strip the first N ms of generated audio to absorb the warm-up artefact at the start of cloned-voice output. Typical: 150. Use 0 to disable. Meaningful for cloned voices only.

Short text (default ≤500 chars) returns 200 OK with audio_base64 inline. Longer text returns 202 Accepted with job_id — poll /speak/jobs/{job_id}.

save_to block:

json
{
  "share_id": "shr_xyz",
  "folder_id": "fld_abc",
  "filename": "chapter-01.mp3",
  "overwrite": false
}

Permission: scaispeak:synthesize.

`GET /speak/jobs/{job_id}`#

Poll an async synth job. Returns status (queued, running, completed, failed), and when complete, audio_base64 inline (for small outputs) or audio_bytes + S3 URI for larger ones. If the job was submitted with save_to, the response also carries save_to.file_id once the upload finishes. Permission: scaispeak:synthesize, scoped to (user, tenant) — you can't poll another user's job by ID guess.

Streaming — WebSocket#

`WS /stream/speak`#

Real-time TTS over WebSocket. Wire protocol:

Client → Server	Fields
`{"type":"open"}`	`voice_id`, `language_hint`, `speed`, `output.codec`, `backend_preference`
`{"type":"text"}`	`delta`
`{"type":"flush"}`	—
`{"type":"interrupt"}`	—
`{"type":"close"}`	—

Server → Client	Fields
`{"type":"ready"}`	`voice_id`, `backend_used`
binary frame	audio bytes in the negotiated codec
`{"type":"interrupted"}`	—
`{"type":"closed"}`	`stats.chars`, `stats.backend_used`
`{"type":"error"}`	`code`, `message`

Close codes: 4401 unauthorized, 4403 forbidden, 4400 bad request, 4502 backend unavailable, 4500 server error. Auth via ?token= or header. Permission: scaispeak:synthesize.

Streaming — WebRTC#

Status: signalling and lifecycle ship end-to-end. The audio plane (aiortc MediaStreamTrack.recv) raises NotImplementedError today — once a peer connection negotiates, no audio drains to the backend. Use the WebSocket streaming endpoints for production until this caveat is removed.

`POST /stream/speak/webrtc/sessions`#

Create a WebRTC session. Body:

Field	Notes
`voice_id`	required
`language_hint`	optional 2-letter code
`speed`	0.5–2.0
`output.codec`	`opus` or `pcm`
`output.sample_rate`	8000–48000
`control.transport`	`websocket` or `datachannel`
`ice_servers`	optional tenant-supplied ICE config
`backend_preference`	same vocabulary as `/speak`

Returns session_id, ice_servers, expires_at, control_ws_url. Permission: scaispeak:synthesize.

`POST /stream/speak/webrtc/sessions/{session_id}/offer`#

Apply client SDP offer, return server's SDP answer.

`POST /stream/speak/webrtc/sessions/{session_id}/ice-candidates`#

Trickle ICE candidate from client. Returns 204 No Content.

`DELETE /stream/speak/webrtc/sessions/{session_id}`#

Tear down the peer + mark session closed.

`WS /stream/speak/webrtc/sessions/{session_id}/control`#

Control plane for an active WebRTC session — same text/flush/interrupt/close vocabulary as the WebSocket streaming path, no binary audio frames (audio rides RTP).

Voice warming#

`GET /voices/{voice_id}/warm`#

Inspect current warm state. Returns warm_node_ids, candidate_node_ids, stale_node_ids. Permission: scaispeak:voice.read.

`POST /voices/{voice_id}/warm`#

Fan-out PrepareVoice to candidate replicas. Body: { "node_ids": [...] } (empty means "all candidates"). Returns outcomes array with per-node ok, cache_key, load_ms, error. Permission: scaispeak:voice.write.

`POST /voices/{voice_id}/evict`#

Drop the voice from every currently-warm replica. Always clears the registry. Permission: scaispeak:voice.write.

Tenant policy#

`GET /admin/policy`#

Read the caller's tenant policy: allowed_backends (subset of ["A","B"]), default_backend. Permission: scaispeak:synthesize — readable by any caller who can synthesise so UIs can show "your tenant routes through Backend B".

`PUT /admin/policy`#

Update the tenant policy. Body: allowed_backends (string shorthand "A"/"B"/"AB" or list), default_backend. Validation rejects default_backend not in allowed_backends. Permission: scaispeak:admin.

ScaiDrive proxy#

`GET /admin/scaidrive/shares`#

Read-only forwarding to ScaiDrive — list shares the caller can see. Used by the synth page destination picker. Requires JWT auth (not sgk_). Returns 404 with SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE when ScaiDrive isn't configured in the deployment.

`GET /admin/scaidrive/shares/{share_id}/folders`#

Lazy-browse folders inside a share. Query: folder_id (omit for the share root). Returns folder children only.

Admin lifecycle#

`POST /admin/lifecycle/install`#

First-time install hook called by the module-host. Idempotent. SuperAdmin-only.

`POST /admin/lifecycle/upgrade`#

Version upgrade hook. Idempotent. SuperAdmin-only.

`POST /admin/lifecycle/uninstall`#

Module uninstall — soft-deletes every non-global voice in the deployment, signals the erasure worker to fan out. Requires confirmation_token + expected_module_id. SuperAdmin-only.

`POST /admin/lifecycle/tenant/{tenant_id}/enable`#

Per-tenant enable. SuperAdmin-only.

`POST /admin/lifecycle/tenant/{tenant_id}/disable`#

Per-tenant disable — soft-deletes all the tenant's user + tenant scope voices and signals erasure. Global voices untouched. SuperAdmin-only.

Blocklist + audit#

`POST /admin/blocklist`#

Add a blocklist entry. Body: scope (tenant, user, voice), target_id, reason, optional expires_at. Permission: scaispeak:admin.

`GET /admin/blocklist`#

List active blocklist entries. Query: scope, tenant_id, limit. Permission: scaispeak:admin.

`DELETE /admin/blocklist/{block_id}`#

Remove a blocklist entry (manual unblock). Returns 204 No Content. Permission: scaispeak:admin.

`GET /admin/erasure/audit`#

List erasure audit rows. Query: tenant_id, voice_id, limit. Returns most-recent-first. Permission: scaispeak:admin.

Global voices (SuperAdmin)#

`POST /admin/voices/global`#

Create a platform-scope (scope='global') voice — no consent, license-based. SuperAdmin-only. Form fields:

Field	Required	Notes
`reference`	yes	Multipart reference audio. ScaiDrive references not accepted for globals.
`display_name`, `language_primary`	yes	Same shape as user voices.
`licensor_name`	yes	Who licensed the voice to ScaiLabs.
`license_type`	yes	`perpetual`, `time_bound`, `usage_bound`.
`valid_until`	when `time_bound`	ISO-8601 timestamp.
`usage_limit_chars`	when `usage_bound`	Integer cap.
`licensor_reference`	no	Contract reference.
`valid_from`	no	ISO-8601 start.
`terms_summary`	no	Operator-facing summary of the terms.
`license_document`	no	Optional PDF; stored alongside the voice.

Returns the new voice_id, license_id, and intake note.

`DELETE /admin/voices/global/{voice_id}`#

Revoke a global voice. SuperAdmin-only. Form field trigger (license_revoked, license_expired, platform_decision). Bypasses the owner-equality check that protects user/tenant voices. Updates the license row's status to match the trigger. Runs the full erasure pipeline.

Errors#

All endpoints return ScaiGrid's standard error envelope:

json
{
  "error": {
    "code": "SCAISPEAK_VOICE_NOT_FOUND",
    "message": "Voice does not exist or isn't visible to the caller",
    "details": { "voice_id": "vc_..." }
  },
  "meta": { "request_id": "req_..." }
}

ScaiSpeak-specific codes:

Code	Meaning
`SCAISPEAK_VOICE_NOT_FOUND`	Voice id doesn't exist or isn't visible.
`SCAISPEAK_VOICE_ACCESS_DENIED`	Caller can't perform this operation on this voice.
`SCAISPEAK_VOICE_PREFLIGHT_FAILED`	Reference audio failed quality checks. Body includes `preflight`.
`SCAISPEAK_CONSENT_INVALID`	Consent recording missing or doesn't match the scripted text.
`SCAISPEAK_AMBIGUOUS_SOURCE`	Both inline upload and ScaiDrive reference supplied for the same file.
`SCAISPEAK_VOICE_SHARE_FORBIDDEN`	Only the owner with `voice.share` can promote to tenant scope.
`SCAISPEAK_BACKEND_UNAVAILABLE`	No allowed backend currently available.
`SCAISPEAK_TENANT_POLICY_INVALID`	Policy update rejected (e.g. default not in allowed set).
`SCAISPEAK_JOB_NOT_FOUND`	Job id doesn't exist or doesn't belong to this caller.
`SCAISPEAK_VOICE_NOT_READY_FOR_WARMING`	Legacy warming path returns this when the voice doesn't have the cached state the previous-gen engine needed. No-op on the current zero-shot engine; safe to ignore for new code.
`SCAISPEAK_SAVE_TO_REQUIRES_JWT`	save_to attempted with `sgk_` API key auth.
`SCAISPEAK_SAVE_TO_EXCHANGE_FAILED`	ScaiKey token exchange against ScaiDrive failed.
`SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE`	ScaiDrive integration not configured.
`SCAISPEAK_SCAIDRIVE_FORBIDDEN`	Caller lacks write access on the destination share.
`SCAISPEAK_SCAIDRIVE_NOT_FOUND`	Destination share or folder doesn't exist.
`SCAISPEAK_SCAIDRIVE_CONFLICT`	File exists at destination and `overwrite` is false.
`SCAISPEAK_SCAIDRIVE_QUOTA_EXCEEDED`	Destination share over quota (HTTP 507).
`SCAISPEAK_LICENSE_FIELD_INVALID`	License-type / bound mismatch on global voice create.
`SCAISPEAK_GLOBAL_VOICE_NOT_FOUND`	Global voice doesn't exist or already deleted.
`SCAISPEAK_BLOCKLIST_NOT_FOUND`	Blocklist entry id doesn't exist.
`SCAISPEAK_UNINSTALL_TOKEN_MISMATCH`	Uninstall hook called without a matching token.

API reference

Health#

GET /healthz#

GET /readyz#

Voices — read#

GET /voices#

GET /voices/{voice_id}#

Voices — write#

POST /voices#

PATCH /voices/{voice_id}#

DELETE /voices/{voice_id}#

POST /voices/{voice_id}/share#

POST /voices/{voice_id}/preview#

POST /voices/{voice_id}/repromote#

WS /voices/record#

Speak#

POST /speak#

GET /speak/jobs/{job_id}#

Streaming — WebSocket#

WS /stream/speak#

Streaming — WebRTC#

POST /stream/speak/webrtc/sessions#

POST /stream/speak/webrtc/sessions/{session_id}/offer#

POST /stream/speak/webrtc/sessions/{session_id}/ice-candidates#

DELETE /stream/speak/webrtc/sessions/{session_id}#

WS /stream/speak/webrtc/sessions/{session_id}/control#