---
summary: "Every ScaiSpeak endpoint \u2014 voices, speak, streaming, WebRTC, voice\
  \ warming, tenant policy, admin, blocklist, global voices."
title: API reference
path: reference/api
status: published
---

All endpoints are mounted at `/v1/modules/scaispeak/` and authenticate with the standard ScaiGrid bearer token. Responses use ScaiGrid's standard envelope (`{ "data": ... }` for success, `{ "error": ... }` for failures).

## Health

### `GET /healthz`

Liveness — process is responding. Cheap; no I/O.

### `GET /readyz`

Readiness — module can serve requests. Returns 200 when the module's upstream dependencies (managed TTS relay, ScaiInfer, Redis) are reachable enough to dispatch.

## Voices — read

### `GET /voices`

List voices visible to the caller (global + own tenant + own user). Query parameters:

| Parameter | Notes |
|---|---|
| `language` | 2-letter ISO code (`en`, `fr`, `de`...). |
| `scope` | `global`, `tenant`, `user`. |
| `gender` | `female`, `male`, `neutral`, `unspecified`. |
| `embedding_status` | `pending`, `processing`, `ready`, `failed`, `evicted`. |
| `q` | Free-text search over `display_name`, `description`, `style_tags`. |
| `limit` | 1-200, default 50. |

Permission: `scaispeak:voice.read`.

### `GET /voices/{voice_id}`

Fetch one voice's full record. Returns 404 if the voice doesn't exist OR isn't visible to the caller (existence isn't disclosed across scopes).

## Voices — write

### `POST /voices`

Create (clone) a voice from a reference + consent recording. Multipart form fields:

| Field | Required | Notes |
|---|---|---|
| `reference` | one of | Multipart file part with the reference audio. |
| `reference_scaidrive_json` | one of | JSON `{file_id, mcp_uri, share_url}` pointing at a ScaiDrive file. |
| `consent` | one of | Multipart file part with the consent audio. |
| `consent_scaidrive_json` | one of | ScaiDrive reference for the consent recording. |
| `display_name` | yes | Human-readable label. |
| `language_primary` | yes | 2-letter ISO code. |
| `language_supported_json` | no | JSON array of 2-letter codes the voice can speak. |
| `gender_hint`, `age_hint`, `style_tags_json` | no | Library metadata; advisory. |
| `consent_user_full_name` | yes | Speaker's full name; written to the consent row. |
| `consent_stated_purpose` | yes | What the cloned voice will be used for; verbatim audit. |
| `consent_text` | yes | The exact scripted statement the speaker reads in the consent clip. |
| `description` | no | Free-text description. |

Returns `201 Created` with the new voice plus the `preflight` block. Permission: `scaispeak:voice.write`.

Errors: `SCAISPEAK_VOICE_PREFLIGHT_FAILED` (audio rejected), `SCAISPEAK_AMBIGUOUS_SOURCE` (inline + ScaiDrive for the same file), `SCAISPEAK_CONSENT_INVALID` (consent audio missing or doesn't match the script).

### `PATCH /voices/{voice_id}`

Partial update. Settable fields: `display_name`, `description`, `language_supported`, `gender_hint`, `age_hint`, `style_tags`. Scope mutation is not allowed here — use `/share`. Permission: `scaispeak:voice.write`.

### `DELETE /voices/{voice_id}`

Erase the voice (GDPR Art. 17). Tombstones the row, fans out `EvictVoice` to every warm replica, clears the Redis registry, deletes reference + consent blobs, writes an immutable `erasure_audit` row.

```json
{
  "data": {
    "audit_id": "aud_...",
    "voice_id": "vc_...",
    "warm_replicas_evicted": 3,
    "blob_bytes_deleted": 1240832,
    "error_summary": null,
    "completed_at": "2026-05-17T14:01:00Z"
  }
}
```

Permission: `scaispeak:voice.write`.

### `POST /voices/{voice_id}/share`

Promote a user-scope voice to tenant scope. Permission: `scaispeak:voice.share` (separate from `voice.write` so sharing can be granted independently).

### `POST /voices/{voice_id}/preview`

Render a short preview clip (max 300 chars). Form fields: `text`, `response_format`. Uses the same dispatcher as `/speak`. Permission: `scaispeak:voice.read`.

### `POST /voices/{voice_id}/repromote`

Re-run intake processing for a voice. Idempotent — no-op if `ready`, no-op if already `processing`. Used to bring legacy voices (created under the previous-generation cloning engine) onto the current zero-shot path. Returns `202 Accepted`. Permission: `scaispeak:voice.write`.

### `WS /voices/record`

Live-record voice intake — WebSocket alternative to `POST /voices`. Two-phase: first reference audio frames + `phase_complete`, then consent audio frames + `finalize`. Auth via `?token=` query or `Authorization` header. Permission: `scaispeak:voice.write`.

## Speak

### `POST /speak`

Batch synthesis. Body:

| Field | Required | Notes |
|---|---|---|
| `voice_id` | yes | A voice the caller can see. |
| `text` | yes | Up to ~500 chars sync, longer async. |
| `language_hint` | no | 2-letter code to disambiguate multilingual voices. |
| `speed` | no | 0.5–2.0, default 1.0. |
| `response_format` | no | `mp3`, `opus`, `wav`, `flac`, `aac`, `pcm`. Default `mp3`. Self-hosted backend currently emits 48 kHz WAV regardless of this field and logs a downgrade warning if the requested format differs — see Troubleshooting. |
| `backend_preference` | no | `prefer_self_hosted`, `prefer_relay`, `any`. Advisory; tenant policy wins. |
| `idempotency_key` | no | Caller-supplied retry key for the output cache. |
| `force_async` | no | Force the job path regardless of text length. |
| `save_to` | no | ScaiDrive destination block (see below). JWT auth required. |
| `inline_response` | no | When `save_to` is set, return audio bytes too (default true). |
| `instructions` | no | Free-text style guidance (emotion / pace / affect). Example: `"cheerful and energetic"` or `"slowly and carefully"`. Meaningful for cloned voices; preset speakers and the relay backend ignore this field. |
| `cfg_value` | no | Cloning-fidelity vs naturalness tradeoff. Range 0.5–5.0. Higher values stay closer to the reference voice at the cost of naturalness. Engine default ~2.0 when omitted. Meaningful for cloned voices only. |
| `warmup_trim_ms` | no | Strip the first N ms of generated audio to absorb the warm-up artefact at the start of cloned-voice output. Typical: 150. Use 0 to disable. Meaningful for cloned voices only. |

Short text (default ≤500 chars) returns `200 OK` with `audio_base64` inline. Longer text returns `202 Accepted` with `job_id` — poll `/speak/jobs/{job_id}`.

`save_to` block:

```json
{
  "share_id": "shr_xyz",
  "folder_id": "fld_abc",
  "filename": "chapter-01.mp3",
  "overwrite": false
}
```

Permission: `scaispeak:synthesize`.

### `GET /speak/jobs/{job_id}`

Poll an async synth job. Returns `status` (`queued`, `running`, `completed`, `failed`), and when complete, `audio_base64` inline (for small outputs) or `audio_bytes` + S3 URI for larger ones. If the job was submitted with `save_to`, the response also carries `save_to.file_id` once the upload finishes. Permission: `scaispeak:synthesize`, scoped to (user, tenant) — you can't poll another user's job by ID guess.

## Streaming — WebSocket

### `WS /stream/speak`

Real-time TTS over WebSocket. Wire protocol:

| Client → Server | Fields |
|---|---|
| `{"type":"open"}` | `voice_id`, `language_hint`, `speed`, `output.codec`, `backend_preference` |
| `{"type":"text"}` | `delta` |
| `{"type":"flush"}` | — |
| `{"type":"interrupt"}` | — |
| `{"type":"close"}` | — |

| Server → Client | Fields |
|---|---|
| `{"type":"ready"}` | `voice_id`, `backend_used` |
| binary frame | audio bytes in the negotiated codec |
| `{"type":"interrupted"}` | — |
| `{"type":"closed"}` | `stats.chars`, `stats.backend_used` |
| `{"type":"error"}` | `code`, `message` |

Close codes: `4401` unauthorized, `4403` forbidden, `4400` bad request, `4502` backend unavailable, `4500` server error. Auth via `?token=` or header. Permission: `scaispeak:synthesize`.

## Streaming — WebRTC

> **Status:** signalling and lifecycle ship end-to-end. The audio plane (aiortc `MediaStreamTrack.recv`) raises `NotImplementedError` today — once a peer connection negotiates, no audio drains to the backend. Use the WebSocket streaming endpoints for production until this caveat is removed.

### `POST /stream/speak/webrtc/sessions`

Create a WebRTC session. Body:

| Field | Notes |
|---|---|
| `voice_id` | required |
| `language_hint` | optional 2-letter code |
| `speed` | 0.5–2.0 |
| `output.codec` | `opus` or `pcm` |
| `output.sample_rate` | 8000–48000 |
| `control.transport` | `websocket` or `datachannel` |
| `ice_servers` | optional tenant-supplied ICE config |
| `backend_preference` | same vocabulary as `/speak` |

Returns `session_id`, `ice_servers`, `expires_at`, `control_ws_url`. Permission: `scaispeak:synthesize`.

### `POST /stream/speak/webrtc/sessions/{session_id}/offer`

Apply client SDP offer, return server's SDP answer.

### `POST /stream/speak/webrtc/sessions/{session_id}/ice-candidates`

Trickle ICE candidate from client. Returns `204 No Content`.

### `DELETE /stream/speak/webrtc/sessions/{session_id}`

Tear down the peer + mark session closed.

### `WS /stream/speak/webrtc/sessions/{session_id}/control`

Control plane for an active WebRTC session — same text/flush/interrupt/close vocabulary as the WebSocket streaming path, no binary audio frames (audio rides RTP).

## Voice warming

### `GET /voices/{voice_id}/warm`

Inspect current warm state. Returns `warm_node_ids`, `candidate_node_ids`, `stale_node_ids`. Permission: `scaispeak:voice.read`.

### `POST /voices/{voice_id}/warm`

Fan-out `PrepareVoice` to candidate replicas. Body: `{ "node_ids": [...] }` (empty means "all candidates"). Returns `outcomes` array with per-node `ok`, `cache_key`, `load_ms`, `error`. Permission: `scaispeak:voice.write`.

### `POST /voices/{voice_id}/evict`

Drop the voice from every currently-warm replica. Always clears the registry. Permission: `scaispeak:voice.write`.

## Tenant policy

### `GET /admin/policy`

Read the caller's tenant policy: `allowed_backends` (subset of `["A","B"]`), `default_backend`. Permission: `scaispeak:synthesize` — readable by any caller who can synthesise so UIs can show "your tenant routes through Backend B".

### `PUT /admin/policy`

Update the tenant policy. Body: `allowed_backends` (string shorthand `"A"`/`"B"`/`"AB"` or list), `default_backend`. Validation rejects `default_backend` not in `allowed_backends`. Permission: `scaispeak:admin`.

## ScaiDrive proxy

### `GET /admin/scaidrive/shares`

Read-only forwarding to ScaiDrive — list shares the caller can see. Used by the synth page destination picker. Requires JWT auth (not `sgk_`). Returns 404 with `SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE` when ScaiDrive isn't configured in the deployment.

### `GET /admin/scaidrive/shares/{share_id}/folders`

Lazy-browse folders inside a share. Query: `folder_id` (omit for the share root). Returns folder children only.

## Admin lifecycle

### `POST /admin/lifecycle/install`

First-time install hook called by the module-host. Idempotent. SuperAdmin-only.

### `POST /admin/lifecycle/upgrade`

Version upgrade hook. Idempotent. SuperAdmin-only.

### `POST /admin/lifecycle/uninstall`

Module uninstall — soft-deletes every non-global voice in the deployment, signals the erasure worker to fan out. Requires `confirmation_token` + `expected_module_id`. SuperAdmin-only.

### `POST /admin/lifecycle/tenant/{tenant_id}/enable`

Per-tenant enable. SuperAdmin-only.

### `POST /admin/lifecycle/tenant/{tenant_id}/disable`

Per-tenant disable — soft-deletes all the tenant's user + tenant scope voices and signals erasure. Global voices untouched. SuperAdmin-only.

## Blocklist + audit

### `POST /admin/blocklist`

Add a blocklist entry. Body: `scope` (`tenant`, `user`, `voice`), `target_id`, `reason`, optional `expires_at`. Permission: `scaispeak:admin`.

### `GET /admin/blocklist`

List active blocklist entries. Query: `scope`, `tenant_id`, `limit`. Permission: `scaispeak:admin`.

### `DELETE /admin/blocklist/{block_id}`

Remove a blocklist entry (manual unblock). Returns `204 No Content`. Permission: `scaispeak:admin`.

### `GET /admin/erasure/audit`

List erasure audit rows. Query: `tenant_id`, `voice_id`, `limit`. Returns most-recent-first. Permission: `scaispeak:admin`.

## Global voices (SuperAdmin)

### `POST /admin/voices/global`

Create a platform-scope (`scope='global'`) voice — no consent, license-based. SuperAdmin-only. Form fields:

| Field | Required | Notes |
|---|---|---|
| `reference` | yes | Multipart reference audio. ScaiDrive references not accepted for globals. |
| `display_name`, `language_primary` | yes | Same shape as user voices. |
| `licensor_name` | yes | Who licensed the voice to ScaiLabs. |
| `license_type` | yes | `perpetual`, `time_bound`, `usage_bound`. |
| `valid_until` | when `time_bound` | ISO-8601 timestamp. |
| `usage_limit_chars` | when `usage_bound` | Integer cap. |
| `licensor_reference` | no | Contract reference. |
| `valid_from` | no | ISO-8601 start. |
| `terms_summary` | no | Operator-facing summary of the terms. |
| `license_document` | no | Optional PDF; stored alongside the voice. |

Returns the new `voice_id`, `license_id`, and intake note.

### `DELETE /admin/voices/global/{voice_id}`

Revoke a global voice. SuperAdmin-only. Form field `trigger` (`license_revoked`, `license_expired`, `platform_decision`). Bypasses the owner-equality check that protects user/tenant voices. Updates the license row's status to match the trigger. Runs the full erasure pipeline.

## Errors

All endpoints return ScaiGrid's standard error envelope:

```json
{
  "error": {
    "code": "SCAISPEAK_VOICE_NOT_FOUND",
    "message": "Voice does not exist or isn't visible to the caller",
    "details": { "voice_id": "vc_..." }
  },
  "meta": { "request_id": "req_..." }
}
```

ScaiSpeak-specific codes:

| Code | Meaning |
|---|---|
| `SCAISPEAK_VOICE_NOT_FOUND` | Voice id doesn't exist or isn't visible. |
| `SCAISPEAK_VOICE_ACCESS_DENIED` | Caller can't perform this operation on this voice. |
| `SCAISPEAK_VOICE_PREFLIGHT_FAILED` | Reference audio failed quality checks. Body includes `preflight`. |
| `SCAISPEAK_CONSENT_INVALID` | Consent recording missing or doesn't match the scripted text. |
| `SCAISPEAK_AMBIGUOUS_SOURCE` | Both inline upload and ScaiDrive reference supplied for the same file. |
| `SCAISPEAK_VOICE_SHARE_FORBIDDEN` | Only the owner with `voice.share` can promote to tenant scope. |
| `SCAISPEAK_BACKEND_UNAVAILABLE` | No allowed backend currently available. |
| `SCAISPEAK_TENANT_POLICY_INVALID` | Policy update rejected (e.g. default not in allowed set). |
| `SCAISPEAK_JOB_NOT_FOUND` | Job id doesn't exist or doesn't belong to this caller. |
| `SCAISPEAK_VOICE_NOT_READY_FOR_WARMING` | Legacy warming path returns this when the voice doesn't have the cached state the previous-gen engine needed. No-op on the current zero-shot engine; safe to ignore for new code. |
| `SCAISPEAK_SAVE_TO_REQUIRES_JWT` | save_to attempted with `sgk_` API key auth. |
| `SCAISPEAK_SAVE_TO_EXCHANGE_FAILED` | ScaiKey token exchange against ScaiDrive failed. |
| `SCAISPEAK_SCAIDRIVE_NOT_AVAILABLE` | ScaiDrive integration not configured. |
| `SCAISPEAK_SCAIDRIVE_FORBIDDEN` | Caller lacks write access on the destination share. |
| `SCAISPEAK_SCAIDRIVE_NOT_FOUND` | Destination share or folder doesn't exist. |
| `SCAISPEAK_SCAIDRIVE_CONFLICT` | File exists at destination and `overwrite` is false. |
| `SCAISPEAK_SCAIDRIVE_QUOTA_EXCEEDED` | Destination share over quota (HTTP 507). |
| `SCAISPEAK_LICENSE_FIELD_INVALID` | License-type / bound mismatch on global voice create. |
| `SCAISPEAK_GLOBAL_VOICE_NOT_FOUND` | Global voice doesn't exist or already deleted. |
| `SCAISPEAK_BLOCKLIST_NOT_FOUND` | Blocklist entry id doesn't exist. |
| `SCAISPEAK_UNINSTALL_TOKEN_MISMATCH` | Uninstall hook called without a matching token. |
