---
summary: User-visible changes to ScaiSpeak.
title: Changelog
path: changelog
status: published
---

User-visible changes only. Internal refactors and infrastructure work omitted.

## v0.x — Phase rollout

ScaiSpeak ships in phases. Each phase adds endpoints and capabilities; the module ID and URL prefix have been stable since Phase 0.

- **Phase 1 — Voice library.** List / get / clone / update / delete voices. Preflight checks on intake. Consent capture. ScaiDrive references for reference + consent audio. Permissions split into `synthesize`, `voice.read`, `voice.write`, `voice.share`, `admin`.
- **Phase 2 — Batch synth.** `POST /speak` with Backend B (managed TTS relay) wired. Tenant backend policy at `/admin/policy`. Voice preview endpoint.
- **Phase 2B — Self-host backend.** Backend A (ScaiInfer-hosted TTS engine) added behind the same `/speak` path. Backend policy picks per-tenant.
- **Phase 3 — Voice warming.** `voice_prefix_tokens` from the previous-generation cloning pipeline. Warm / evict / repromote endpoints. Redis-backed warm registry. *Superseded 2026-05-22 by the zero-shot cloning engine; the endpoints remain for compatibility but are no-ops on the new engine.*
- **Phase 4 — WebSocket streaming.** `WS /stream/speak` with the text/flush/interrupt/close vocabulary. Opus + PCM output codecs.
- **Phase 5 — WebRTC.** Session lifecycle at `/stream/speak/webrtc/sessions/*` plus control WebSocket. Requires `aiortc` + `av` in the deployment.
- **Phase 6 — Async long-form.** `POST /speak` returns `202` + `job_id` for text over the threshold. `GET /speak/jobs/{id}` for polling. Caller can force the path with `force_async`.
- **Phase 7 — GDPR + safety.** Erasure pipeline with audit rows. Blocklist endpoints. Lifecycle hooks (install / upgrade / uninstall / tenant enable / disable) wired into the erasure worker.
- **2026-05-13 — save_to ScaiDrive.** `POST /speak` accepts a `save_to` block; sync + async paths upload to the caller's ScaiDrive share via token exchange. Synth admin page at `/admin/scaispeak/synthesise` ships with the ScaiDrive folder picker and localStorage presets. Global voices: `POST /admin/voices/global` + `DELETE /admin/voices/global/{id}`, SuperAdmin-only, licensed-not-consent-based.
- **2026-05-22 — Zero-shot cloning engine.** Self-hosted cloning is now zero-shot: the reference clip is consumed at synth time directly, no separate training step. New voices land at `embedding_status: ready` immediately after intake clears preflight. Three new optional fields on `POST /speak` (`instructions`, `cfg_value`, `warmup_trim_ms`) let callers tune per-call delivery for cloned voices. Output sample rate is now 48 kHz on the self-hosted path, up from 24 kHz. The warm / repromote endpoints stay in place as no-ops for compatibility.
