Quickstart

In five minutes you'll list available voices, render audio in a global voice, and save the result to disk.

You need:

A ScaiGrid API key with scaispeak:voice.read and scaispeak:synthesize (any tenant admin has both).
An audio player on your machine (afplay, play, ffplay, anything that plays MP3).

bash
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."

1. List voices#

Every tenant sees the platform's global voices automatically. Pick one with embedding_status: ready.

bash
curl "$SCAIGRID_HOST/v1/modules/scaispeak/voices?language=en&embedding_status=ready&limit=5" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"

python
import httpx, os
voices = httpx.get(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaispeak/voices",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    params={"language": "en", "embedding_status": "ready", "limit": 5},
).json()["data"]["items"]
for v in voices:
    print(v["voice_id"], v["display_name"], v["scope"])

javascript
const res = await fetch(
  `${process.env.SCAIGRID_HOST}/v1/modules/scaispeak/voices?language=en&embedding_status=ready&limit=5`,
  { headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` } },
);
const { data } = await res.json();
data.items.forEach(v => console.log(v.voice_id, v.display_name, v.scope));

Save one voice_id — you'll need it for the next call.

2. Render a preview#

Every voice has a built-in preview endpoint that renders a short sample. Cheap, capped at 300 chars, useful for picking a voice from the library.

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/voices/$VOICE_ID/preview" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "text=Hello from ScaiSpeak. This is the preview." \
  -F "response_format=mp3" \
  | python -c "import sys,json,base64;\
b=json.load(sys.stdin)['data']['audio_base64'];\
open('preview.mp3','wb').write(base64.b64decode(b))"

Play preview.mp3. If it sounds wrong, pick a different voice_id and repeat.

3. Synthesise full text#

POST /speak is the production verb. Short text returns inline; longer text falls through to an async job (default threshold is 500 characters).

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "'$VOICE_ID'",
    "text": "Speech synthesis in ScaiSpeak is metered, routed, and recorded the same way any other inference call is.",
    "response_format": "mp3"
  }'

python
import httpx, os, base64

resp = httpx.post(
    f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaispeak/speak",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "voice_id": os.environ["VOICE_ID"],
        "text": "Speech synthesis in ScaiSpeak is metered, routed, and recorded.",
        "response_format": "mp3",
    },
).json()["data"]
audio = base64.b64decode(resp["audio_base64"])
open("synth.mp3", "wb").write(audio)
print(resp["backend_used"], resp["char_count"], "chars")

javascript
const out = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaispeak/speak`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ voice_id: process.env.VOICE_ID, text: "...", response_format: "mp3" }),
});
const { data } = await out.json();
require("fs").writeFileSync("synth.mp3", Buffer.from(data.audio_base64, "base64"));
console.log(data.backend_used, data.char_count);

You should see backend_used: "A" if your tenant has a self-hosted TTS node online, "B" if you're routed to the managed TTS relay.

4. Long-form (async path)#

Text longer than the threshold (or force_async: true) returns 202 Accepted with a job_id. Poll GET /speak/jobs/{job_id} until status: completed; the response carries audio_base64 inline (small outputs) or an S3 URI (larger ones).

5. Save directly to ScaiDrive#

If you're authenticated with a JWT (not an sgk_ API key), the synth output can land straight in a ScaiDrive share with save_to:

bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
  -H "Authorization: Bearer $JWT" \
  -H "Content-Type: application/json" \
  -d '{
    "voice_id": "'$VOICE_ID'",
    "text": "Audio that lands in your share with no second round-trip.",
    "save_to": { "share_id": "shr_xyz", "filename": "chapter-01.mp3" },
    "inline_response": false
  }'

The response carries the new file_id, name, and version_id.

What just happened#

/voices returned the visible voice library — global plus your tenant's plus your user's.
/voices/{id}/preview rendered a short clip through the same dispatcher the production endpoint uses.
/speak picked a backend (self-hosted A or relay B) per tenant policy, dispatched the synth, and either streamed the audio inline or queued a job.
Every call was metered by ScaiGrid's accounting pipeline against your tenant's budget.

Next#

Clone a voice from your own recording — see clone and synthesise.
Wire low-latency streaming for an interactive product — see stream with WebSocket.
Configure your tenant's backend policy — see Architecture.