Quickstart
In five minutes you'll list available voices, render audio in a global voice, and save the result to disk.
You need:
- A ScaiGrid API key with
scaispeak:voice.read and scaispeak:synthesize (any tenant admin has both).
- An audio player on your machine (
afplay, play, ffplay, anything that plays MP3).
| export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."
|
1. List voices
Every tenant sees the platform's global voices automatically. Pick one with embedding_status: ready.
| curl "$SCAIGRID_HOST/v1/modules/scaispeak/voices?language=en&embedding_status=ready&limit=5" \
-H "Authorization: Bearer $SCAIGRID_API_KEY"
|
| import httpx, os
voices = httpx.get(
f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaispeak/voices",
headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
params={"language": "en", "embedding_status": "ready", "limit": 5},
).json()["data"]["items"]
for v in voices:
print(v["voice_id"], v["display_name"], v["scope"])
|
| const res = await fetch(
`${process.env.SCAIGRID_HOST}/v1/modules/scaispeak/voices?language=en&embedding_status=ready&limit=5`,
{ headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` } },
);
const { data } = await res.json();
data.items.forEach(v => console.log(v.voice_id, v.display_name, v.scope));
|
Save one voice_id — you'll need it for the next call.
2. Render a preview
Every voice has a built-in preview endpoint that renders a short sample. Cheap, capped at 300 chars, useful for picking a voice from the library.
| curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/voices/$VOICE_ID/preview" \
-H "Authorization: Bearer $SCAIGRID_API_KEY" \
-F "text=Hello from ScaiSpeak. This is the preview." \
-F "response_format=mp3" \
| python -c "import sys,json,base64;\
b=json.load(sys.stdin)['data']['audio_base64'];\
open('preview.mp3','wb').write(base64.b64decode(b))"
|
Play preview.mp3. If it sounds wrong, pick a different voice_id and repeat.
3. Synthesise full text
POST /speak is the production verb. Short text returns inline; longer text falls through to an async job (default threshold is 500 characters).
| curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
-H "Authorization: Bearer $SCAIGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"voice_id": "'$VOICE_ID'",
"text": "Speech synthesis in ScaiSpeak is metered, routed, and recorded the same way any other inference call is.",
"response_format": "mp3"
}'
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14 | import httpx, os, base64
resp = httpx.post(
f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaispeak/speak",
headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
json={
"voice_id": os.environ["VOICE_ID"],
"text": "Speech synthesis in ScaiSpeak is metered, routed, and recorded.",
"response_format": "mp3",
},
).json()["data"]
audio = base64.b64decode(resp["audio_base64"])
open("synth.mp3", "wb").write(audio)
print(resp["backend_used"], resp["char_count"], "chars")
|
| const out = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaispeak/speak`, {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ voice_id: process.env.VOICE_ID, text: "...", response_format: "mp3" }),
});
const { data } = await out.json();
require("fs").writeFileSync("synth.mp3", Buffer.from(data.audio_base64, "base64"));
console.log(data.backend_used, data.char_count);
|
You should see backend_used: "A" if your tenant has a self-hosted TTS node online, "B" if you're routed to the managed TTS relay.
Text longer than the threshold (or force_async: true) returns 202 Accepted with a job_id. Poll GET /speak/jobs/{job_id} until status: completed; the response carries audio_base64 inline (small outputs) or an S3 URI (larger ones).
5. Save directly to ScaiDrive
If you're authenticated with a JWT (not an sgk_ API key), the synth output can land straight in a ScaiDrive share with save_to:
| curl -X POST "$SCAIGRID_HOST/v1/modules/scaispeak/speak" \
-H "Authorization: Bearer $JWT" \
-H "Content-Type: application/json" \
-d '{
"voice_id": "'$VOICE_ID'",
"text": "Audio that lands in your share with no second round-trip.",
"save_to": { "share_id": "shr_xyz", "filename": "chapter-01.mp3" },
"inline_response": false
}'
|
The response carries the new file_id, name, and version_id.
What just happened
/voices returned the visible voice library — global plus your tenant's plus your user's.
/voices/{id}/preview rendered a short clip through the same dispatcher the production endpoint uses.
/speak picked a backend (self-hosted A or relay B) per tenant policy, dispatched the synth, and either streamed the audio inline or queued a job.
- Every call was metered by ScaiGrid's accounting pipeline against your tenant's budget.
Next