---
summary: "Transcribe a short audio file with one curl, then open a WebSocket stream\
  \ for live captioning \u2014 five minutes end-to-end."
title: Quickstart
path: quickstart
status: published
---

In five minutes you'll have a transcript of a recorded file and a live captioning session running over a WebSocket.

You need:

- A ScaiGrid API key with `scaiecho:transcribe` (any tenant admin has this).
- A short audio file (`.wav`, `.mp3`, `.flac`, `.ogg`, or `.m4a`) — under 5 MiB to keep this synchronous.
- A terminal that can speak WebSocket if you want to follow step 4 (`websocat` works, so does Python's `websockets` library).

```bash
export SCAIGRID_HOST="https://scaigrid.scailabs.ai"
export SCAIGRID_API_KEY="sgk_..."
```

## 1. Check your tenant policy

Tenant policy decides whether your transcription runs on a self-hosted STT node (Backend A) or a managed STT relay (Backend B). Most tenants are on `AB` allowed with `B` default.

```bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/tenant-policy" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

If you see a 403, you need `scaiecho:admin` to read policy. You can still transcribe — the policy applies automatically.

## 2. Transcribe a file

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "file=@meeting.wav" \
  -F "language_hint=en" \
  -F "backend_preference=any"
```

```python
import httpx, os

with open("meeting.wav", "rb") as f:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/transcribe",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={"file": ("meeting.wav", f, "audio/wav")},
        data={"language_hint": "en", "backend_preference": "any"},
        timeout=120.0,
    )
print(resp.json()["data"]["transcript"])
```

```javascript
import fs from "node:fs";

const form = new FormData();
form.append("file", new Blob([fs.readFileSync("meeting.wav")]), "meeting.wav");
form.append("language_hint", "en");
form.append("backend_preference", "any");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/transcribe`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
const { data } = await res.json();
console.log(data.transcript);
```

For audio under the inline threshold (default 5 MiB) you get the transcript back in the same response. Larger files return `202 Accepted` with a `job_id`.

## 3. Poll an async job (if you went over the threshold)

```bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/transcribe/jobs/$JOB_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

Status moves through `queued` → `running` → `completed`. The transcript is on the response when status is `completed`. No S3 fetch needed — transcripts are text.

## 4. Stream live audio over WebSocket

Open a WebSocket to `/v1/modules/scaiecho/stream/transcribe`, send an `open` JSON control frame, push binary audio chunks, and receive `delta` JSON frames.

```python
import asyncio, json, os, websockets

async def main():
    url = (
        os.environ["SCAIGRID_HOST"].replace("https", "wss")
        + "/v1/modules/scaiecho/stream/transcribe"
        + f"?token={os.environ['SCAIGRID_API_KEY']}"
    )
    async with websockets.connect(url) as ws:
        await ws.send(json.dumps({
            "type": "open",
            "language_hint": "en",
            "media_type": "audio/wav",
            "chunk_seconds": 5.0,
        }))
        print(await ws.recv())  # {"type": "ready", ...}

        with open("meeting.wav", "rb") as f:
            while chunk := f.read(16000):
                await ws.send(chunk)

        await ws.send(json.dumps({"type": "close"}))
        async for msg in ws:
            print(msg)

asyncio.run(main())
```

```javascript
import WebSocket from "ws";
import fs from "node:fs";

const url = `${process.env.SCAIGRID_HOST.replace("https", "wss")}`
  + `/v1/modules/scaiecho/stream/transcribe?token=${process.env.SCAIGRID_API_KEY}`;
const ws = new WebSocket(url);

ws.on("open", () => {
  ws.send(JSON.stringify({ type: "open", language_hint: "en", media_type: "audio/wav" }));
});
ws.on("message", (data) => console.log(data.toString()));

ws.on("open", () => {
  const stream = fs.createReadStream("meeting.wav", { highWaterMark: 16000 });
  stream.on("data", (chunk) => ws.send(chunk));
  stream.on("end", () => ws.send(JSON.stringify({ type: "close" })));
});
```

```bash
# websocat — push raw bytes after an open control frame
{
  echo '{"type":"open","language_hint":"en","media_type":"audio/wav"}'
  cat meeting.wav
  echo '{"type":"close"}'
} | websocat -b "$SCAIGRID_HOST_WS/v1/modules/scaiecho/stream/transcribe?token=$SCAIGRID_API_KEY"
```

Server frames you'll see: `ready` (with the selected backend), repeated `delta` (with `text`, `is_final`, `start`, `end`), and finally `closed`.

## 5. (Optional) enroll a speaker for diarization

Skip this unless you need speaker-attributed transcripts. Enrollment is biometric — it requires consent capture and the `scaiecho:enroll` permission, separate from `scaiecho:transcribe`.

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Alice" \
  -F "language_primary=en" \
  -F "consent_user_full_name=Alice Example" \
  -F "consent_stated_purpose=Meeting transcription diarization" \
  -F "consent_text=I consent to enrollment for diarization." \
  -F "reference=@alice-reference.wav" \
  -F "consent=@alice-consent.wav"
```

See [Enroll a speaker for diarization](./tutorials/enroll-a-speaker) for the full pipeline, including how diarized streaming requests pick up enrolled profiles.

## What just happened

- Step 2 ran through `TranscribeService`, which consulted your tenant policy and picked Backend A or B. Short audio went sync; long audio enqueued a job on the arq worker pool.
- Step 4 opened a `StreamTranscribeService` session. Audio frames went to the dispatcher's chunk-relay path; transcript records came back over the WebSocket as JSON deltas.
- Every call was metered by ScaiGrid's accounting pipeline against your tenant's budget, just like a chat completion.

## Next

- Read [Architecture](./concepts/architecture) to understand the backend split and the dispatcher contract.
- Read [Streaming transports](./concepts/streaming-transports) to choose between WebSocket and WebRTC for live audio.
- See the [API reference](./reference/api) for every endpoint and error code.
