---
summary: Upload reference + consent recordings, watch the enrollment fan out across
  pyannote nodes, request a diarized transcript.
title: Enroll a speaker for diarization
path: tutorials/enroll-a-speaker
status: published
---

Speaker diarization in ScaiEcho works against an enrolled speaker library. You upload a short reference recording, a separate consent recording, and metadata; the platform runs a quality preflight, persists the consent record, then fans the enrollment out to every online pyannote node. Once any node returns success, the profile flips to `ready` and the diarize path can label segments from that speaker.

This tutorial walks the full enrollment pipeline, then shows how to consume it from a streaming transcribe call.

## Permissions

Speaker enrollment is biometric capture — `scaiecho:enroll` is a separate permission from `scaiecho:transcribe`. A tenant admin granting "transcribe access" doesn't implicitly grant enrollment. Diarized transcription requires `scaiecho:diarize` on top of `scaiecho:transcribe`.

## 1. Capture reference + consent audio

Two separate recordings:

- **Reference** — at least a few seconds of clean speech from the speaker, no background noise. The pyannote embedding model uses this to identify the speaker in later transcripts.
- **Consent** — the speaker reading the consent text aloud. This is the immutable audit record proving they agreed to enrollment. Keep it short and self-contained.

Both can be any common audio format the dispatcher accepts (`wav`, `mp3`, `flac`, `ogg`, `m4a`).

## 2. Submit the enrollment

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -F "display_name=Alice Example" \
  -F "language_primary=en" \
  -F "description=Customer success lead" \
  -F "consent_user_full_name=Alice Example" \
  -F "consent_stated_purpose=Meeting transcription diarization for the support team" \
  -F "consent_text=I, Alice Example, consent to ScaiEcho enrolling my voice for diarization." \
  -F "reference=@alice-reference.wav" \
  -F "consent=@alice-consent.wav"
```

```python
import httpx, os

with open("alice-reference.wav", "rb") as ref, open("alice-consent.wav", "rb") as cons:
    resp = httpx.post(
        f"{os.environ['SCAIGRID_HOST']}/v1/modules/scaiecho/speakers",
        headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
        files={
            "reference": ("alice-reference.wav", ref, "audio/wav"),
            "consent": ("alice-consent.wav", cons, "audio/wav"),
        },
        data={
            "display_name": "Alice Example",
            "language_primary": "en",
            "consent_user_full_name": "Alice Example",
            "consent_stated_purpose": "Meeting transcription diarization",
            "consent_text": "I consent to enrollment for diarization.",
        },
        timeout=120.0,
    )
print(resp.json()["data"])
```

```javascript
import fs from "node:fs";

const form = new FormData();
form.append("reference", new Blob([fs.readFileSync("alice-reference.wav")]), "alice-reference.wav");
form.append("consent", new Blob([fs.readFileSync("alice-consent.wav")]), "alice-consent.wav");
form.append("display_name", "Alice Example");
form.append("language_primary", "en");
form.append("consent_user_full_name", "Alice Example");
form.append("consent_stated_purpose", "Meeting transcription diarization");
form.append("consent_text", "I consent to enrollment for diarization.");

const res = await fetch(`${process.env.SCAIGRID_HOST}/v1/modules/scaiecho/speakers`, {
  method: "POST",
  headers: { "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}` },
  body: form,
});
console.log(await res.json());
```

The response is `201 Created` with the speaker profile plus a `preflight` block describing what the quality check found, an `enrolled_on` array listing the pyannote node ids that accepted the enrollment, and `enrollment_status` (one of `pending`, `ready`, `failed`, `evicted`).

If the response includes a `note` saying no pyannote node is online, the row stays at `pending` until at least one such node lands — the profile is recorded but cannot yet diarize. The diarization path won't be able to label segments until enrollment fans out.

## 3. Inspect and re-warm enrollment

To see which nodes currently hold the speaker enrolled:

```bash
curl "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID/warm" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

You'll get three sets: `warm_node_ids` (Redis-registered), `candidate_node_ids` (nodes currently running the pyannote engine), `stale_node_ids` (in the registry but not a candidate — typically a node that left the cluster).

To proactively re-enroll on every candidate (or a specific subset):

```bash
curl -X POST "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID/warm" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"node_ids": []}'
```

An empty `node_ids` list means "all candidates." A list of node ids targets that subset. Useful when a node was added after the original intake fan-out, or when an earlier enrollment failed and ops wants to retry.

## 4. Request a diarized transcript

Streaming routes accept `diarize: true` on the `open` frame; the dispatcher attaches a `speaker_label` to each `delta` whose segment matches an enrolled profile.

```python
async with websockets.connect(url) as ws:
    await ws.send(json.dumps({
        "type": "open",
        "language_hint": "en",
        "media_type": "audio/wav",
        "diarize": True,
    }))
    print(await ws.recv())  # {"type": "ready", "backend_used": "A"}
    # ... push audio ...
    async for msg in ws:
        d = json.loads(msg)
        if d.get("type") == "delta":
            print(d.get("speaker_label", "?"), d["text"])
```

Two things to know about the `speaker_label` field:

- It is **omitted** when no label is available — either diarization wasn't requested, the dispatcher couldn't run it (no pyannote node, Backend B selected), or this segment didn't match an enrolled profile cleanly.
- Segments matching no enrolled profile get a per-session cluster id like `spk_0`, `spk_1`. The mapping isn't stable across sessions.

Diarization on Backend B is silently ignored — the managed STT relay does not expose diarization. If your tenant policy pinned the stream to B, the deltas will not carry speaker labels regardless of `diarize=true`.

## 5. Delete a speaker

```bash
curl -X DELETE "$SCAIGRID_HOST/v1/modules/scaiecho/speakers/$SPEAKER_ID" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

This is the GDPR Art. 17 fan-out. The orchestrator deletes the reference and consent blobs from S3, writes an immutable `ErasureAudit` row, tombstones the speaker row, and evicts the embedding from every pyannote node that held it. The response includes `audit_id` (the immutable record), `blob_bytes_deleted`, and per-node `error_summary` if any node failed to evict.

After deletion, future diarized streams won't label that speaker; existing transcripts that already attributed segments to the speaker keep the labels they had at transcription time (transcripts are not retroactively edited).
