---
summary: How to wire a wake-word detector on the client and emit the right WebSocket
  frames so ScaiVoice can gate sessions on "hey assistant".
title: Client-side wake word
path: tutorials/client-wake-word
status: published
---

ScaiVoice's wake-word gating lets you build always-on personal-assistant flows: the bot is dormant by default and only "wakes" for the next utterance after the user says a trigger phrase. The wake detector runs on the client; the server just gates input frames behind a `wake_armed` flag.

## When to use wake-word gating

Turn it on when:

- The mic is always-open in your UI (kitchen assistant, hands-free car app).
- You want to avoid double-processing background conversation as user input.
- You need a clear "the bot is now listening to you" signal for UX.

Leave it off for click-to-talk UIs — there the user's button press is already the activation gesture.

## What ScaiVoice expects

Two pieces:

1. **At session create**, set `wake_word_enabled: true`:
   ```json
   POST /v1/modules/scaivoice/sessions
   {"voice_id": "vc_…", "llm_model": "…", "wake_word_enabled": true}
   ```

2. **At runtime**, emit a frame when your client-side detector fires:
   ```json
   {"type": "wake", "confidence": 0.93}
   ```

Behaviour:

| Client emits | Session state | Server does |
|---|---|---|
| `wake` | armed=false (initial / post-turn) | Sets `wake_armed=true`, emits `{"type":"wake_state","armed":true}` |
| `wake` | armed=true | No-op (idempotent — fine to re-emit). |
| `text` or utterance | armed=false | Drops the input, emits `{"type":"info","code":"SCAIVOICE_WAKE_REQUIRED"}` |
| `text` or utterance | armed=true | Processes normally. After the turn completes, server re-arms (sets armed=false). |

The server emits `{"type":"wake_state","armed":<bool>,"wake_word_enabled":true}` on:

- Initial connect (so the client knows it's armed=false).
- On every wake-state transition (after the wake frame, after each turn).

Render UI off these events — typically a "Say 'hey assistant'" prompt when not armed, and a "Listening…" indicator when armed.

## Recommended browser library

[**openwakeword**](https://github.com/dscripka/openwakeword) is the recommended default — Apache-2.0, pre-trained models for common phrases (`hey jarvis`, `alexa`, `hey google`, custom training supported), runs in browser via ONNX/WASM. Small models (~5 MB) load fast.

For production-grade accuracy or proprietary wake phrases, [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/) is a commercial alternative with better detection rates at the cost of a per-device license. The integration pattern is identical — both libraries expose a callback-on-detection API.

## Reference integration

```html
<script type="module">
import { OpenWakeWord } from "@openwakeword/web";  // hypothetical wrapper

const ws = new WebSocket(`wss://scaigrid.scailabs.ai${WS_URL}?token=${TOKEN}`);
ws.binaryType = "arraybuffer";

// Track local state so we don't re-emit wake while already armed.
let armedLocal = false;

ws.addEventListener("message", (event) => {
  if (typeof event.data !== "string") return;
  const msg = JSON.parse(event.data);
  if (msg.type === "wake_state") {
    armedLocal = !!msg.armed;
    renderArmedIndicator(armedLocal);
  }
});

const wakeDetector = await OpenWakeWord.load({
  model: "hey_jarvis",  // or a custom-trained ONNX bundle
  threshold: 0.5,
});

wakeDetector.on("trigger", (event) => {
  // Idempotent — server no-ops if already armed.
  if (ws.readyState === WebSocket.OPEN && !armedLocal) {
    ws.send(JSON.stringify({
      type: "wake",
      confidence: event.confidence,
    }));
  }
});

wakeDetector.start();
</script>
```

## Combining with VAD

Wake-word + VAD play together naturally:

- **Wake word** fires once → bot becomes armed.
- **VAD** then drives the actual mic frames + barge-in inside the now-armed turn.

Order of events for a typical "hey assistant, what's the weather?" interaction:

1. User: "hey assistant" → wake-word detector fires → emit `{"type":"wake"}`.
2. Server: `{"type":"wake_state","armed":true}`.
3. User: pause briefly, then "what's the weather?" → VAD `speaking:true` → mic frames flow → STT segments → end-of-utterance.
4. Server runs the turn → emits agent_text + audio frames.
5. Turn done → server re-arms (`{"type":"wake_state","armed":false}`).
6. Back to waiting for the next wake.

If the user interrupts mid-reply ("never mind"), the VAD speaking-true triggers cancellation as documented in [Client-side VAD integration](./client-vad-integration).

## Without wake word

Skip everything above and the session is always-listening — every utterance is processed immediately. This is the default and simplest UX for push-to-talk style flows.

## What about server-side wake-word?

ScaiVoice doesn't run wake-word detection server-side. Sending the full mic stream to the server just to detect "hey assistant" would be wasteful (continuous bandwidth + STT cycles) and would add round-trip latency to the most latency-critical signal. Client-side is the right tier; we're unlikely to change this.
