--- summary: How to wire a wake-word detector on the client and emit the right WebSocket frames so ScaiVoice can gate sessions on "hey assistant". title: Client-side wake word path: tutorials/client-wake-word status: published --- ScaiVoice's wake-word gating lets you build always-on personal-assistant flows: the bot is dormant by default and only "wakes" for the next utterance after the user says a trigger phrase. The wake detector runs on the client; the server just gates input frames behind a `wake_armed` flag. ## When to use wake-word gating Turn it on when: - The mic is always-open in your UI (kitchen assistant, hands-free car app). - You want to avoid double-processing background conversation as user input. - You need a clear "the bot is now listening to you" signal for UX. Leave it off for click-to-talk UIs — there the user's button press is already the activation gesture. ## What ScaiVoice expects Two pieces: 1. **At session create**, set `wake_word_enabled: true`: ```json POST /v1/modules/scaivoice/sessions {"voice_id": "vc_…", "llm_model": "…", "wake_word_enabled": true} ``` 2. **At runtime**, emit a frame when your client-side detector fires: ```json {"type": "wake", "confidence": 0.93} ``` Behaviour: | Client emits | Session state | Server does | |---|---|---| | `wake` | armed=false (initial / post-turn) | Sets `wake_armed=true`, emits `{"type":"wake_state","armed":true}` | | `wake` | armed=true | No-op (idempotent — fine to re-emit). | | `text` or utterance | armed=false | Drops the input, emits `{"type":"info","code":"SCAIVOICE_WAKE_REQUIRED"}` | | `text` or utterance | armed=true | Processes normally. After the turn completes, server re-arms (sets armed=false). | The server emits `{"type":"wake_state","armed":,"wake_word_enabled":true}` on: - Initial connect (so the client knows it's armed=false). - On every wake-state transition (after the wake frame, after each turn). Render UI off these events — typically a "Say 'hey assistant'" prompt when not armed, and a "Listening…" indicator when armed. ## Recommended browser library [**openwakeword**](https://github.com/dscripka/openwakeword) is the recommended default — Apache-2.0, pre-trained models for common phrases (`hey jarvis`, `alexa`, `hey google`, custom training supported), runs in browser via ONNX/WASM. Small models (~5 MB) load fast. For production-grade accuracy or proprietary wake phrases, [Picovoice Porcupine](https://picovoice.ai/platform/porcupine/) is a commercial alternative with better detection rates at the cost of a per-device license. The integration pattern is identical — both libraries expose a callback-on-detection API. ## Reference integration ```html ``` ## Combining with VAD Wake-word + VAD play together naturally: - **Wake word** fires once → bot becomes armed. - **VAD** then drives the actual mic frames + barge-in inside the now-armed turn. Order of events for a typical "hey assistant, what's the weather?" interaction: 1. User: "hey assistant" → wake-word detector fires → emit `{"type":"wake"}`. 2. Server: `{"type":"wake_state","armed":true}`. 3. User: pause briefly, then "what's the weather?" → VAD `speaking:true` → mic frames flow → STT segments → end-of-utterance. 4. Server runs the turn → emits agent_text + audio frames. 5. Turn done → server re-arms (`{"type":"wake_state","armed":false}`). 6. Back to waiting for the next wake. If the user interrupts mid-reply ("never mind"), the VAD speaking-true triggers cancellation as documented in [Client-side VAD integration](./client-vad-integration). ## Without wake word Skip everything above and the session is always-listening — every utterance is processed immediately. This is the default and simplest UX for push-to-talk style flows. ## What about server-side wake-word? ScaiVoice doesn't run wake-word detection server-side. Sending the full mic stream to the server just to detect "hey assistant" would be wasteful (continuous bandwidth + STT cycles) and would add round-trip latency to the most latency-critical signal. Client-side is the right tier; we're unlikely to change this.