Tune AI response mode

json
{
  "title": "Tune AI response mode",
  "audience": "power_user",
  "summary": "Streaming vs. conversational — and the per-(room×AI) override.",
  "sort_order": 10
}

Tune AI response mode#

The two modes:

Streaming: tokens appear in one growing message as the AI types. Good for short replies, code blocks, anything you read top-to-bottom.
Conversational: paragraphs flush as separate messages on boundary detection. Good for long-form replies; the AI feels like someone typing follow-ups rather than dumping a wall.

Default mode for new rooms is conversational. You can override globally, per room, or per-AI within a room.

Where to change it#

Per (room × AI)#

Chat header → mode toggle (one click for 1-AI rooms; per-AI dropdown for multi-AI rooms).

Via slash command:

bash
/mode chat            # → conversational
/mode continuous      # → streaming
/mode <ai-name> chat  # multi-AI per-AI form

Via the API:

bash
PUT /v1/rooms/{room_id}/ai/response-mode
{ "mode": "streaming", "participant_id": "<ai-id-or-null>" }

participant_id=null flips every AI in the room.

Global user default#

Settings → AI → Default response mode. Affects every new room you join. Existing rooms keep their per-room setting.

When room-wide flips persist to your profile default#

A room-wide mode flip (no participant_id) updates your profile default. Per-AI flips do not — tweaking one AI in one room shouldn't change your global preference. This is deliberate.

Heuristic for picking#

Pick streaming when:

Replies are typically short (< 200 words).
You want to read top-to-bottom as it generates.
The reply is mostly code — you want to see the code form line by line.

Pick conversational when:

Replies are long and structured.
You want to interrupt mid-flow — easier to react to message 2 of 5 than to a single growing wall.
The AI is doing iterative reasoning ("first I'll …, then I'll …").

How "paragraph boundary" detection works#

In conversational mode, the streaming response is buffered. When the text crosses a paragraph boundary (a blank line, end of a fenced code block, end of a list item with a following blank), the buffer flushes as a new message. There's also a max-length safety so a single super-long paragraph still gets sent.

If the model decides to stop mid-paragraph, that final buffer flushes regardless.

Sampling parameters#

Mode is one knob; the others (temperature, top_p, max_tokens, frequency_penalty, presence_penalty, reasoning_effort) are per-room too. Set under Engagement → Sampling or via:

bash
PUT /v1/rooms/{room_id}/ai/temperature
{ "value": 0.7, "participant_id": null }

The prompt studio is the easiest way to dial these in — preview before applying.