---
title: Chat Completions
path: api-guides/chat-completions
status: published
---

# Chat Completions

The chat completions API is ScaiGrid's flagship inference endpoint. Send a conversation, get an assistant response back. Supports streaming, tool calls, multimodal input (text + images + audio), and all the parameters you expect.

**Endpoint:** `POST /v1/inference/chat`

## Basic request

```bash
curl -X POST https://scaigrid.scailabs.ai/v1/inference/chat \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "scailabs/poolnoodle-omni",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "What is the capital of Switzerland?"}
    ],
    "max_tokens": 50
  }'
```

```python
import httpx, os

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [
            {"role": "system", "content": "You are a concise assistant."},
            {"role": "user", "content": "What is the capital of Switzerland?"},
        ],
        "max_tokens": 50,
    },
)
data = resp.json()["data"]
print(data["choices"][0]["message"]["content"])
```

```typescript
const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "scailabs/poolnoodle-omni",
    messages: [
      { role: "system", content: "You are a concise assistant." },
      { role: "user", content: "What is the capital of Switzerland?" },
    ],
    max_tokens: 50,
  }),
});
const { data } = await resp.json();
console.log(data.choices[0].message.content);
```

## Request parameters

| Field | Type | Notes |
|-------|------|-------|
| `model` | string (required) | Frontend model slug |
| `messages` | array (required) | Conversation history |
| `max_tokens` | integer | Max output tokens. Capped by the model's `max_output_tokens` if set |
| `temperature` | float | 0.0 (deterministic) to 2.0 (creative) |
| `top_p` | float | Nucleus sampling, 0.0–1.0 |
| `stop` | string or array | Stop sequences |
| `seed` | integer | For reproducibility (provider-dependent) |
| `stream` | boolean | See [Streaming](#streaming) below |
| `tools` | array | Tool definitions; see [Tool calls](#tool-calls) |
| `tool_choice` | string or object | `auto`, `none`, `required`, or `{"type": "function", "function": {"name": "..."}}` |
| `metadata` | object | Passed through to backends that support it |

## Messages

Each message has a `role` and `content`:

- `system` — behavioral instructions. Usually the first message.
- `user` — human or upstream-system input.
- `assistant` — prior assistant responses (for multi-turn).
- `tool` — result of a tool call. Requires `tool_call_id`.

### Multi-turn conversations

```python
messages = [
    {"role": "system", "content": "You translate to French."},
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Bonjour"},
    {"role": "user", "content": "Good night"},
]
```

The full history is sent every time — ScaiGrid is stateless. For long-running conversations, use [Sessions](./05-sessions-and-rooms.md) to store history server-side.

### Multimodal content

For vision or audio models, `content` can be an array of typed parts:

```json
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}
```

Supported part types:

- `text` — plain text
- `image_url` — `{"url": "...", "detail": "auto" | "low" | "high"}`
- `image_base64` — `{"data": "<base64>", "media_type": "image/png"}`
- `audio_url` — for audio input on models like GPT-4o audio
- `audio_base64` — `{"data": "<base64>", "media_type": "audio/wav"}`

ScaiGrid rewrites base64 images to proxy URLs on the fly, so you can send them inline without bloating your request.

## Streaming

Set `"stream": true` to receive tokens as they arrive. The response is Server-Sent Events (`Content-Type: text/event-stream`).

```python
import httpx, json, os

with httpx.stream(
    "POST",
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": True,
    },
    timeout=600,
) as r:
    for line in r.iter_lines():
        if line.startswith("data: "):
            payload = line[6:]
            if payload == "[DONE]":
                break
            chunk = json.loads(payload)
            print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)
        elif line.startswith("event: error"):
            # Next data: line carries the error payload
            pass
```

```typescript
const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "scailabs/poolnoodle-omni",
    messages: [{ role: "user", content: "Tell me a story." }],
    stream: true,
  }),
});

const reader = resp.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  const lines = buf.split("\n");
  buf = lines.pop()!;
  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const payload = line.slice(6);
    if (payload === "[DONE]") return;
    const chunk = JSON.parse(payload);
    process.stdout.write(chunk.choices[0].delta.content || "");
  }
}
```

### SSE event types

- `data: {...}` — a chat completion chunk. `choices[0].delta.content` accumulates.
- `data: [DONE]` — stream end. Close your reader.
- `event: error\ndata: {...}` — an error occurred mid-stream. Payload has `{code, message}`.

Any stream always ends with `data: [DONE]` whether it completed normally or errored.

## Tool calls

Declare tools the model can invoke:

```python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
]

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [{"role": "user", "content": "What's the weather in Zürich?"}],
        "tools": tools,
    },
).json()["data"]

msg = resp["choices"][0]["message"]
if msg.get("tool_calls"):
    tc = msg["tool_calls"][0]
    print(f"Tool: {tc['function']['name']}")
    print(f"Args: {tc['function']['arguments']}")  # JSON string
```

When the model wants to call a tool, the response has `tool_calls` instead of `content`. Execute the tool, add both the assistant turn and the tool result to your history, then call again:

```python
history = [
    {"role": "user", "content": "What's the weather in Zürich?"},
    msg,  # assistant turn with tool_calls
    {
        "role": "tool",
        "tool_call_id": msg["tool_calls"][0]["id"],
        "content": '{"temp": 18, "unit": "celsius"}',
    },
]

final = httpx.post(... json={"model": ..., "messages": history, "tools": tools}, ...)
```

The model's second call produces a plain text response using the tool result.

## Response shape

Non-streaming response:

```json
{
  "status": "ok",
  "data": {
    "id": "chatcmpl-abc",
    "model": "scailabs/poolnoodle-omni",
    "created": 1713888000,
    "choices": [{
      "index": 0,
      "message": {"role": "assistant", "content": "Bern."},
      "finish_reason": "stop"
    }],
    "usage": {"prompt_tokens": 24, "completion_tokens": 3, "total_tokens": 27},
    "_meta": {"request_id": "req_xyz", "latency_ms": 310}
  }
}
```

`finish_reason` values:

- `stop` — model finished naturally
- `length` — hit `max_tokens`
- `tool_calls` — model wants to call a tool
- `content_filter` — upstream safety filter triggered

## What's next

- [Embeddings](./02-embeddings.md) — vectorize text for search.
- [OpenAI Compatibility](./07-openai-compatibility.md) — drop-in replacement for OpenAI SDK.
- [Sessions and Rooms](./05-sessions-and-rooms.md) — server-side conversation storage.
- [Errors](../03-core-concepts/07-errors.md) — handling failures.
