Chat Completions

The chat completions API is ScaiGrid's flagship inference endpoint. Send a conversation, get an assistant response back. Supports streaming, tool calls, multimodal input (text + images + audio), and all the parameters you expect.

Endpoint: POST /v1/inference/chat

Basic request#

bash
curl -X POST https://scaigrid.scailabs.ai/v1/inference/chat \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "scailabs/poolnoodle-omni",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "What is the capital of Switzerland?"}
    ],
    "max_tokens": 50
  }'

python
import httpx, os

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [
            {"role": "system", "content": "You are a concise assistant."},
            {"role": "user", "content": "What is the capital of Switzerland?"},
        ],
        "max_tokens": 50,
    },
)
data = resp.json()["data"]
print(data["choices"][0]["message"]["content"])

typescript
const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "scailabs/poolnoodle-omni",
    messages: [
      { role: "system", content: "You are a concise assistant." },
      { role: "user", content: "What is the capital of Switzerland?" },
    ],
    max_tokens: 50,
  }),
});
const { data } = await resp.json();
console.log(data.choices[0].message.content);

Request parameters#

Field	Type	Notes
`model`	string (required)	Frontend model slug
`messages`	array (required)	Conversation history
`max_tokens`	integer	Max output tokens. Capped by the model's `max_output_tokens` if set
`temperature`	float	0.0 (deterministic) to 2.0 (creative)
`top_p`	float	Nucleus sampling, 0.0–1.0
`stop`	string or array	Stop sequences
`seed`	integer	For reproducibility (provider-dependent)
`stream`	boolean	See Streaming below
`tools`	array	Tool definitions; see Tool calls
`tool_choice`	string or object	`auto`, `none`, `required`, or `{"type": "function", "function": {"name": "..."}}`
`metadata`	object	Passed through to backends that support it

Messages#

Each message has a role and content:

system — behavioral instructions. Usually the first message.
user — human or upstream-system input.
assistant — prior assistant responses (for multi-turn).
tool — result of a tool call. Requires tool_call_id.

Multi-turn conversations#

python
messages = [
    {"role": "system", "content": "You translate to French."},
    {"role": "user", "content": "Hello"},
    {"role": "assistant", "content": "Bonjour"},
    {"role": "user", "content": "Good night"},
]

The full history is sent every time — ScaiGrid is stateless. For long-running conversations, use Sessions to store history server-side.

Multimodal content#

For vision or audio models, content can be an array of typed parts:

json
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}

Supported part types:

text — plain text
image_url — {"url": "...", "detail": "auto" | "low" | "high"}
image_base64 — {"data": "<base64>", "media_type": "image/png"}
audio_url — for audio input on models like GPT-4o audio
audio_base64 — {"data": "<base64>", "media_type": "audio/wav"}

ScaiGrid rewrites base64 images to proxy URLs on the fly, so you can send them inline without bloating your request.

Streaming#

Set "stream": true to receive tokens as they arrive. The response is Server-Sent Events (Content-Type: text/event-stream).

python
import httpx, json, os

with httpx.stream(
    "POST",
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [{"role": "user", "content": "Tell me a story."}],
        "stream": True,
    },
    timeout=600,
) as r:
    for line in r.iter_lines():
        if line.startswith("data: "):
            payload = line[6:]
            if payload == "[DONE]":
                break
            chunk = json.loads(payload)
            print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)
        elif line.startswith("event: error"):
            # Next data: line carries the error payload
            pass

typescript
const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "scailabs/poolnoodle-omni",
    messages: [{ role: "user", content: "Tell me a story." }],
    stream: true,
  }),
});

const reader = resp.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  buf += decoder.decode(value, { stream: true });
  const lines = buf.split("\n");
  buf = lines.pop()!;
  for (const line of lines) {
    if (!line.startsWith("data: ")) continue;
    const payload = line.slice(6);
    if (payload === "[DONE]") return;
    const chunk = JSON.parse(payload);
    process.stdout.write(chunk.choices[0].delta.content || "");
  }
}

SSE event types#

data: {...} — a chat completion chunk. choices[0].delta.content accumulates.
data: [DONE] — stream end. Close your reader.
event: error\ndata: {...} — an error occurred mid-stream. Payload has {code, message}.

Any stream always ends with data: [DONE] whether it completed normally or errored.

Tool calls#

Declare tools the model can invoke:

python
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["location"],
            },
        },
    },
]

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/chat",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "scailabs/poolnoodle-omni",
        "messages": [{"role": "user", "content": "What's the weather in Zürich?"}],
        "tools": tools,
    },
).json()["data"]

msg = resp["choices"][0]["message"]
if msg.get("tool_calls"):
    tc = msg["tool_calls"][0]
    print(f"Tool: {tc['function']['name']}")
    print(f"Args: {tc['function']['arguments']}")  # JSON string

When the model wants to call a tool, the response has tool_calls instead of content. Execute the tool, add both the assistant turn and the tool result to your history, then call again:

python
history = [
    {"role": "user", "content": "What's the weather in Zürich?"},
    msg,  # assistant turn with tool_calls
    {
        "role": "tool",
        "tool_call_id": msg["tool_calls"][0]["id"],
        "content": '{"temp": 18, "unit": "celsius"}',
    },
]

final = httpx.post(... json={"model": ..., "messages": history, "tools": tools}, ...)

The model's second call produces a plain text response using the tool result.

Response shape#

Non-streaming response:

json
{
  "status": "ok",
  "data": {
    "id": "chatcmpl-abc",
    "model": "scailabs/poolnoodle-omni",
    "created": 1713888000,
    "choices": [{
      "index": 0,
      "message": {"role": "assistant", "content": "Bern."},
      "finish_reason": "stop"
    }],
    "usage": {"prompt_tokens": 24, "completion_tokens": 3, "total_tokens": 27},
    "_meta": {"request_id": "req_xyz", "latency_ms": 310}
  }
}

finish_reason values:

stop — model finished naturally
length — hit max_tokens
tool_calls — model wants to call a tool
content_filter — upstream safety filter triggered

What's next#

Embeddings — vectorize text for search.
OpenAI Compatibility — drop-in replacement for OpenAI SDK.
Sessions and Rooms — server-side conversation storage.
Errors — handling failures.