Chat Completions
The chat completions API is ScaiGrid's flagship inference endpoint. Send a conversation, get an assistant response back. Supports streaming, tool calls, multimodal input (text + images + audio), and all the parameters you expect.
Endpoint: POST /v1/inference/chat
Basic request
| curl -X POST https://scaigrid.scailabs.ai/v1/inference/chat \
-H "Authorization: Bearer $SCAIGRID_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "scailabs/poolnoodle-omni",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "What is the capital of Switzerland?"}
],
"max_tokens": 50
}'
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | import httpx, os
resp = httpx.post(
"https://scaigrid.scailabs.ai/v1/inference/chat",
headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
json={
"model": "scailabs/poolnoodle-omni",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "What is the capital of Switzerland?"},
],
"max_tokens": 50,
},
)
data = resp.json()["data"]
print(data["choices"][0]["message"]["content"])
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 | const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "scailabs/poolnoodle-omni",
messages: [
{ role: "system", content: "You are a concise assistant." },
{ role: "user", content: "What is the capital of Switzerland?" },
],
max_tokens: 50,
}),
});
const { data } = await resp.json();
console.log(data.choices[0].message.content);
|
Request parameters
| Field |
Type |
Notes |
model |
string (required) |
Frontend model slug |
messages |
array (required) |
Conversation history |
max_tokens |
integer |
Max output tokens. Capped by the model's max_output_tokens if set |
temperature |
float |
0.0 (deterministic) to 2.0 (creative) |
top_p |
float |
Nucleus sampling, 0.0–1.0 |
stop |
string or array |
Stop sequences |
seed |
integer |
For reproducibility (provider-dependent) |
stream |
boolean |
See Streaming below |
tools |
array |
Tool definitions; see Tool calls |
tool_choice |
string or object |
auto, none, required, or {"type": "function", "function": {"name": "..."}} |
metadata |
object |
Passed through to backends that support it |
Messages
Each message has a role and content:
system — behavioral instructions. Usually the first message.
user — human or upstream-system input.
assistant — prior assistant responses (for multi-turn).
tool — result of a tool call. Requires tool_call_id.
Multi-turn conversations
| messages = [
{"role": "system", "content": "You translate to French."},
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Bonjour"},
{"role": "user", "content": "Good night"},
]
|
The full history is sent every time — ScaiGrid is stateless. For long-running conversations, use Sessions to store history server-side.
Multimodal content
For vision or audio models, content can be an array of typed parts:
| {
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
|
Supported part types:
text — plain text
image_url — {"url": "...", "detail": "auto" | "low" | "high"}
image_base64 — {"data": "<base64>", "media_type": "image/png"}
audio_url — for audio input on models like GPT-4o audio
audio_base64 — {"data": "<base64>", "media_type": "audio/wav"}
ScaiGrid rewrites base64 images to proxy URLs on the fly, so you can send them inline without bloating your request.
Streaming
Set "stream": true to receive tokens as they arrive. The response is Server-Sent Events (Content-Type: text/event-stream).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23 | import httpx, json, os
with httpx.stream(
"POST",
"https://scaigrid.scailabs.ai/v1/inference/chat",
headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
json={
"model": "scailabs/poolnoodle-omni",
"messages": [{"role": "user", "content": "Tell me a story."}],
"stream": True,
},
timeout=600,
) as r:
for line in r.iter_lines():
if line.startswith("data: "):
payload = line[6:]
if payload == "[DONE]":
break
chunk = json.loads(payload)
print(chunk["choices"][0]["delta"].get("content", ""), end="", flush=True)
elif line.startswith("event: error"):
# Next data: line carries the error payload
pass
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30 | const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/chat", {
method: "POST",
headers: {
"Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "scailabs/poolnoodle-omni",
messages: [{ role: "user", content: "Tell me a story." }],
stream: true,
}),
});
const reader = resp.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
const { done, value } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
const lines = buf.split("\n");
buf = lines.pop()!;
for (const line of lines) {
if (!line.startsWith("data: ")) continue;
const payload = line.slice(6);
if (payload === "[DONE]") return;
const chunk = JSON.parse(payload);
process.stdout.write(chunk.choices[0].delta.content || "");
}
}
|
SSE event types
data: {...} — a chat completion chunk. choices[0].delta.content accumulates.
data: [DONE] — stream end. Close your reader.
event: error\ndata: {...} — an error occurred mid-stream. Payload has {code, message}.
Any stream always ends with data: [DONE] whether it completed normally or errored.
Declare tools the model can invoke:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33 | tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
},
"required": ["location"],
},
},
},
]
resp = httpx.post(
"https://scaigrid.scailabs.ai/v1/inference/chat",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "scailabs/poolnoodle-omni",
"messages": [{"role": "user", "content": "What's the weather in Zürich?"}],
"tools": tools,
},
).json()["data"]
msg = resp["choices"][0]["message"]
if msg.get("tool_calls"):
tc = msg["tool_calls"][0]
print(f"Tool: {tc['function']['name']}")
print(f"Args: {tc['function']['arguments']}") # JSON string
|
When the model wants to call a tool, the response has tool_calls instead of content. Execute the tool, add both the assistant turn and the tool result to your history, then call again:
| history = [
{"role": "user", "content": "What's the weather in Zürich?"},
msg, # assistant turn with tool_calls
{
"role": "tool",
"tool_call_id": msg["tool_calls"][0]["id"],
"content": '{"temp": 18, "unit": "celsius"}',
},
]
final = httpx.post(... json={"model": ..., "messages": history, "tools": tools}, ...)
|
The model's second call produces a plain text response using the tool result.
Response shape
Non-streaming response:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15 | {
"status": "ok",
"data": {
"id": "chatcmpl-abc",
"model": "scailabs/poolnoodle-omni",
"created": 1713888000,
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Bern."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 24, "completion_tokens": 3, "total_tokens": 27},
"_meta": {"request_id": "req_xyz", "latency_ms": 310}
}
}
|
finish_reason values:
stop — model finished naturally
length — hit max_tokens
tool_calls — model wants to call a tool
content_filter — upstream safety filter triggered
What's next