---
title: Your First Integration
path: getting-started/your-first-integration
status: published
---

# Your First Integration

The quickstart showed you how to fire a single request. This walks through a real integration: structured error handling, retries, streaming, and usage tracking.

We'll build a small "title generator" — given a short conversation snippet, produce a 6-word title. You'd want this behind a user-facing chat app.

## Setup

```bash
export SCAIGRID_API_KEY=sgk_your_key_here
export SCAIGRID_BASE_URL=https://scaigrid.scailabs.ai
```

```python
# pip install httpx
import os
import httpx

API_KEY = os.environ["SCAIGRID_API_KEY"]
BASE_URL = os.environ.get("SCAIGRID_BASE_URL", "https://scaigrid.scailabs.ai")
```

```typescript
// npm install
const API_KEY = process.env.SCAIGRID_API_KEY!;
const BASE_URL = process.env.SCAIGRID_BASE_URL ?? "https://scaigrid.scailabs.ai";
```

## A real request with error handling

```python
import httpx

class ScaiGridError(Exception):
    def __init__(self, code: str, message: str, retry_after: float | None = None):
        self.code = code
        self.message = message
        self.retry_after = retry_after
        super().__init__(f"{code}: {message}")


def chat(model: str, messages: list[dict], **params) -> dict:
    resp = httpx.post(
        f"{BASE_URL}/v1/inference/chat",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={"model": model, "messages": messages, **params},
        timeout=60.0,
    )
    body = resp.json()
    if body.get("status") == "error":
        err = body["error"]
        raise ScaiGridError(
            code=err["code"],
            message=err["message"],
            retry_after=err.get("retry_after"),
        )
    return body["data"]


result = chat(
    model="scailabs/poolnoodle-omni",
    messages=[
        {"role": "system", "content": "You generate conversation titles. Reply with ONLY the title, max 6 words."},
        {"role": "user", "content": "Hey, what's a good recipe for carbonara?"},
    ],
    max_tokens=20,
    temperature=0.3,
)
print(result["choices"][0]["message"]["content"])
```

```typescript
interface ChatResult {
  choices: { message: { role: string; content: string } }[];
  usage: { prompt_tokens: number; completion_tokens: number; total_tokens: number };
}

interface ScaiGridError {
  code: string;
  message: string;
  retry_after?: number;
}

async function chat(
  model: string,
  messages: Array<{ role: string; content: string }>,
  params: Record<string, unknown> = {},
): Promise<ChatResult> {
  const resp = await fetch(`${BASE_URL}/v1/inference/chat`, {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ model, messages, ...params }),
  });
  const body = await resp.json();
  if (body.status === "error") {
    const err = body.error as ScaiGridError;
    const e = new Error(`${err.code}: ${err.message}`);
    (e as any).code = err.code;
    (e as any).retryAfter = err.retry_after;
    throw e;
  }
  return body.data as ChatResult;
}

const result = await chat(
  "scailabs/poolnoodle-omni",
  [
    { role: "system", content: "You generate conversation titles. Reply with ONLY the title, max 6 words." },
    { role: "user", content: "Hey, what's a good recipe for carbonara?" },
  ],
  { max_tokens: 20, temperature: 0.3 },
);
console.log(result.choices[0].message.content);
```

Expected output: something like `Classic Carbonara Recipe Request`.

## Retries

ScaiGrid maps upstream failures to specific error codes. Some are retryable, some are not.

| Code | HTTP | Retry? |
|------|------|--------|
| `BACKEND_RATE_LIMITED` | 429 | Yes — honor `retry_after` |
| `BACKEND_TIMEOUT` | 504 | Yes — with exponential backoff |
| `BACKEND_ERROR` | 502 | Yes — once or twice |
| `BUDGET_EXCEEDED` | 429 | No — admin intervention needed |
| `MODEL_UNAVAILABLE` | 503 | Sometimes — the model might come back |
| `UPSTREAM_SHAPE_MISMATCH` | 502 | No — gateway integration bug |
| `AUTH_TOKEN_INVALID` | 401 | No — fix your credentials |
| `AUTHZ_PERMISSION_DENIED` | 403 | No — not a transient problem |

A minimal retry helper:

```python
import time

RETRYABLE = {"BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE"}

def chat_with_retry(model: str, messages: list, max_attempts: int = 3, **params) -> dict:
    for attempt in range(max_attempts):
        try:
            return chat(model, messages, **params)
        except ScaiGridError as e:
            if e.code not in RETRYABLE or attempt == max_attempts - 1:
                raise
            delay = e.retry_after if e.retry_after else (2 ** attempt)
            time.sleep(delay)
    raise RuntimeError("unreachable")
```

```typescript
const RETRYABLE = new Set([
  "BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR", "MODEL_UNAVAILABLE",
]);

async function chatWithRetry(
  model: string,
  messages: any[],
  params: Record<string, unknown> = {},
  maxAttempts = 3,
): Promise<ChatResult> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await chat(model, messages, params);
    } catch (e: any) {
      if (!RETRYABLE.has(e.code) || attempt === maxAttempts - 1) throw e;
      const delay = (e.retryAfter ?? Math.pow(2, attempt)) * 1000;
      await new Promise(r => setTimeout(r, delay));
    }
  }
  throw new Error("unreachable");
}
```

## Tracking usage

Every response includes a `usage` object with token counts. For longer-lived integrations you usually want to sum these to measure cost.

```python
total_prompt = 0
total_completion = 0

def track(result):
    global total_prompt, total_completion
    usage = result["usage"]
    total_prompt += usage["prompt_tokens"]
    total_completion += usage["completion_tokens"]
```

For authoritative usage across your whole tenant, query the accounting API directly:

```bash
curl "$SCAIGRID_BASE_URL/v1/accounting/usage/summary?period=day" \
  -H "Authorization: Bearer $SCAIGRID_API_KEY"
```

See [Accounting and Budgets](../03-core-concepts/04-accounting-and-budgets.md).

## Using the request ID for support

Every response has a request ID in the `X-Scaigrid-Request-Id` header (and in `meta.request_id` in the body). If something goes wrong and you need to contact support, that ID lets us look up the full request trace in seconds.

```python
resp = httpx.post(...)
print("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"))
```

```typescript
const resp = await fetch(...);
console.log("Request ID:", resp.headers.get("X-Scaigrid-Request-Id"));
```

Log this on every request in production. When a user reports an issue, the request ID is the fastest path to answers.

## What's next

- [Chat Completions](../04-api-guides/01-chat-completions.md) — streaming, tool calls, multimodal.
- [Models and Routing](../03-core-concepts/03-models-and-routing.md) — how to pick the right model for a task.
- [Errors](../03-core-concepts/07-errors.md) — full error code reference.