Models and Routing

ScaiGrid separates three concerns:

Frontend models — the stable public names your application asks for.
Backend models — the actual upstream deployments that answer requests.
Routing policies — the rules that map one to the other.

Your code only ever names a frontend model. Operators reshape what's behind that name — swapping providers, adjusting weights, configuring failover — without touching your code.

Frontend models#

A frontend model is an abstract identity. It has a slug (scailabs/poolnoodle-omni), a display name, a modality (chat, embedding, image, audio), capabilities, and optional defaults (context window, max output tokens, system prompt template, pricing). It may carry metadata (persona assignment, compliance flags).

Think of it as the menu item your application orders from. The kitchen decides how to prepare it.

List frontend models:

bash
curl -H "Authorization: Bearer $TOKEN" https://scaigrid.scailabs.ai/v1/models

Create one:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/models \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "slug": "tenant/acme/summarizer",
    "display_name": "Acme Summarizer",
    "modality": "chat",
    "context_window": 128000,
    "max_output_tokens": 4096
  }'

Slug conventions:

openai/gpt-4o — platform-level, OpenAI-provided
scailabs/poolnoodle-omni — platform-level, ScaiLabs-provided
partner/{partner_slug}/... — partner-scoped
tenant/{tenant_slug}/... — tenant-scoped

Backend models#

A backend model is a concrete, routable endpoint. It has a URI (openai:gpt-4o or scaiinfer:node-eu-01/llama-3.3), a provider type, credentials (stored encrypted), capabilities, and health status.

A backend is specific: it points at one provider, one deployment, one region. When ScaiGrid calls it, it knows exactly where the request goes.

List backends:

bash
curl -H "Authorization: Bearer $ADMIN_TOKEN" https://scaigrid.scailabs.ai/v1/backends

bash
curl -X POST https://scaigrid.scailabs.ai/v1/backends \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "display_name": "OpenAI GPT-4o (us-east)",
    "uri": "openai:gpt-4o",
    "provider_type": "openai",
    "connection_config": {"api_key": "sk-..."}
  }'

Supported provider types: openai, anthropic, azure, mistral, qwen, google, scaiinfer (our distributed inference cluster), custom (any OpenAI-protocol-compatible endpoint).

Providers#

A provider is a group of backends sharing configuration — an OpenAI account, an Azure subscription, a ScaiInfer cluster. Providers have discovery endpoints (POST /v1/providers/{id}/discover) that list what models are available from that provider, so you can add backends without hand-typing identifiers.

bash
curl -X POST https://scaigrid.scailabs.ai/v1/providers \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Our OpenAI Account",
    "provider_type": "openai",
    "connection_config": {"api_key": "sk-..."}
  }'

Routing policies#

A routing policy decides, for each request to a frontend model, which backend to call. Policies are named and reusable — multiple frontend models can share the same policy.

Simplest case: a frontend model maps 1:1 to a backend. No policy needed — the mapping itself carries weight 100, priority 1.

Common multi-backend patterns:

Weighted round-robin. Two backends, 70/30 split.

json
{
  "mappings": [
    {"backend_id": "backend_a", "weight": 70, "priority": 1},
    {"backend_id": "backend_b", "weight": 30, "priority": 1}
  ]
}

Primary + failover. Use backend A unless it's unhealthy, then backend B.

json
{
  "mappings": [
    {"backend_id": "backend_a", "weight": 100, "priority": 1},
    {"backend_id": "backend_b", "weight": 100, "priority": 2}
  ]
}

Higher priority number = later fallback. Within a priority tier, traffic splits by weight.

Map a frontend to backends:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/routing/mappings \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "frontend_id": "fm_poolnoodle_omni",
    "backend_id": "be_openai_gpt4o",
    "weight": 100,
    "priority": 1
  }'

Model access control#

Frontend models are listed per-tenant, but not every tenant should see every model. Use model access policies to scope visibility:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/model-access \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scope_type": "tenant",
    "scope_id": "tenant_acme",
    "model_slug": "openai/gpt-4o",
    "enabled": false
  }'

This disables openai/gpt-4o for tenant_acme. Without an explicit entry, models are implicitly allowed.

Model groups#

For grant-in-bulk scenarios, group models:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/model-groups \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "GPT family",
    "members": ["openai/gpt-4o", "openai/gpt-4o-mini", "openai/gpt-4.1"]
  }'

Then grant or deny the whole group at once via /v1/model-access with model_group_id.

Health and circuit breaking#

Each backend has a health status: healthy, degraded, unhealthy, unavailable. ScaiGrid tracks failures per-backend and opens a circuit breaker after repeated errors. An unhealthy backend is skipped until its circuit closes (on successful probe requests).

Health checks: GET /v1/backends/{backend_id}/health.

Provider discovery#

Rather than typing out every model from a provider, ask ScaiGrid to discover them:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/providers/{provider_id}/discover \
  -H "Authorization: Bearer $ADMIN_TOKEN"

Returns a list of available upstream models. You can selectively configure some as backends:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/providers/{provider_id}/models/gpt-4o/configure \
  -H "Authorization: Bearer $ADMIN_TOKEN"

What's next#

Chat Completions — calling models from your app.
Models and Routing Reference — full endpoint list.
Rate Limiting — per-key/user/tenant limits.