---
title: Errors
path: core-concepts/errors
status: published
---

# Errors

ScaiGrid error responses are structured and stable. Your code branches on `code`, never on HTTP status alone or string matching.

## Error envelope

Every error response has the same shape:

```json
{
  "status": "error",
  "error": {
    "code": "BACKEND_RATE_LIMITED",
    "message": "Backend rate limit exceeded — please retry later",
    "retry_after": 30
  },
  "meta": {
    "request_id": "req_abc123"
  }
}
```

- **`status`** — always `"error"` for error responses.
- **`error.code`** — machine-readable code from a stable vocabulary. Branch on this.
- **`error.message`** — human-readable description. Display or log as-is; don't parse.
- **`error.retry_after`** — optional, present on rate-limit and some timeout errors. Seconds to wait.
- **`error.details`** — optional, present on validation errors. Array of `{field, message}` for each problem.
- **`meta.request_id`** — unique request ID. Include in any support ticket.

The HTTP status code in the response line matches the error class (400-level for client errors, 500-level for server/gateway errors), but the `code` field is more specific and should drive your handling.

## Streaming error frames

For streaming endpoints (SSE), mid-stream errors arrive as a distinct event:

```
event: error
data: {"code": "BACKEND_ERROR", "message": "..."}

data: [DONE]
```

Listen for the `error` event type in your SSE client, not just `data` events. The stream always ends with `data: [DONE]` whether it succeeded or failed.

## Retry classification

| Code | Retry? | Notes |
|------|--------|-------|
| `AUTH_TOKEN_INVALID` | No | Fix credentials |
| `AUTH_TOKEN_MISSING` | No | Add Authorization header |
| `AUTHZ_PERMISSION_DENIED` | No | User lacks permission |
| `VALIDATION_ERROR` | No | Malformed request body |
| `MODEL_NOT_FOUND` | No | Model slug doesn't exist |
| `MODEL_ACCESS_DENIED` | No | Model not enabled for tenant |
| `MODEL_UNAVAILABLE` | Sometimes | All backends are unhealthy — retry after a delay |
| `BACKEND_RATE_LIMITED` | Yes | Honor `retry_after` |
| `BACKEND_TIMEOUT` | Yes | Backoff exponentially |
| `BACKEND_ERROR` | Once | Upstream transient failure |
| `UPSTREAM_SHAPE_MISMATCH` | No | Gateway integration bug — file a support ticket |
| `BUDGET_EXCEEDED` | No | Wait for budget period to roll over |
| `RATE_LIMITED` | Yes | Per-key/user/tenant limit — honor `retry_after` |
| `QUOTA_EXCEEDED` | No | Hard quota reached |
| `SERVICE_DRAINING` | Yes | Gateway is rolling — retry shortly |

## Canonical error codes

**Authentication / authorization**
- `AUTH_TOKEN_MISSING` (401) — no Authorization header
- `AUTH_TOKEN_INVALID` (401) — token couldn't be validated
- `AUTH_INSUFFICIENT_SCOPE` (403) — token lacks required scope
- `AUTHZ_PERMISSION_DENIED` (403) — user lacks permission
- `SESSION_EXPIRED` (401) — refresh your token

**Validation**
- `VALIDATION_ERROR` (422) — request body didn't match schema. Check `error.details`.
- `SLUG_CONFLICT` (409) — another resource already uses this slug

**Models and backends**
- `MODEL_NOT_FOUND` (404) — no frontend model with that slug
- `MODEL_ACCESS_DENIED` (403) — model exists but not enabled for this tenant
- `MODEL_UNAVAILABLE` (503) — all backends for this model are unhealthy or circuit-broken
- `BACKEND_NOT_FOUND` (404) — backend doesn't exist
- `BACKEND_ERROR` (502) — upstream provider returned an error
- `BACKEND_TIMEOUT` (504) — upstream didn't respond in time
- `BACKEND_RATE_LIMITED` (429) — upstream rate-limited us; includes `retry_after`
- `UPSTREAM_SHAPE_MISMATCH` (502) — upstream sent a response our parsers didn't accept

**Rate limits and quotas**
- `RATE_LIMITED` (429) — ScaiGrid's own rate limiter triggered
- `QUOTA_EXCEEDED` (429) — request quota exceeded
- `BUDGET_EXCEEDED` (429) — usage budget exceeded

**Tenant / partner**
- `PARTNER_NOT_FOUND` (404)
- `TENANT_NOT_FOUND` (404)
- `TENANT_SUSPENDED` (403) — tenant is administratively suspended
- `PARTNER_SUSPENDED` (403)

**Modules**
- `MODULE_NOT_FOUND` (404)
- `MODULE_NOT_ENABLED` (403) — module isn't enabled for this tenant
- `MODULE_DEPENDENCY_UNAVAILABLE` (424) — required upstream module isn't available

**Resources**
- `USER_NOT_FOUND` (404)
- `API_KEY_NOT_FOUND` (404)
- `GROUP_NOT_FOUND` (404)
- `ROOM_NOT_FOUND` (404)
- `SESSION_NOT_FOUND` (404)
- `ROUTING_POLICY_NOT_FOUND` (404)
- `BUDGET_NOT_FOUND` (404)
- `WEBHOOK_NOT_FOUND` (404)
- `BATCH_NOT_FOUND` (404)
- `CHECKPOINT_NOT_FOUND` (404)

**Service**
- `SERVICE_DRAINING` (503) — gateway is gracefully shutting down
- `SERVICE_UNAVAILABLE` (503) — a dependency (Redis, MariaDB) is down

Module-specific codes are listed in each module's reference page and follow the convention `{MODULE}_{NAME}` (e.g., `SCAIQUEUE_MESSAGE_NOT_FOUND`).

## Full reference

See [Error Codes Reference](../06-reference/11-error-codes.md) for the complete, exhaustive list.

## Sample error handler

A minimal Python handler that classifies and retries correctly:

```python
import httpx
import time

RETRYABLE = {
    "BACKEND_RATE_LIMITED", "BACKEND_TIMEOUT", "BACKEND_ERROR",
    "MODEL_UNAVAILABLE", "RATE_LIMITED", "SERVICE_DRAINING",
}

class ScaiGridError(Exception):
    def __init__(self, code, message, retry_after=None, request_id=None):
        self.code = code
        self.message = message
        self.retry_after = retry_after
        self.request_id = request_id
        super().__init__(f"{code}: {message} (request_id={request_id})")

def call_with_retry(method, url, *, max_attempts=3, **kwargs):
    for attempt in range(max_attempts):
        resp = httpx.request(method, url, **kwargs)
        body = resp.json()
        rid = resp.headers.get("X-Scaigrid-Request-Id")
        if body.get("status") == "error":
            err = body["error"]
            if err["code"] not in RETRYABLE or attempt == max_attempts - 1:
                raise ScaiGridError(err["code"], err["message"],
                                     err.get("retry_after"), rid)
            time.sleep(err.get("retry_after") or (2 ** attempt))
            continue
        return body["data"]
    raise RuntimeError("unreachable")
```

## What's next

- [Error Codes Reference](../06-reference/11-error-codes.md) — complete code list.
- [Your First Integration](../02-getting-started/03-your-first-integration.md) — uses this pattern end-to-end.
