---
title: Accounting and Budgets
path: core-concepts/accounting-and-budgets
status: published
---

# Accounting and Budgets

ScaiGrid counts every completion. Usage rolls up the tenancy hierarchy. Budgets enforce at the gateway, before upstream spend.

## What gets tracked

For every inference call (streaming or not), ScaiGrid records:

- **Frontend model** — what the caller asked for
- **Backend model** — where the request actually went
- **Prompt tokens** — input count
- **Completion tokens** — output count
- **Latency** — end-to-end milliseconds
- **Cost** — computed from per-model pricing (input + output rates per million tokens)
- **User, tenant, partner** — the full scope chain
- **Request ID** — for trace-back

These flow through a two-stage pipeline: Redis counters are incremented immediately (fast, low-latency), and a background worker flushes to MariaDB every 30 seconds (durable, queryable). You get near-real-time usage visibility with a small commit delay.

## Usage queries

The `/v1/accounting/usage` endpoint supports slicing by any combination of scope, model, user, time window:

```bash
curl "https://scaigrid.scailabs.ai/v1/accounting/usage?period=day&limit=100" \
  -H "Authorization: Bearer $TOKEN"
```

Summary form aggregates:

```bash
curl "https://scaigrid.scailabs.ai/v1/accounting/usage/summary?period=month&group_by=model" \
  -H "Authorization: Bearer $TOKEN"
```

Returns:

```json
{
  "status": "ok",
  "data": [
    {
      "group_key": "scailabs/poolnoodle-omni",
      "request_count": 12450,
      "input_tokens": 4823991,
      "output_tokens": 1287332,
      "total_tokens": 6111323,
      "total_cost": "62.47",
      "backend_cost": "61.93"
    },
    ...
  ]
}
```

`total_cost` is what the caller paid you (frontend pricing). `backend_cost` is what you paid the upstream provider. The difference is your margin.

Permission requirements:
- `accounting:view_own` — your own usage only
- `accounting:view_tenant` — full tenant usage
- `accounting:view_partner` — full partner usage across tenants

## Pricing models

Pricing lives on frontend models as `input_price_per_mtok` and `output_price_per_mtok` (decimals, currency-agnostic — set via `currency_code` setting). Backends have their own `cost_input_per_mtok` and `cost_output_per_mtok` for internal cost tracking.

```bash
curl -X PUT https://scaigrid.scailabs.ai/v1/models/{model_id} \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_price_per_mtok": "2.50",
    "output_price_per_mtok": "10.00"
  }'
```

A prompt of 1,200 tokens and completion of 400 tokens on this model costs:

```
(1200 / 1_000_000) × 2.50  +  (400 / 1_000_000) × 10.00
=  0.003  +  0.004
=  0.007
```

Recorded per request in decimal form. No rounding.

## Budgets

Budgets cap spend or token count across a scope. Hitting a budget blocks new requests with `BUDGET_EXCEEDED` (HTTP 429) — existing in-flight requests complete.

```bash
curl -X POST https://scaigrid.scailabs.ai/v1/accounting/budgets \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scope": "tenant",
    "scope_id": "tenant_acme",
    "period": "monthly",
    "cost_limit": "500.00",
    "soft_limit_pct": 0.8,
    "hard_action": "block"
  }'
```

- **`scope`** — `partner`, `tenant`, `user`, or `group`.
- **`period`** — `daily`, `weekly`, `monthly`, `total` (lifetime).
- **`cost_limit`** — max spend in the period (decimal).
- **`token_limit`** — or limit by tokens instead.
- **`request_limit`** — or limit by raw request count.
- **`soft_limit_pct`** — at this fraction of the hard limit, trigger webhooks / alerts (no blocking yet).
- **`hard_action`** — `block` (return 429), `notify` (only warn), `throttle` (reduce rate limits).

Budgets can stack. A user-level budget under a tenant-level budget under a partner-level budget — all three enforce simultaneously. Most restrictive wins.

## Soft limits and alerts

When usage crosses `soft_limit_pct`, ScaiGrid fires a `budget.soft_limit_reached` event on the event bus. Subscribe via a webhook:

```bash
curl -X POST https://scaigrid.scailabs.ai/v1/webhooks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-ops.example/scaigrid-alerts",
    "events": ["budget.soft_limit_reached", "budget.hard_limit_reached"],
    "secret": "whsec_..."
  }'
```

Your operations team gets Slack'd before the hard block kicks in.

## Accounting modes

ScaiGrid supports two failure modes for the Redis counter pipeline:

- **`reject`** (default, safer) — if Redis is unreachable when checking budget, reject the request. No free inference during Redis outages.
- **`allow`** (available, looser) — if Redis is unreachable, allow the request. Useful if you value availability over exact cost enforcement.

Set via `ACCOUNTING_REDIS_FAILURE_MODE` env var.

## Exporting usage

For external billing, export raw usage records:

```bash
curl "https://scaigrid.scailabs.ai/v1/accounting/export?start=2026-04-01&end=2026-04-30&format=csv" \
  -H "Authorization: Bearer $TOKEN" \
  -o usage-april.csv
```

Formats: `csv`, `json`, `ndjson`. Useful for feeding into QuickBooks, Stripe metered billing, or your own data warehouse.

## Streaming reservations

Streaming completions don't know their final token count until they finish. To avoid over-committing budget, ScaiGrid reserves tokens up-front based on the request's `max_tokens`, then settles to the actual count when the stream completes. If the reservation exceeds budget, the stream is rejected before it starts.

This is transparent — you don't need to do anything special for streaming. It just works.

## What's next

- [Webhooks](../06-reference/08-webhooks.md) — subscribe to budget events.
- [Rate Limiting](../07-advanced/05-rate-limiting.md) — complementary to budgets, protects against bursty abuse.
- [Accounting Reference](../06-reference/07-accounting.md) — full endpoint list.
