Accounting and Budgets

ScaiGrid counts every completion. Usage rolls up the tenancy hierarchy. Budgets enforce at the gateway, before upstream spend.

What gets tracked#

For every inference call (streaming or not), ScaiGrid records:

Frontend model — what the caller asked for
Backend model — where the request actually went
Prompt tokens — input count
Completion tokens — output count
Latency — end-to-end milliseconds
Cost — computed from per-model pricing (input + output rates per million tokens)
User, tenant, partner — the full scope chain
Request ID — for trace-back

These flow through a two-stage pipeline: Redis counters are incremented immediately (fast, low-latency), and a background worker flushes to MariaDB every 30 seconds (durable, queryable). You get near-real-time usage visibility with a small commit delay.

Usage queries#

The /v1/accounting/usage endpoint supports slicing by any combination of scope, model, user, time window:

bash
curl "https://scaigrid.scailabs.ai/v1/accounting/usage?period=day&limit=100" \
  -H "Authorization: Bearer $TOKEN"

Summary form aggregates:

bash
curl "https://scaigrid.scailabs.ai/v1/accounting/usage/summary?period=month&group_by=model" \
  -H "Authorization: Bearer $TOKEN"

Returns:

json
{
  "status": "ok",
  "data": [
    {
      "group_key": "scailabs/poolnoodle-omni",
      "request_count": 12450,
      "input_tokens": 4823991,
      "output_tokens": 1287332,
      "total_tokens": 6111323,
      "total_cost": "62.47",
      "backend_cost": "61.93"
    },
    ...
  ]
}

total_cost is what the caller paid you (frontend pricing). backend_cost is what you paid the upstream provider. The difference is your margin.

Permission requirements:

accounting:view_own — your own usage only
accounting:view_tenant — full tenant usage
accounting:view_partner — full partner usage across tenants

Pricing models#

Pricing lives on frontend models as input_price_per_mtok and output_price_per_mtok (decimals, currency-agnostic — set via currency_code setting). Backends have their own cost_input_per_mtok and cost_output_per_mtok for internal cost tracking.

bash
curl -X PUT https://scaigrid.scailabs.ai/v1/models/{model_id} \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_price_per_mtok": "2.50",
    "output_price_per_mtok": "10.00"
  }'

A prompt of 1,200 tokens and completion of 400 tokens on this model costs:

scdoc

1
2
3

(1200 / 1_000_000) × 2.50  +  (400 / 1_000_000) × 10.00
=  0.003  +  0.004
=  0.007

Recorded per request in decimal form. No rounding.

Budgets#

Budgets cap spend or token count across a scope. Hitting a budget blocks new requests with BUDGET_EXCEEDED (HTTP 429) — existing in-flight requests complete.

bash
curl -X POST https://scaigrid.scailabs.ai/v1/accounting/budgets \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scope": "tenant",
    "scope_id": "tenant_acme",
    "period": "monthly",
    "cost_limit": "500.00",
    "soft_limit_pct": 0.8,
    "hard_action": "block"
  }'

scope — partner, tenant, user, or group.
period — daily, weekly, monthly, total (lifetime).
cost_limit — max spend in the period (decimal).
token_limit — or limit by tokens instead.
request_limit — or limit by raw request count.
soft_limit_pct — at this fraction of the hard limit, trigger webhooks / alerts (no blocking yet).
hard_action — block (return 429), notify (only warn), throttle (reduce rate limits).

Budgets can stack. A user-level budget under a tenant-level budget under a partner-level budget — all three enforce simultaneously. Most restrictive wins.

Soft limits and alerts#

When usage crosses soft_limit_pct, ScaiGrid fires a budget.soft_limit_reached event on the event bus. Subscribe via a webhook:

bash
curl -X POST https://scaigrid.scailabs.ai/v1/webhooks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-ops.example/scaigrid-alerts",
    "events": ["budget.soft_limit_reached", "budget.hard_limit_reached"],
    "secret": "whsec_..."
  }'

Your operations team gets Slack'd before the hard block kicks in.

Accounting modes#

ScaiGrid supports two failure modes for the Redis counter pipeline:

reject (default, safer) — if Redis is unreachable when checking budget, reject the request. No free inference during Redis outages.
allow (available, looser) — if Redis is unreachable, allow the request. Useful if you value availability over exact cost enforcement.

Set via ACCOUNTING_REDIS_FAILURE_MODE env var.

Exporting usage#

For external billing, export raw usage records:

bash
curl "https://scaigrid.scailabs.ai/v1/accounting/export?start=2026-04-01&end=2026-04-30&format=csv" \
  -H "Authorization: Bearer $TOKEN" \
  -o usage-april.csv

Formats: csv, json, ndjson. Useful for feeding into QuickBooks, Stripe metered billing, or your own data warehouse.

Streaming reservations#

Streaming completions don't know their final token count until they finish. To avoid over-committing budget, ScaiGrid reserves tokens up-front based on the request's max_tokens, then settles to the actual count when the stream completes. If the reservation exceeds budget, the stream is rejected before it starts.

This is transparent — you don't need to do anything special for streaming. It just works.

What's next#

Webhooks — subscribe to budget events.
Rate Limiting — complementary to budgets, protects against bursty abuse.
Accounting Reference — full endpoint list.