Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Accounting and Budgets

ScaiGrid counts every completion. Usage rolls up the tenancy hierarchy. Budgets enforce at the gateway, before upstream spend.

What gets tracked#

For every inference call (streaming or not), ScaiGrid records:

  • Frontend model — what the caller asked for
  • Backend model — where the request actually went
  • Prompt tokens — input count
  • Completion tokens — output count
  • Latency — end-to-end milliseconds
  • Cost — computed from per-model pricing (input + output rates per million tokens)
  • User, tenant, partner — the full scope chain
  • Request ID — for trace-back

These flow through a two-stage pipeline: Redis counters are incremented immediately (fast, low-latency), and a background worker flushes to MariaDB every 30 seconds (durable, queryable). You get near-real-time usage visibility with a small commit delay.

Usage queries#

The /v1/accounting/usage endpoint supports slicing by any combination of scope, model, user, time window:

bash
1
2
curl "https://scaigrid.scailabs.ai/v1/accounting/usage?period=day&limit=100" \
  -H "Authorization: Bearer $TOKEN"

Summary form aggregates:

bash
1
2
curl "https://scaigrid.scailabs.ai/v1/accounting/usage/summary?period=month&group_by=model" \
  -H "Authorization: Bearer $TOKEN"

Returns:

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "status": "ok",
  "data": [
    {
      "group_key": "scailabs/poolnoodle-omni",
      "request_count": 12450,
      "input_tokens": 4823991,
      "output_tokens": 1287332,
      "total_tokens": 6111323,
      "total_cost": "62.47",
      "backend_cost": "61.93"
    },
    ...
  ]
}

total_cost is what the caller paid you (frontend pricing). backend_cost is what you paid the upstream provider. The difference is your margin.

Permission requirements:

  • accounting:view_own — your own usage only
  • accounting:view_tenant — full tenant usage
  • accounting:view_partner — full partner usage across tenants

Pricing models#

Pricing lives on frontend models as input_price_per_mtok and output_price_per_mtok (decimals, currency-agnostic — set via currency_code setting). Backends have their own cost_input_per_mtok and cost_output_per_mtok for internal cost tracking.

bash
1
2
3
4
5
6
7
curl -X PUT https://scaigrid.scailabs.ai/v1/models/{model_id} \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "input_price_per_mtok": "2.50",
    "output_price_per_mtok": "10.00"
  }'

A prompt of 1,200 tokens and completion of 400 tokens on this model costs:

scdoc
1
2
3
(1200 / 1_000_000) × 2.50  +  (400 / 1_000_000) × 10.00
=  0.003  +  0.004
=  0.007

Recorded per request in decimal form. No rounding.

Budgets#

Budgets cap spend or token count across a scope. Hitting a budget blocks new requests with BUDGET_EXCEEDED (HTTP 429) — existing in-flight requests complete.

bash
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
curl -X POST https://scaigrid.scailabs.ai/v1/accounting/budgets \
  -H "Authorization: Bearer $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "scope": "tenant",
    "scope_id": "tenant_acme",
    "period": "monthly",
    "cost_limit": "500.00",
    "soft_limit_pct": 0.8,
    "hard_action": "block"
  }'
  • scopepartner, tenant, user, or group.
  • perioddaily, weekly, monthly, total (lifetime).
  • cost_limit — max spend in the period (decimal).
  • token_limit — or limit by tokens instead.
  • request_limit — or limit by raw request count.
  • soft_limit_pct — at this fraction of the hard limit, trigger webhooks / alerts (no blocking yet).
  • hard_actionblock (return 429), notify (only warn), throttle (reduce rate limits).

Budgets can stack. A user-level budget under a tenant-level budget under a partner-level budget — all three enforce simultaneously. Most restrictive wins.

Soft limits and alerts#

When usage crosses soft_limit_pct, ScaiGrid fires a budget.soft_limit_reached event on the event bus. Subscribe via a webhook:

bash
1
2
3
4
5
6
7
8
curl -X POST https://scaigrid.scailabs.ai/v1/webhooks \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://your-ops.example/scaigrid-alerts",
    "events": ["budget.soft_limit_reached", "budget.hard_limit_reached"],
    "secret": "whsec_..."
  }'

Your operations team gets Slack'd before the hard block kicks in.

Accounting modes#

ScaiGrid supports two failure modes for the Redis counter pipeline:

  • reject (default, safer) — if Redis is unreachable when checking budget, reject the request. No free inference during Redis outages.
  • allow (available, looser) — if Redis is unreachable, allow the request. Useful if you value availability over exact cost enforcement.

Set via ACCOUNTING_REDIS_FAILURE_MODE env var.

Exporting usage#

For external billing, export raw usage records:

bash
1
2
3
curl "https://scaigrid.scailabs.ai/v1/accounting/export?start=2026-04-01&end=2026-04-30&format=csv" \
  -H "Authorization: Bearer $TOKEN" \
  -o usage-april.csv

Formats: csv, json, ndjson. Useful for feeding into QuickBooks, Stripe metered billing, or your own data warehouse.

Streaming reservations#

Streaming completions don't know their final token count until they finish. To avoid over-committing budget, ScaiGrid reserves tokens up-front based on the request's max_tokens, then settles to the actual count when the stream completes. If the reservation exceeds budget, the stream is rejected before it starts.

This is transparent — you don't need to do anything special for streaming. It just works.

What's next#

Updated 2026-05-18 15:01:28 View source (.md) rev 17