Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Rate Limiting

ScaiVault applies rate limits per identity across several endpoint categories. Limits protect the service from runaway clients and stop a compromised token from doing maximum damage before detection.

How it works#

Each endpoint belongs to a category. Each (identity, category) pair has a token bucket in Redis. Every matching request consumes a token; refill is continuous at the category's rate. An empty bucket returns 429 rate_limited with Retry-After.

Token-bucket lets you burst short-term (up to the bucket size) and sustain the nominal rate long-term. Good for real-world traffic patterns where startup loads N secrets in a burst, then the service settles into steady-state.

Defaults#

These are defaults on the managed service; self-hosted deployments can override.

Category Limit Window
Read secrets 1000 per minute
Write secrets 100 per minute
Delete secrets 100 per minute
List operations 100 per minute
Batch read 100 per minute
Batch metadata 100 per minute
Audit queries 50 per minute
Audit export 5 per minute
Policy operations 50 per minute
Policy test 200 per minute
PKI issue / sign 50 per minute
PKI admin (CA, roles) 20 per minute
ACME order 30 per minute
Webhook operations 20 per minute
Subscription operations 20 per minute
Subscription poll 240 per minute (one long-poll = one unit)
Federation operations 20 per minute
Dynamic credential generation 100 per minute
Dynamic engine/role admin 20 per minute
Lease operations (renew, revoke) 100 per minute
Identity read 100 per minute
Identity sync 5 per minute
Auth whoami / introspect 300 per minute

Limits are per identity, not per IP. A service account and a user with the same IP have separate buckets.

Rate limit headers#

Every response includes:

text
1
2
3
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 842
X-RateLimit-Reset: 1714478400

X-RateLimit-Reset is a Unix timestamp when the bucket refills fully. Use it for pacing.

On 429:

http
1
2
3
4
5
HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714478445

Body:

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry after 45 seconds.",
    "details": {
      "limit": 1000,
      "window": "1m",
      "retry_after": 45,
      "category": "secrets:read"
    }
  }
}

Sleep Retry-After seconds and try again. All official SDKs do this automatically.

Quotas vs rate limits#

Rate limits smooth bursts. Quotas cap longer-term consumption (and come with 429 quota_exceeded instead of rate_limited). Quotas are per-tenant and per-month by default.

Resource Default monthly quota
Secret reads Unlimited
Secret writes 1,000,000
Certificates issued 10,000
ACME orders 1,000
Dynamic leases 100,000
Audit log entries Unlimited
Audit exports 100

Monitor quota usage: GET /v1/tenant/quota (admin only).

Burst tuning#

On self-hosted deployments, configure rate limits per endpoint via RATE_LIMITS env var (YAML-formatted):

yaml
1
2
3
4
5
6
7
secrets:read:
  rate: 2000
  window: 1m
  burst: 500
secrets:write:
  rate: 200
  window: 1m

burst is the bucket depth — the maximum immediately-serviced request count. Default is rate * 0.5.

Avoiding limits in practice#

  • Cache reads. A 1-minute in-process cache cuts most callers' read traffic by 10-100x. Invalidate on rotation events instead of polling.
  • Batch where you can. One POST /v1/secrets/batch with 50 paths is 50 reads against the individual limit, but only 1 against the batch-read limit (which is also softer).
  • Use subscriptions/webhooks. Instead of polling for "has this changed yet?", subscribe and react to the event.
  • Don't list from the hot path. GET /v1/secrets?prefix=... is a heavy operation — pre-compute lists in background jobs when possible.

Multi-identity patterns#

One service that needs a lot of reads should not share a token across instances — each instance gets its own token, each token its own bucket. ScaiKey makes this easy: mint a client_credentials token per pod/worker, reclaim them on shutdown.

A platform that serves many customers should not funnel all customer reads through one service account — you end up hitting the service-account's limit. Give each customer tenant its own service account (usually automatic in ScaiKey) and let each customer consume their own quota.

What's next#

Updated 2026-05-17 13:26:49 View source (.md) rev 2