Rate Limiting

ScaiVault applies rate limits per identity across several endpoint categories. Limits protect the service from runaway clients and stop a compromised token from doing maximum damage before detection.

How it works#

Each endpoint belongs to a category. Each (identity, category) pair has a token bucket in Redis. Every matching request consumes a token; refill is continuous at the category's rate. An empty bucket returns 429 rate_limited with Retry-After.

Token-bucket lets you burst short-term (up to the bucket size) and sustain the nominal rate long-term. Good for real-world traffic patterns where startup loads N secrets in a burst, then the service settles into steady-state.

Defaults#

These are defaults on the managed service; self-hosted deployments can override.

Category	Limit	Window
Read secrets	1000	per minute
Write secrets	100	per minute
Delete secrets	100	per minute
List operations	100	per minute
Batch read	100	per minute
Batch metadata	100	per minute
Audit queries	50	per minute
Audit export	5	per minute
Policy operations	50	per minute
Policy test	200	per minute
PKI issue / sign	50	per minute
PKI admin (CA, roles)	20	per minute
ACME order	30	per minute
Webhook operations	20	per minute
Subscription operations	20	per minute
Subscription poll	240	per minute (one long-poll = one unit)
Federation operations	20	per minute
Dynamic credential generation	100	per minute
Dynamic engine/role admin	20	per minute
Lease operations (renew, revoke)	100	per minute
Identity read	100	per minute
Identity sync	5	per minute
Auth whoami / introspect	300	per minute

Limits are per identity, not per IP. A service account and a user with the same IP have separate buckets.

Rate limit headers#

Every response includes:

text

1
2
3

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 842
X-RateLimit-Reset: 1714478400

X-RateLimit-Reset is a Unix timestamp when the bucket refills fully. Use it for pacing.

On 429:

http
HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714478445

Body:

json
{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry after 45 seconds.",
    "details": {
      "limit": 1000,
      "window": "1m",
      "retry_after": 45,
      "category": "secrets:read"
    }
  }
}

Sleep Retry-After seconds and try again. All official SDKs do this automatically.

Quotas vs rate limits#

Rate limits smooth bursts. Quotas cap longer-term consumption (and come with 429 quota_exceeded instead of rate_limited). Quotas are per-tenant and per-month by default.

Resource	Default monthly quota
Secret reads	Unlimited
Secret writes	1,000,000
Certificates issued	10,000
ACME orders	1,000
Dynamic leases	100,000
Audit log entries	Unlimited
Audit exports	100

Monitor quota usage: GET /v1/tenant/quota (admin only).

Burst tuning#

On self-hosted deployments, configure rate limits per endpoint via RATE_LIMITS env var (YAML-formatted):

yaml
secrets:read:
  rate: 2000
  window: 1m
  burst: 500
secrets:write:
  rate: 200
  window: 1m

burst is the bucket depth — the maximum immediately-serviced request count. Default is rate * 0.5.

Avoiding limits in practice#

Cache reads. A 1-minute in-process cache cuts most callers' read traffic by 10-100x. Invalidate on rotation events instead of polling.
Batch where you can. One POST /v1/secrets/batch with 50 paths is 50 reads against the individual limit, but only 1 against the batch-read limit (which is also softer).
Use subscriptions/webhooks. Instead of polling for "has this changed yet?", subscribe and react to the event.
Don't list from the hot path. GET /v1/secrets?prefix=... is a heavy operation — pre-compute lists in background jobs when possible.

Multi-identity patterns#

One service that needs a lot of reads should not share a token across instances — each instance gets its own token, each token its own bucket. ScaiKey makes this easy: mint a client_credentials token per pod/worker, reclaim them on shutdown.

A platform that serves many customers should not funnel all customer reads through one service account — you end up hitting the service-account's limit. Give each customer tenant its own service account (usually automatic in ScaiKey) and let each customer consume their own quota.

What's next#

Errors — retry semantics.
Error Codes — full code list.