Rate Limiting
ScaiVault applies rate limits per identity across several endpoint categories. Limits protect the service from runaway clients and stop a compromised token from doing maximum damage before detection.
How it works#
Each endpoint belongs to a category. Each (identity, category) pair has a token bucket in Redis. Every matching request consumes a token; refill is continuous at the category's rate. An empty bucket returns 429 rate_limited with Retry-After.
Token-bucket lets you burst short-term (up to the bucket size) and sustain the nominal rate long-term. Good for real-world traffic patterns where startup loads N secrets in a burst, then the service settles into steady-state.
Defaults#
These are defaults on the managed service; self-hosted deployments can override.
| Category | Limit | Window |
|---|---|---|
| Read secrets | 1000 | per minute |
| Write secrets | 100 | per minute |
| Delete secrets | 100 | per minute |
| List operations | 100 | per minute |
| Batch read | 100 | per minute |
| Batch metadata | 100 | per minute |
| Audit queries | 50 | per minute |
| Audit export | 5 | per minute |
| Policy operations | 50 | per minute |
| Policy test | 200 | per minute |
| PKI issue / sign | 50 | per minute |
| PKI admin (CA, roles) | 20 | per minute |
| ACME order | 30 | per minute |
| Webhook operations | 20 | per minute |
| Subscription operations | 20 | per minute |
| Subscription poll | 240 | per minute (one long-poll = one unit) |
| Federation operations | 20 | per minute |
| Dynamic credential generation | 100 | per minute |
| Dynamic engine/role admin | 20 | per minute |
| Lease operations (renew, revoke) | 100 | per minute |
| Identity read | 100 | per minute |
| Identity sync | 5 | per minute |
| Auth whoami / introspect | 300 | per minute |
Limits are per identity, not per IP. A service account and a user with the same IP have separate buckets.
Rate limit headers#
Every response includes:
1 2 3 | |
X-RateLimit-Reset is a Unix timestamp when the bucket refills fully. Use it for pacing.
On 429:
1 2 3 4 5 | |
Body:
1 2 3 4 5 6 7 8 9 10 11 12 | |
Sleep Retry-After seconds and try again. All official SDKs do this automatically.
Quotas vs rate limits#
Rate limits smooth bursts. Quotas cap longer-term consumption (and come with 429 quota_exceeded instead of rate_limited). Quotas are per-tenant and per-month by default.
| Resource | Default monthly quota |
|---|---|
| Secret reads | Unlimited |
| Secret writes | 1,000,000 |
| Certificates issued | 10,000 |
| ACME orders | 1,000 |
| Dynamic leases | 100,000 |
| Audit log entries | Unlimited |
| Audit exports | 100 |
Monitor quota usage: GET /v1/tenant/quota (admin only).
Burst tuning#
On self-hosted deployments, configure rate limits per endpoint via RATE_LIMITS env var (YAML-formatted):
1 2 3 4 5 6 7 | |
burst is the bucket depth — the maximum immediately-serviced request count. Default is rate * 0.5.
Avoiding limits in practice#
- Cache reads. A 1-minute in-process cache cuts most callers' read traffic by 10-100x. Invalidate on rotation events instead of polling.
- Batch where you can. One
POST /v1/secrets/batchwith 50 paths is 50 reads against the individual limit, but only 1 against the batch-read limit (which is also softer). - Use subscriptions/webhooks. Instead of polling for "has this changed yet?", subscribe and react to the event.
- Don't list from the hot path.
GET /v1/secrets?prefix=...is a heavy operation — pre-compute lists in background jobs when possible.
Multi-identity patterns#
One service that needs a lot of reads should not share a token across instances — each instance gets its own token, each token its own bucket. ScaiKey makes this easy: mint a client_credentials token per pod/worker, reclaim them on shutdown.
A platform that serves many customers should not funnel all customer reads through one service account — you end up hitting the service-account's limit. Give each customer tenant its own service account (usually automatic in ScaiKey) and let each customer consume their own quota.
What's next#
- Errors — retry semantics.
- Error Codes — full code list.