---
title: Rate Limiting
path: advanced/rate-limiting
status: published
---

# Rate Limiting

ScaiVault applies rate limits per identity across several endpoint categories. Limits protect the service from runaway clients and stop a compromised token from doing maximum damage before detection.

## How it works

Each endpoint belongs to a category. Each `(identity, category)` pair has a token bucket in Redis. Every matching request consumes a token; refill is continuous at the category's rate. An empty bucket returns `429 rate_limited` with `Retry-After`.

Token-bucket lets you burst short-term (up to the bucket size) and sustain the nominal rate long-term. Good for real-world traffic patterns where startup loads N secrets in a burst, then the service settles into steady-state.

## Defaults

These are defaults on the managed service; self-hosted deployments can override.

| Category | Limit | Window |
|----------|-------|--------|
| Read secrets | 1000 | per minute |
| Write secrets | 100 | per minute |
| Delete secrets | 100 | per minute |
| List operations | 100 | per minute |
| Batch read | 100 | per minute |
| Batch metadata | 100 | per minute |
| Audit queries | 50 | per minute |
| Audit export | 5 | per minute |
| Policy operations | 50 | per minute |
| Policy test | 200 | per minute |
| PKI issue / sign | 50 | per minute |
| PKI admin (CA, roles) | 20 | per minute |
| ACME order | 30 | per minute |
| Webhook operations | 20 | per minute |
| Subscription operations | 20 | per minute |
| Subscription poll | 240 | per minute (one long-poll = one unit) |
| Federation operations | 20 | per minute |
| Dynamic credential generation | 100 | per minute |
| Dynamic engine/role admin | 20 | per minute |
| Lease operations (renew, revoke) | 100 | per minute |
| Identity read | 100 | per minute |
| Identity sync | 5 | per minute |
| Auth whoami / introspect | 300 | per minute |

Limits are per identity, not per IP. A service account and a user with the same IP have separate buckets.

## Rate limit headers

Every response includes:

```
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 842
X-RateLimit-Reset: 1714478400
```

`X-RateLimit-Reset` is a Unix timestamp when the bucket refills fully. Use it for pacing.

On `429`:

```
HTTP/1.1 429 Too Many Requests
Retry-After: 45
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714478445
```

Body:

```json
{
  "error": {
    "code": "rate_limited",
    "message": "Rate limit exceeded. Retry after 45 seconds.",
    "details": {
      "limit": 1000,
      "window": "1m",
      "retry_after": 45,
      "category": "secrets:read"
    }
  }
}
```

Sleep `Retry-After` seconds and try again. All official SDKs do this automatically.

## Quotas vs rate limits

Rate limits smooth bursts. Quotas cap longer-term consumption (and come with `429 quota_exceeded` instead of `rate_limited`). Quotas are per-tenant and per-month by default.

| Resource | Default monthly quota |
|----------|------------------------|
| Secret reads | Unlimited |
| Secret writes | 1,000,000 |
| Certificates issued | 10,000 |
| ACME orders | 1,000 |
| Dynamic leases | 100,000 |
| Audit log entries | Unlimited |
| Audit exports | 100 |

Monitor quota usage: `GET /v1/tenant/quota` (admin only).

## Burst tuning

On self-hosted deployments, configure rate limits per endpoint via `RATE_LIMITS` env var (YAML-formatted):

```yaml
secrets:read:
  rate: 2000
  window: 1m
  burst: 500
secrets:write:
  rate: 200
  window: 1m
```

`burst` is the bucket depth — the maximum immediately-serviced request count. Default is `rate * 0.5`.

## Avoiding limits in practice

- **Cache reads.** A 1-minute in-process cache cuts most callers' read traffic by 10-100x. Invalidate on rotation events instead of polling.
- **Batch where you can.** One `POST /v1/secrets/batch` with 50 paths is 50 reads against the individual limit, but only 1 against the batch-read limit (which is also softer).
- **Use subscriptions/webhooks.** Instead of polling for "has this changed yet?", subscribe and react to the event.
- **Don't list from the hot path.** `GET /v1/secrets?prefix=...` is a heavy operation — pre-compute lists in background jobs when possible.

## Multi-identity patterns

One service that needs a lot of reads should *not* share a token across instances — each instance gets its own token, each token its own bucket. ScaiKey makes this easy: mint a client_credentials token per pod/worker, reclaim them on shutdown.

A platform that serves many customers should *not* funnel all customer reads through one service account — you end up hitting the service-account's limit. Give each customer tenant its own service account (usually automatic in ScaiKey) and let each customer consume their own quota.

## What's next

- [Errors](../core-concepts/errors) — retry semantics.
- [Error Codes](../reference/error-codes) — full code list.
