Rate Limiting
ScaiDrive applies rate limits at three layers: per-token, per-user, and per-tenant. Limits are enforced in Redis with a sliding-window algorithm and surface as 429 RATE_LIMITED with Retry-After.
Limit dimensions#
| Dimension | Counted against | Typical limit |
|---|---|---|
| Per-token | The specific access token | 1000 req/min |
| Per-user | The user, across all their tokens | 5000 req/min |
| Per-tenant | The tenant total | 50000 req/min |
Numbers are defaults — tenant admins can raise or lower them per tenant. Partner admins can set the ceiling.
Endpoint classes#
Different endpoint classes have different costs:
| Class | Example endpoints | Cost |
|---|---|---|
| Metadata | GET /api/v1/files/{id} |
1 |
| List | GET /api/v1/folders/{id}/children |
2 |
| Write | PATCH /api/v1/files/{id} |
2 |
| Upload/Download | /content, /sync/download |
5 |
| Search | /search/* |
10 |
| Semantic search | /search/semantic, /search/context |
20 |
Every request consumes its class's cost from your quota bucket. A token with 1000 req/min can make 1000 metadata calls, or 50 semantic searches, or a mix.
Identifying limits in responses#
Every response includes the current state:
1 2 3 | |
X-RateLimit-Limit— quota per minute for this token.X-RateLimit-Remaining— budget left this window.X-RateLimit-Reset— Unix timestamp when the window resets.
On 429, additionally:
1 | |
Seconds until retry is safe.
429 response shape#
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
details.dimension identifies which layer you hit: token, user, or tenant.
Client-side handling#
A well-behaved client:
- Watches
X-RateLimit-Remaining. Below 10%, slow down proactively. - On 429, sleeps for
retry_afterseconds with jitter (±20%) and retries. - Tracks retry counts; after 3 failed retries, fails the operation and surfaces the error.
- Never busy-loops.
1 2 3 4 5 6 7 8 9 10 11 | |
1 2 3 4 5 6 7 8 9 10 | |
Upload limits#
The upload session protocol counts as "upload/download" (cost 5) per chunk PUT. A 10 GB upload split into 4 MB chunks is 2500 requests — at cost 5 each, that's 12500 units, well over a single minute's budget.
For batch uploads, either:
- Stagger chunks so peak request rate stays under your limit; or
- Request a raised limit for the upload window from your admin.
Bandwidth is not rate-limited per se, only the request count.
WebSocket limits#
WebSocket connections don't consume request budget — the handshake is one request, and subsequent frames are free. However:
- Connection count is limited per-token (default 10) and per-tenant (default 10000). Hitting a limit returns
WS close code 4429. - Message rate is limited loosely (1000 msg/min per connection) to prevent abuse. Excess is dropped, not queued.
Connectors and background work#
Connector syncs do not count against interactive user limits. They have their own, higher, per-connector budget.
Background jobs (vectorization indexing, quota recalculation) are server-internal and not subject to request limits.
Tenant admin controls#
Tenant admins can view and tune limits via the admin API:
1 2 3 4 5 6 7 | |
Raising limits counts against the partner-level ceiling. Your partner admin sets the max; your tenant admin sets the configured value below that.