Rate Limiting

ScaiDrive applies rate limits at three layers: per-token, per-user, and per-tenant. Limits are enforced in Redis with a sliding-window algorithm and surface as 429 RATE_LIMITED with Retry-After.

Limit dimensions#

Dimension	Counted against	Typical limit
Per-token	The specific access token	1000 req/min
Per-user	The user, across all their tokens	5000 req/min
Per-tenant	The tenant total	50000 req/min

Numbers are defaults — tenant admins can raise or lower them per tenant. Partner admins can set the ceiling.

Endpoint classes#

Different endpoint classes have different costs:

Class	Example endpoints	Cost
Metadata	`GET /api/v1/files/{id}`	1
List	`GET /api/v1/folders/{id}/children`	2
Write	`PATCH /api/v1/files/{id}`	2
Upload/Download	`/content`, `/sync/download`	5
Search	`/search/*`	10
Semantic search	`/search/semantic`, `/search/context`	20

Every request consumes its class's cost from your quota bucket. A token with 1000 req/min can make 1000 metadata calls, or 50 semantic searches, or a mix.

Identifying limits in responses#

Every response includes the current state:

http
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1713888060

X-RateLimit-Limit — quota per minute for this token.
X-RateLimit-Remaining — budget left this window.
X-RateLimit-Reset — Unix timestamp when the window resets.

On 429, additionally:

http

1	`Retry-After: 22`

Seconds until retry is safe.

429 response shape#

json
{
  "status": "error",
  "error": {
    "code": "RATE_LIMITED",
    "message": "Per-token rate limit exceeded",
    "retry_after": 22,
    "details": {
      "dimension": "token",
      "limit": 1000,
      "window_seconds": 60
    }
  },
  "meta": {"request_id": "req_xyz"}
}

details.dimension identifies which layer you hit: token, user, or tenant.

Client-side handling#

A well-behaved client:

Watches X-RateLimit-Remaining. Below 10%, slow down proactively.
On 429, sleeps for retry_after seconds with jitter (±20%) and retries.
Tracks retry counts; after 3 failed retries, fails the operation and surfaces the error.
Never busy-loops.

python
import time, random

def request_with_retry(client, method, path, **kw):
    for attempt in range(4):
        r = client.request(method, path, **kw)
        if r.status_code == 429:
            delay = int(r.headers.get("Retry-After", 2))
            time.sleep(delay * (0.8 + 0.4 * random.random()))
            continue
        return r
    r.raise_for_status()

typescript
async function withRetry(req: () => Promise<Response>): Promise<Response> {
  for (let i = 0; i < 4; i++) {
    const r = await req();
    if (r.status !== 429) return r;
    const delay = Number(r.headers.get("Retry-After") ?? 2);
    const jitter = 0.8 + 0.4 * Math.random();
    await new Promise((res) => setTimeout(res, delay * 1000 * jitter));
  }
  throw new Error("rate limit retries exhausted");
}

Upload limits#

The upload session protocol counts as "upload/download" (cost 5) per chunk PUT. A 10 GB upload split into 4 MB chunks is 2500 requests — at cost 5 each, that's 12500 units, well over a single minute's budget.

For batch uploads, either:

Stagger chunks so peak request rate stays under your limit; or
Request a raised limit for the upload window from your admin.

Bandwidth is not rate-limited per se, only the request count.

WebSocket limits#

WebSocket connections don't consume request budget — the handshake is one request, and subsequent frames are free. However:

Connection count is limited per-token (default 10) and per-tenant (default 10000). Hitting a limit returns WS close code 4429.
Message rate is limited loosely (1000 msg/min per connection) to prevent abuse. Excess is dropped, not queued.

Connectors and background work#

Connector syncs do not count against interactive user limits. They have their own, higher, per-connector budget.

Background jobs (vectorization indexing, quota recalculation) are server-internal and not subject to request limits.

Tenant admin controls#

Tenant admins can view and tune limits via the admin API:

bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits

curl -X PATCH $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"per_user_per_minute": 10000}'

Raising limits counts against the partner-level ceiling. Your partner admin sets the max; your tenant admin sets the configured value below that.