Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Rate Limiting

ScaiDrive applies rate limits at three layers: per-token, per-user, and per-tenant. Limits are enforced in Redis with a sliding-window algorithm and surface as 429 RATE_LIMITED with Retry-After.

Limit dimensions#

Dimension Counted against Typical limit
Per-token The specific access token 1000 req/min
Per-user The user, across all their tokens 5000 req/min
Per-tenant The tenant total 50000 req/min

Numbers are defaults — tenant admins can raise or lower them per tenant. Partner admins can set the ceiling.

Endpoint classes#

Different endpoint classes have different costs:

Class Example endpoints Cost
Metadata GET /api/v1/files/{id} 1
List GET /api/v1/folders/{id}/children 2
Write PATCH /api/v1/files/{id} 2
Upload/Download /content, /sync/download 5
Search /search/* 10
Semantic search /search/semantic, /search/context 20

Every request consumes its class's cost from your quota bucket. A token with 1000 req/min can make 1000 metadata calls, or 50 semantic searches, or a mix.

Identifying limits in responses#

Every response includes the current state:

http
1
2
3
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1713888060
  • X-RateLimit-Limit — quota per minute for this token.
  • X-RateLimit-Remaining — budget left this window.
  • X-RateLimit-Reset — Unix timestamp when the window resets.

On 429, additionally:

http
1
Retry-After: 22

Seconds until retry is safe.

429 response shape#

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
  "status": "error",
  "error": {
    "code": "RATE_LIMITED",
    "message": "Per-token rate limit exceeded",
    "retry_after": 22,
    "details": {
      "dimension": "token",
      "limit": 1000,
      "window_seconds": 60
    }
  },
  "meta": {"request_id": "req_xyz"}
}

details.dimension identifies which layer you hit: token, user, or tenant.

Client-side handling#

A well-behaved client:

  1. Watches X-RateLimit-Remaining. Below 10%, slow down proactively.
  2. On 429, sleeps for retry_after seconds with jitter (±20%) and retries.
  3. Tracks retry counts; after 3 failed retries, fails the operation and surfaces the error.
  4. Never busy-loops.
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
import time, random

def request_with_retry(client, method, path, **kw):
    for attempt in range(4):
        r = client.request(method, path, **kw)
        if r.status_code == 429:
            delay = int(r.headers.get("Retry-After", 2))
            time.sleep(delay * (0.8 + 0.4 * random.random()))
            continue
        return r
    r.raise_for_status()
typescript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
async function withRetry(req: () => Promise<Response>): Promise<Response> {
  for (let i = 0; i < 4; i++) {
    const r = await req();
    if (r.status !== 429) return r;
    const delay = Number(r.headers.get("Retry-After") ?? 2);
    const jitter = 0.8 + 0.4 * Math.random();
    await new Promise((res) => setTimeout(res, delay * 1000 * jitter));
  }
  throw new Error("rate limit retries exhausted");
}

Upload limits#

The upload session protocol counts as "upload/download" (cost 5) per chunk PUT. A 10 GB upload split into 4 MB chunks is 2500 requests — at cost 5 each, that's 12500 units, well over a single minute's budget.

For batch uploads, either:

  • Stagger chunks so peak request rate stays under your limit; or
  • Request a raised limit for the upload window from your admin.

Bandwidth is not rate-limited per se, only the request count.

WebSocket limits#

WebSocket connections don't consume request budget — the handshake is one request, and subsequent frames are free. However:

  • Connection count is limited per-token (default 10) and per-tenant (default 10000). Hitting a limit returns WS close code 4429.
  • Message rate is limited loosely (1000 msg/min per connection) to prevent abuse. Excess is dropped, not queued.

Connectors and background work#

Connector syncs do not count against interactive user limits. They have their own, higher, per-connector budget.

Background jobs (vectorization indexing, quota recalculation) are server-internal and not subject to request limits.

Tenant admin controls#

Tenant admins can view and tune limits via the admin API:

bash
1
2
3
4
5
6
7
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits

curl -X PATCH $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"per_user_per_minute": 10000}'

Raising limits counts against the partner-level ceiling. Your partner admin sets the max; your tenant admin sets the configured value below that.

What's next#

Updated 2026-05-18 15:04:19 View source (.md) rev 2