---
title: Rate Limiting
path: advanced/rate-limiting
status: published
---

ScaiDrive applies rate limits at three layers: per-token, per-user, and per-tenant. Limits are enforced in Redis with a sliding-window algorithm and surface as `429 RATE_LIMITED` with `Retry-After`.

## Limit dimensions

| Dimension | Counted against | Typical limit |
|-----------|-----------------|---------------|
| Per-token | The specific access token | 1000 req/min |
| Per-user | The user, across all their tokens | 5000 req/min |
| Per-tenant | The tenant total | 50000 req/min |

Numbers are defaults — tenant admins can raise or lower them per tenant. Partner admins can set the ceiling.

## Endpoint classes

Different endpoint classes have different costs:

| Class | Example endpoints | Cost |
|-------|-------------------|------|
| Metadata | `GET /api/v1/files/{id}` | 1 |
| List | `GET /api/v1/folders/{id}/children` | 2 |
| Write | `PATCH /api/v1/files/{id}` | 2 |
| Upload/Download | `/content`, `/sync/download` | 5 |
| Search | `/search/*` | 10 |
| Semantic search | `/search/semantic`, `/search/context` | 20 |

Every request consumes its class's cost from your quota bucket. A token with 1000 req/min can make 1000 metadata calls, or 50 semantic searches, or a mix.

## Identifying limits in responses

Every response includes the current state:

```http
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 947
X-RateLimit-Reset: 1713888060
```

- `X-RateLimit-Limit` — quota per minute for this token.
- `X-RateLimit-Remaining` — budget left this window.
- `X-RateLimit-Reset` — Unix timestamp when the window resets.

On `429`, additionally:

```http
Retry-After: 22
```

Seconds until retry is safe.

## 429 response shape

```json
{
  "status": "error",
  "error": {
    "code": "RATE_LIMITED",
    "message": "Per-token rate limit exceeded",
    "retry_after": 22,
    "details": {
      "dimension": "token",
      "limit": 1000,
      "window_seconds": 60
    }
  },
  "meta": {"request_id": "req_xyz"}
}
```

`details.dimension` identifies which layer you hit: `token`, `user`, or `tenant`.

## Client-side handling

A well-behaved client:

1. Watches `X-RateLimit-Remaining`. Below 10%, slow down proactively.
2. On 429, sleeps for `retry_after` seconds with jitter (±20%) and retries.
3. Tracks retry counts; after 3 failed retries, fails the operation and surfaces the error.
4. Never busy-loops.

```python
import time, random

def request_with_retry(client, method, path, **kw):
    for attempt in range(4):
        r = client.request(method, path, **kw)
        if r.status_code == 429:
            delay = int(r.headers.get("Retry-After", 2))
            time.sleep(delay * (0.8 + 0.4 * random.random()))
            continue
        return r
    r.raise_for_status()
```

```typescript
async function withRetry(req: () => Promise<Response>): Promise<Response> {
  for (let i = 0; i < 4; i++) {
    const r = await req();
    if (r.status !== 429) return r;
    const delay = Number(r.headers.get("Retry-After") ?? 2);
    const jitter = 0.8 + 0.4 * Math.random();
    await new Promise((res) => setTimeout(res, delay * 1000 * jitter));
  }
  throw new Error("rate limit retries exhausted");
}
```

## Upload limits

The upload session protocol counts as "upload/download" (cost 5) per chunk PUT. A 10 GB upload split into 4 MB chunks is 2500 requests — at cost 5 each, that's 12500 units, well over a single minute's budget.

For batch uploads, either:

- Stagger chunks so peak request rate stays under your limit; or
- Request a raised limit for the upload window from your admin.

Bandwidth is not rate-limited per se, only the request count.

## WebSocket limits

WebSocket connections don't consume request budget — the handshake is one request, and subsequent frames are free. However:

- **Connection count** is limited per-token (default 10) and per-tenant (default 10000). Hitting a limit returns `WS close code 4429`.
- **Message rate** is limited loosely (1000 msg/min per connection) to prevent abuse. Excess is dropped, not queued.

## Connectors and background work

Connector syncs do not count against interactive user limits. They have their own, higher, per-connector budget.

Background jobs (vectorization indexing, quota recalculation) are server-internal and not subject to request limits.

## Tenant admin controls

Tenant admins can view and tune limits via the admin API:

```bash
curl -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
     $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits

curl -X PATCH $SCAIDRIVE_URL/api/v1/admin/tenant/rate-limits \
  -H "Authorization: Bearer $SCAIDRIVE_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"per_user_per_minute": 10000}'
```

Raising limits counts against the partner-level ceiling. Your partner admin sets the max; your tenant admin sets the configured value below that.

## What's next

- [Errors](/docs/scaidrive/core-concepts/errors)
- [Troubleshooting](/docs/scaidrive/operations/troubleshooting)