---
title: Rate Limiting
path: concepts/rate-limiting
status: published
---

# Rate Limiting

ScaiSend rate-limits requests to protect the service from floods and to keep individual tenants from starving each other. This page describes the limits, the response format, and how to design a client that stays under them.

## Where limits apply

Rate limits are enforced at the API layer, per credential (API key or user JWT), on specific endpoints. The most commonly-hit limits:

| Endpoint | Default limit |
|----------|---------------|
| `POST /v3/mail/send` | 10,000 requests/second per tenant |
| `POST /v3/user/webhooks` | 10 requests/minute per tenant |
| `POST /v3/api_keys` | 10 requests/minute per tenant |
| `GET /v3/messages` | 100 requests/second per credential |
| Everything else | 1,000 requests/second per credential |

Limits are global defaults for a standard ScaiSend deployment; a self-hosted operator can tune them via configuration. Per-tenant overrides are possible but rare.

## Response when rate-limited

A limited request gets `429 Too Many Requests`:

```
HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1713888005
Content-Type: application/json

{"detail": "Rate limit exceeded"}
```

| Header | Meaning |
|--------|---------|
| `Retry-After` | Seconds to wait before the next attempt |
| `X-RateLimit-Limit` | Requests allowed in the current window |
| `X-RateLimit-Remaining` | Requests remaining in the window |
| `X-RateLimit-Reset` | Unix timestamp when the window resets |

**Always honor `Retry-After`.** Sleeping exactly that long before retry guarantees you'll be just inside the next window when you try again.

## A rate-limit-aware send loop

```python
import os, time, random, httpx

API_KEY = os.environ["SCAISEND_API_KEY"]


def send(body: dict, max_attempts: int = 4) -> str:
    for attempt in range(max_attempts):
        resp = httpx.post(
            "https://scaisend.scailabs.ai/v3/mail/send",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json=body,
            timeout=30,
        )
        if resp.status_code == 202:
            return resp.json()["message_id"]
        if resp.status_code == 429:
            delay = int(resp.headers.get("Retry-After", "1"))
            time.sleep(delay + random.random())
            continue
        if resp.status_code >= 500:
            time.sleep((2 ** attempt) + random.random())
            continue
        resp.raise_for_status()
    raise RuntimeError(f"send failed after {max_attempts} attempts")
```

```typescript
async function send(body: unknown, maxAttempts = 4): Promise<string> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const resp = await fetch("https://scaisend.scailabs.ai/v3/mail/send", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.SCAISEND_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });
    if (resp.status === 202) return (await resp.json()).message_id;
    if (resp.status === 429) {
      const delay = Number(resp.headers.get("Retry-After") ?? 1);
      await new Promise((r) => setTimeout(r, (delay + Math.random()) * 1000));
      continue;
    }
    if (resp.status >= 500) {
      await new Promise((r) => setTimeout(r, (Math.pow(2, attempt) + Math.random()) * 1000));
      continue;
    }
    throw new Error(`ScaiSend ${resp.status}: ${await resp.text()}`);
  }
  throw new Error("send failed");
}
```

Three characteristics of a good loop:

1. **Honors `Retry-After` on 429.** Don't sleep for your own arbitrary duration; use the header.
2. **Exponential backoff for 5xx.** Bigger delay each attempt; add jitter to avoid synchronized retries from many clients.
3. **Caps retries.** Four attempts total is usually enough; beyond that, something structural is wrong — fail loudly.

## Proactive pacing

If you know you need to send ~100k messages, and your limit is 10,000 RPS, you have headroom. The problem is when a burst exceeds your budget.

**Token bucket.** Rate-limit your side before you even call:

```python
import time
import threading

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate
        self.capacity = capacity
        self.tokens = float(capacity)
        self.last = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self):
        with self.lock:
            now = time.monotonic()
            self.tokens = min(self.capacity, self.tokens + (now - self.last) * self.rate)
            self.last = now
            if self.tokens < 1:
                time.sleep((1 - self.tokens) / self.rate)
                self.tokens = 0
            else:
                self.tokens -= 1


bucket = TokenBucket(rate=5000, capacity=5000)  # half of the limit

for user in users:
    bucket.acquire()
    send(build_body(user))
```

Setting your client's rate to half the server's limit gives you headroom for occasional bursts and concurrent clients.

## Spreading sends over time

For marketing campaigns, use `send_at` to distribute across a window:

```python
import random, time

base = int(time.time()) + 60  # 1 minute from now
for i, user in enumerate(users):
    # Spread across 5 minutes with random jitter
    send_at = base + (i * 300 // len(users)) + random.randint(0, 15)
    send_with_schedule(user, send_at=send_at)
```

This way, you queue everything at once (fast), and ScaiSend's scheduler releases messages smoothly over the window. Your SMTP infrastructure also enjoys smoother load.

## Per-endpoint considerations

### `/v3/mail/send`

The big one. 10,000/s is the primary limit; realistically, a single client can't sustain that, so you won't hit it unless you're a platform with many concurrent senders.

### `/v3/messages` (list)

Lower limit (100/s) because this hits the database hard on each call. If you need to sweep through messages, paginate with `page_size=100` and space your pagination out.

### Admin endpoints

Administrative operations (creating keys, creating domains, rotating DKIM) have low limits — these are human-initiated, not automation. Typical limits: 10/minute.

## Monitoring

The `X-RateLimit-Remaining` header tells you how close you are to the limit. Log it periodically:

```python
if int(resp.headers.get("X-RateLimit-Remaining", "1000000")) < 100:
    logger.warning("approaching rate limit")
```

Alert if you see `X-RateLimit-Remaining` consistently low on a particular credential. That's your signal to either spread your send load or split across multiple API keys (one per sender application).

## 503 vs 429

`429` is rate-limiting: "slow down." `503` is service-unavailable: "a dependency is down." Both are retryable, but they mean different things:

- `429` — your fault (or volume) for the request rate.
- `503` — something is unhealthy server-side; you'll get through once it recovers.

Retry both. Differentiate in monitoring: persistent 429 means you need to pace; persistent 503 means you should be paging someone.

## Related

- [Errors](errors) — general error handling.
- [Your First Integration](../tutorials/first-integration) — retry loop recipe.
- [Sending Mail](../tutorials/sending-mail) — `send_at` for spreading load.
