Rate Limiting

ScaiSend rate-limits requests to protect the service from floods and to keep individual tenants from starving each other. This page describes the limits, the response format, and how to design a client that stays under them.

Where limits apply#

Rate limits are enforced at the API layer, per credential (API key or user JWT), on specific endpoints. The most commonly-hit limits:

Endpoint	Default limit
`POST /v3/mail/send`	10,000 requests/second per tenant
`POST /v3/user/webhooks`	10 requests/minute per tenant
`POST /v3/api_keys`	10 requests/minute per tenant
`GET /v3/messages`	100 requests/second per credential
Everything else	1,000 requests/second per credential

Limits are global defaults for a standard ScaiSend deployment; a self-hosted operator can tune them via configuration. Per-tenant overrides are possible but rare.

Response when rate-limited#

A limited request gets 429 Too Many Requests:

http
HTTP/1.1 429 Too Many Requests
Retry-After: 3
X-RateLimit-Limit: 10000
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1713888005
Content-Type: application/json

{"detail": "Rate limit exceeded"}

Header	Meaning
`Retry-After`	Seconds to wait before the next attempt
`X-RateLimit-Limit`	Requests allowed in the current window
`X-RateLimit-Remaining`	Requests remaining in the window
`X-RateLimit-Reset`	Unix timestamp when the window resets

Always honor Retry-After. Sleeping exactly that long before retry guarantees you'll be just inside the next window when you try again.

A rate-limit-aware send loop#

python
import os, time, random, httpx

API_KEY = os.environ["SCAISEND_API_KEY"]


def send(body: dict, max_attempts: int = 4) -> str:
    for attempt in range(max_attempts):
        resp = httpx.post(
            "https://scaisend.scailabs.ai/v3/mail/send",
            headers={"Authorization": f"Bearer {API_KEY}"},
            json=body,
            timeout=30,
        )
        if resp.status_code == 202:
            return resp.json()["message_id"]
        if resp.status_code == 429:
            delay = int(resp.headers.get("Retry-After", "1"))
            time.sleep(delay + random.random())
            continue
        if resp.status_code >= 500:
            time.sleep((2 ** attempt) + random.random())
            continue
        resp.raise_for_status()
    raise RuntimeError(f"send failed after {max_attempts} attempts")

typescript
async function send(body: unknown, maxAttempts = 4): Promise<string> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    const resp = await fetch("https://scaisend.scailabs.ai/v3/mail/send", {
      method: "POST",
      headers: {
        "Authorization": `Bearer ${process.env.SCAISEND_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(body),
    });
    if (resp.status === 202) return (await resp.json()).message_id;
    if (resp.status === 429) {
      const delay = Number(resp.headers.get("Retry-After") ?? 1);
      await new Promise((r) => setTimeout(r, (delay + Math.random()) * 1000));
      continue;
    }
    if (resp.status >= 500) {
      await new Promise((r) => setTimeout(r, (Math.pow(2, attempt) + Math.random()) * 1000));
      continue;
    }
    throw new Error(`ScaiSend ${resp.status}: ${await resp.text()}`);
  }
  throw new Error("send failed");
}

Three characteristics of a good loop:

Honors Retry-After on 429. Don't sleep for your own arbitrary duration; use the header.
Exponential backoff for 5xx. Bigger delay each attempt; add jitter to avoid synchronized retries from many clients.
Caps retries. Four attempts total is usually enough; beyond that, something structural is wrong — fail loudly.

Proactive pacing#

If you know you need to send ~100k messages, and your limit is 10,000 RPS, you have headroom. The problem is when a burst exceeds your budget.

Token bucket. Rate-limit your side before you even call:

python
import time
import threading

class TokenBucket:
    def __init__(self, rate: float, capacity: int):
        self.rate = rate
        self.capacity = capacity
        self.tokens = float(capacity)
        self.last = time.monotonic()
        self.lock = threading.Lock()

    def acquire(self):
        with self.lock:
            now = time.monotonic()
            self.tokens = min(self.capacity, self.tokens + (now - self.last) * self.rate)
            self.last = now
            if self.tokens < 1:
                time.sleep((1 - self.tokens) / self.rate)
                self.tokens = 0
            else:
                self.tokens -= 1


bucket = TokenBucket(rate=5000, capacity=5000)  # half of the limit

for user in users:
    bucket.acquire()
    send(build_body(user))

Setting your client's rate to half the server's limit gives you headroom for occasional bursts and concurrent clients.

Spreading sends over time#

For marketing campaigns, use send_at to distribute across a window:

python
import random, time

base = int(time.time()) + 60  # 1 minute from now
for i, user in enumerate(users):
    # Spread across 5 minutes with random jitter
    send_at = base + (i * 300 // len(users)) + random.randint(0, 15)
    send_with_schedule(user, send_at=send_at)

This way, you queue everything at once (fast), and ScaiSend's scheduler releases messages smoothly over the window. Your SMTP infrastructure also enjoys smoother load.

Per-endpoint considerations#

`/v3/mail/send`#

The big one. 10,000/s is the primary limit; realistically, a single client can't sustain that, so you won't hit it unless you're a platform with many concurrent senders.

`/v3/messages` (list)#

Lower limit (100/s) because this hits the database hard on each call. If you need to sweep through messages, paginate with page_size=100 and space your pagination out.

Admin endpoints#

Administrative operations (creating keys, creating domains, rotating DKIM) have low limits — these are human-initiated, not automation. Typical limits: 10/minute.

Monitoring#

The X-RateLimit-Remaining header tells you how close you are to the limit. Log it periodically:

python
if int(resp.headers.get("X-RateLimit-Remaining", "1000000")) < 100:
    logger.warning("approaching rate limit")

Alert if you see X-RateLimit-Remaining consistently low on a particular credential. That's your signal to either spread your send load or split across multiple API keys (one per sender application).

503 vs 429#

429 is rate-limiting: "slow down." 503 is service-unavailable: "a dependency is down." Both are retryable, but they mean different things:

429 — your fault (or volume) for the request rate.
503 — something is unhealthy server-side; you'll get through once it recovers.

Retry both. Differentiate in monitoring: persistent 429 means you need to pace; persistent 503 means you should be paging someone.

Errors — general error handling.
Your First Integration — retry loop recipe.
Sending Mail — send_at for spreading load.