--- title: Rate Limiting path: concepts/rate-limiting status: published --- # Rate Limiting ScaiSend rate-limits requests to protect the service from floods and to keep individual tenants from starving each other. This page describes the limits, the response format, and how to design a client that stays under them. ## Where limits apply Rate limits are enforced at the API layer, per credential (API key or user JWT), on specific endpoints. The most commonly-hit limits: | Endpoint | Default limit | |----------|---------------| | `POST /v3/mail/send` | 10,000 requests/second per tenant | | `POST /v3/user/webhooks` | 10 requests/minute per tenant | | `POST /v3/api_keys` | 10 requests/minute per tenant | | `GET /v3/messages` | 100 requests/second per credential | | Everything else | 1,000 requests/second per credential | Limits are global defaults for a standard ScaiSend deployment; a self-hosted operator can tune them via configuration. Per-tenant overrides are possible but rare. ## Response when rate-limited A limited request gets `429 Too Many Requests`: ``` HTTP/1.1 429 Too Many Requests Retry-After: 3 X-RateLimit-Limit: 10000 X-RateLimit-Remaining: 0 X-RateLimit-Reset: 1713888005 Content-Type: application/json {"detail": "Rate limit exceeded"} ``` | Header | Meaning | |--------|---------| | `Retry-After` | Seconds to wait before the next attempt | | `X-RateLimit-Limit` | Requests allowed in the current window | | `X-RateLimit-Remaining` | Requests remaining in the window | | `X-RateLimit-Reset` | Unix timestamp when the window resets | **Always honor `Retry-After`.** Sleeping exactly that long before retry guarantees you'll be just inside the next window when you try again. ## A rate-limit-aware send loop ```python import os, time, random, httpx API_KEY = os.environ["SCAISEND_API_KEY"] def send(body: dict, max_attempts: int = 4) -> str: for attempt in range(max_attempts): resp = httpx.post( "https://scaisend.scailabs.ai/v3/mail/send", headers={"Authorization": f"Bearer {API_KEY}"}, json=body, timeout=30, ) if resp.status_code == 202: return resp.json()["message_id"] if resp.status_code == 429: delay = int(resp.headers.get("Retry-After", "1")) time.sleep(delay + random.random()) continue if resp.status_code >= 500: time.sleep((2 ** attempt) + random.random()) continue resp.raise_for_status() raise RuntimeError(f"send failed after {max_attempts} attempts") ``` ```typescript async function send(body: unknown, maxAttempts = 4): Promise { for (let attempt = 0; attempt < maxAttempts; attempt++) { const resp = await fetch("https://scaisend.scailabs.ai/v3/mail/send", { method: "POST", headers: { "Authorization": `Bearer ${process.env.SCAISEND_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify(body), }); if (resp.status === 202) return (await resp.json()).message_id; if (resp.status === 429) { const delay = Number(resp.headers.get("Retry-After") ?? 1); await new Promise((r) => setTimeout(r, (delay + Math.random()) * 1000)); continue; } if (resp.status >= 500) { await new Promise((r) => setTimeout(r, (Math.pow(2, attempt) + Math.random()) * 1000)); continue; } throw new Error(`ScaiSend ${resp.status}: ${await resp.text()}`); } throw new Error("send failed"); } ``` Three characteristics of a good loop: 1. **Honors `Retry-After` on 429.** Don't sleep for your own arbitrary duration; use the header. 2. **Exponential backoff for 5xx.** Bigger delay each attempt; add jitter to avoid synchronized retries from many clients. 3. **Caps retries.** Four attempts total is usually enough; beyond that, something structural is wrong — fail loudly. ## Proactive pacing If you know you need to send ~100k messages, and your limit is 10,000 RPS, you have headroom. The problem is when a burst exceeds your budget. **Token bucket.** Rate-limit your side before you even call: ```python import time import threading class TokenBucket: def __init__(self, rate: float, capacity: int): self.rate = rate self.capacity = capacity self.tokens = float(capacity) self.last = time.monotonic() self.lock = threading.Lock() def acquire(self): with self.lock: now = time.monotonic() self.tokens = min(self.capacity, self.tokens + (now - self.last) * self.rate) self.last = now if self.tokens < 1: time.sleep((1 - self.tokens) / self.rate) self.tokens = 0 else: self.tokens -= 1 bucket = TokenBucket(rate=5000, capacity=5000) # half of the limit for user in users: bucket.acquire() send(build_body(user)) ``` Setting your client's rate to half the server's limit gives you headroom for occasional bursts and concurrent clients. ## Spreading sends over time For marketing campaigns, use `send_at` to distribute across a window: ```python import random, time base = int(time.time()) + 60 # 1 minute from now for i, user in enumerate(users): # Spread across 5 minutes with random jitter send_at = base + (i * 300 // len(users)) + random.randint(0, 15) send_with_schedule(user, send_at=send_at) ``` This way, you queue everything at once (fast), and ScaiSend's scheduler releases messages smoothly over the window. Your SMTP infrastructure also enjoys smoother load. ## Per-endpoint considerations ### `/v3/mail/send` The big one. 10,000/s is the primary limit; realistically, a single client can't sustain that, so you won't hit it unless you're a platform with many concurrent senders. ### `/v3/messages` (list) Lower limit (100/s) because this hits the database hard on each call. If you need to sweep through messages, paginate with `page_size=100` and space your pagination out. ### Admin endpoints Administrative operations (creating keys, creating domains, rotating DKIM) have low limits — these are human-initiated, not automation. Typical limits: 10/minute. ## Monitoring The `X-RateLimit-Remaining` header tells you how close you are to the limit. Log it periodically: ```python if int(resp.headers.get("X-RateLimit-Remaining", "1000000")) < 100: logger.warning("approaching rate limit") ``` Alert if you see `X-RateLimit-Remaining` consistently low on a particular credential. That's your signal to either spread your send load or split across multiple API keys (one per sender application). ## 503 vs 429 `429` is rate-limiting: "slow down." `503` is service-unavailable: "a dependency is down." Both are retryable, but they mean different things: - `429` — your fault (or volume) for the request rate. - `503` — something is unhealthy server-side; you'll get through once it recovers. Retry both. Differentiate in monitoring: persistent 429 means you need to pace; persistent 503 means you should be paging someone. ## Related - [Errors](errors) — general error handling. - [Your First Integration](../tutorials/first-integration) — retry loop recipe. - [Sending Mail](../tutorials/sending-mail) — `send_at` for spreading load.