Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Embeddings

Convert text into dense vectors for semantic search, clustering, recommendation, and anything else that benefits from distance-based similarity.

Endpoint: POST /v1/inference/embed

Basic request#

bash
1
2
3
4
5
6
7
curl -X POST https://scaigrid.scailabs.ai/v1/inference/embed \
  -H "Authorization: Bearer $SCAIGRID_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": ["The quick brown fox", "jumps over the lazy dog"]
  }'
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import httpx, os

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/embed",
    headers={"Authorization": f"Bearer {os.environ['SCAIGRID_API_KEY']}"},
    json={
        "model": "openai/text-embedding-3-small",
        "input": ["The quick brown fox", "jumps over the lazy dog"],
    },
)
data = resp.json()["data"]
for item in data["data"]:
    print(f"Index {item['index']}: {len(item['embedding'])} dimensions")
typescript
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
const resp = await fetch("https://scaigrid.scailabs.ai/v1/inference/embed", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.SCAIGRID_API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "openai/text-embedding-3-small",
    input: ["The quick brown fox", "jumps over the lazy dog"],
  }),
});
const { data } = await resp.json();
for (const item of data.data) {
  console.log(`Index ${item.index}: ${item.embedding.length} dimensions`);
}

Input formats#

The input field accepts either a string or an array of strings:

json
1
2
{"model": "...", "input": "single string"}
{"model": "...", "input": ["first", "second", "third"]}

Batching is strongly preferred when you have multiple texts — one API call for 100 texts is dramatically cheaper (in latency and cost) than 100 separate calls. Most embedding models accept 1000+ texts per request.

Response shape#

json
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
  "status": "ok",
  "data": {
    "model": "openai/text-embedding-3-small",
    "data": [
      {"index": 0, "embedding": [0.023, -0.101, ...]},
      {"index": 1, "embedding": [-0.045, 0.082, ...]}
    ],
    "usage": {"prompt_tokens": 8, "total_tokens": 8}
  }
}

Vector dimensions depend on the model:

Model Dimensions
openai/text-embedding-3-small 1536
openai/text-embedding-3-large 3072
openai/text-embedding-ada-002 1536
google/text-embedding-004 768
mistral/mistral-embed 1024

Check your tenant's model list (GET /v1/models?modality=embedding) to see what's available.

Computing cosine similarity#

Embeddings are typically compared with cosine similarity:

python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import numpy as np

def cosine_similarity(a, b):
    a = np.array(a)
    b = np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

emb_a = data["data"][0]["embedding"]
emb_b = data["data"][1]["embedding"]
print(cosine_similarity(emb_a, emb_b))  # -1.0 (opposite) to 1.0 (identical)
typescript
1
2
3
4
5
6
7
8
9
function cosineSimilarity(a: number[], b: number[]): number {
  let dot = 0, na = 0, nb = 0;
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i];
    na += a[i] * a[i];
    nb += b[i] * b[i];
  }
  return dot / (Math.sqrt(na) * Math.sqrt(nb));
}

Some embedding models normalize vectors to unit length — for those, cosine similarity equals the dot product.

Building a search index#

For production workloads, use ScaiMatrix — it runs embeddings, stores vectors in Weaviate, and exposes a search API. You don't reimplement indexing yourself.

If you want to manage your own index (pgvector, Qdrant, Faiss, etc.), the flow is:

  1. Split documents into chunks (typically 200–800 tokens each).
  2. Embed each chunk with POST /v1/inference/embed.
  3. Store (vector, chunk_text, metadata) rows in your vector store.
  4. At query time, embed the user's query and search for nearest neighbors.
python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# Bulk embed a document
chunks = ["First paragraph...", "Second paragraph...", "Third paragraph..."]

resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/embed",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={"model": "openai/text-embedding-3-small", "input": chunks},
).json()["data"]

vectors = [item["embedding"] for item in resp["data"]]
# Insert into your vector store with the corresponding chunks

Rate limits and batching strategy#

Embedding requests are rate-limited like any other inference call — per API key, per user, per tenant. For large ingestion jobs, batch aggressively (up to a few hundred texts per request) and add a small delay between batches to stay within the per-minute rate limit. See Rate Limiting.

If you're indexing large corpora (> 100K documents), consider Batch Inference — async jobs with higher throughput and lower cost.

Dimensional reduction#

Some models support returning reduced-dimension embeddings via a dimensions parameter:

python
1
2
3
4
5
6
7
8
9
resp = httpx.post(
    "https://scaigrid.scailabs.ai/v1/inference/embed",
    headers={"Authorization": f"Bearer {API_KEY}"},
    json={
        "model": "openai/text-embedding-3-large",
        "input": "hello world",
        "dimensions": 512,  # reduce from 3072 to 512
    },
)

Provider-dependent — only openai/text-embedding-3-* supports this today. Returns embeddings with the specified dimensions, trading some quality for smaller storage and faster search.

What's next#

Updated 2026-05-18 15:01:28 View source (.md) rev 17