Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Search

Both regular content and documentation are indexed into Weaviate and searchable through a hybrid (BM25 + vector) query.

What gets indexed#

Two collections:

  • Content — one row per content item × locale. Title, summary, body, searchable field values, taxonomy term IDs, status, visibility.
  • DocPage — one row per chunk, where docs are split on H1/H2/H3 headings. Each chunk carries page metadata (namespace, version, path), the heading text, an anchor slug for deep linking, and the chunk body.

Indexing flow#

text
1
Content/Doc write → event bus → ARQ task → Weaviate upsert

The pipeline is async — there's a 1–3 second lag between writing and searchability. CLIs python -m scaicms.cli.index_management (content) and python -m scaicms.cli docs-index (docs) let you reindex, check consistency, and reset the index.

Hybrid alpha#

SEARCH_HYBRID_ALPHA=0.7 by default — leans semantic but still considers keyword matches. Tune per environment.

Embeddings#

When embedding_provider is configured the backend calls out for vector embeddings on each chunk; otherwise search falls back to pure BM25. Embeddings cost money — disable them in dev to save calls.

Searching docs#

bash
1
2
3
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"query":"how do I shard models","namespace":"scaigrid","limit":10}' \
  "$API/api/v1/docs/_search"

Hits include anchor, so you can deep-link to the matching chunk: /docs/scaigrid/models/training#tensor-parallelism.

Updated 2026-05-16 12:33:52 View source (.md) rev 2