Search

Both regular content and documentation are indexed into Weaviate and searchable through a hybrid (BM25 + vector) query.

What gets indexed#

Two collections:

Content — one row per content item × locale. Title, summary, body, searchable field values, taxonomy term IDs, status, visibility.
DocPage — one row per chunk, where docs are split on H1/H2/H3 headings. Each chunk carries page metadata (namespace, version, path), the heading text, an anchor slug for deep linking, and the chunk body.

Indexing flow#

text

1	`Content/Doc write → event bus → ARQ task → Weaviate upsert`

The pipeline is async — there's a 1–3 second lag between writing and searchability. CLIs python -m scaicms.cli.index_management (content) and python -m scaicms.cli docs-index (docs) let you reindex, check consistency, and reset the index.

Hybrid alpha#

SEARCH_HYBRID_ALPHA=0.7 by default — leans semantic but still considers keyword matches. Tune per environment.

Embeddings#

When embedding_provider is configured the backend calls out for vector embeddings on each chunk; otherwise search falls back to pure BM25. Embeddings cost money — disable them in dev to save calls.

Searching docs#

bash
curl -X POST -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"query":"how do I shard models","namespace":"scaigrid","limit":10}' \
  "$API/api/v1/docs/_search"

Hits include anchor, so you can deep-link to the matching chunk: /docs/scaigrid/models/training#tensor-parallelism.