Search
Both regular content and documentation are indexed into Weaviate and searchable through a hybrid (BM25 + vector) query.
What gets indexed#
Two collections:
Content— one row per content item × locale. Title, summary, body, searchable field values, taxonomy term IDs, status, visibility.DocPage— one row per chunk, where docs are split on H1/H2/H3 headings. Each chunk carries page metadata (namespace, version, path), the heading text, an anchor slug for deep linking, and the chunk body.
Indexing flow#
1 | |
The pipeline is async — there's a 1–3 second lag between writing and
searchability. CLIs python -m scaicms.cli.index_management (content) and
python -m scaicms.cli docs-index (docs) let you reindex, check
consistency, and reset the index.
Hybrid alpha#
SEARCH_HYBRID_ALPHA=0.7 by default — leans semantic but still considers
keyword matches. Tune per environment.
Embeddings#
When embedding_provider is configured the backend calls out for vector
embeddings on each chunk; otherwise search falls back to pure BM25.
Embeddings cost money — disable them in dev to save calls.
Searching docs#
1 2 3 | |
Hits include anchor, so you can deep-link to the matching chunk:
/docs/scaigrid/models/training#tensor-parallelism.