Architecture
ScaiPersona is a thin product layer over ScaiGrid's existing primitives — the model catalogue, the inference pipeline, ScaiMatrix retrieval, and ScaiDrive shares. There is no separate "persona engine"; a persona is a configuration plus a request enricher that runs inside the inference pipeline.
Components#
There is no separate ScaiPersona deployment. ScaiPersona is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets. The CRUD routes live at /v1/modules/scaipersona/; the actual hot path runs inside the inference enricher.
Request flow for one persona invocation#
- Caller calls
POST /v1/inference/chatwithmodel = "tenant/{tenant_slug}/{persona_slug}". - Routing resolves the slug to the persona's published
FrontendModel. The frontend model'smetadata.persona_idties it back to the persona row. - Enricher (
persona_enricher, registered onapp.state.persona_enricherat module init) is invoked byInferenceService. It reads the persona, and ifrag_enabledis true, runs the configured RAG strategy. - RAG retrieval. For each active
PersonaSource:collectionsources call ScaiMatrix's vector search after checking the caller's collection access.scaidrivesources call ScaiDrive's/api/v1/search/contextafter exchanging the caller's bearer token for a ScaiDrive-scoped one. Results across sources are weighted bysource.weight, deduplicated, and capped atrag_top_k.
- Context injection. Top chunks are formatted with the persona's
rag_context_template(default: a labelled list) and appended to the system message. - Inference. The enriched request is dispatched to the underlying frontend model's backend. The persona's
system_promptwas already baked into the published frontend model at publish time. - Accounting. Tokens, latency, and dispatch metadata are recorded against the persona's frontend model — the persona ends up in usage reports as its own model.
State#
- Personas and sources — in MariaDB tables
mod_scaipersona_personasandmod_scaipersona_persona_sources. - Published frontend models — in the standard
frontend_modelstable, with ametadata.persona_idpointer back. Backend links are copied from the underlying model at publish time. - Avatars — in S3 under
scaipersona/{tenant_id}/{persona_id}/avatar. Served back throughGET /personas/{id}/avatar(public, no auth). - No persona-level conversation log. Persona invocations land in the standard inference history; filter by model slug to find them.
Publishing semantics#
Publishing materialises a persona as a FrontendModel row. The new row:
- Has slug
tenant/{tenant_slug}/{persona_slug}— globally unique and tenant-namespaced. - Inherits
modality,capabilities,context_window, and pricing from the underlying model. - Uses the persona's
system_promptas itssystem_prompt_template. - Carries the persona's
avatar_urlanddefault_params. - Has its backend links copied from the underlying frontend model so dispatch resolves identically.
Editing a persona while it's published auto-syncs the frontend model on the next PUT /personas/{id} or POST /personas/{id}/avatar. There is no separate draft / live split — every update is live on the next inference call.
Unpublishing deletes the frontend model and removes it from any model groups it had joined. The persona itself stays put; you can re-publish later.
How it differs from calling a vendor model directly#
A direct ScaiGrid chat-completion call gives you tokens-out. ScaiPersona adds:
| Concern | Direct call | ScaiPersona |
|---|---|---|
| System prompt | Caller provides per-request | Baked into the published model |
| RAG retrieval | Caller orchestrates | Built-in; ScaiMatrix + ScaiDrive |
| Multi-source weighting | Caller merges results | Built-in via source.weight |
| Multi-step retrieval | Caller implements | rag_strategy: multi_step / agentic |
| Catalogue listing | Vendor model only | Persona appears as its own frontend model |
| Routing / model groups | Vendor model only | Persona slugs are first-class members |
For a one-off completion, call the inference endpoint with a vendor model. For a named, RAG-grounded assistant you want callers to target by slug, publish a persona.
Where the trust boundary is#
The persona itself is tenant-scoped — PersonaService.get_for_tenant rejects any cross-tenant read. RAG access is enforced at retrieval time, not at attach time: every collection search calls collection_access_fn with the invoking user and drops chunks the caller can't see. ScaiDrive shares are gated by the caller's bearer token, exchanged into a ScaiDrive-scoped token per call — so a persona never "borrows" access the caller doesn't already have.