Architecture

ScaiPersona is a thin product layer over ScaiGrid's existing primitives — the model catalogue, the inference pipeline, ScaiMatrix retrieval, and ScaiDrive shares. There is no separate "persona engine"; a persona is a configuration plus a request enricher that runs inside the inference pipeline.

Components#

flowchart LR C[Caller] subgraph SG[ScaiGrid] R[Routing FrontendModel persona's published model] E[Persona enricher - reads persona - runs RAG - injects context] D[Inference dispatch to backend] R --> E E --> D end SM[ScaiMatrix] SD[ScaiDrive] C -- /v1/inference/chat --> R E --> SM E --> SD D -- stream / json --> C

There is no separate ScaiPersona deployment. ScaiPersona is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets. The CRUD routes live at /v1/modules/scaipersona/; the actual hot path runs inside the inference enricher.

Request flow for one persona invocation#

Caller calls POST /v1/inference/chat with model = "tenant/{tenant_slug}/{persona_slug}".
Routing resolves the slug to the persona's published FrontendModel. The frontend model's metadata.persona_id ties it back to the persona row.
Enricher (persona_enricher, registered on app.state.persona_enricher at module init) is invoked by InferenceService. It reads the persona, and if rag_enabled is true, runs the configured RAG strategy.
RAG retrieval. For each active PersonaSource:
- collection sources call ScaiMatrix's vector search after checking the caller's collection access.
- scaidrive sources call ScaiDrive's /api/v1/search/context after exchanging the caller's bearer token for a ScaiDrive-scoped one. Results across sources are weighted by source.weight, deduplicated, and capped at rag_top_k.
Context injection. Top chunks are formatted with the persona's rag_context_template (default: a labelled list) and appended to the system message.
Inference. The enriched request is dispatched to the underlying frontend model's backend. The persona's system_prompt was already baked into the published frontend model at publish time.
Accounting. Tokens, latency, and dispatch metadata are recorded against the persona's frontend model — the persona ends up in usage reports as its own model.

State#

Personas and sources — in MariaDB tables mod_scaipersona_personas and mod_scaipersona_persona_sources.
Published frontend models — in the standard frontend_models table, with a metadata.persona_id pointer back. Backend links are copied from the underlying model at publish time.
Avatars — in S3 under scaipersona/{tenant_id}/{persona_id}/avatar. Served back through GET /personas/{id}/avatar (public, no auth).
No persona-level conversation log. Persona invocations land in the standard inference history; filter by model slug to find them.

Publishing semantics#

Publishing materialises a persona as a FrontendModel row. The new row:

Has slug tenant/{tenant_slug}/{persona_slug} — globally unique and tenant-namespaced.
Inherits modality, capabilities, context_window, and pricing from the underlying model.
Uses the persona's system_prompt as its system_prompt_template.
Carries the persona's avatar_url and default_params.
Has its backend links copied from the underlying frontend model so dispatch resolves identically.

Editing a persona while it's published auto-syncs the frontend model on the next PUT /personas/{id} or POST /personas/{id}/avatar. There is no separate draft / live split — every update is live on the next inference call.

Unpublishing deletes the frontend model and removes it from any model groups it had joined. The persona itself stays put; you can re-publish later.

How it differs from calling a vendor model directly#

A direct ScaiGrid chat-completion call gives you tokens-out. ScaiPersona adds:

Concern	Direct call	ScaiPersona
System prompt	Caller provides per-request	Baked into the published model
RAG retrieval	Caller orchestrates	Built-in; ScaiMatrix + ScaiDrive
Multi-source weighting	Caller merges results	Built-in via `source.weight`
Multi-step retrieval	Caller implements	`rag_strategy: multi_step` / `agentic`
Catalogue listing	Vendor model only	Persona appears as its own frontend model
Routing / model groups	Vendor model only	Persona slugs are first-class members

For a one-off completion, call the inference endpoint with a vendor model. For a named, RAG-grounded assistant you want callers to target by slug, publish a persona.

Where the trust boundary is#

The persona itself is tenant-scoped — PersonaService.get_for_tenant rejects any cross-tenant read. RAG access is enforced at retrieval time, not at attach time: every collection search calls collection_access_fn with the invoking user and drops chunks the caller can't see. ScaiDrive shares are gated by the caller's bearer token, exchanged into a ScaiDrive-scoped token per call — so a persona never "borrows" access the caller doesn't already have.