Platform
ScaiWave ScaiGrid ScaiCore ScaiBot ScaiDrive ScaiKey Models Tools & Services
Solutions
Organisations Developers Internet Service Providers Managed Service Providers AI-in-a-Box
Resources
Support Documentation Blog Downloads
Company
About Research Careers Investment Opportunities Contact
Log in

Architecture

ScaiPersona is a thin product layer over ScaiGrid's existing primitives — the model catalogue, the inference pipeline, ScaiMatrix retrieval, and ScaiDrive shares. There is no separate "persona engine"; a persona is a configuration plus a request enricher that runs inside the inference pipeline.

Components#

flowchart LR C[Caller] subgraph SG[ScaiGrid] R[Routing<br/>FrontendModel<br/>persona's published model] E[Persona enricher<br/>- reads persona<br/>- runs RAG<br/>- injects context] D[Inference dispatch<br/>to backend] R --> E E --> D end SM[ScaiMatrix] SD[ScaiDrive] C -- /v1/inference/chat --> R E --> SM E --> SD D -- stream / json --> C

There is no separate ScaiPersona deployment. ScaiPersona is a ScaiGrid module — it runs in the same FastAPI process, behind the same auth, accounted against the same budgets. The CRUD routes live at /v1/modules/scaipersona/; the actual hot path runs inside the inference enricher.

Request flow for one persona invocation#

  1. Caller calls POST /v1/inference/chat with model = "tenant/{tenant_slug}/{persona_slug}".
  2. Routing resolves the slug to the persona's published FrontendModel. The frontend model's metadata.persona_id ties it back to the persona row.
  3. Enricher (persona_enricher, registered on app.state.persona_enricher at module init) is invoked by InferenceService. It reads the persona, and if rag_enabled is true, runs the configured RAG strategy.
  4. RAG retrieval. For each active PersonaSource:
    • collection sources call ScaiMatrix's vector search after checking the caller's collection access.
    • scaidrive sources call ScaiDrive's /api/v1/search/context after exchanging the caller's bearer token for a ScaiDrive-scoped one. Results across sources are weighted by source.weight, deduplicated, and capped at rag_top_k.
  5. Context injection. Top chunks are formatted with the persona's rag_context_template (default: a labelled list) and appended to the system message.
  6. Inference. The enriched request is dispatched to the underlying frontend model's backend. The persona's system_prompt was already baked into the published frontend model at publish time.
  7. Accounting. Tokens, latency, and dispatch metadata are recorded against the persona's frontend model — the persona ends up in usage reports as its own model.

State#

  • Personas and sources — in MariaDB tables mod_scaipersona_personas and mod_scaipersona_persona_sources.
  • Published frontend models — in the standard frontend_models table, with a metadata.persona_id pointer back. Backend links are copied from the underlying model at publish time.
  • Avatars — in S3 under scaipersona/{tenant_id}/{persona_id}/avatar. Served back through GET /personas/{id}/avatar (public, no auth).
  • No persona-level conversation log. Persona invocations land in the standard inference history; filter by model slug to find them.

Publishing semantics#

Publishing materialises a persona as a FrontendModel row. The new row:

  • Has slug tenant/{tenant_slug}/{persona_slug} — globally unique and tenant-namespaced.
  • Inherits modality, capabilities, context_window, and pricing from the underlying model.
  • Uses the persona's system_prompt as its system_prompt_template.
  • Carries the persona's avatar_url and default_params.
  • Has its backend links copied from the underlying frontend model so dispatch resolves identically.

Editing a persona while it's published auto-syncs the frontend model on the next PUT /personas/{id} or POST /personas/{id}/avatar. There is no separate draft / live split — every update is live on the next inference call.

Unpublishing deletes the frontend model and removes it from any model groups it had joined. The persona itself stays put; you can re-publish later.

How it differs from calling a vendor model directly#

A direct ScaiGrid chat-completion call gives you tokens-out. ScaiPersona adds:

Concern Direct call ScaiPersona
System prompt Caller provides per-request Baked into the published model
RAG retrieval Caller orchestrates Built-in; ScaiMatrix + ScaiDrive
Multi-source weighting Caller merges results Built-in via source.weight
Multi-step retrieval Caller implements rag_strategy: multi_step / agentic
Catalogue listing Vendor model only Persona appears as its own frontend model
Routing / model groups Vendor model only Persona slugs are first-class members

For a one-off completion, call the inference endpoint with a vendor model. For a named, RAG-grounded assistant you want callers to target by slug, publish a persona.

Where the trust boundary is#

The persona itself is tenant-scoped — PersonaService.get_for_tenant rejects any cross-tenant read. RAG access is enforced at retrieval time, not at attach time: every collection search calls collection_access_fn with the invoking user and drops chunks the caller can't see. ScaiDrive shares are gated by the caller's bearer token, exchanged into a ScaiDrive-scoped token per call — so a persona never "borrows" access the caller doesn't already have.

Updated 2026-05-18 15:01:31 View source (.md) rev 12