airiBehind the ScenesHow AIRI Thinks

From Words to Actions — A Peek into AIRI’s Mind

AIRI feels human because every response follows a disciplined, multi-stage pipeline that balances creativity, consistency, and safety. This page demystifies that process—no code required.

Big Picture (Function-First)

  • Consistency: The character never “breaks role” thanks to a structured System Prompt built from your Character Card.
  • Responsiveness: Streaming generation lets AIRI start speaking before the full answer is ready.
  • Emotion & Action: Special inline markers convert plain text into avatar expressions or external actions.
  • Memory: Short-term context (latest chat) and optional long-term vector memories help AIRI reference past events.
  • Safety & Control: Layered filters and system rules block disallowed content before it reaches TTS or other users.

Outcome: Conversations feel natural yet remain on-brand and under your control.

The Thinking Stack (Conceptual Mechanism)

  1. System Prompt Builder
    Combines: Character Card, global rules, localisation settings.
    Ensures: Persona, tone, forbidden topics.
  2. Context Builder
    Adds: Latest N chat turns, optional long-term memories (vector search), platform metadata (e.g., caller name, Discord channel topic).
    Goal: Give the LLM enough context without exceeding token budget.
  3. LLM Selection & Call
    Local default: Ollama Llama-3 8B.
    Cloud alternatives: GPT-4, Claude 3, Gemini.
    Common interface: llm.ts abstracts provider quirks; every call requests stream=true.
  4. Streaming Parser
    While tokens arrive, llmmarkerParser.ts continuously:
    • Detects <|EMOTE_*|> markers.
    • Splits text into TTS-friendly chunks.
    • Filters unsafe content flagged by the Safety Layer.
  5. Safety Layer
    • Keyword & regex blocklist.
    • Optional OpenAI Mod-API check.
    • If violation: replace with [content removed] and log incident.
  6. Output Fan-Out
    • Clean text → TTS (desktop) or chat message (Discord, Telegram).
    • Marker events → Avatar expression, Minecraft skill, etc.

Emotion & Action Markers

MarkerEffectExample
<|EMOTE_HAPPY|>Avatar plays Happy animationGreat job! <|EMOTE_HAPPY|>
<|MOVE_TO|>{"x":10,"z":5}Minecraft bot movesOn my way! <|MOVE_TO|>{"x":10,"z":5}
<|SFX_JINGLE|>Plays sound effectTime to celebrate! <|SFX_JINGLE|>

Markers are never shown to users; they are parsed into events before text reaches UI/TTS.

Memory Architecture

  • Short-Term (Chat History): Last ~20 turns kept in RAM.
  • Long-Term (Vector DB): Optional; Telegram photos, notable facts stored with embeddings (see db/schema.ts).
  • Ephemeral Scratchpad: For multi-step tasks in Minecraft integration.

Token Budget Strategy

  1. System Prompt (~600 tokens)
  2. Most recent chat until context limit − safety margin
  3. Memories rated by relevance score until space filled
  4. Hard cutoff prevents model errors.

Safety & Compliance

  • Dual filter (local regex + optional external) keeps latency low while allowing stricter cloud checks.
  • Logs retained locally for 7 days by default (configurable in Settings → Safety); no PII transmitted unless cloud filters enabled.
  • Custom rules can be added in Settings → Safety.

Extending the Brain

  • New Markers: Add to emotions.ts or create a new skill handler.
  • Tool Calling: Upgrade to OpenAI function-calling schema by enabling Tool Mode in developer settings.
  • Memory Plugins: Implement MemoryProvider interface to hook new databases.

Functional RoleCode FileDescription
Prompt Builderapps/playground-prompt-engineering/src/composables/useCharacterPrompt.tsConstructs system prompt from card
LLM Storepackages/stage-ui/src/stores/llm.tsStreams responses from local/cloud models
Marker Parserpackages/stage-ui/src/composables/llmmarkerParser.tsDetects markers & splits text
Safety Filterpackages/stage-ui/src/utils/safety.tsKeyword & mod-api checks
Memory DB Schemaservices/telegram-bot/src/db/schema.tsVector storage for images & text
Emotion Mappackages/stage-ui/src/constants/emotions.tsMarker → animation mapping