From Words to Actions — A Peek into AIRI’s Mind
AIRI feels human because every response follows a disciplined, multi-stage pipeline that balances creativity, consistency, and safety. This page demystifies that process—no code required.
Big Picture (Function-First)
- Consistency: The character never “breaks role” thanks to a structured System Prompt built from your Character Card.
- Responsiveness: Streaming generation lets AIRI start speaking before the full answer is ready.
- Emotion & Action: Special inline markers convert plain text into avatar expressions or external actions.
- Memory: Short-term context (latest chat) and optional long-term vector memories help AIRI reference past events.
- Safety & Control: Layered filters and system rules block disallowed content before it reaches TTS or other users.
Outcome: Conversations feel natural yet remain on-brand and under your control.
The Thinking Stack (Conceptual Mechanism)
- System Prompt Builder
Combines: Character Card, global rules, localisation settings.
Ensures: Persona, tone, forbidden topics. - Context Builder
Adds: Latest N chat turns, optional long-term memories (vector search), platform metadata (e.g., caller name, Discord channel topic).
Goal: Give the LLM enough context without exceeding token budget. - LLM Selection & Call
Local default: Ollama Llama-3 8B.
Cloud alternatives: GPT-4, Claude 3, Gemini.
Common interface:llm.ts
abstracts provider quirks; every call requests stream=true. - Streaming Parser
While tokens arrive,llmmarkerParser.ts
continuously:
• Detects<|EMOTE_*|>
markers.
• Splits text into TTS-friendly chunks.
• Filters unsafe content flagged by the Safety Layer. - Safety Layer
• Keyword & regex blocklist.
• Optional OpenAI Mod-API check.
• If violation: replace with [content removed] and log incident. - Output Fan-Out
• Clean text → TTS (desktop) or chat message (Discord, Telegram).
• Marker events → Avatar expression, Minecraft skill, etc.
Emotion & Action Markers
Marker | Effect | Example |
---|---|---|
<|EMOTE_HAPPY|> | Avatar plays Happy animation | Great job! <|EMOTE_HAPPY|> |
<|MOVE_TO|>{"x":10,"z":5} | Minecraft bot moves | On my way! <|MOVE_TO|>{"x":10,"z":5} |
<|SFX_JINGLE|> | Plays sound effect | Time to celebrate! <|SFX_JINGLE|> |
Markers are never shown to users; they are parsed into events before text reaches UI/TTS.
Memory Architecture
- Short-Term (Chat History): Last ~20 turns kept in RAM.
- Long-Term (Vector DB): Optional; Telegram photos, notable facts stored with embeddings (see
db/schema.ts
). - Ephemeral Scratchpad: For multi-step tasks in Minecraft integration.
Token Budget Strategy
- System Prompt (~600 tokens)
- Most recent chat until context limit − safety margin
- Memories rated by relevance score until space filled
- Hard cutoff prevents model errors.
Safety & Compliance
- Dual filter (local regex + optional external) keeps latency low while allowing stricter cloud checks.
- Logs retained locally for 7 days by default (configurable in Settings → Safety); no PII transmitted unless cloud filters enabled.
- Custom rules can be added in Settings → Safety.
Extending the Brain
- New Markers: Add to
emotions.ts
or create a new skill handler. - Tool Calling: Upgrade to OpenAI function-calling schema by enabling Tool Mode in developer settings.
- Memory Plugins: Implement
MemoryProvider
interface to hook new databases.
Related Technical Files
Functional Role | Code File | Description |
---|---|---|
Prompt Builder | apps/playground-prompt-engineering/src/composables/useCharacterPrompt.ts | Constructs system prompt from card |
LLM Store | packages/stage-ui/src/stores/llm.ts | Streams responses from local/cloud models |
Marker Parser | packages/stage-ui/src/composables/llmmarkerParser.ts | Detects markers & splits text |
Safety Filter | packages/stage-ui/src/utils/safety.ts | Keyword & mod-api checks |
Memory DB Schema | services/telegram-bot/src/db/schema.ts | Vector storage for images & text |
Emotion Map | packages/stage-ui/src/constants/emotions.ts | Marker → animation mapping |