AI Town: Technical Specifications
Introduction
This document is the definitive strategic blueprint of AI Town’s technical architecture, designed for product leaders, designers, and founders. It moves beyond high-level concepts to provide a detailed, yet accessible, explanation of the core engineering principles and implementation details. Its purpose is to demystify the technology, enabling informed decision-making and effective communication with technical teams to build a similar product.
High-Level Architecture & The Real-Time Data Flow
1.1 System Architecture Overview
AI Town is a modern, full-stack web application with a clear separation between the user interface (Frontend), the simulation and AI logic (Backend), and the data persistence layer (Database). The “magic” of AI is provided by an external Large Language Model (LLM) service.
1.2 The Real-Time Data Flow: The Core Interaction Pattern
Understanding how the frontend and backend communicate is key. The system uses a real-time subscription model, which is fundamentally different from traditional request-response websites. This architecture is extremely powerful because it decouples the user’s interface from the complex simulation logic, leading to a highly responsive user experience.
The flow unfolds in a continuous loop:
- Initial Load & Subscription: When a user loads the application, the Frontend establishes a persistent, live connection to the Database. It “subscribes” to the world’s state.
- Live Updates: From this point on, any change to the world’s data in the Database is automatically and instantly pushed to the Frontend. The Frontend’s only job is to render the data it receives.
- User Action: When a user performs an action (like clicking to move), the Frontend sends a single, asynchronous command to the Backend.
- Backend Processing: The Backend receives the command, performs all the heavy lifting (e.g., calculating the path), and updates the player’s state in the Database.
- Closing the Loop: The Database, seeing the change, automatically pushes the updated player state back to the subscribed Frontend, which then visually updates the character’s position. The Frontend never needs to ask “is he there yet?”; it is simply told where the character is in real-time.
1.3 Technology Choices Explained
Component | Technology | What It Is & Why It’s a Strategic Choice |
---|---|---|
Frontend UI | React & TypeScript | What: A leading industry standard for building dynamic, component-based user interfaces. Why: It allows for a clean separation of concerns and the creation of reusable UI components. TypeScript adds strong typing, which significantly reduces bugs and improves developer productivity, crucial for a project of this complexity. |
2D World Rendering | Pixi.js | What: A high-performance 2D rendering library that uses the browser’s GPU via WebGL. Why: Rendering a world with many moving, animated sprites using standard HTML/CSS would be cripplingly slow. Pixi.js is purpose-built for high-framerate 2D graphics, ensuring a smooth visual experience that is fundamental to the product’s quality. |
Backend & Database | Convex | What: A serverless platform that tightly integrates a real-time database with serverless functions. Why: This is the project’s key architectural decision. It provides the seamless, real-time subscription model out of the box. This drastically simplifies the code and eliminates an entire category of complex engineering problems related to state synchronization, allowing the team to focus on the core simulation and AI logic instead of infrastructure. |
AI “Brain” | OpenAI / Ollama (LLMs) | What: Large Language Models that can understand and generate human-like text. Why: This is the cognitive core of the AI agents. Instead of writing millions of lines of brittle, rule-based code for behavior and dialogue, we delegate these tasks to the LLM. By providing it with rich context (personality, memories, current situation), we elicit emergent, intelligent, and nuanced behavior that would be impossible to script by hand. It allows for genuine unpredictability and creativity from the agents. |
Core System Blueprints
This section details the conceptual “engines” that power the simulation.
2.1 The World Engine: The Physics of Reality
The World Engine is responsible for the continuous, persistent progression of the simulation. It is the “heartbeat” of AI Town, governed by a set of immutable laws that ensure the world is fair, consistent, and always active.
-
The Scheduler & Ticker (The Heartbeat):
- How it Works: The simulation runs in a continuous, self-perpetuating loop on the backend. A master function,
runStep
, is scheduled to execute. Inside, it runs a high-frequencytick
function for a fixed duration (e.g., 1 second of real time might contain 60 ticks). Thistick
is the smallest unit of time in the world; during each tick, every single entity has a chance to update its state. When therunStep
function completes, its final act is to schedule itself to run again immediately, creating an unbroken chain of time. AgenerationNumber
acts as a safeguard, ensuring that only one authoritative timeline is ever running, preventing bugs from network delays.
- How it Works: The simulation runs in a continuous, self-perpetuating loop on the backend. A master function,
-
The Pathfinding System (Intelligent Movement):
- How it Works: When a character needs to move, the system doesn’t just tell them to go in a straight line. It uses the A* (A-Star) pathfinding algorithm, a classic and highly efficient method for finding the shortest path between two points. It works like a GPS for the characters:
- It considers the entire map as a grid.
- It calculates the most efficient path, treating obstacles (like buildings) and other characters as “traffic” to be routed around.
- The output is not just a destination, but a complete, step-by-step route. The character then follows this route, and the
tick
updates their position smoothly along the path. This is why characters move with such believable purpose.
- How it Works: When a character needs to move, the system doesn’t just tell them to go in a straight line. It uses the A* (A-Star) pathfinding algorithm, a classic and highly efficient method for finding the shortest path between two points. It works like a GPS for the characters:
-
The State Manager (The “Git” of World State):
- How it Works: Saving the entire world’s data every second would be incredibly inefficient. Instead, the system uses a technique called state diffing. After each
runStep
, the State Manager compares the world’s state now to its state one second ago. It identifies only what has changed—Player A moved, Conversation B ended—and saves only this small “diff” to the database. This is conceptually similar to howgit
tracks changes in code, making updates extremely fast and lightweight.
- How it Works: Saving the entire world’s data every second would be incredibly inefficient. Instead, the system uses a technique called state diffing. After each
-
Data Flow Diagram:
2.2 The Agent Cognitive Engine: The Spark of Life
This is the “brain” of each AI agent, built on a cycle of perceiving, thinking, and acting. It is a multi-stage pipeline that transforms raw data (conversations) into learned wisdom and intelligent behavior.
-
Step 1: The Memory Creation Pipeline (Turning Experience into Knowledge)
- How it Works: After a conversation, an agent’s brain performs a four-step process to create a memory. This is a powerful LLM chain:
- Summarize: The entire conversation transcript is sent to the LLM with the prompt: “You are [Agent Name]. Summarize the following conversation from your perspective in a single, first-person sentence.” (e.g., “I learned that Bob loves gardening.”)
- Rate Importance (Poignancy): The summary generated in step 1 is sent back to the LLM with a new prompt: “On a scale of 1-10, where 1 is mundane and 10 is life-changing, how important is this memory? Respond with only a number.”
- Embed: The summary text is sent to an embeddings model, which converts the meaning of the sentence into a vector (a list of numbers). This vector is a mathematical “fingerprint” of the memory’s concept.
- Store: The final, rich memory object—containing the summary (the story), the poignancy score (the feeling), and the vector embedding (the concept)—is saved to the agent’s memory database.
- How it Works: After a conversation, an agent’s brain performs a four-step process to create a memory. This is a powerful LLM chain:
-
Step 2: The RAG-based Retrieval System (Recalling the Right Memory)
- How it Works: When an agent needs to think, it must recall the right memories. This is done via a sophisticated Retrieval-Augmented Generation (RAG) system that mimics human recall by balancing three factors:
- Relevance: The system takes the current query (e.g., “my thoughts about Bob”) and creates a vector embedding from it. It then performs a vector search to find memories with the most similar vector “fingerprints.”
- Importance: The system retrieves the stored poignancy score for each memory. A memory of a major argument (poignancy: 9) is more likely to be recalled than a memory of a casual greeting (poignancy: 2).
- Recency: The system checks the
lastAccess
timestamp of each memory. Memories accessed recently are given a higher score, which decays over time. This is why a topic mentioned a few minutes ago is fresh in an agent’s mind.
- The Final Score: A memory’s final
overallScore
is a weighted sum: (Relevance Score) + (Importance Score) + (Recency Score). The memories with the highest scores are retrieved and used as the context for the agent’s next thought.
- How it Works: When an agent needs to think, it must recall the right memories. This is done via a sophisticated Retrieval-Augmented Generation (RAG) system that mimics human recall by balancing three factors:
-
Step 3: The Reflection Mechanism (Gaining Wisdom)
- How it Works: To prevent agents from just being a collection of individual memories, they must form higher-level insights. Periodically, the system sums up the poignancy scores of an agent’s most recent memories. If this sum exceeds a
REFLECTION_THRESHOLD
, a reflection is triggered. The system sends the last 100 memory summaries to the LLM with a prompt like: “You are [Agent Name]. From the statements below, what are 3 high-level insights you can infer?” The LLM might respond with: [“I’ve learned that asking about people’s hobbies is a great way to make friends.”]. These insights are then put through the Memory Creation Pipeline and stored as new, highly important memories, fundamentally shaping the agent’s future worldview.
- How it Works: To prevent agents from just being a collection of individual memories, they must form higher-level insights. Periodically, the system sums up the poignancy scores of an agent’s most recent memories. If this sum exceeds a
-
Data Flow Diagram:
Data Architecture & State Management
This section explains how the system organizes, stores, and thinks about its most valuable asset: information.
3.1 Data Philosophy: Hot vs. Cold Storage
To ensure the simulation is both extremely fast and preserves a perfect historical record, the system uses a two-tiered data strategy.
-
Active State (“Hot Storage”): The data for the current, running world is kept “hot” in the main database tables. This data is structured for constant, rapid updates and for being streamed to users in real-time. The design prioritizes performance above all else, keeping the data lean and focused only on what is happening right now.
-
Archived State (“Cold Storage”): When an entity (like a player who leaves or a conversation that ends) is removed from the active world, it is not deleted. Its final state is copied to a separate “archive” table. This creates an immutable, permanent ledger of everything that has ever happened in the world. This is invaluable for long-term analysis, debugging, or future features (like “replaying” a day’s events), without slowing down the live simulation with historical data.
3.2 Key Data Models & Their Relationships
These are the conceptual “information cards” for the most important entities in the system. Understanding their relationships is key to understanding the system as a whole.
Data Model | Key Attributes & Purpose | Key Relationships |
---|---|---|
World | engineId , status Purpose: Represents an entire simulated instance of a town. It is the top-level container for everything else. | Has many Players . Has many Conversations . |
Player | identity , position , path , status Purpose: Represents any character, human or AI, that exists within the active world. This is the core “physical” entity that moves and acts. | Belongs to one World . Can participate in many Conversations . (If AI) Has many Memories . |
Conversation | participants , state , transcript Purpose: A temporary entity that exists while two or more players are talking. It links them together and stores their dialogue. | Belongs to one World . Has many Players (as participants). |
Memory (Agent-specific) | summary , poignancy , embedding , lastAccess Purpose: The fundamental unit of an AI agent’s long-term knowledge and identity. It is a rich, multi-faceted record of a past experience. | Belongs to one AI Player . |
3.3 The Big Picture: Entity Relationship Diagram
This diagram illustrates how all the key data models connect to form a cohesive system.
How to Read This Diagram:
- The
WORLD
is the main container for everything. - A
PLAYER
and aCONVERSATION
cannot exist without aWORLD
. - The line between
PLAYER
andCONVERSATION
shows that a player can be in many conversations over time, and a conversation has multiple players. - Crucially, the
PLAYER
has a special relationship withMEMORY
. A memory can only belong to a single AI player, forming the basis of that agent’s unique mind. This is the foundation of their personality and learned behavior.