ai-townProject BreakdownProduct Requirements (PRD)

AI Town: Product Requirements Document (PRD)

Part One: Product Blueprint

1.1 Product Vision & Core Value

Vision: To create a living, interactive digital world where AI characters possess their own lives, memories, and relationships. It serves as both a captivating “reality show” of AI and a powerful, open-source foundation for developers and creators to build the next generation of social simulations.

Core Value:

  • For Observers: Experience a unique form of entertainment by watching the emergent stories and social dynamics of an autonomous AI society.
  • For Participants: Step inside the simulation to interact with, influence, and become part of the living world’s evolving narrative.
  • For Builders: Provide a comprehensive, well-architected starter kit to understand and replicate the principles of “Generative Agents,” accelerating the creation of novel AI-powered experiences.

1.2 Product Overview

AI Town is a product with a dual-mode experience. By default, it is a passive, lean-back Spectator Experience, allowing anyone to observe the AI simulation without commitment. Through a clear call-to-action, users can transition into an active, lean-in Participant’s Journey, where they become an integral part of the town’s social fabric.

The magic of this world is powered by the third foundational pillar: The Living AI Agents. This is the cognitive engine that gives each AI character a unique personality, persistent memory, and the ability to behave autonomously, creating the rich, unpredictable narrative that is the core of the product. These three pillars, or “Epics,” form the complete product scope.

1.3 Feature Summary

This table provides an at-a-glance summary of the core features, which are detailed further in Part Two of this document.

EpicKey Features (User Story)
1. The Spectator Experience1.1 Explore the Town & Its Inhabitants:
    • Unauthenticated, read-only access to the live world.
    • Smooth pan-and-zoom map navigation.
    • Character selection to view profiles and real-time conversation transcripts.
2. The Participant’s Journey2.1 Join the World:
    • Authentication-gated entry.
    • World capacity validation (MAX_HUMAN_PLAYERS).
    • Player character spawning at a valid, unoccupied map position.

2.2 Move Around the Town:
    • Click-to-move navigation with instant visual feedback.
    • A* pathfinding with dynamic obstacle avoidance.

2.3 Converse with AI Agents:
    • Proximity-based conversation initiation.
    • Real-time chat interface with typing indicators.
    • Full conversation history persistence.
3. The Living AI Agents3.1 Persistent Memory System:
    • Post-conversation automated memory creation (LLM-based summarization & poignancy scoring).
    • Vector embedding for semantic search.

3.2 Intelligent Memory Retrieval (RAG):
    • Contextual memory search based on relevance, importance, and recency.

3.3 Self-Reflection & Insight Generation:
    • Periodic, poignancy-triggered reflection to create high-level insights from recent memories.

1.4 High-Level User Flow


Part Two: Detailed Specifications by Epic

Epic 1: The Spectator Experience

This epic details the experience of a user who is observing the world without being an active participant.

User Story 1.1: As a curious visitor, I want to explore the town and its inhabitants so that I can understand how the world works and what stories are unfolding.

  • 1.1.1 User Interaction Flow

    1. Upon loading the application, the user is immediately presented with a view of the game world.
    2. The user can click and drag the mouse on the map to pan the camera in any direction.
    3. The user can use the mouse scroll wheel to zoom in and out, allowing for both a high-level overview and close-up inspection.
    4. The user can move their mouse over any character (AI or human) to see their name displayed as a tooltip.
    5. The user can click on any character to select them. Upon selection, the UI’s right-hand details panel updates to show information about that character.
  • 1.1.2 Business Logic & Acceptance Criteria

    • [Must] The application must load into a read-only spectator mode by default, without requiring any user authentication.
    • [Must] The map panning must be constrained so the user cannot move the camera infinitely far from the world’s boundaries.
    • [Must] Selecting a character must populate the details panel with their name, their personality description (e.g., “a friendly baker”), and their current high-level activity (e.g., “walking to the cafe”).
    • [Must] The details panel must also display the full transcript of the character’s current conversation if they are in one, updating in real-time. If they are not in a conversation, it should show the transcript of their most recently completed conversation.
  • 1.1.3 Backend Process Flow

    1. The frontend client establishes a real-time subscription to the backend for the public state of the specified world.
    2. The backend continuously streams updates, including the positions of all players and the state of all ongoing conversations.
    3. No user-specific data is processed; the backend serves the same public data to all spectators.
  • 1.1.4 Frontend Experience & Feedback

    • Panning & Zooming: Movement should feel smooth and responsive, with kinetic scrolling (a slight coasting effect after dragging and releasing) to feel natural.
    • Selection: The selected character on the map should have a subtle visual highlight (e.g., a soft, static white outline or circle) to indicate they are the subject of the details panel.
    • Details Panel: The panel should transition smoothly when a new character is selected, not abruptly flash or reload. Text content should be clearly legible against the panel’s dark background.

Epic 2: The Participant’s Journey

This epic covers the full interactive experience for an authenticated user.

User Story 2.1: As a new user, I want to join the world so that I can become an active participant.

  • 2.1.1 User Interaction Flow

    1. User clicks the “Interact” button in the footer. If not authenticated, a login modal (Clerk UI) appears.
    2. After successful authentication, the system begins the process of adding the user to the world. The “Interact” button should show a loading state.
    3. Upon success, a new character representing the user appears on the map. This character has a unique visual indicator.
    4. The “Interact” button in the footer changes to a “Leave” button.
  • 2.1.2 Business Logic & Acceptance Criteria

    • [Must] The joinWorld logic must only be triggerable by an authenticated user.
    • [Must] The backend must check if the world’s human player count has reached the MAX_HUMAN_PLAYERS limit. If so, it must return a “world full” error.
    • [Must] If the world is full, the frontend must display a user-friendly toast notification (e.g., “This town is bustling! Try joining later.”) and revert the “Interact” button from its loading state.
    • [Must] If the world has space, the backend must find a valid, unoccupied starting position for the player.
    • [Must] The new player entity created in the database must be assigned a random personality description from a predefined list.
  • 2.1.3 Backend Process Flow

    1. Receives joinWorld request with user’s auth token.
    2. Queries the database to count players where player.human is true.
    3. Compares count to MAX_HUMAN_PLAYERS constant. If at limit, throws an error.
    4. Calls findUnoccupiedPosition to get a safe spawn coordinate.
    5. Imports the list of possible character descriptions and selects one at random.
    6. Inserts a new players document with the user’s ID, spawn coordinates, and random description.
    7. The database’s real-time infrastructure pushes the new player data to all connected clients.
  • 2.1.4 Frontend Experience & Feedback

    • Loading State: While the backend is processing the join request, the “Interact” button should be disabled and its text replaced with a subtle loading spinner to provide feedback.
    • Character Spawn: The user’s character should appear on the map with a gentle fade-in animation and a brief, sparkling particle effect to feel magical.
    • Player Indicator: The user’s character must have a constant, slowly pulsing cyan indicator underneath it to be easily identifiable at all times.

User Story 2.2: As a participant, I want to move around the town so that I can explore and approach other characters.

  • 2.2.1 User Interaction Flow

    1. The user clicks on any walkable tile on the map.
    2. A visual indicator (a temporary, animated circle) instantly appears at the clicked destination.
    3. The user’s character turns to face the destination and begins walking along an intelligent path.
    4. The character automatically navigates around obstacles and other characters.
    5. The character stops upon reaching the destination, and the destination indicator disappears.
  • 2.2.2 Business Logic & Acceptance Criteria

    • [Must] The system must only accept movement commands for the user’s own authenticated character.
    • [Must] Clicks on non-walkable terrain (e.g., buildings, water) or outside the map boundaries must be ignored.
    • [Must] The backend must calculate the most efficient path using the A* algorithm.
    • [Must] The character’s movement speed must be constant and defined by the system.
    • [Should] If the destination becomes blocked mid-journey (e.g., by another character), the character should pause and attempt to recalculate the path after a short delay.
  • 2.2.3 Backend Process Flow

    1. Frontend sends a moveTo command with destination coordinates.
    2. Backend receives the command and validates the user’s identity.
    3. It calls the findRoute function, which implements the A* algorithm. The algorithm treats tiles occupied by other characters or map objects as non-walkable.
    4. The resulting path (an array of coordinates) is saved to the user’s player document in the database.
    5. The backend’s tick loop continuously updates the player’s (x, y) position along this path on every game tick, ensuring smooth movement.
  • 2.2.4 Frontend Experience & Feedback

    • Instant Feedback: The destination indicator must appear instantly on click, even before the path is calculated, to make the UI feel responsive. The indicator should have a subtle “ping” animation (a ripple effect).
    • Movement: Character animation should be smooth. When changing direction, the character sprite should smoothly transition to the correct orientation.

User Story 2.3: As a participant, I want to start a conversation with an AI agent so that I can interact with them.

  • 2.3.1 User Interaction Flow

    1. User clicks on an AI agent to select them, opening the details panel.
    2. The user clicks the “Start Conversation” button in the panel.
    3. The user’s character automatically begins walking toward the target AI agent. The AI agent turns to face the user.
    4. Once the user’s character is within a specific proximity of the AI agent, the chat interface in the details panel becomes active.
    5. The user can now type a message and press Enter to send it.
  • 2.3.2 Business Logic & Acceptance Criteria

    • [Must] The “Start Conversation” button must only be visible if neither the user nor the target AI agent is already in a conversation.
    • [Must] The conversation is not formally started until both participants are within the CONVERSATION_DISTANCE threshold.
    • [Must] Once a conversation is active, both participants are marked as “busy” and cannot be invited to other conversations.
    • [Must] The full transcript of the conversation must be saved to the database.
  • 2.3.3 Backend Process Flow

    1. Frontend sends startConversation command with the target agent’s ID.
    2. Backend creates a new conversations document in the database, adding both the user and the agent as participants with an initial state of “invited.”
    3. The backend’s tick loop monitors the distance between the two participants.
    4. When the distance is less than CONVERSATION_DISTANCE, it updates the state of both participants in the conversation document to “participating.”
    5. When the frontend sends a writeMessage command, the backend appends the new message to the conversation’s message log. This triggers the AI agent’s cognitive logic to generate a response.
  • 2.3.4 Frontend Experience & Feedback

    • Button State: After clicking “Start Conversation,” the button should become disabled and change text to “Walking over…” to clearly communicate the system state.
    • Chat UI: The chat input should become enabled only when the conversation officially begins. When the AI is “thinking” of a response, a “typing…” indicator should appear in the chat window.
    • Automatic Scrolling: The chat window must automatically scroll to the newest message as it appears.

Epic 3: The Living AI Agents

This epic defines the core cognitive systems that make the AI agents appear intelligent and alive.

User Story 3.1: As a system, I want AI agents to have persistent memories so that their past experiences can inform their future actions.

  • 3.1.1 System Interaction Flow (Internal)

    1. An AI agent completes a conversation.
    2. The system automatically triggers a “memory creation” process.
    3. The system calls an LLM to summarize the conversation from the agent’s perspective.
    4. The system calls an LLM again to rate the emotional importance (poignancy) of the summary.
    5. The system generates a vector embedding of the summary.
    6. The summary, poignancy score, and embedding are saved as a single “memory” unit in the agent’s database.
  • 3.1.2 Business Logic & Acceptance Criteria

    • [Must] Memory creation must be triggered automatically for an agent after every conversation they participate in.
    • [Must] The summarization prompt must instruct the LLM to write in the first person.
    • [Must] The poignancy rating prompt must instruct the LLM to return only a single integer on a scale of 1-10.
    • [Must] Each memory must be stored with a lastAccess timestamp, which is initialized to the creation time.

User Story 3.2: As a system, I want AI agents to intelligently retrieve relevant memories when making decisions.

  • 3.2.1 System Interaction Flow (Internal)

    1. An agent needs to make a decision (e.g., what to say next).
    2. The system initiates a memory search with a query related to the current context (e.g., “thoughts about Maya”).
    3. It performs a vector search to find semantically similar memories.
    4. It then scores these candidate memories based on a weighted combination of relevance, importance (poignancy), and recency.
    5. The highest-scoring memories are selected and provided as context to the agent’s “brain” (LLM).
    6. The lastAccess timestamp of the retrieved memories is updated.
  • 3.2.2 Business Logic & Acceptance Criteria

    • [Must] The final memory score must be calculated as a sum of three normalized components: relevance (from vector search), importance (from stored poignancy), and recency.
    • [Must] The recency score must decay exponentially over time (e.g., using a decay factor of 0.99 per hour since last access).
    • [Must] Updating the lastAccess timestamp upon retrieval is critical, as this makes recently recalled memories more likely to be recalled again soon.

User Story 3.3: As a system, I want AI agents to reflect on their experiences to form higher-level insights.

  • 3.3.1 System Interaction Flow (Internal)

    1. The system periodically checks an agent’s recent memories.
    2. If the cumulative poignancy of recent memories surpasses a REFLECTION_THRESHOLD, a reflection is triggered.
    3. The text from the last 100 memories is sent to an LLM.
    4. The LLM is prompted to “derive high-level insights” from these memories.
    5. The insights returned by the LLM are then saved as new, highly poignant memories for the agent.
  • 3.3.2 Business Logic & Acceptance Criteria

    • [Must] The reflection check must be performed after new memories are added.
    • [Must] The REFLECTION_THRESHOLD is a tunable constant, set to a default of 500.
    • [Must] The insights generated by the LLM must be treated as regular memories, allowing them to be retrieved in future memory searches to influence behavior.
Functional RoleCode FileDescription
Main UI Layout & Entrypointsrc/App.tsxThe root React component that assembles the main game screen, footer controls, and help modal. Manages global UI state.
Game World Renderer & Inputsrc/components/PixiGame.tsxManages the Pixi.js canvas, rendering the map and all characters. It is responsible for capturing all click-to-move inputs.
Character Details & Chat UIsrc/components/PlayerDetails.tsxThe side panel UI. Displays selected character information and orchestrates the entire chat interface.
Backend Command Hooksrc/hooks/sendInput.tsA crucial custom hook that provides a standardized, asynchronous way to send any user command to the backend and await its result.
World Entry/Exit Backend Logicconvex/world.tsContains the backend mutations for handling user joinWorld and leaveWorld requests, including validation and state updates.
Core Memory Systemconvex/agent/memory.tsManages the entire lifecycle of an agent’s memory: creation, importance scoring, retrieval (RAG), and self-reflection.
Agent State Machine & Triggersconvex/aiTown/agent.tsThe agent’s “brain” and state machine. Its tick() method decides when to trigger high-level AI operations like thinking or remembering.
Conversation State Machineconvex/aiTown/conversation.tsManages the complete lifecycle of conversations on the backend, from initiation and inviting participants to concluding a conversation.