Architecture
A deep dive into the internal architecture of the M3 Forge Agent Server: how requests flow through the system, how agents are configured, and how the persistence, scheduling, and service discovery layers work together.
System Overview
The Agent Server is built around a PostgreSQL-centric architecture where the database serves as the coordination backbone for task scheduling, state persistence, and checkpoint storage. External clients connect through stateless gateway servers, which route work into a PostgreSQL-backed task queue consumed by horizontally scalable worker processes.
Request flow:
- A client sends a request (create a run, invoke an assistant) to the Gateway over HTTP or gRPC.
- The Gateway authenticates the request, validates the payload, and inserts a job into the PostgreSQL task scheduler.
- The scheduler assigns the job to an available worker based on slot capacity and concurrency limits.
- The worker’s Coordinator loads the agent blueprint, resolves the execution DAG, and begins stepping through executors.
- At each step, the Coordinator writes checkpoints to PostgreSQL, caches intermediate results in Redis, and uploads large artifacts (documents, images) to S3.
- Once the run completes, the final result is persisted and the client is notified via polling or server-sent events.
Agent Blueprints
Agents are defined declaratively through flow YAML configs that describe a directed acyclic graph (DAG) of executors. These configs are loaded at server startup and used by the Coordinator to determine how a run should be executed.
Each flow config specifies:
- Executors and their dependencies (the DAG edges)
- Tool bindings available to each executor
- Routing policy governing how messages flow between agents
- Resource limits (timeouts, max tokens, memory caps)
# config/flows/document-review.yaml
name: document-review
description: Multi-step document analysis and review pipeline
executors:
- id: classifier
type: llm
model: claude-sonnet-4-5-20250929
tools: [classify_document]
output_to: [router]
- id: router
type: conditional
routes:
- condition: "output.category == 'legal'"
target: legal_reviewer
- condition: "output.category == 'financial'"
target: financial_reviewer
- default: general_reviewer
- id: legal_reviewer
type: llm
model: claude-opus-4-6
tools: [legal_db_search, citation_check]
output_to: [aggregator]
- id: financial_reviewer
type: llm
model: claude-opus-4-6
tools: [financial_db_search, compliance_check]
output_to: [aggregator]
- id: general_reviewer
type: llm
model: claude-sonnet-4-5-20250929
output_to: [aggregator]
- id: aggregator
type: merge
strategy: concatenate
routing_policy: message_driven
timeout: 300sAgent configs use a factory pattern for dynamic per-run configuration. When a run is created, the server clones the base blueprint and applies any run-level overrides (model, tools, parameters) before execution begins. This allows the same agent definition to serve multiple use cases without duplicating configuration.
Persistence Layer
The Agent Server organizes persistent data into three categories, each optimized for its access pattern.
Core Resources
Assistants, sessions, and runs are stored as normalized rows in PostgreSQL. These are the primary API resources that clients interact with and form the relational backbone of the system.
| Resource | Table | Key Relationships |
|---|---|---|
| Assistants | assistants | Has many sessions, references flow config |
| Sessions | sessions | Belongs to assistant, has many runs |
| Runs | runs | Belongs to session, has many checkpoints |
| Messages | messages | Belongs to run, ordered by sequence |
Checkpoints
Execution state is persisted as checkpoints in the workflow_checkpoints table via the PostgresCheckpointStore. Each checkpoint captures the full serialized state of a Coordinator at a given step, enabling:
- Resume after failure — restart a run from the last successful checkpoint instead of from scratch
- Time-travel debugging — inspect the exact state at any point during execution
- Branching — fork a run from a historical checkpoint with different parameters
CREATE TABLE workflow_checkpoints (
id UUID PRIMARY KEY,
run_id UUID NOT NULL REFERENCES runs(id),
step_index INTEGER NOT NULL,
state JSONB NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_checkpoints_run_updated
ON workflow_checkpoints (run_id, updated_at DESC);Checkpoints are stored as JSONB and indexed by updated_at for efficient retrieval of the latest state. The JSONB format allows the Coordinator to serialize arbitrary executor state without schema migrations when executor types evolve.
Long-Term Memory
For agents that need to recall information across sessions, the Agent Server integrates with marie-mem0, a memory layer backed by pgvector. This enables semantic search over past interactions:
- Embedding-based retrieval of relevant context from prior runs
- Workspace-scoped memory isolation (agents in different workspaces never share memory)
- Configurable retention policies to manage storage growth
Memory is distinct from checkpoints: checkpoints capture execution state within a single run, while memory persists knowledge across runs and sessions.
Task Queue
The Agent Server uses a PostgreSQL-backed task scheduler rather than a dedicated message broker for its primary job queue. This is a deliberate architectural choice.
Why PostgreSQL Over Redis?
- Transactional guarantees — Job insertion, status updates, and checkpoint writes participate in the same database transaction. A run is never marked as “started” unless its checkpoint is also persisted.
- Single persistence layer — No data synchronization between a separate queue (Redis) and the state database (PostgreSQL). Fewer moving parts means fewer failure modes.
- ACID compliance — Jobs are never lost, duplicated, or processed out of order, even during worker crashes or network partitions.
- Queryable queue — Operators can inspect, filter, and analyze the job queue using standard SQL, without specialized tooling.
Slot Capacity Manager
The scheduler enforces concurrency limits through a slot capacity manager. Each worker process registers its available slots (based on CPU, memory, and GPU resources), and the scheduler only assigns jobs when a slot is free.
Worker A: 4 slots total, 3 occupied → 1 available
Worker B: 8 slots total, 2 occupied → 6 available
Worker C: 4 slots total, 4 occupied → 0 available (no new jobs assigned)Semaphore Store
For operations requiring distributed mutual exclusion (such as preventing two workers from processing the same session concurrently), the scheduler provides a semaphore store backed by PostgreSQL advisory locks. This avoids the need for an external distributed lock service.
RabbitMQ for Async Messaging
While PostgreSQL handles the primary job queue, RabbitMQ is used for asynchronous message passing between Coordinators. This covers scenarios where:
- One agent needs to notify another agent in a different worker process
- Event-driven routing policies require pub/sub semantics
- Progress updates need to be broadcast to multiple subscribers (dashboards, SSE clients)
RabbitMQ complements rather than replaces the PostgreSQL scheduler. The scheduler owns job lifecycle; RabbitMQ handles inter-process communication.
Container Architecture
The Agent Server runs as two types of stateless containers, both horizontally scalable behind a load balancer.
Gateway Servers
Gateway servers expose the Agent Server API to external clients:
- REST API on port
51000for standard HTTP clients - gRPC API on port
52000for high-throughput internal services - Stateless request handling: authenticate, validate, enqueue, respond
- No local state — any gateway can serve any request
- Health check endpoint for load balancer integration
Queue Workers
Queue workers consume jobs from the PostgreSQL task queue and execute them:
- Each worker runs one or more Coordinators that manage agent execution
- Workers register their slot capacity with the scheduler on startup
- Executors run inside the worker process, invoking tools and LLM providers
- Workers deregister gracefully on shutdown, allowing in-flight jobs to complete or be reassigned
Both container types share the same codebase and configuration. The role (gateway vs. worker) is determined by a startup flag, enabling a single container image for simplified deployment.
Routing and Coordination
The Coordinator uses a RoutingPolicy to determine how messages flow between executors within a run. Two implementations are provided.
MessageDrivenRoutingPolicy (Default)
The default routing policy uses event-driven communication between agents. Each executor publishes messages to a shared bus, and downstream executors subscribe to the message types they care about.
This enables:
- Dynamic routing — executors can inspect message content and decide where to send output at runtime
- Multi-cast — a single executor can broadcast to multiple downstream consumers
- Communication graph tracking — all inter-agent messages are recorded, enabling visualization of the actual execution path after a run completes
The communication graph is stored alongside the run and can be rendered as a directed graph in the M3 Forge dashboard, showing which agents communicated, what messages were exchanged, and the order of interactions.
SequentialRoutingPolicy
A simpler alternative that executes agents in a fixed linear chain:
Executor A → Executor B → Executor C → DoneEach executor receives the full output of the previous one. There is no branching, no dynamic routing, and no multi-cast. This policy is appropriate for straightforward pipelines where the execution order is known at configuration time.
Choosing a Policy
| Policy | Use When |
|---|---|
message_driven | Complex multi-agent workflows, conditional branching, dynamic routing |
sequential | Simple linear pipelines, predictable execution order |
Set the policy in the flow config:
routing_policy: message_driven # or sequentialService Discovery
The Agent Server uses Consul for service discovery, enabling dynamic registration and routing of executors across the cluster.
Registration
When a worker starts, it registers its executors with the Consul catalog:
- Service name — the executor type (e.g.,
llm-executor,tool-executor) - Address and port — where the executor can be reached
- Tags — metadata such as supported models, available tools, GPU availability
- Health checks — HTTP and TCP checks to verify executor liveness
Health Checks
Consul runs two types of health checks against each registered executor:
- HTTP health check —
GET /healthreturns200 OKif the executor is ready to accept work - TCP health check — verifies the executor port is open and accepting connections
Failed health checks automatically deregister the executor from the service catalog, preventing the scheduler from assigning it new work.
Service Catalog
The Consul service catalog at port 8500 provides a real-time view of all available executors. The scheduler queries this catalog when assigning jobs to determine:
- Which executor types are available
- Which workers have capacity
- Geographic proximity for latency-sensitive routing
Consul also supports key-value storage for distributed configuration. The Agent Server uses this to propagate runtime feature flags and configuration updates without restarting workers.
Deregistration
When a worker shuts down gracefully, it deregisters its executors from Consul. If a worker crashes without deregistering, the health checks will fail and Consul automatically removes the stale entries after the configured timeout (default: 30 seconds).
Skill System
Skills extend agent capabilities with reusable, discoverable tool packages. The Agent Server provides a skill management system that handles discovery, registration, and routing.
SkillRegistry
The SkillRegistry discovers skills from two locations:
- System skills — shipped with the Agent Server in
config/skills/ - Workspace skills — user-defined skills in the workspace directory at
.marie/skills/
On startup, the registry scans both directories, parses each skill’s SKILL.md specification, and builds an in-memory index of available skills.
SKILL.md Specification
Each skill is defined by a SKILL.md file in its directory. This file serves as both documentation and configuration:
# Extract Tables
Extract structured table data from documents.
## Triggers
- /extract-tables
- /table-extract
## Tools
- pdf_reader
- table_parser
- csv_formatter
## Parameters
- format: Output format (json, csv, markdown). Default: json
- pages: Page range to process. Default: allThe registry parses the frontmatter and content to build the skill’s metadata, including its trigger commands, required tools, and configurable parameters.
SkillRouter
The SkillRouter resolves slash commands to skills. When an agent receives a message containing a slash command (e.g., /extract-tables), the router:
- Looks up the command in the skill index
- Loads the skill’s tool bindings
- Injects the tools into the executor’s context for the duration of the command
- Returns the tools to the pool after execution
Auto-Discovery and Hot-Reload
The SkillRegistry watches the skill directories for changes using filesystem events. When a new skill is added or an existing skill is modified:
- The registry re-parses the affected
SKILL.md - The skill index is updated in place
- No server restart is required
This enables operators to deploy new skills to running workers by dropping a directory into the workspace skills path.
Hot-reloaded skills take effect on the next run. Skills bound to an in-progress run continue using the version that was loaded when the run started.
Next Steps
- Scaling — horizontal scaling strategies and capacity planning
- Assistants — creating and configuring assistants
- Runs — executing and monitoring agent runs