Skip to Content
Agent ServerArchitecture

Architecture

A deep dive into the internal architecture of the M3 Forge Agent Server: how requests flow through the system, how agents are configured, and how the persistence, scheduling, and service discovery layers work together.

System Overview

The Agent Server is built around a PostgreSQL-centric architecture where the database serves as the coordination backbone for task scheduling, state persistence, and checkpoint storage. External clients connect through stateless gateway servers, which route work into a PostgreSQL-backed task queue consumed by horizontally scalable worker processes.

Request flow:

  • A client sends a request (create a run, invoke an assistant) to the Gateway over HTTP or gRPC.
  • The Gateway authenticates the request, validates the payload, and inserts a job into the PostgreSQL task scheduler.
  • The scheduler assigns the job to an available worker based on slot capacity and concurrency limits.
  • The worker’s Coordinator loads the agent blueprint, resolves the execution DAG, and begins stepping through executors.
  • At each step, the Coordinator writes checkpoints to PostgreSQL, caches intermediate results in Redis, and uploads large artifacts (documents, images) to S3.
  • Once the run completes, the final result is persisted and the client is notified via polling or server-sent events.

Agent Blueprints

Agents are defined declaratively through flow YAML configs that describe a directed acyclic graph (DAG) of executors. These configs are loaded at server startup and used by the Coordinator to determine how a run should be executed.

Each flow config specifies:

  • Executors and their dependencies (the DAG edges)
  • Tool bindings available to each executor
  • Routing policy governing how messages flow between agents
  • Resource limits (timeouts, max tokens, memory caps)
# config/flows/document-review.yaml name: document-review description: Multi-step document analysis and review pipeline executors: - id: classifier type: llm model: claude-sonnet-4-5-20250929 tools: [classify_document] output_to: [router] - id: router type: conditional routes: - condition: "output.category == 'legal'" target: legal_reviewer - condition: "output.category == 'financial'" target: financial_reviewer - default: general_reviewer - id: legal_reviewer type: llm model: claude-opus-4-6 tools: [legal_db_search, citation_check] output_to: [aggregator] - id: financial_reviewer type: llm model: claude-opus-4-6 tools: [financial_db_search, compliance_check] output_to: [aggregator] - id: general_reviewer type: llm model: claude-sonnet-4-5-20250929 output_to: [aggregator] - id: aggregator type: merge strategy: concatenate routing_policy: message_driven timeout: 300s

Agent configs use a factory pattern for dynamic per-run configuration. When a run is created, the server clones the base blueprint and applies any run-level overrides (model, tools, parameters) before execution begins. This allows the same agent definition to serve multiple use cases without duplicating configuration.

Persistence Layer

The Agent Server organizes persistent data into three categories, each optimized for its access pattern.

Core Resources

Assistants, sessions, and runs are stored as normalized rows in PostgreSQL. These are the primary API resources that clients interact with and form the relational backbone of the system.

ResourceTableKey Relationships
AssistantsassistantsHas many sessions, references flow config
SessionssessionsBelongs to assistant, has many runs
RunsrunsBelongs to session, has many checkpoints
MessagesmessagesBelongs to run, ordered by sequence

Checkpoints

Execution state is persisted as checkpoints in the workflow_checkpoints table via the PostgresCheckpointStore. Each checkpoint captures the full serialized state of a Coordinator at a given step, enabling:

  • Resume after failure — restart a run from the last successful checkpoint instead of from scratch
  • Time-travel debugging — inspect the exact state at any point during execution
  • Branching — fork a run from a historical checkpoint with different parameters
CREATE TABLE workflow_checkpoints ( id UUID PRIMARY KEY, run_id UUID NOT NULL REFERENCES runs(id), step_index INTEGER NOT NULL, state JSONB NOT NULL, metadata JSONB DEFAULT '{}', created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_checkpoints_run_updated ON workflow_checkpoints (run_id, updated_at DESC);

Checkpoints are stored as JSONB and indexed by updated_at for efficient retrieval of the latest state. The JSONB format allows the Coordinator to serialize arbitrary executor state without schema migrations when executor types evolve.

Long-Term Memory

For agents that need to recall information across sessions, the Agent Server integrates with marie-mem0, a memory layer backed by pgvector. This enables semantic search over past interactions:

  • Embedding-based retrieval of relevant context from prior runs
  • Workspace-scoped memory isolation (agents in different workspaces never share memory)
  • Configurable retention policies to manage storage growth

Memory is distinct from checkpoints: checkpoints capture execution state within a single run, while memory persists knowledge across runs and sessions.

Task Queue

The Agent Server uses a PostgreSQL-backed task scheduler rather than a dedicated message broker for its primary job queue. This is a deliberate architectural choice.

Why PostgreSQL Over Redis?

  • Transactional guarantees — Job insertion, status updates, and checkpoint writes participate in the same database transaction. A run is never marked as “started” unless its checkpoint is also persisted.
  • Single persistence layer — No data synchronization between a separate queue (Redis) and the state database (PostgreSQL). Fewer moving parts means fewer failure modes.
  • ACID compliance — Jobs are never lost, duplicated, or processed out of order, even during worker crashes or network partitions.
  • Queryable queue — Operators can inspect, filter, and analyze the job queue using standard SQL, without specialized tooling.

Slot Capacity Manager

The scheduler enforces concurrency limits through a slot capacity manager. Each worker process registers its available slots (based on CPU, memory, and GPU resources), and the scheduler only assigns jobs when a slot is free.

Worker A: 4 slots total, 3 occupied → 1 available Worker B: 8 slots total, 2 occupied → 6 available Worker C: 4 slots total, 4 occupied → 0 available (no new jobs assigned)

Semaphore Store

For operations requiring distributed mutual exclusion (such as preventing two workers from processing the same session concurrently), the scheduler provides a semaphore store backed by PostgreSQL advisory locks. This avoids the need for an external distributed lock service.

RabbitMQ for Async Messaging

While PostgreSQL handles the primary job queue, RabbitMQ is used for asynchronous message passing between Coordinators. This covers scenarios where:

  • One agent needs to notify another agent in a different worker process
  • Event-driven routing policies require pub/sub semantics
  • Progress updates need to be broadcast to multiple subscribers (dashboards, SSE clients)

RabbitMQ complements rather than replaces the PostgreSQL scheduler. The scheduler owns job lifecycle; RabbitMQ handles inter-process communication.

Container Architecture

The Agent Server runs as two types of stateless containers, both horizontally scalable behind a load balancer.

Gateway Servers

Gateway servers expose the Agent Server API to external clients:

  • REST API on port 51000 for standard HTTP clients
  • gRPC API on port 52000 for high-throughput internal services
  • Stateless request handling: authenticate, validate, enqueue, respond
  • No local state — any gateway can serve any request
  • Health check endpoint for load balancer integration

Queue Workers

Queue workers consume jobs from the PostgreSQL task queue and execute them:

  • Each worker runs one or more Coordinators that manage agent execution
  • Workers register their slot capacity with the scheduler on startup
  • Executors run inside the worker process, invoking tools and LLM providers
  • Workers deregister gracefully on shutdown, allowing in-flight jobs to complete or be reassigned

Both container types share the same codebase and configuration. The role (gateway vs. worker) is determined by a startup flag, enabling a single container image for simplified deployment.

Routing and Coordination

The Coordinator uses a RoutingPolicy to determine how messages flow between executors within a run. Two implementations are provided.

MessageDrivenRoutingPolicy (Default)

The default routing policy uses event-driven communication between agents. Each executor publishes messages to a shared bus, and downstream executors subscribe to the message types they care about.

This enables:

  • Dynamic routing — executors can inspect message content and decide where to send output at runtime
  • Multi-cast — a single executor can broadcast to multiple downstream consumers
  • Communication graph tracking — all inter-agent messages are recorded, enabling visualization of the actual execution path after a run completes

The communication graph is stored alongside the run and can be rendered as a directed graph in the M3 Forge dashboard, showing which agents communicated, what messages were exchanged, and the order of interactions.

SequentialRoutingPolicy

A simpler alternative that executes agents in a fixed linear chain:

Executor A → Executor B → Executor C → Done

Each executor receives the full output of the previous one. There is no branching, no dynamic routing, and no multi-cast. This policy is appropriate for straightforward pipelines where the execution order is known at configuration time.

Choosing a Policy

PolicyUse When
message_drivenComplex multi-agent workflows, conditional branching, dynamic routing
sequentialSimple linear pipelines, predictable execution order

Set the policy in the flow config:

routing_policy: message_driven # or sequential

Service Discovery

The Agent Server uses Consul for service discovery, enabling dynamic registration and routing of executors across the cluster.

Registration

When a worker starts, it registers its executors with the Consul catalog:

  • Service name — the executor type (e.g., llm-executor, tool-executor)
  • Address and port — where the executor can be reached
  • Tags — metadata such as supported models, available tools, GPU availability
  • Health checks — HTTP and TCP checks to verify executor liveness

Health Checks

Consul runs two types of health checks against each registered executor:

  • HTTP health checkGET /health returns 200 OK if the executor is ready to accept work
  • TCP health check — verifies the executor port is open and accepting connections

Failed health checks automatically deregister the executor from the service catalog, preventing the scheduler from assigning it new work.

Service Catalog

The Consul service catalog at port 8500 provides a real-time view of all available executors. The scheduler queries this catalog when assigning jobs to determine:

  • Which executor types are available
  • Which workers have capacity
  • Geographic proximity for latency-sensitive routing

Consul also supports key-value storage for distributed configuration. The Agent Server uses this to propagate runtime feature flags and configuration updates without restarting workers.

Deregistration

When a worker shuts down gracefully, it deregisters its executors from Consul. If a worker crashes without deregistering, the health checks will fail and Consul automatically removes the stale entries after the configured timeout (default: 30 seconds).

Skill System

Skills extend agent capabilities with reusable, discoverable tool packages. The Agent Server provides a skill management system that handles discovery, registration, and routing.

SkillRegistry

The SkillRegistry discovers skills from two locations:

  1. System skills — shipped with the Agent Server in config/skills/
  2. Workspace skills — user-defined skills in the workspace directory at .marie/skills/

On startup, the registry scans both directories, parses each skill’s SKILL.md specification, and builds an in-memory index of available skills.

SKILL.md Specification

Each skill is defined by a SKILL.md file in its directory. This file serves as both documentation and configuration:

# Extract Tables Extract structured table data from documents. ## Triggers - /extract-tables - /table-extract ## Tools - pdf_reader - table_parser - csv_formatter ## Parameters - format: Output format (json, csv, markdown). Default: json - pages: Page range to process. Default: all

The registry parses the frontmatter and content to build the skill’s metadata, including its trigger commands, required tools, and configurable parameters.

SkillRouter

The SkillRouter resolves slash commands to skills. When an agent receives a message containing a slash command (e.g., /extract-tables), the router:

  1. Looks up the command in the skill index
  2. Loads the skill’s tool bindings
  3. Injects the tools into the executor’s context for the duration of the command
  4. Returns the tools to the pool after execution

Auto-Discovery and Hot-Reload

The SkillRegistry watches the skill directories for changes using filesystem events. When a new skill is added or an existing skill is modified:

  • The registry re-parses the affected SKILL.md
  • The skill index is updated in place
  • No server restart is required

This enables operators to deploy new skills to running workers by dropping a directory into the workspace skills path.

Hot-reloaded skills take effect on the next run. Skills bound to an in-progress run continue using the version that was loaded when the run started.

Next Steps

  • Scaling — horizontal scaling strategies and capacity planning
  • Assistants — creating and configuring assistants
  • Runs — executing and monitoring agent runs
Last updated on