Agent Server

The Agent Server is the production runtime for AI agents in M3 Forge. It hosts agent assistants as persistent, API-accessible services with built-in state management, task queuing, and horizontal scaling.

What is the Agent Server?

The Agent Server is an API-driven server that deploys agent configurations — called assistants — and manages their full execution lifecycle. It bridges the gap between agent design in the Agent Editor (BUILD) and production infrastructure (DEPLOY).


┌─────────────────┐     ┌─────────────────┐     ┌──────────────────┐
│   Agent Editor   │     │   Agent Server   │     │  Infrastructure  │
│    (BUILD)       │ ──▶ │    (DEPLOY)      │ ──▶ │   (OPERATE)      │
│                  │     │                  │     │                  │
│  Design agents,  │     │  Host assistants │     │  Scale, monitor, │
│  assign tools,   │     │  as services,    │     │  observe metrics  │
│  test prompts    │     │  manage state    │     │  and logs         │
└─────────────────┘     └─────────────────┘     └──────────────────┘

When you create and test an agent in the Agent Editor, you deploy it to the Agent Server as an assistant. The server then handles everything needed to run that agent reliably in production: accepting API requests, managing conversation state, queuing tasks, and scaling across workers.

Core Concepts

Assistants

Sessions

Runs

Scheduled Runs

Assistants

An assistant is a deployed agent configuration — a named, versioned snapshot of an agent’s model, system prompt, tools, and parameters. Assistants are the unit of deployment on the Agent Server. Each assistant exposes an API endpoint that clients use to create sessions and trigger runs.

Sessions

A session is a stateful execution context tied to an assistant. It maintains conversation history, checkpoint state, and memory across multiple interactions. When a user starts a chat with an assistant, the server creates a session that persists until explicitly closed or expired.

Runs

A run is a single execution instance within a session. Each time a client sends a message or triggers an assistant, the server creates a run that tracks the execution from start to completion — including intermediate tool calls, token usage, and any errors encountered.

Scheduled Runs

Scheduled runs enable recurring agent execution on cron schedules. Configure an assistant to run at fixed intervals — daily report generation, periodic data checks, or recurring analysis tasks — without manual invocation.

How It Works

The Agent Server processes requests through a layered architecture:

Gateway — Accepts client requests via REST or gRPC, authenticates callers, and validates input against assistant schemas.
Task Queue — Enqueues run requests into a PostgreSQL-backed job scheduler. The queue ensures at-least-once delivery, handles retries, and supports priority ordering.
Queue Workers (Coordinators) — Pull tasks from the queue and orchestrate agent execution. Each coordinator manages the agent’s reasoning loop: invoking the LLM, executing tool calls, and writing checkpoints.
Checkpoint Storage — Persists execution state to PostgreSQL after each step. If a worker crashes mid-run, another worker can resume from the last checkpoint.
LLM Providers — Coordinators call configured LLM providers (OpenAI, Anthropic, local models) based on the assistant’s model configuration.

The Agent Server uses the same PostgreSQL-backed scheduler as M3 Forge workflows. This shared infrastructure means agents and workflows benefit from the same reliability guarantees, monitoring, and scaling mechanisms.

Key Capabilities

Stateful Conversations

Every session maintains full conversation history and checkpoint state. Agents can reference previous messages, resume interrupted conversations, and build context over extended interactions — all without client-side state management.

Horizontal Scaling

The Agent Server scales by adding coordinator workers. Each worker pulls tasks independently from the shared queue, so throughput grows linearly with worker count. Checkpointing ensures no work is lost if a worker fails.

Task Queuing and Retries

All run requests flow through a durable task queue. Failed runs are retried according to configurable policies (max attempts, backoff intervals). The queue handles burst traffic by buffering requests and processing them at sustainable throughput.

Streaming Responses

Runs support streaming via Server-Sent Events (SSE). Clients receive intermediate tokens, tool call results, and status updates in real time as the agent executes — enabling responsive chat interfaces and live progress indicators.

Cron Scheduling

Schedule assistants to run on recurring cron expressions. The scheduler integrates with the same PostgreSQL job engine used by M3 Forge workflows, providing reliable execution with failure alerting and run history.

Relationship to Other Sections

The Agent Server connects several areas of the M3 Forge platform:

Concept	Related Section	Relationship
Assistants	Agents (BUILD)	Assistants deploy agent configs created in the Agent Editor
Sessions	Applications	Chat threads in apps create sessions on the Agent Server
Runs	Monitoring	Run execution metrics flow to the Monitoring dashboard
Scheduled Runs	Automation	Same cron engine, different targets (agents vs workflows)
Architecture	Infrastructure	Agent Server containers run on M3 Forge infrastructure

Architecture and Scaling

Architecture Deep Dive

Scaling Guide

Quick Start

Deploy an Assistant

Navigate to the Agent Editor, configure your agent, and click Deploy to Agent Server. The server creates a versioned assistant with an API endpoint.

Create a Session

Send a POST /assistants/{id}/sessions request to create a new session. The server initializes conversation state and returns a session ID.

Run the Assistant

Send a message to POST /sessions/{id}/runs to trigger a run. The server queues the task, executes the agent, and returns the result — or streams it via SSE.

Monitor Execution

View active runs, check completion status, and inspect execution traces in the Monitoring dashboard.

Agent Server

What is the Agent Server?

Core Concepts

Assistants

Sessions

Runs

Scheduled Runs

Assistants

Sessions

Runs

Scheduled Runs

How It Works

Key Capabilities

Stateful Conversations

Horizontal Scaling

Task Queuing and Retries

Streaming Responses

Cron Scheduling

Relationship to Other Sections

Architecture and Scaling

Architecture Deep Dive

Scaling Guide

Quick Start

Deploy an Assistant

Create a Session

Run the Assistant

Monitor Execution

Next Steps

Assistants

Sessions

Runs

Scheduled Runs