Agent Server
The Agent Server is the production runtime for AI agents in M3 Forge. It hosts agent assistants as persistent, API-accessible services with built-in state management, task queuing, and horizontal scaling.
What is the Agent Server?
The Agent Server is an API-driven server that deploys agent configurations — called assistants — and manages their full execution lifecycle. It bridges the gap between agent design in the Agent Editor (BUILD) and production infrastructure (DEPLOY).
┌─────────────────┐ ┌─────────────────┐ ┌──────────────────┐
│ Agent Editor │ │ Agent Server │ │ Infrastructure │
│ (BUILD) │ ──▶ │ (DEPLOY) │ ──▶ │ (OPERATE) │
│ │ │ │ │ │
│ Design agents, │ │ Host assistants │ │ Scale, monitor, │
│ assign tools, │ │ as services, │ │ observe metrics │
│ test prompts │ │ manage state │ │ and logs │
└─────────────────┘ └─────────────────┘ └──────────────────┘When you create and test an agent in the Agent Editor, you deploy it to the Agent Server as an assistant. The server then handles everything needed to run that agent reliably in production: accepting API requests, managing conversation state, queuing tasks, and scaling across workers.
Core Concepts
Assistants
An assistant is a deployed agent configuration — a named, versioned snapshot of an agent’s model, system prompt, tools, and parameters. Assistants are the unit of deployment on the Agent Server. Each assistant exposes an API endpoint that clients use to create sessions and trigger runs.
Sessions
A session is a stateful execution context tied to an assistant. It maintains conversation history, checkpoint state, and memory across multiple interactions. When a user starts a chat with an assistant, the server creates a session that persists until explicitly closed or expired.
Runs
A run is a single execution instance within a session. Each time a client sends a message or triggers an assistant, the server creates a run that tracks the execution from start to completion — including intermediate tool calls, token usage, and any errors encountered.
Scheduled Runs
Scheduled runs enable recurring agent execution on cron schedules. Configure an assistant to run at fixed intervals — daily report generation, periodic data checks, or recurring analysis tasks — without manual invocation.
How It Works
The Agent Server processes requests through a layered architecture:
- Gateway — Accepts client requests via REST or gRPC, authenticates callers, and validates input against assistant schemas.
- Task Queue — Enqueues run requests into a PostgreSQL-backed job scheduler. The queue ensures at-least-once delivery, handles retries, and supports priority ordering.
- Queue Workers (Coordinators) — Pull tasks from the queue and orchestrate agent execution. Each coordinator manages the agent’s reasoning loop: invoking the LLM, executing tool calls, and writing checkpoints.
- Checkpoint Storage — Persists execution state to PostgreSQL after each step. If a worker crashes mid-run, another worker can resume from the last checkpoint.
- LLM Providers — Coordinators call configured LLM providers (OpenAI, Anthropic, local models) based on the assistant’s model configuration.
The Agent Server uses the same PostgreSQL-backed scheduler as M3 Forge workflows. This shared infrastructure means agents and workflows benefit from the same reliability guarantees, monitoring, and scaling mechanisms.
Key Capabilities
Stateful Conversations
Every session maintains full conversation history and checkpoint state. Agents can reference previous messages, resume interrupted conversations, and build context over extended interactions — all without client-side state management.
Horizontal Scaling
The Agent Server scales by adding coordinator workers. Each worker pulls tasks independently from the shared queue, so throughput grows linearly with worker count. Checkpointing ensures no work is lost if a worker fails.
Task Queuing and Retries
All run requests flow through a durable task queue. Failed runs are retried according to configurable policies (max attempts, backoff intervals). The queue handles burst traffic by buffering requests and processing them at sustainable throughput.
Streaming Responses
Runs support streaming via Server-Sent Events (SSE). Clients receive intermediate tokens, tool call results, and status updates in real time as the agent executes — enabling responsive chat interfaces and live progress indicators.
Cron Scheduling
Schedule assistants to run on recurring cron expressions. The scheduler integrates with the same PostgreSQL job engine used by M3 Forge workflows, providing reliable execution with failure alerting and run history.
Relationship to Other Sections
The Agent Server connects several areas of the M3 Forge platform:
| Concept | Related Section | Relationship |
|---|---|---|
| Assistants | Agents (BUILD) | Assistants deploy agent configs created in the Agent Editor |
| Sessions | Applications | Chat threads in apps create sessions on the Agent Server |
| Runs | Monitoring | Run execution metrics flow to the Monitoring dashboard |
| Scheduled Runs | Automation | Same cron engine, different targets (agents vs workflows) |
| Architecture | Infrastructure | Agent Server containers run on M3 Forge infrastructure |
Architecture and Scaling
Quick Start
Deploy an Assistant
Navigate to the Agent Editor, configure your agent, and click Deploy to Agent Server. The server creates a versioned assistant with an API endpoint.
Create a Session
Send a POST /assistants/{id}/sessions request to create a new session. The server initializes conversation state and returns a session ID.
Run the Assistant
Send a message to POST /sessions/{id}/runs to trigger a run. The server queues the task, executes the agent, and returns the result — or streams it via SSE.
Monitor Execution
View active runs, check completion status, and inspect execution traces in the Monitoring dashboard.