Runs

A run is a single invocation of an Assistant within a Session. Each run represents one execution cycle — from receiving input to producing output — with full state tracking, checkpointing, and monitoring. Runs are the atomic unit of work in the Agent Server; every interaction you observe, debug, or replay maps back to exactly one run.

Run Execution Lifecycle

Every run follows a deterministic five-stage lifecycle. Understanding these stages is essential for debugging failures and optimizing latency.

Request

The client sends input to the Gateway via REST or gRPC. The Gateway validates the request against the assistant’s input schema, authenticates the caller, and enqueues a job into the PostgreSQL-backed scheduler. At this point the run enters pending state and a run ID is returned to the caller.

Scheduling

The scheduler evaluates available worker capacity and assigns the job to an eligible worker slot. Assignment considers worker affinity, current load, and any resource constraints defined on the assistant (GPU requirements, memory limits). If no worker is available, the job remains queued until capacity frees up or a configurable queue timeout is reached.

Initialization

The assigned Coordinator loads the assistant configuration, restores session state from the latest checkpoint, hydrates tool registrations, and prepares the execution context. If the run belongs to a multi-agent workflow, the coordinator also resolves the agent graph and establishes inter-agent communication channels.

Execution

The agent processes the input through its reasoning loop: generating responses, invoking tools, and communicating with other agents when part of a coordinated workflow. Checkpoints are written at each step boundary so that the run can be resumed from the last stable state on failure. Events are streamed to the client in real time via the configured delivery mechanism (SSE, webhook, or polling).

Completion

The final output is persisted alongside execution metadata (token counts, timing, tool call history). The run status is updated to completed, failed, or cancelled depending on the outcome. Session state is saved for use by subsequent runs in the same session.

Run Types

The Agent Server supports four execution modes. Choose the mode that matches your client’s interaction pattern and latency requirements.

Synchronous

The client sends a request and blocks until the full response is available. This is the simplest integration pattern, suitable for fast tasks that complete within a few seconds.


Client ──request──▶ Gateway ──▶ Worker ──response──▶ Client
                    (client blocks)

Streaming (SSE)

Server-Sent Events deliver incremental results as the agent produces them. The client receives a stream of partial outputs, tool call notifications, and status updates in real time. Best for chat-style interactions where perceived latency matters.


Client ──request──▶ Gateway ──▶ Worker
Client ◀──event 1──
Client ◀──event 2──
Client ◀──event N──
Client ◀──done─────

Background

The Gateway returns immediately with a run ID. The client polls a status endpoint or listens for completion via a separate channel. Use this mode for long-running tasks where holding an open connection is impractical.


Client ──request──▶ Gateway ──run_id──▶ Client
                    Worker executes asynchronously
Client ──poll────▶ Gateway ──status──▶ Client
Client ──poll────▶ Gateway ──result──▶ Client

Webhook

Similar to background mode, but instead of polling, the Gateway delivers results to a callback URL when the run completes. Best for server-to-server async integrations where the caller does not maintain a persistent connection.


Client ──request + callback_url──▶ Gateway ──run_id──▶ Client
                                   Worker executes asynchronously
Gateway ──POST result──▶ callback_url

Mode Comparison

Mode	Latency Profile	Best Use Case	Client Pattern
Synchronous	Low (sub-second tasks)	Simple queries, tool lookups	Request/response
Streaming	Progressive (first token fast)	Chat, content generation	EventSource / SSE
Background	Deferred (minutes to hours)	Batch processing, analysis	Poll with run ID
Webhook	Deferred (minutes to hours)	Async integrations, pipelines	Receive POST callback

Run States

A run progresses through a state machine with six possible states. Transitions are immutable and logged with timestamps.

State Definitions

State	Description
pending	Queued in the scheduler, waiting for a worker slot to become available.
running	Actively executing on an assigned worker. Checkpoints are being written.
completed	Finished successfully. Output and metadata are persisted.
failed	Terminated due to an error. Error details including stack trace and failing step are recorded.
cancelled	Terminated by explicit client request or system-level cancellation (timeout, resource limit).
paused	Execution is suspended, waiting for external input such as human-in-the-loop approval. Resumes when input is provided or times out.

Coordination Patterns

When a run involves multiple agents, the Agent Server uses a coordinator to manage execution flow. Three built-in coordinator types are available.

WorkflowCoordinator

Message-driven orchestration where agents communicate via typed messages. The coordinator routes messages between agents based on the workflow graph. This is the default for multi-agent workflows and supports complex branching, loops, and conditional routing.


coordinator:
  type: workflow
  graph:
    researcher:
      outputs_to: [analyst, writer]
    analyst:
      outputs_to: [writer]
    writer:
      outputs_to: [__end__]

ChainCoordinator

Sequential pipeline where the output of agent N becomes the input of agent N+1. Simple to reason about and debug. Use this when agents have strict ordering dependencies.


coordinator:
  type: chain
  agents:
    - extractor
    - transformer
    - validator
    - publisher

FanOutCoordinator

Parallel execution across multiple agents with result aggregation. The coordinator distributes the same input (or partitioned input) to all agents, waits for completion, and merges results using a configurable aggregation strategy.


coordinator:
  type: fan_out
  agents:
    - spanish_translator
    - french_translator
    - german_translator
  aggregation: merge_all
  timeout_per_agent: 30s

Inter-Agent Communication

Within a multi-agent run, agents exchange AgentMessage objects through the coordinator. Messages are typed to enable structured communication patterns.

Message Types

Type	Purpose
`TASK`	Assign work to another agent
`RESULT`	Return completed work to the requester
`VALIDATION`	Request or provide validation of an output
`ERROR`	Signal a failure to the coordinator or peer agents
`CONTROL`	System-level directives (pause, resume, cancel)

Special Receivers

Messages can target specific agents by name, or use reserved addresses for common routing patterns:

__end__ — Terminate the workflow and return the message payload as the run output
__self__ — Send the message back to the originating agent (self-loop for iterative refinement)
__prev__ — Route to the previous agent in a chain
__next__ — Route to the next agent in a chain

Example: Inter-Agent Message Flow


researcher                    analyst                      writer
    │                            │                            │
    ├── TASK(query) ────────────▶│                            │
    │                            ├── RESULT(findings) ───────▶│
    │                            │                            │
    │   VALIDATION(draft) ◀──────┤◀── VALIDATION(draft) ─────┤
    │                            │                            │
    ├── RESULT(approved) ───────▶│                            │
    │                            ├── RESULT(final) ──────────▶│
    │                            │                            ├──▶ __end__

In this flow, the researcher assigns a task to the analyst, who produces findings and passes them to the writer. The writer sends a draft back for validation, and after approval the final output is delivered to __end__, completing the run.

Run Monitoring

Every run captures detailed telemetry that is available in real time during execution and persisted for post-hoc analysis.

Token Usage

Input and output tokens are tracked per step and aggregated at the run level. Token counts are broken down by agent when multiple agents participate, enabling cost attribution across a multi-agent workflow.

Execution Timeline

Each step records start and end timestamps, providing a full duration breakdown. The timeline view in the dashboard visualizes sequential and parallel execution phases, making it straightforward to identify which steps dominate total latency.

Tool Call History

Every tool invocation is logged with its input arguments, output, duration, and success/failure status. Tool call history enables auditing of agent behavior and debugging of incorrect tool usage.

Communication Graph

For multi-agent runs, the communication graph captures every inter-agent message with sender, receiver, type, and timestamp. This graph can be visualized in the dashboard to understand coordination dynamics and identify communication bottlenecks.

Error Handling

The Agent Server provides multiple layers of resilience to handle failures gracefully.

Retry Policies

Each assistant can define a retry policy with configurable retry count and backoff strategy. Retries apply to transient failures such as network timeouts, rate limits, and temporary service unavailability.


assistant:
  retry_policy:
    max_retries: 3
    backoff: exponential
    initial_delay: 1s
    max_delay: 30s

Timeouts

Execution timeouts are enforced via the ExecutionContext at two granularities:

Per-run timeout — Maximum total duration for the entire run
Per-step timeout — Maximum duration for a single reasoning/tool step

When a timeout fires, the run transitions to failed with a timeout error that identifies the step that exceeded the limit.

Checkpoint Recovery

The Agent Server writes checkpoints at each step boundary during execution. When a run fails and is retried, execution resumes from the last successful checkpoint instead of replaying the entire run from scratch. This reduces both latency and cost for retries.

Checkpoint-based recovery means that retrying a failed run does not re-execute steps that already completed successfully. Only the failed step and subsequent steps are re-run, preserving intermediate results and avoiding redundant token consumption.

Dead Letter Queue

Runs that exhaust all retry attempts are moved to a dead letter queue (DLQ). The DLQ retains the full run context — input, checkpoint state, error history — so that operators can inspect failures, fix the underlying issue, and manually replay the run when ready.

Assistants — Configure the agents that runs execute
Sessions — Understand the session context that persists across runs
Scheduled Runs — Trigger runs on a recurring schedule
Architecture — System design and component topology