Prompts & Testing

M3 Forge provides a comprehensive prompt management system with version control, interactive testing, and A/B experimentation. Manage all your LLM prompts in a centralized repository with Git integration, test them in real-time against multiple models, and optimize performance through systematic experiments.

Why Prompt Management Matters

LLM prompts are critical business logic. As your AI applications scale, you need:

Version control for prompt changes
Centralized repository for team collaboration
Safe testing before production deployment
Data-driven optimization through A/B testing
Comparison tools to evaluate prompt variants

M3 Forge treats prompts as first-class code artifacts with full lifecycle management.

Prompts & Testing section showing prompt list with version indicators and test status

Key Features

Repository-Based Management

Store prompts in Git repositories with full version control. Every change is tracked, branches enable parallel development, and rollbacks are instant. The built-in file browser provides VS Code-like editing with Monaco editor integration.

Supported workflows:

Create and edit prompts with syntax highlighting
Branch management for isolated development
Commit history with detailed diffs
Pull changes from workspace files
Export prompts to deployment environments

Prompts sync bidirectionally. Edit in the UI or commit from your IDE — both workflows stay synchronized through Git.

Multi-Provider Testing

Test prompts against multiple LLM providers simultaneously:

OpenAI (GPT-4, GPT-3.5)
Anthropic (Claude 3 Opus, Sonnet, Haiku)
Qwen (Qwen2.5)
Hugging Face models
Custom endpoints

Variable substitution enables reusable templates. Define variables like {customer_name} or {product_id} and fill them at test time.

Statistical Experimentation

Run controlled A/B tests with traffic splitting, metric tracking, and statistical significance analysis. Compare multiple prompt variants under identical conditions to identify the highest-performing version.

Core Concepts

Prompt Templates

Templates use variable syntax for dynamic content:


You are an expert customer service agent.

Customer: {customer_name}
Issue: {issue_description}
Priority: {priority_level}

Provide a professional response.

Variables are auto-detected from curly brace syntax and presented as fillable fields in the Playground.

System vs. Message Prompts

System prompts define agent behavior and constraints
Message prompts contain the user input or task description

Both support variables and can be tested independently or together.

Experiments and Variants

Experiments define:

Variants — Different versions of the same prompt (A, B, C)
Traffic split — Percentage allocation across variants
Metrics — Quality scores, latency, cost per request
Sample size — Number of test runs required for statistical significance

M3 Forge automatically calculates confidence intervals and recommends winning variants.

Common Workflows

Testing a New Prompt

Navigate to your prompt repository
Select a prompt file from the file tree
Click “Test in Playground”
Choose a model and fill variable values
Run and review the response

Comparing Branches

Open a prompt in the editor
Click “Compare” in the toolbar
Select a branch to compare against
View side-by-side diff with highlighted changes
Copy content from either version

Creating an Experiment

Navigate to Experiments section
Click “New A/B Test”
Select the prompt to test
Define variants (different versions or parameters)
Set traffic split and run size
Launch experiment and monitor results

Integration with Workflows

Prompts tested in M3 Forge can be deployed directly to workflows:

Select a prompt from the repository
Reference it in a workflow node by path
Variables map automatically to workflow inputs
Changes to the prompt propagate to all workflows using it

This enables prompt engineering as a separate discipline from workflow design.

Prompts & Testing

Why Prompt Management Matters

Key Features

Repository-Based Management

Multi-Provider Testing

Statistical Experimentation

Core Concepts

Prompt Templates

System vs. Message Prompts

Experiments and Variants

Quick Start

Playground

Comparisons

Experiments

Common Workflows

Testing a New Prompt

Comparing Branches

Creating an Experiment

Integration with Workflows

Next Steps

Playground Guide

Experiment Design