Skip to Content
Prompts & TestingOverview

Prompts & Testing

M3 Forge provides a comprehensive prompt management system with version control, interactive testing, and A/B experimentation. Manage all your LLM prompts in a centralized repository with Git integration, test them in real-time against multiple models, and optimize performance through systematic experiments.

Why Prompt Management Matters

LLM prompts are critical business logic. As your AI applications scale, you need:

  • Version control for prompt changes
  • Centralized repository for team collaboration
  • Safe testing before production deployment
  • Data-driven optimization through A/B testing
  • Comparison tools to evaluate prompt variants

M3 Forge treats prompts as first-class code artifacts with full lifecycle management.

Prompts & Testing section showing prompt list with version indicators and test status

Key Features

Repository-Based Management

Store prompts in Git repositories with full version control. Every change is tracked, branches enable parallel development, and rollbacks are instant. The built-in file browser provides VS Code-like editing with Monaco editor integration.

Supported workflows:

  • Create and edit prompts with syntax highlighting
  • Branch management for isolated development
  • Commit history with detailed diffs
  • Pull changes from workspace files
  • Export prompts to deployment environments

Prompts sync bidirectionally. Edit in the UI or commit from your IDE — both workflows stay synchronized through Git.

Multi-Provider Testing

Test prompts against multiple LLM providers simultaneously:

  • OpenAI (GPT-4, GPT-3.5)
  • Anthropic (Claude 3 Opus, Sonnet, Haiku)
  • Qwen (Qwen2.5)
  • Hugging Face models
  • Custom endpoints

Variable substitution enables reusable templates. Define variables like {customer_name} or {product_id} and fill them at test time.

Statistical Experimentation

Run controlled A/B tests with traffic splitting, metric tracking, and statistical significance analysis. Compare multiple prompt variants under identical conditions to identify the highest-performing version.

Core Concepts

Prompt Templates

Templates use variable syntax for dynamic content:

You are an expert customer service agent. Customer: {customer_name} Issue: {issue_description} Priority: {priority_level} Provide a professional response.

Variables are auto-detected from curly brace syntax and presented as fillable fields in the Playground.

System vs. Message Prompts

  • System prompts define agent behavior and constraints
  • Message prompts contain the user input or task description

Both support variables and can be tested independently or together.

Experiments and Variants

Experiments define:

  • Variants — Different versions of the same prompt (A, B, C)
  • Traffic split — Percentage allocation across variants
  • Metrics — Quality scores, latency, cost per request
  • Sample size — Number of test runs required for statistical significance

M3 Forge automatically calculates confidence intervals and recommends winning variants.

Quick Start

Common Workflows

Testing a New Prompt

  1. Navigate to your prompt repository
  2. Select a prompt file from the file tree
  3. Click “Test in Playground”
  4. Choose a model and fill variable values
  5. Run and review the response

Comparing Branches

  1. Open a prompt in the editor
  2. Click “Compare” in the toolbar
  3. Select a branch to compare against
  4. View side-by-side diff with highlighted changes
  5. Copy content from either version

Creating an Experiment

  1. Navigate to Experiments section
  2. Click “New A/B Test”
  3. Select the prompt to test
  4. Define variants (different versions or parameters)
  5. Set traffic split and run size
  6. Launch experiment and monitor results

Integration with Workflows

Prompts tested in M3 Forge can be deployed directly to workflows:

  • Select a prompt from the repository
  • Reference it in a workflow node by path
  • Variables map automatically to workflow inputs
  • Changes to the prompt propagate to all workflows using it

This enables prompt engineering as a separate discipline from workflow design.

Next Steps

Last updated on