Traces

Inspect individual LLM calls with full request/response details, timing breakdowns, and debugging context.

Overview

Traces provide granular visibility into every LLM invocation in M3 Forge. Each trace captures the complete lifecycle of an AI model call, from request construction through response processing.

Traces are the lowest level of observability, enabling you to:

Debug failures by inspecting exact prompts and error messages
Replay requests to reproduce issues in development
Analyze latency with detailed timing breakdowns
Audit usage by reviewing what data was sent to external LLMs

Trace Detail View

Navigate to a trace by clicking any row in the LLM Observability table, or visit /llm-observability/traces/:traceId directly.

Trace list view showing LLM calls with model, tokens, latency, and status columns

Trace Header

The header displays key metadata:

Trace ID - Unique identifier (UUID) for correlation with logs
Timestamp - When the LLM call was initiated
Status - success, error, or timeout
Duration - Total time from request to response
Cost - Calculated cost based on token usage and model pricing

Request Details

The request panel shows:


{
  "model": "gpt-4-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant that extracts structured data from documents."
    },
    {
      "role": "user",
      "content": "Extract the invoice number, date, and total from:\n\nINVOICE #12345\nDate: 2024-03-19\nTotal: $1,234.56"
    }
  ],
  "temperature": 0.1,
  "max_tokens": 500,
  "stream": false
}

This includes:

Model configuration - Temperature, max tokens, top-p, presence penalty
System prompt - Instruction context sent to the LLM
User prompt - Actual query or document content
Function definitions - If using function calling (OpenAI) or tools (Anthropic)

System prompts and user inputs are masked according to PII detection rules. See Privacy and Security for masking configuration.

Response Details

The response panel shows:


{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1710864000,
  "model": "gpt-4-turbo-2024-04-09",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "{\"invoice_number\": \"12345\", \"date\": \"2024-03-19\", \"total\": 1234.56}"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 87,
    "completion_tokens": 32,
    "total_tokens": 119
  }
}

This includes:

Generated content - LLM’s response text or structured output
Finish reason - stop (natural completion), length (max tokens hit), content_filter, or function_call
Token usage - Exact counts for billing verification
Model version - Full identifier including snapshot date

Timing Breakdown

The timing panel visualizes latency with a waterfall chart:

Phase	Duration	Description
Queue	23ms	Time spent waiting for gateway availability
Network	145ms	Round-trip latency to provider API
TTFT	1,234ms	Time to first token (streaming)
Generation	3,567ms	Full response generation time
Total	4,969ms	End-to-end duration

Hover over each phase in the waterfall to see exact timestamps and durations.

Workflow Context

If the trace is part of a workflow execution, the context panel shows:

Workflow name - Link to DAG definition
Run ID - Link to full execution logs
Node ID - Which Prompt node triggered this LLM call
Input data - JSONPath-accessible context passed to the node
Output data - Result returned from the LLM call

This allows you to see the full execution flow and understand how this trace fits into the larger pipeline.

Metadata

Additional metadata includes:

Gateway - Which Marie-AI gateway processed the request
User ID - Who initiated the workflow or prompt execution
Tags - Custom key-value pairs for categorization
Cache status - Whether the response was served from cache (if applicable)

Trace Collection

Automatic Instrumentation

M3 Forge automatically captures traces for:

Prompt nodes in workflows
Agent LLM calls during query plan execution
Guardrail evaluations that use LLM-based validation
Manual prompt testing in the Prompt Playground

No code changes or SDK integration is required. Traces are collected by the Marie-AI gateway layer before requests are sent to external providers.

Custom Tracing

For custom integrations, you can create traces programmatically:

tRPC


import { trpc } from '@/lib/trpc';
 
const trace = await trpc.tracking.createLlmTrace.mutate({
  model: 'gpt-4-turbo',
  request: {
    messages: [{ role: 'user', content: 'Hello' }],
    temperature: 0.7,
  },
  response: {
    content: 'Hi there!',
    usage: { prompt_tokens: 10, completion_tokens: 5 },
  },
  latency_ms: 1234,
  workflow_id: 'my-workflow',
  node_id: 'prompt-node-1',
});

Request Replay

Replay a trace to reproduce issues or test prompt changes:

Open trace detail

Navigate to the trace you want to replay.

Click “Replay Request”

This copies the exact request configuration to the Prompt Playground.

Modify if needed

Adjust system prompt, temperature, or other parameters to test variations.

Execute

Run the modified request and compare results with the original trace.

Replay is useful for:

Debugging non-deterministic failures - Rerun with same prompt to see if issue reproduces
Testing prompt improvements - Modify system prompt and compare quality
Validating gateway changes - Ensure new gateway version produces consistent results
Creating test cases - Export trace as unit test fixture

Span Analysis

Traces can include multiple spans for complex operations:

Span Hierarchy


Workflow Execution (root span)
├── Prompt Node 1 (span)
│   └── LLM Call to GPT-4 (trace)
├── Guardrail Evaluation (span)
│   └── LLM Call to Claude (trace)
└── Prompt Node 2 (span)
    └── LLM Call to Gemini (trace)

Each span has:

Parent span ID - To reconstruct the tree
Start and end timestamps - For duration calculation
Attributes - Key-value metadata specific to that span
Events - Notable occurrences during execution (cache hit, retry, timeout)

Viewing Spans

In the trace detail view, expand the “Spans” section to see the full hierarchy. Click on any span to view:

Attributes - All metadata associated with the span
Events - Timeline of events within the span
Duration - Time spent in this span vs. children

This is particularly useful for debugging complex workflows where multiple LLM calls are chained together.

Searching Traces

Use advanced search to find specific traces:

Full-Text Search

Search across all trace content:


invoice extraction

This searches:

System prompts
User prompts
LLM responses
Error messages
Metadata tags

JSONPath Queries

Query nested fields with JSONPath syntax:


$.request.messages[?(@.role == 'system')].content contains "extract"

Regular Expressions

Use regex for pattern matching:


workflow_id matches "^prod-.*-v2$"

Saved Searches

Save frequently used queries as presets:

Failed GPT-4 calls - status:error AND model:gpt-4*
Expensive traces - cost > 0.50
Slow TTFT - ttft_ms > 5000

Debugging Failures

When a trace shows status: error, the error panel displays:

Error Details


{
  "error": {
    "type": "RateLimitError",
    "message": "Rate limit exceeded for gpt-4-turbo in organization org-xyz",
    "code": "rate_limit_exceeded",
    "param": null,
    "http_status": 429
  }
}

Common Error Types

Error Type	Cause	Resolution
RateLimitError	Too many requests to provider	Implement backoff, request quota increase
ContentFilterError	Input or output violated content policy	Review prompt, add content guardrails
TimeoutError	Request exceeded configured timeout	Increase timeout, optimize prompt length
AuthenticationError	Invalid API key	Verify gateway configuration
InvalidRequestError	Malformed request payload	Check function definitions, message format

Retry Information

If the LLM call was retried, the trace shows:

Retry count - How many times the request was attempted
Backoff strategy - Exponential, linear, or fixed delay
Final outcome - Success after retry, or permanent failure

Request/Response Inspection

Syntax Highlighting

Request and response payloads are displayed with JSON syntax highlighting for readability.

Diff Mode

Compare two traces side-by-side to understand differences:

Select first trace

Click “Compare” in the trace detail header.

Select second trace

Choose another trace from the list or enter a trace ID.

View diff

See highlighted differences in request configuration and response content.

Useful for:

Comparing successful vs. failed calls
Analyzing impact of prompt changes
Debugging non-deterministic outputs

Copy Utilities

Quick-copy buttons for:

Request JSON - Copy to clipboard for testing
Response text - Copy LLM output
Trace ID - Copy for sharing with team
cURL command - Replay request from terminal

Next Steps

Review LLM Observability for aggregated cost and usage analytics
Set up SLA Monitoring to alert on trace anomalies
Explore Insights for workflow-level performance metrics