Traces
Inspect individual LLM calls with full request/response details, timing breakdowns, and debugging context.
Overview
Traces provide granular visibility into every LLM invocation in M3 Forge. Each trace captures the complete lifecycle of an AI model call, from request construction through response processing.
Traces are the lowest level of observability, enabling you to:
- Debug failures by inspecting exact prompts and error messages
- Replay requests to reproduce issues in development
- Analyze latency with detailed timing breakdowns
- Audit usage by reviewing what data was sent to external LLMs
Trace Detail View
Navigate to a trace by clicking any row in the LLM Observability table, or visit /llm-observability/traces/:traceId directly.

Trace Header
The header displays key metadata:
- Trace ID - Unique identifier (UUID) for correlation with logs
- Timestamp - When the LLM call was initiated
- Status -
success,error, ortimeout - Duration - Total time from request to response
- Cost - Calculated cost based on token usage and model pricing
Request Details
The request panel shows:
{
"model": "gpt-4-turbo",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that extracts structured data from documents."
},
{
"role": "user",
"content": "Extract the invoice number, date, and total from:\n\nINVOICE #12345\nDate: 2024-03-19\nTotal: $1,234.56"
}
],
"temperature": 0.1,
"max_tokens": 500,
"stream": false
}This includes:
- Model configuration - Temperature, max tokens, top-p, presence penalty
- System prompt - Instruction context sent to the LLM
- User prompt - Actual query or document content
- Function definitions - If using function calling (OpenAI) or tools (Anthropic)
System prompts and user inputs are masked according to PII detection rules. See Privacy and Security for masking configuration.
Response Details
The response panel shows:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710864000,
"model": "gpt-4-turbo-2024-04-09",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "{\"invoice_number\": \"12345\", \"date\": \"2024-03-19\", \"total\": 1234.56}"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 87,
"completion_tokens": 32,
"total_tokens": 119
}
}This includes:
- Generated content - LLM’s response text or structured output
- Finish reason -
stop(natural completion),length(max tokens hit),content_filter, orfunction_call - Token usage - Exact counts for billing verification
- Model version - Full identifier including snapshot date

Timing Breakdown
The timing panel visualizes latency with a waterfall chart:
| Phase | Duration | Description |
|---|---|---|
| Queue | 23ms | Time spent waiting for gateway availability |
| Network | 145ms | Round-trip latency to provider API |
| TTFT | 1,234ms | Time to first token (streaming) |
| Generation | 3,567ms | Full response generation time |
| Total | 4,969ms | End-to-end duration |
Hover over each phase in the waterfall to see exact timestamps and durations.
Workflow Context
If the trace is part of a workflow execution, the context panel shows:
- Workflow name - Link to DAG definition
- Run ID - Link to full execution logs
- Node ID - Which Prompt node triggered this LLM call
- Input data - JSONPath-accessible context passed to the node
- Output data - Result returned from the LLM call
This allows you to see the full execution flow and understand how this trace fits into the larger pipeline.
Metadata
Additional metadata includes:
- Gateway - Which Marie-AI gateway processed the request
- User ID - Who initiated the workflow or prompt execution
- Tags - Custom key-value pairs for categorization
- Cache status - Whether the response was served from cache (if applicable)
Trace Collection
Automatic Instrumentation
M3 Forge automatically captures traces for:
- Prompt nodes in workflows
- Agent LLM calls during query plan execution
- Guardrail evaluations that use LLM-based validation
- Manual prompt testing in the Prompt Playground
No code changes or SDK integration is required. Traces are collected by the Marie-AI gateway layer before requests are sent to external providers.
Custom Tracing
For custom integrations, you can create traces programmatically:
tRPC
import { trpc } from '@/lib/trpc';
const trace = await trpc.tracking.createLlmTrace.mutate({
model: 'gpt-4-turbo',
request: {
messages: [{ role: 'user', content: 'Hello' }],
temperature: 0.7,
},
response: {
content: 'Hi there!',
usage: { prompt_tokens: 10, completion_tokens: 5 },
},
latency_ms: 1234,
workflow_id: 'my-workflow',
node_id: 'prompt-node-1',
});Request Replay
Replay a trace to reproduce issues or test prompt changes:
Open trace detail
Navigate to the trace you want to replay.
Click “Replay Request”
This copies the exact request configuration to the Prompt Playground.
Modify if needed
Adjust system prompt, temperature, or other parameters to test variations.
Execute
Run the modified request and compare results with the original trace.
Replay is useful for:
- Debugging non-deterministic failures - Rerun with same prompt to see if issue reproduces
- Testing prompt improvements - Modify system prompt and compare quality
- Validating gateway changes - Ensure new gateway version produces consistent results
- Creating test cases - Export trace as unit test fixture
Span Analysis
Traces can include multiple spans for complex operations:
Span Hierarchy
Workflow Execution (root span)
├── Prompt Node 1 (span)
│ └── LLM Call to GPT-4 (trace)
├── Guardrail Evaluation (span)
│ └── LLM Call to Claude (trace)
└── Prompt Node 2 (span)
└── LLM Call to Gemini (trace)Each span has:
- Parent span ID - To reconstruct the tree
- Start and end timestamps - For duration calculation
- Attributes - Key-value metadata specific to that span
- Events - Notable occurrences during execution (cache hit, retry, timeout)
Viewing Spans
In the trace detail view, expand the “Spans” section to see the full hierarchy. Click on any span to view:
- Attributes - All metadata associated with the span
- Events - Timeline of events within the span
- Duration - Time spent in this span vs. children
This is particularly useful for debugging complex workflows where multiple LLM calls are chained together.
Searching Traces
Use advanced search to find specific traces:
Full-Text Search
Search across all trace content:
invoice extractionThis searches:
- System prompts
- User prompts
- LLM responses
- Error messages
- Metadata tags
JSONPath Queries
Query nested fields with JSONPath syntax:
$.request.messages[?(@.role == 'system')].content contains "extract"Regular Expressions
Use regex for pattern matching:
workflow_id matches "^prod-.*-v2$"Saved Searches
Save frequently used queries as presets:
- Failed GPT-4 calls -
status:error AND model:gpt-4* - Expensive traces -
cost > 0.50 - Slow TTFT -
ttft_ms > 5000
Debugging Failures
When a trace shows status: error, the error panel displays:
Error Details
{
"error": {
"type": "RateLimitError",
"message": "Rate limit exceeded for gpt-4-turbo in organization org-xyz",
"code": "rate_limit_exceeded",
"param": null,
"http_status": 429
}
}Common Error Types
| Error Type | Cause | Resolution |
|---|---|---|
| RateLimitError | Too many requests to provider | Implement backoff, request quota increase |
| ContentFilterError | Input or output violated content policy | Review prompt, add content guardrails |
| TimeoutError | Request exceeded configured timeout | Increase timeout, optimize prompt length |
| AuthenticationError | Invalid API key | Verify gateway configuration |
| InvalidRequestError | Malformed request payload | Check function definitions, message format |
Retry Information
If the LLM call was retried, the trace shows:
- Retry count - How many times the request was attempted
- Backoff strategy - Exponential, linear, or fixed delay
- Final outcome - Success after retry, or permanent failure
Request/Response Inspection
Syntax Highlighting
Request and response payloads are displayed with JSON syntax highlighting for readability.
Diff Mode
Compare two traces side-by-side to understand differences:
Select first trace
Click “Compare” in the trace detail header.
Select second trace
Choose another trace from the list or enter a trace ID.
View diff
See highlighted differences in request configuration and response content.
Useful for:
- Comparing successful vs. failed calls
- Analyzing impact of prompt changes
- Debugging non-deterministic outputs
Copy Utilities
Quick-copy buttons for:
- Request JSON - Copy to clipboard for testing
- Response text - Copy LLM output
- Trace ID - Copy for sharing with team
- cURL command - Replay request from terminal
Next Steps
- Review LLM Observability for aggregated cost and usage analytics
- Set up SLA Monitoring to alert on trace anomalies
- Explore Insights for workflow-level performance metrics