Skip to Content

Traces

Inspect individual LLM calls with full request/response details, timing breakdowns, and debugging context.

Overview

Traces provide granular visibility into every LLM invocation in M3 Forge. Each trace captures the complete lifecycle of an AI model call, from request construction through response processing.

Traces are the lowest level of observability, enabling you to:

  • Debug failures by inspecting exact prompts and error messages
  • Replay requests to reproduce issues in development
  • Analyze latency with detailed timing breakdowns
  • Audit usage by reviewing what data was sent to external LLMs

Trace Detail View

Navigate to a trace by clicking any row in the LLM Observability table, or visit /llm-observability/traces/:traceId directly.

Trace list view showing LLM calls with model, tokens, latency, and status columns

Trace Header

The header displays key metadata:

  • Trace ID - Unique identifier (UUID) for correlation with logs
  • Timestamp - When the LLM call was initiated
  • Status - success, error, or timeout
  • Duration - Total time from request to response
  • Cost - Calculated cost based on token usage and model pricing

Request Details

The request panel shows:

{ "model": "gpt-4-turbo", "messages": [ { "role": "system", "content": "You are a helpful assistant that extracts structured data from documents." }, { "role": "user", "content": "Extract the invoice number, date, and total from:\n\nINVOICE #12345\nDate: 2024-03-19\nTotal: $1,234.56" } ], "temperature": 0.1, "max_tokens": 500, "stream": false }

This includes:

  • Model configuration - Temperature, max tokens, top-p, presence penalty
  • System prompt - Instruction context sent to the LLM
  • User prompt - Actual query or document content
  • Function definitions - If using function calling (OpenAI) or tools (Anthropic)

System prompts and user inputs are masked according to PII detection rules. See Privacy and Security for masking configuration.

Response Details

The response panel shows:

{ "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1710864000, "model": "gpt-4-turbo-2024-04-09", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "{\"invoice_number\": \"12345\", \"date\": \"2024-03-19\", \"total\": 1234.56}" }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 87, "completion_tokens": 32, "total_tokens": 119 } }

This includes:

  • Generated content - LLM’s response text or structured output
  • Finish reason - stop (natural completion), length (max tokens hit), content_filter, or function_call
  • Token usage - Exact counts for billing verification
  • Model version - Full identifier including snapshot date
Trace detail view showing full request/response, timing breakdown, and token usage for a single LLM call

Timing Breakdown

The timing panel visualizes latency with a waterfall chart:

PhaseDurationDescription
Queue23msTime spent waiting for gateway availability
Network145msRound-trip latency to provider API
TTFT1,234msTime to first token (streaming)
Generation3,567msFull response generation time
Total4,969msEnd-to-end duration

Hover over each phase in the waterfall to see exact timestamps and durations.

Workflow Context

If the trace is part of a workflow execution, the context panel shows:

  • Workflow name - Link to DAG definition
  • Run ID - Link to full execution logs
  • Node ID - Which Prompt node triggered this LLM call
  • Input data - JSONPath-accessible context passed to the node
  • Output data - Result returned from the LLM call

This allows you to see the full execution flow and understand how this trace fits into the larger pipeline.

Metadata

Additional metadata includes:

  • Gateway - Which Marie-AI gateway processed the request
  • User ID - Who initiated the workflow or prompt execution
  • Tags - Custom key-value pairs for categorization
  • Cache status - Whether the response was served from cache (if applicable)

Trace Collection

Automatic Instrumentation

M3 Forge automatically captures traces for:

  • Prompt nodes in workflows
  • Agent LLM calls during query plan execution
  • Guardrail evaluations that use LLM-based validation
  • Manual prompt testing in the Prompt Playground

No code changes or SDK integration is required. Traces are collected by the Marie-AI gateway layer before requests are sent to external providers.

Custom Tracing

For custom integrations, you can create traces programmatically:

import { trpc } from '@/lib/trpc'; const trace = await trpc.tracking.createLlmTrace.mutate({ model: 'gpt-4-turbo', request: { messages: [{ role: 'user', content: 'Hello' }], temperature: 0.7, }, response: { content: 'Hi there!', usage: { prompt_tokens: 10, completion_tokens: 5 }, }, latency_ms: 1234, workflow_id: 'my-workflow', node_id: 'prompt-node-1', });

Request Replay

Replay a trace to reproduce issues or test prompt changes:

Open trace detail

Navigate to the trace you want to replay.

Click “Replay Request”

This copies the exact request configuration to the Prompt Playground.

Modify if needed

Adjust system prompt, temperature, or other parameters to test variations.

Execute

Run the modified request and compare results with the original trace.

Replay is useful for:

  • Debugging non-deterministic failures - Rerun with same prompt to see if issue reproduces
  • Testing prompt improvements - Modify system prompt and compare quality
  • Validating gateway changes - Ensure new gateway version produces consistent results
  • Creating test cases - Export trace as unit test fixture

Span Analysis

Traces can include multiple spans for complex operations:

Span Hierarchy

Workflow Execution (root span) ├── Prompt Node 1 (span) │ └── LLM Call to GPT-4 (trace) ├── Guardrail Evaluation (span) │ └── LLM Call to Claude (trace) └── Prompt Node 2 (span) └── LLM Call to Gemini (trace)

Each span has:

  • Parent span ID - To reconstruct the tree
  • Start and end timestamps - For duration calculation
  • Attributes - Key-value metadata specific to that span
  • Events - Notable occurrences during execution (cache hit, retry, timeout)

Viewing Spans

In the trace detail view, expand the “Spans” section to see the full hierarchy. Click on any span to view:

  • Attributes - All metadata associated with the span
  • Events - Timeline of events within the span
  • Duration - Time spent in this span vs. children

This is particularly useful for debugging complex workflows where multiple LLM calls are chained together.

Searching Traces

Use advanced search to find specific traces:

Search across all trace content:

invoice extraction

This searches:

  • System prompts
  • User prompts
  • LLM responses
  • Error messages
  • Metadata tags

JSONPath Queries

Query nested fields with JSONPath syntax:

$.request.messages[?(@.role == 'system')].content contains "extract"

Regular Expressions

Use regex for pattern matching:

workflow_id matches "^prod-.*-v2$"

Saved Searches

Save frequently used queries as presets:

  • Failed GPT-4 calls - status:error AND model:gpt-4*
  • Expensive traces - cost > 0.50
  • Slow TTFT - ttft_ms > 5000

Debugging Failures

When a trace shows status: error, the error panel displays:

Error Details

{ "error": { "type": "RateLimitError", "message": "Rate limit exceeded for gpt-4-turbo in organization org-xyz", "code": "rate_limit_exceeded", "param": null, "http_status": 429 } }

Common Error Types

Error TypeCauseResolution
RateLimitErrorToo many requests to providerImplement backoff, request quota increase
ContentFilterErrorInput or output violated content policyReview prompt, add content guardrails
TimeoutErrorRequest exceeded configured timeoutIncrease timeout, optimize prompt length
AuthenticationErrorInvalid API keyVerify gateway configuration
InvalidRequestErrorMalformed request payloadCheck function definitions, message format

Retry Information

If the LLM call was retried, the trace shows:

  • Retry count - How many times the request was attempted
  • Backoff strategy - Exponential, linear, or fixed delay
  • Final outcome - Success after retry, or permanent failure

Request/Response Inspection

Syntax Highlighting

Request and response payloads are displayed with JSON syntax highlighting for readability.

Diff Mode

Compare two traces side-by-side to understand differences:

Select first trace

Click “Compare” in the trace detail header.

Select second trace

Choose another trace from the list or enter a trace ID.

View diff

See highlighted differences in request configuration and response content.

Useful for:

  • Comparing successful vs. failed calls
  • Analyzing impact of prompt changes
  • Debugging non-deterministic outputs

Copy Utilities

Quick-copy buttons for:

  • Request JSON - Copy to clipboard for testing
  • Response text - Copy LLM output
  • Trace ID - Copy for sharing with team
  • cURL command - Replay request from terminal

Next Steps

Last updated on