Prompt Playground

The Playground provides an interactive environment for testing prompts against live LLM providers with real-time streaming responses, variable substitution, and multi-modal support.

Overview

Test prompts before production deployment. The Playground automatically detects variables in your templates, presents fillable forms, and streams responses from any configured LLM connection. Save successful tests for later comparison or export results for analysis.

Prompt Playground showing template editor with variable inputs on left and streaming LLM response on right

Key Features

Auto-Detected Variables

Variables use curly brace syntax: {variable_name}. The Playground scans your prompt template and generates input fields automatically.

Example:


You are a {role} helping with {task}.

User question: {user_input}
Context: {context}

The UI displays four input fields:

role (e.g., “customer service agent”)
task (e.g., “technical troubleshooting”)
user_input (e.g., “My printer won’t connect”)
context (e.g., “Customer has HP LaserJet Pro”)

Variables support multiple types: text, numbers, JSON objects, and file attachments.

Playground variable panel showing auto-detected template variables with input fields and type selectors

Variable names must be alphanumeric with underscores. Nested variables like {user.name} are not supported — use flat keys like {user_name}.

Model Selection

Choose from all configured LLM connections:

Provider	Models	Features
OpenAI	GPT-4, GPT-4 Turbo, GPT-3.5 Turbo	Vision, streaming, function calling
Anthropic	Claude 3 Opus, Sonnet, Haiku	200K context, vision, artifacts
Qwen	Qwen2.5-72B, Qwen2.5-7B	High-performance Chinese/English
Custom	Any OpenAI-compatible API	Self-hosted models

The model selector displays context window size and capabilities (vision, streaming) as badges.

Parameter Presets

Fine-tune generation behavior with preset configurations:

Deterministic — Temperature 0, consistent outputs for classification and extraction
Balanced — Temperature 0.7, general-purpose tasks
Creative — Temperature 1.2, storytelling and brainstorming
Diverse — Temperature 0.9 with high penalties, maximum variety
Custom — Manual control of all parameters

Advanced parameters:

Temperature (0-2)
Top-P nucleus sampling (0-1)
Presence penalty (-2 to 2)
Frequency penalty (-2 to 2)
Max tokens
Stop sequences

Streaming Responses

All models support real-time streaming. Responses appear token-by-token as the LLM generates them, providing immediate feedback during testing.

Performance metrics displayed:

Time to first token (TTFT)
Tokens per second (TPS)
Total duration
Token count (input + output)
Estimated cost (based on provider pricing)

Vision-capable models (GPT-4 Vision, Claude 3) support image attachments:

Click the image upload button
Select image files (JPEG, PNG, GIF, WebP)
Images are automatically resized and base64-encoded
Add multiple images per prompt

Supported use cases:

Document analysis (invoices, receipts, contracts)
Visual QA (describe this image, what’s wrong here?)
OCR and text extraction
Diagram interpretation

Large images are automatically resized to 2048px max dimension to reduce API costs. Original aspect ratios are preserved.

Using the Playground

Select a Prompt

Navigate to a prompt repository and click any prompt file. The editor opens in the main panel.

Open Playground

Click “Test in Playground” from the toolbar or use the keyboard shortcut Cmd+Shift+P.

The Playground opens as a side panel showing:

Prompt template preview
Auto-detected variable inputs
Model selector
Parameter controls

Configure Test

Choose a model from the dropdown
Fill variable values in the auto-generated form
Adjust parameters using presets or custom values
Attach images if testing vision models (optional)

Run Test

Click “Run” or press Cmd+Enter. The response streams in real-time with metrics displayed at the bottom.

Save or Compare

Successful tests can be:

Saved to the test history for later reference
Exported as JSON for analysis
Compared against other test runs to evaluate quality differences

Advanced Features

Test History

Every test run is saved with metadata:

Timestamp
Model and parameters used
Variable values
Full response text
Performance metrics

Access history from the sidebar to replay tests or compare results across different configurations.

JSON Mode

For structured outputs, enable JSON mode in parameter settings. The LLM returns valid JSON that can be parsed directly into application logic.

Example prompt for JSON mode:


Extract customer information as JSON.

Input: "John Smith, john@example.com, lives in Seattle"

Output format:
{
  "name": "string",
  "email": "string",
  "location": "string"
}

The response is guaranteed valid JSON: {"name": "John Smith", "email": "john@example.com", "location": "Seattle"}

System Prompts

Separate system and user prompts for better control:

System prompt — Defines agent behavior (e.g., “You are a helpful assistant”)
User prompt — Contains the task or question

Both support variables and can be tested independently.

Streaming Control

Pause or cancel streaming responses:

Pause — Temporarily stop streaming without losing progress
Cancel — Abort the request and discard partial results
Resume — Continue from where streaming paused

Best Practices

Variable Naming

Use descriptive, consistent names:

Good: {customer_name}, {order_id}, {support_tier}
Bad: {x}, {temp}, {data}

Parameter Selection

Choose presets based on use case:

Classification/Extraction → Deterministic
General conversation → Balanced
Content generation → Creative
Brainstorming → Diverse

Cost Optimization

Reduce API costs during testing:

Limit max tokens to reasonable values
Use smaller models (GPT-3.5 vs GPT-4) for initial iterations
Compress images before upload
Monitor cost per test in the metrics panel

Test with the cheapest model first (Claude Haiku, GPT-3.5 Turbo). Upgrade to premium models only after validating prompt structure.

Keyboard Shortcuts

Action	Shortcut
Open Playground	`Cmd+Shift+P`
Run test	`Cmd+Enter`
Clear response	`Cmd+K`
Toggle parameters	`Cmd+/`
Save result	`Cmd+S`

Next Steps

Compare prompt variants across branches
Run A/B experiments for optimization
Deploy tested prompts to production workflows