Skip to Content
AdministrationLLM Connections

LLM Connections

Configure multiple LLM providers and models for diverse AI capabilities, cost optimization, and failover redundancy.

Overview

M3 Forge supports connections to multiple large language model providers, enabling you to:

  • Use the best model for each task — GPT-4o for structured extraction, Claude for long documents
  • Optimize costs — Route simple tasks to cheaper models, complex tasks to premium models
  • Ensure availability — Failover to secondary provider if primary has an outage
  • Meet compliance requirements — Use regional endpoints or private deployments
  • Experiment with new models — Test cutting-edge capabilities without migration

Each workspace configures its own LLM connections with separate API keys and settings.

LLM Connections page showing configured providers (OpenAI, Anthropic, etc.) with status indicators and model lists

Supported Providers

M3 Forge integrates with major LLM providers:

ProviderModelsKey FeaturesUse Cases
OpenAIGPT-4o, GPT-4 Turbo, GPT-4, GPT-3.5 TurboFunction calling, vision, JSON mode, embeddingsGeneral purpose, structured extraction, code generation
AnthropicClaude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku200K context, extended thinking, refusal resistanceLong documents, complex reasoning, safety-critical applications
Azure OpenAIGPT-4o, GPT-4, GPT-3.5Enterprise SLA, regional deployment, complianceEnterprise deployments, HIPAA/SOC2 compliance
Google Vertex AIGemini 1.5 Pro, Gemini 1.5 FlashMultimodal, 2M context, groundingDocument understanding, video analysis, large context needs
AWS BedrockClaude, Llama, TitanAWS-native, data residency, IAM integrationAWS-centric architectures, regulated industries
QwenQwen-2.5, Qwen-VLMultilingual (120+ languages), visionNon-English content, international deployments

Adding an LLM Connection

From workspace settings, go to Settings → LLM Connections.

Click Add Connection

Select New LLM Connection and choose the provider type.

Configure Connection

Provide provider-specific details:

OpenAI Configuration:

  • API Key — From OpenAI platform (starts with sk-)
  • Organization ID — Optional, for organization billing tracking
  • Base URL — Leave default unless using a proxy
  • Models — Select available models to enable (GPT-4o, GPT-4 Turbo, etc.)

Optional:

  • Rate limit — Max requests per minute (default: 10,000)
  • Timeout — Request timeout in seconds (default: 60)

Test Connection

Click Test Connection to verify:

  • API key is valid
  • Network connectivity is working
  • Models are accessible

The test makes a simple API call and displays the response.

Save

Connection is saved and immediately available for use in workflows and prompts.

API keys are encrypted at rest using AES-256. They are only decrypted when making API calls to providers.

Managing Connections

Viewing Connections

The LLM Connections page displays all configured providers:

ProviderModelsStatusUsage (30d)Actions
OpenAIGPT-4o, GPT-3.5 Turbo✓ Active145,234 requestsEdit, Test, Disable
AnthropicClaude 3.5 Sonnet✓ Active23,891 requestsEdit, Test, Disable
Azure OpenAIGPT-4 Turbo⚠ Warning2,103 requestsEdit, Test, Enable

Status indicators:

  • ✓ Active — Connection working normally
  • ⚠ Warning — Recent errors or approaching rate limits
  • ✗ Error — Connection failing (invalid key, network issue, quota exceeded)

Editing Connections

To update a connection:

  1. Click Edit next to the connection
  2. Modify settings (API key, enabled models, rate limits)
  3. Click Test Connection to verify changes
  4. Save

Changes take effect immediately for new workflow runs.

Disabling Connections

To temporarily disable a connection without deleting it:

  1. Click Disable
  2. Confirm action

Disabled connections:

  • Cannot be used in new workflow runs
  • Remain visible in configuration
  • Can be re-enabled anytime

Useful for:

  • Temporarily over quota with provider
  • Testing failover behavior
  • Rotating API keys

Deleting Connections

To permanently remove a connection:

  1. Click Delete
  2. Confirm deletion

Deleting a connection used by active workflows will cause those workflows to fail. Update workflows to use a different connection first.

Model Selection

Different models have different capabilities and costs. Choose appropriately:

By Task Type

TaskRecommended ModelReasoning
Structured extractionGPT-4o, Claude 3.5 SonnetJSON mode, high accuracy
Long document analysisClaude 3.5 Sonnet, Gemini 1.5 Pro200K+ context windows
Simple classificationGPT-3.5 Turbo, Claude 3 HaikuFast and cost-effective
Code generationGPT-4o, Claude 3.5 SonnetStrong coding capabilities
MultilingualQwen-2.5, Gemini 1.5 ProBroad language support
Vision tasksGPT-4o, Qwen-VL, Gemini 1.5Image understanding

By Cost

From most to least expensive (approximate):

  1. Premium tier — GPT-4o, Claude 3 Opus, Gemini 1.5 Pro
  2. Mid tier — GPT-4 Turbo, Claude 3.5 Sonnet
  3. Efficient tier — GPT-3.5 Turbo, Claude 3 Haiku, Gemini 1.5 Flash

Use premium models only when necessary for accuracy or capabilities.

By Context Length

ModelContext WindowBest For
Gemini 1.5 Pro2M tokensEntire codebases, book-length documents
Claude 3.5 Sonnet200K tokensLong contracts, research papers
GPT-4o128K tokensMulti-page documents
GPT-3.5 Turbo16K tokensShort documents, chat

Rate Limiting and Quotas

Prevent runaway costs and comply with provider limits:

Connection-Level Rate Limits

Set per-connection rate limits:

  • Requests per minute — Max API calls per minute (default: provider limit)
  • Tokens per minute — Max tokens consumed per minute
  • Daily budget — Max spend per day in USD

When limits are reached, additional requests are queued or rejected based on configuration.

Usage Monitoring

Monitor consumption in the Usage tab:

Daily Requests:

Chart showing requests per day over last 30 days

Cost Breakdown:

| Model | Requests | Input Tokens | Output Tokens | Cost | |-------|----------|--------------|---------------|------| | GPT-4o | 12,345 | 1.2M | 456K | $89.23 | | Claude 3.5 Sonnet | 8,901 | 890K | 234K | $67.12 |

Top Workflows by Cost:

| Workflow | Requests | Cost | |----------|----------|------| | contract-extraction | 3,456 | $52.31 | | fraud-detection | 2,103 | $31.87 |

Quota Alerts

Set up alerts for:

  • Daily spend threshold — Email when spending exceeds $X per day
  • Rate limit approaching — Warn at 80% of rate limit
  • Error rate spike — Alert when error rate > 5%

Configure alerts in Settings → Notifications.

Failover and Load Balancing

Configure multiple connections for high availability:

Failover Configuration

In workflow node configuration, specify primary and fallback providers:

{ "node_type": "llm", "config": { "primary_connection": "openai-production", "fallback_connections": ["azure-openai", "anthropic"], "failover_on": ["rate_limit", "timeout", "server_error"] } }

If primary connection fails, system automatically retries with fallback connections.

Load Balancing

Distribute requests across multiple connections:

{ "node_type": "llm", "config": { "connections": ["openai-key-1", "openai-key-2", "openai-key-3"], "load_balancing": "round_robin" } }

Strategies:

  • Round robin — Rotate through connections evenly
  • Least loaded — Use connection with most available quota
  • Random — Random selection

Useful for staying under per-key rate limits when making many requests.

Security Best Practices

API Key Management

  • Rotate keys quarterly — Update API keys every 3 months
  • Use separate keys per environment — Different keys for dev, staging, production
  • Limit key permissions — Use provider-level scoping when available
  • Never commit keys to Git — Use environment variables or secret management

Least Privilege

  • Workspace isolation — Each workspace has its own connections and keys
  • Role-based access — Only Admins can configure LLM connections
  • Audit logging — All connection changes are logged with user attribution

Compliance

For regulated industries:

  • Use regional endpoints — Azure/Bedrock for data residency requirements
  • Enable audit logs — Export connection usage for compliance reporting
  • Implement approval workflows — Require review before adding new providers
  • Monitor data exfiltration — Alert on unusual prompt patterns

See security standards  for complete guidance.

Advanced Configuration

Custom Endpoints

For self-hosted or proxy deployments:

  1. Edit connection
  2. Override Base URL with custom endpoint
  3. Optionally add Custom headers for authentication
  4. Test connection to verify compatibility

Example: OpenAI-compatible endpoint:

Base URL: https://my-llm-proxy.example.com/v1 Custom headers: Authorization: Bearer my-custom-token X-Deployment-Region: us-east-1

Prompt Caching

For providers supporting prompt caching (Anthropic, some OpenAI deployments):

  1. Enable Prompt caching in connection settings
  2. Configure Cache TTL (time to live)
  3. System automatically caches prompt prefixes to reduce costs

Useful for:

  • Large system prompts repeated across requests
  • Document context reused in multiple queries

Streaming

Enable streaming responses for real-time user feedback:

  1. Enable Streaming support in connection settings
  2. In workflow nodes, set stream: true
  3. Frontend displays partial responses as they arrive

Improves perceived latency for long-form generation.

Troubleshooting

Connection Test Failing

Cause: Invalid API key, network connectivity issue, or quota exceeded.

Solution:

  1. Verify API key is correct and has not expired
  2. Check provider dashboard for quota/billing status
  3. Test network connectivity to provider endpoint
  4. Review error message for specific issue

Rate Limit Errors

Cause: Exceeded provider’s rate limit.

Solution:

  1. Reduce connection-level rate limit to stay under provider limit
  2. Add failover connections to distribute load
  3. Implement request queuing in workflows
  4. Contact provider to increase quota

High Latency

Cause: Network distance to provider, large prompts, or model overload.

Solution:

  1. Use regional endpoints closer to your deployment
  2. Optimize prompts to reduce token count
  3. Switch to faster model (GPT-3.5 instead of GPT-4)
  4. Enable prompt caching for repeated content

Unexpected Costs

Cause: Inefficient prompts or workflow execution loops.

Solution:

  1. Review top workflows by cost in usage dashboard
  2. Optimize prompts to reduce output tokens
  3. Add budget alerts to catch spikes early
  4. Implement caching to reduce redundant calls

Model Not Available

Cause: Model not enabled in connection or provider account lacks access.

Solution:

  1. Check connection settings and enable desired model
  2. Verify provider account has access (some models require approval)
  3. Update to latest API version if model is newly released

Next Steps

Last updated on