LLM Connections

Configure multiple LLM providers and models for diverse AI capabilities, cost optimization, and failover redundancy.

Overview

M3 Forge supports connections to multiple large language model providers, enabling you to:

Use the best model for each task — GPT-4o for structured extraction, Claude for long documents
Optimize costs — Route simple tasks to cheaper models, complex tasks to premium models
Ensure availability — Failover to secondary provider if primary has an outage
Meet compliance requirements — Use regional endpoints or private deployments
Experiment with new models — Test cutting-edge capabilities without migration

Each workspace configures its own LLM connections with separate API keys and settings.

LLM Connections page showing configured providers (OpenAI, Anthropic, etc.) with status indicators and model lists

Supported Providers

M3 Forge integrates with major LLM providers:

Provider	Models	Key Features	Use Cases
OpenAI	GPT-4o, GPT-4 Turbo, GPT-4, GPT-3.5 Turbo	Function calling, vision, JSON mode, embeddings	General purpose, structured extraction, code generation
Anthropic	Claude 3.5 Sonnet, Claude 3 Opus, Claude 3 Haiku	200K context, extended thinking, refusal resistance	Long documents, complex reasoning, safety-critical applications
Azure OpenAI	GPT-4o, GPT-4, GPT-3.5	Enterprise SLA, regional deployment, compliance	Enterprise deployments, HIPAA/SOC2 compliance
Google Vertex AI	Gemini 1.5 Pro, Gemini 1.5 Flash	Multimodal, 2M context, grounding	Document understanding, video analysis, large context needs
AWS Bedrock	Claude, Llama, Titan	AWS-native, data residency, IAM integration	AWS-centric architectures, regulated industries
Qwen	Qwen-2.5, Qwen-VL	Multilingual (120+ languages), vision	Non-English content, international deployments

Adding an LLM Connection

Navigate to LLM Connections

From workspace settings, go to Settings → LLM Connections.

Click Add Connection

Select New LLM Connection and choose the provider type.

Configure Connection

Provide provider-specific details:

OpenAI

OpenAI Configuration:

API Key — From OpenAI platform (starts with sk-)
Organization ID — Optional, for organization billing tracking
Base URL — Leave default unless using a proxy
Models — Select available models to enable (GPT-4o, GPT-4 Turbo, etc.)

Optional:

Rate limit — Max requests per minute (default: 10,000)
Timeout — Request timeout in seconds (default: 60)

Anthropic

Anthropic Configuration:

API Key — From Anthropic console (starts with sk-ant-)
Base URL — Leave default unless using a proxy
Models — Select available models (Claude 3.5 Sonnet, Claude 3 Opus, etc.)

Optional:

Rate limit — Max requests per minute (default: 4,000)
Timeout — Request timeout in seconds (default: 120 for extended thinking)

Azure OpenAI

Azure OpenAI Configuration:

API Key — From Azure portal
Endpoint — Azure OpenAI resource endpoint (e.g., https://your-resource.openai.azure.com/)
Deployment names — Map deployment names to models
API Version — Azure API version (default: 2024-02-15-preview)

Optional:

Active Directory authentication — Use Azure AD instead of API key

Vertex AI

Google Vertex AI Configuration:

Project ID — Google Cloud project ID
Region — Vertex AI region (e.g., us-central1)
Credentials — Service account JSON key file
Models — Select Gemini models to enable

Optional:

Endpoint override — For private endpoints

Bedrock

AWS Bedrock Configuration:

Access Key ID — AWS IAM access key
Secret Access Key — AWS IAM secret
Region — AWS region (e.g., us-east-1)
Models — Select Bedrock models to enable (Claude, Llama, Titan)

Optional:

Assume role ARN — For cross-account access

Test Connection

Click Test Connection to verify:

API key is valid
Network connectivity is working
Models are accessible

The test makes a simple API call and displays the response.

Save

Connection is saved and immediately available for use in workflows and prompts.

API keys are encrypted at rest using AES-256. They are only decrypted when making API calls to providers.

Managing Connections

Viewing Connections

The LLM Connections page displays all configured providers:

Provider	Models	Status	Usage (30d)	Actions
OpenAI	GPT-4o, GPT-3.5 Turbo	✓ Active	145,234 requests	Edit, Test, Disable
Anthropic	Claude 3.5 Sonnet	✓ Active	23,891 requests	Edit, Test, Disable
Azure OpenAI	GPT-4 Turbo	⚠ Warning	2,103 requests	Edit, Test, Enable

Status indicators:

✓ Active — Connection working normally
⚠ Warning — Recent errors or approaching rate limits
✗ Error — Connection failing (invalid key, network issue, quota exceeded)

Editing Connections

To update a connection:

Click Edit next to the connection
Modify settings (API key, enabled models, rate limits)
Click Test Connection to verify changes
Save

Changes take effect immediately for new workflow runs.

Disabling Connections

To temporarily disable a connection without deleting it:

Click Disable
Confirm action

Disabled connections:

Cannot be used in new workflow runs
Remain visible in configuration
Can be re-enabled anytime

Useful for:

Temporarily over quota with provider
Testing failover behavior
Rotating API keys

Deleting Connections

To permanently remove a connection:

Click Delete
Confirm deletion

Deleting a connection used by active workflows will cause those workflows to fail. Update workflows to use a different connection first.

Model Selection

Different models have different capabilities and costs. Choose appropriately:

By Task Type

Task	Recommended Model	Reasoning
Structured extraction	GPT-4o, Claude 3.5 Sonnet	JSON mode, high accuracy
Long document analysis	Claude 3.5 Sonnet, Gemini 1.5 Pro	200K+ context windows
Simple classification	GPT-3.5 Turbo, Claude 3 Haiku	Fast and cost-effective
Code generation	GPT-4o, Claude 3.5 Sonnet	Strong coding capabilities
Multilingual	Qwen-2.5, Gemini 1.5 Pro	Broad language support
Vision tasks	GPT-4o, Qwen-VL, Gemini 1.5	Image understanding

By Cost

From most to least expensive (approximate):

Premium tier — GPT-4o, Claude 3 Opus, Gemini 1.5 Pro
Mid tier — GPT-4 Turbo, Claude 3.5 Sonnet
Efficient tier — GPT-3.5 Turbo, Claude 3 Haiku, Gemini 1.5 Flash

Use premium models only when necessary for accuracy or capabilities.

By Context Length

Model	Context Window	Best For
Gemini 1.5 Pro	2M tokens	Entire codebases, book-length documents
Claude 3.5 Sonnet	200K tokens	Long contracts, research papers
GPT-4o	128K tokens	Multi-page documents
GPT-3.5 Turbo	16K tokens	Short documents, chat

Rate Limiting and Quotas

Prevent runaway costs and comply with provider limits:

Connection-Level Rate Limits

Set per-connection rate limits:

Requests per minute — Max API calls per minute (default: provider limit)
Tokens per minute — Max tokens consumed per minute
Daily budget — Max spend per day in USD

When limits are reached, additional requests are queued or rejected based on configuration.

Usage Monitoring

Monitor consumption in the Usage tab:

Daily Requests:


Chart showing requests per day over last 30 days

Cost Breakdown:


| Model | Requests | Input Tokens | Output Tokens | Cost |
|-------|----------|--------------|---------------|------|
| GPT-4o | 12,345 | 1.2M | 456K | $89.23 |
| Claude 3.5 Sonnet | 8,901 | 890K | 234K | $67.12 |

Top Workflows by Cost:


| Workflow | Requests | Cost |
|----------|----------|------|
| contract-extraction | 3,456 | $52.31 |
| fraud-detection | 2,103 | $31.87 |

Quota Alerts

Set up alerts for:

Daily spend threshold — Email when spending exceeds $X per day
Rate limit approaching — Warn at 80% of rate limit
Error rate spike — Alert when error rate > 5%

Configure alerts in Settings → Notifications.

Failover and Load Balancing

Configure multiple connections for high availability:

Failover Configuration

In workflow node configuration, specify primary and fallback providers:


{
  "node_type": "llm",
  "config": {
    "primary_connection": "openai-production",
    "fallback_connections": ["azure-openai", "anthropic"],
    "failover_on": ["rate_limit", "timeout", "server_error"]
  }
}

If primary connection fails, system automatically retries with fallback connections.

Load Balancing

Distribute requests across multiple connections:


{
  "node_type": "llm",
  "config": {
    "connections": ["openai-key-1", "openai-key-2", "openai-key-3"],
    "load_balancing": "round_robin"
  }
}

Strategies:

Round robin — Rotate through connections evenly
Least loaded — Use connection with most available quota
Random — Random selection

Useful for staying under per-key rate limits when making many requests.

Security Best Practices

API Key Management

Rotate keys quarterly — Update API keys every 3 months
Use separate keys per environment — Different keys for dev, staging, production
Limit key permissions — Use provider-level scoping when available
Never commit keys to Git — Use environment variables or secret management

Least Privilege

Workspace isolation — Each workspace has its own connections and keys
Role-based access — Only Admins can configure LLM connections
Audit logging — All connection changes are logged with user attribution

Compliance

For regulated industries:

Use regional endpoints — Azure/Bedrock for data residency requirements
Enable audit logs — Export connection usage for compliance reporting
Implement approval workflows — Require review before adding new providers
Monitor data exfiltration — Alert on unusual prompt patterns

See security standards for complete guidance.

Advanced Configuration

Custom Endpoints

For self-hosted or proxy deployments:

Edit connection
Override Base URL with custom endpoint
Optionally add Custom headers for authentication
Test connection to verify compatibility

Example: OpenAI-compatible endpoint:


Base URL: https://my-llm-proxy.example.com/v1
Custom headers:
  Authorization: Bearer my-custom-token
  X-Deployment-Region: us-east-1

Prompt Caching

For providers supporting prompt caching (Anthropic, some OpenAI deployments):

Enable Prompt caching in connection settings
Configure Cache TTL (time to live)
System automatically caches prompt prefixes to reduce costs

Useful for:

Large system prompts repeated across requests
Document context reused in multiple queries

Streaming

Enable streaming responses for real-time user feedback:

Enable Streaming support in connection settings
In workflow nodes, set stream: true
Frontend displays partial responses as they arrive

Improves perceived latency for long-form generation.

Troubleshooting

Connection Test Failing

Cause: Invalid API key, network connectivity issue, or quota exceeded.

Solution:

Verify API key is correct and has not expired
Check provider dashboard for quota/billing status
Test network connectivity to provider endpoint
Review error message for specific issue

Rate Limit Errors

Cause: Exceeded provider’s rate limit.

Solution:

Reduce connection-level rate limit to stay under provider limit
Add failover connections to distribute load
Implement request queuing in workflows
Contact provider to increase quota

High Latency

Cause: Network distance to provider, large prompts, or model overload.

Solution:

Use regional endpoints closer to your deployment
Optimize prompts to reduce token count
Switch to faster model (GPT-3.5 instead of GPT-4)
Enable prompt caching for repeated content

Unexpected Costs

Cause: Inefficient prompts or workflow execution loops.

Solution:

Review top workflows by cost in usage dashboard
Optimize prompts to reduce output tokens
Add budget alerts to catch spikes early
Implement caching to reduce redundant calls

Model Not Available

Cause: Model not enabled in connection or provider account lacks access.

Solution:

Check connection settings and enable desired model
Verify provider account has access (some models require approval)
Update to latest API version if model is newly released

Next Steps

Configure workspaces for provider isolation
Set up users and roles to control who configures connections
Create workflows using your configured LLM connections
Monitor usage and costs in the monitoring dashboard

LLM Connections

Overview

Supported Providers

Adding an LLM Connection

Navigate to LLM Connections

Click Add Connection

Configure Connection

OpenAI

Anthropic

Azure OpenAI

Vertex AI

Bedrock

Qwen

Test Connection

Save

Managing Connections

Viewing Connections

Editing Connections

Disabling Connections

Deleting Connections

Model Selection

By Task Type

By Cost

By Context Length

Rate Limiting and Quotas

Connection-Level Rate Limits

Usage Monitoring

Quota Alerts

Failover and Load Balancing

Failover Configuration

Load Balancing

Security Best Practices

API Key Management

Least Privilege

Compliance

Advanced Configuration

Custom Endpoints

Prompt Caching

Streaming

Troubleshooting

Connection Test Failing

Rate Limit Errors

High Latency

Unexpected Costs

Model Not Available

Next Steps