Data Privacy & Storage
M3 Forge processes documents, executes workflows, and interacts with LLM providers on your behalf. This page explains how your data is stored, transmitted, and protected, and how to configure retention policies to control data lifecycle.
How M3 Forge Stores Your Data
Interaction Storage
Every interaction with M3 Forge generates data that is stored for monitoring, debugging, quality assessment, and historical tracking:
| Data Type | Examples | Storage |
|---|---|---|
| LLM Observability | Prompt executions, model responses, token usage, latency metrics | PostgreSQL + S3 |
| Job Execution | Workflow runs, processor results, completion status | PostgreSQL |
| Conversations | Chat threads with AI assistants, HITL discussions | PostgreSQL |
| HITL Requests | Human review requests, approvals, corrections | PostgreSQL |
| Audit Trails | Configuration changes, user actions, RAG operations | PostgreSQL |
| Training Data | Training jobs, model metrics, dataset annotations | PostgreSQL + S3 |
Storage Architecture
- PostgreSQL — Structured data including policies, metadata, audit logs, and job state. Uses two schemas:
marie_studiofor application data andmarie_schedulerfor job execution. - S3-compatible object storage — Documents, raw LLM event payloads, model artifacts, and training data. Supports MinIO, AWS S3, or any S3-compatible provider.
Data Retention Policies
M3 Forge provides configurable per-category retention policies that automatically delete data older than a specified period. This helps manage storage costs, maintain query performance, and meet compliance requirements.
All retention policies are disabled by default. Data is retained indefinitely until an administrator configures a retention period.
Retention Categories
| Category | Data Covered | Tables Affected |
|---|---|---|
| Monitoring | LLM raw events, failed events, processor executions | LlmRawEvent, LlmFailedEvent, ProcessorExecution |
| Execution History | Completed job runs, job history, archives | Job, JobHistory, Archive |
| Conversations | Chat threads and all associated messages | ChatThread, ChatMessage |
| Human-in-the-Loop | HITL requests, responses, and notifications | HitlRequest, HitlResponse, HitlNotification |
| Audit Trail | Audit logs, configuration changes, RAG audit logs | AuditLog, ConfigurationAudit, RagAuditLog |
| Training | Training jobs and extractor training jobs | TrainingJob, ExtractorTrainingJob |
Available Retention Periods
- 30 days
- 60 days
- 90 days
- 180 days
- 365 days
- Indefinite (no automatic deletion)
Configuring Retention Policies
Navigate to Settings
Go to Settings → Data Retention from the main navigation.
Select a Category
Each data category has its own card with an enable/disable toggle.
Choose Retention Period
Select the desired retention period from the dropdown. Only data older than this period will be affected.
Optional: Monitoring Settings
For the Monitoring category, two additional options are available:
- Aggregate usage stats before deletion — Rolls up raw LLM events into daily usage statistics (grouped by model and provider) before deleting the raw data. This preserves trend data for dashboards while freeing storage.
- Clean up S3 storage objects — Deletes referenced S3 objects alongside database records.
Save
Click Save to apply the policy. Changes take effect at the next scheduled enforcement run (03:00 UTC daily).
How Enforcement Works
Retention policies are enforced automatically:
- Schedule — Daily at 03:00 UTC via an internal cron job
- Batch processing — Data is deleted in batches of 1,000 rows with short delays between batches to minimize database load
- Safety caps — A maximum of 500,000 rows per table per enforcement run prevents runaway deletions
- Manual trigger — Administrators can trigger enforcement runs from the Settings UI
- Dry-run mode — Preview what would be deleted without actually removing any data
Data deleted by retention policies cannot be recovered. Always use dry-run mode to preview deletions before enabling a new policy.
Execution History: Special Handling
Jobs in the Execution History category respect the keepUntil field. A job will not be deleted until both conditions are met:
- The job has reached a terminal state (completed, failed, cancelled, expired)
- The job’s
keepUntiltimestamp has passed AND the retention period has elapsed
Monitoring: Aggregation
When Aggregate usage stats before deletion is enabled:
- Raw LLM events are grouped by model, provider, and day
- Aggregated counts and token usage are upserted into the
LlmUsageStatstable - The raw events are then deleted
This preserves long-term usage trends for cost analysis and capacity planning while removing verbose per-request data.
Data Transmission
LLM Provider Communication
When M3 Forge sends data to external LLM providers:
- All data is transmitted via encrypted channels (TLS/HTTPS)
- Only the data required for the specific execution is sent — no additional context or metadata
- M3 Forge does not send your data to LLM providers for training or any purpose beyond generating the requested response
- Provider API keys are stored encrypted in the database and never included in logs
Internal Communication
- All internal service communication uses TLS when deployed with recommended configuration
- API requests are authenticated via Bearer tokens or HMAC signatures
- WebSocket connections for terminal access use the same authentication layer
Data Encryption
At Rest
- Database — PostgreSQL supports transparent data encryption (TDE). Enable at the infrastructure level based on your compliance requirements.
- Object storage — S3 objects are encrypted using server-side encryption (SSE-S3 or SSE-KMS depending on your storage provider configuration)
- Secrets — API keys, provider credentials, and MFA secrets are encrypted with AES-256 before storage
In Transit
- All external connections require TLS 1.2 or higher
- Internal service mesh communication is encrypted when running behind a reverse proxy or service mesh with mTLS
- WebSocket connections use WSS (WebSocket Secure)
Compliance Considerations
Self-Hosted Advantage
Because M3 Forge is self-hosted, your organization maintains full control over:
| Concern | Your Control |
|---|---|
| Data residency | Deploy in any region, any cloud, or on-premises |
| Network isolation | Run entirely within your VPC — no external callbacks |
| Encryption keys | Manage your own KMS keys and TLS certificates |
| Retention policies | Configure per-category retention to match your compliance framework |
| Access controls | RBAC with custom roles, MFA, session management |
| Audit trails | All audit data stays in your database — export to your SIEM |
Framework Alignment
| Framework | Relevant Controls |
|---|---|
| GDPR | Data retention policies (right to erasure, data minimization), self-hosted data residency, audit trails |
| HIPAA | Encryption at rest and in transit, access controls, audit logging, configurable PHI retention |
| SOC 2 | Data lifecycle management, role-based access, monitoring and alerting, change audit trail |
| PCI DSS | Encryption standards, access controls, retention limits, audit logging |
Next Steps
- Configure data retention in Settings → Data Retention
- Set up API Keys for secure programmatic access
- Configure Users & Roles for access control
- Monitor system activity in Monitoring