RAG Indexes

Create, configure, and manage document indexes for semantic search and retrieval augmented generation.

Creating an Index

RAG indexes organize documents into searchable collections with configurable chunking and embedding strategies.

Select a Tenant

All indexes are scoped to a tenant for data isolation. Use the tenant picker in the left sidebar to choose the target tenant.

If the tenant dropdown is empty, create a tenant first in the Administration section.

Click “New Index”

The index editor panel opens on the right side. Configure:

Name - Unique identifier for the index (e.g., “customer-support-docs”)
Description - Optional human-readable summary
Embedding Model - Choose from OpenAI, Jina AI, Cohere, or custom models

Configure Chunking

Text splitting determines how documents are divided into searchable units:

Strategy	Chunk Size	Overlap	Best For
Fixed Size	512 tokens	50 tokens	Uniform documents, simple pipelines
Sentence Boundary	256-512 tokens	0 tokens	Maintaining semantic coherence
Paragraph	Variable	0 tokens	Structured content (reports, articles)
Recursive	1024 tokens	200 tokens	Long-form documents, books

Overlap allows adjacent chunks to share context, improving retrieval when queries span chunk boundaries.

Save and Index

Click “Create Index” to persist the configuration. The index is now ready to accept documents.

RAG Index detail view showing document list, chunking configuration, and embedding model settings

Changing embedding models or chunking strategies after documents are uploaded requires re-indexing all content. Plan your configuration before uploading large corpora.

Uploading Documents

Supported Formats

M3 Forge parses the following file types:

PDF - Text extraction with layout preservation
DOCX - Microsoft Word documents
TXT - Plain text
Markdown - .md files with formatting
HTML - Web pages and exported content
CSV - Structured data (each row becomes a chunk)

Upload Methods

Drag & Drop

Open the index in the sidebar
Navigate to the “Files” tab
Drag files from your desktop into the upload zone
Upload begins automatically with progress tracking

API


curl -X POST https://your-instance/api/rag/indexes/{index_id}/upload \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "file=@document.pdf"

Returns job ID for async processing. Poll /api/rag/jobs/{job_id} for status.

Batch Indexing

For large document sets:

Select multiple files (Shift+Click or Ctrl+Click)
Upload in a single batch operation
Backend processes files in parallel
Status updates appear in the file list

Processing time scales with document count and complexity. A 100-page PDF typically indexes in 10-30 seconds depending on embedding model latency.

Managing Files

File List View

Each indexed file shows:

Filename - Original document name
Chunks - Number of text chunks extracted
Status - indexed, processing, failed
Uploaded - Timestamp of upload
Actions - Re-index, delete

Re-Indexing

Re-index individual files when:

Chunking configuration changes
Embedding model is updated
Document content is modified

Select the file and click “Re-index” to trigger fresh processing. Original chunks are replaced atomically.

Deleting Files

Remove files from the index:

Select the file in the file list
Click the trash icon
Confirm deletion

Chunks are removed from the vector index immediately. This operation cannot be undone.

Deleting files impacts search results for queries that previously matched those chunks. Ensure dependencies (workflows, applications) are updated before deletion.

Index Configuration

Embedding Models

Switch embedding models in the index editor:


{
  "model": "openai/text-embedding-3-large",
  "dimensions": 3072,
  "normalize": true
}

Model Comparison:

Model	Latency (p95)	Cost (per 1M tokens)	Languages
OpenAI 3-large	200ms	$0.13	English-optimized
Jina v4	150ms	$0.02	100+ languages
Cohere multilingual-v3	180ms	$0.10	100+ languages

For custom models, provide HuggingFace model ID or deployment endpoint.

Chunking Parameters

Fine-tune splitting behavior:

chunk_size - Target size in tokens (128-2048)
chunk_overlap - Overlap between adjacent chunks (0-200 tokens)
separator - Delimiter for splitting (newline, paragraph, sentence)
keep_separator - Include delimiter in chunks (true/false)

Example configuration:


{
  "strategy": "recursive",
  "chunk_size": 1024,
  "chunk_overlap": 100,
  "separators": ["\n\n", "\n", ". ", " "],
  "keep_separator": false
}

Recursive splitting tries each separator in order, falling back to character-level if needed.

Metadata Schema

Attach custom metadata to documents for filtered retrieval:


{
  "metadata_schema": {
    "product": "string",
    "version": "string",
    "category": "string",
    "created_date": "date"
  }
}

Metadata is specified during file upload:


{
  "file": "product-manual.pdf",
  "metadata": {
    "product": "enterprise",
    "version": "2.4.0",
    "category": "installation"
  }
}

Search queries can filter by metadata:


{
  "query": "database configuration",
  "metadata_filter": {
    "product": "enterprise",
    "version": ["2.4.0", "2.5.0"]
  }
}

Multi-Tenant Access

Tenant Hierarchy

Indexes inherit tenant permissions:

Owner Tenant - Full read/write access
Parent Tenants - Read-only access (if enabled)
Child Tenants - No access by default

Enable “Include sub-tenants” to query across child tenant indexes.

Filtering by Tenant

The tenant picker in the sidebar controls which indexes are visible:

All Tenants - Show all accessible indexes (admin only)
Specific Tenant - Show only indexes owned by that tenant
Include Sub-Tenants - Also show child tenant indexes (checkbox)

Search results respect tenant boundaries. Cross-tenant queries are not supported for security isolation.

Index Lifecycle

States

State	Description	Actions Available
Draft	Configuration only, no documents	Edit, Delete
Active	Documents indexed, ready for search	Search, Upload, Edit Config
Indexing	Background processing in progress	View Progress
Error	Processing failed	View Logs, Retry

Monitoring

Index detail view shows:

Document Count - Total files indexed
Chunk Count - Total searchable text chunks
Storage Size - Vector and metadata storage
Last Updated - Most recent file upload timestamp
Error Count - Failed indexing operations (if any)

Maintenance

Regular maintenance tasks:

Reindex All - Rebuild entire index (after config changes)
Optimize - Compact vector storage and rebuild search structures
Export Metadata - Download document list with chunk counts

Large indexes (10,000+ documents) benefit from periodic optimization to reduce query latency and storage overhead.

Best Practices

Index Design

Single Domain per Index - Don’t mix unrelated document types
Consistent Chunking - Use same strategy across all documents in an index
Version Indexes - Create new indexes for major content updates (e.g., docs-v1, docs-v2)
Metadata First - Define metadata schema before uploading documents

Performance Optimization

Batch Uploads - Process multiple files in one operation
Async Indexing - Upload during off-peak hours for large batches
Index Warming - Run test queries after creation to populate caches
Monitor Latency - Track p95 search latency in monitoring dashboard

Cost Management

Right-Size Chunks - Larger chunks reduce embedding API calls
Cache Embeddings - Reuse embeddings for duplicate content
Prune Old Versions - Delete outdated documents to reduce storage costs
Choose Efficient Models - Jina v4 offers 85% cost savings vs OpenAI

Next Steps

Learn how to search indexes for optimal retrieval
Integrate indexes into workflows for RAG pipelines
Monitor index health in the monitoring dashboard