Search
Query RAG indexes with semantic search, hybrid ranking, and advanced filtering for optimal document retrieval.
Search Methods
M3 Forge supports three retrieval strategies, each optimized for different use cases.
Semantic Search
Vector similarity search using cosine distance between query and document embeddings:
{
"index_id": "customer-docs",
"query": "How do I configure SSL certificates?",
"search_type": "semantic",
"top_k": 5
}How it works:
- Query embedded with same model as documents
- Vector database computes cosine similarity to all chunks
- Top-K most similar chunks returned ranked by score
Best for:
- Natural language questions
- Conceptual queries (“how to improve performance”)
- Multilingual search (with multilingual embeddings)
- Queries with synonyms or paraphrasing
Limitations:
- May miss exact keyword matches
- Struggles with rare terms, acronyms, product codes
- Requires query and documents in similar semantic space
Keyword Search (BM25)
Traditional full-text search using TF-IDF with BM25 ranking:
{
"index_id": "product-catalog",
"query": "SKU-12345",
"search_type": "keyword",
"top_k": 10
}Best for:
- Exact matches (SKUs, error codes, IDs)
- Rare or technical terms
- Short queries (1-3 words)
- Deterministic retrieval (same query always returns same results)
Limitations:
- No semantic understanding
- Sensitive to query phrasing
- Weak on long-form questions
Hybrid Search
Combines semantic and keyword search with weighted score fusion:
{
"index_id": "support-tickets",
"query": "database connection error SQLSTATE[HY000]",
"search_type": "hybrid",
"top_k": 5,
"hybrid_alpha": 0.7
}hybrid_alpha controls the balance:
- 0.0 - Pure keyword search
- 0.5 - Equal weighting
- 1.0 - Pure semantic search
- 0.7 (default) - Favor semantic with keyword boost
Best for:
- General-purpose retrieval (most use cases)
- Mixed queries (natural language + specific terms)
- Production systems requiring robustness
Hybrid search provides the best balance of precision and recall. Start with hybrid_alpha: 0.7 and tune based on evaluation metrics.
Query Parameters
Top-K Results
Control result count with top_k:
{
"top_k": 5 // Return 5 most relevant chunks
}Guidelines:
- RAG Context: 3-5 chunks (fits most LLM context windows)
- Search UI: 10-20 chunks (user browses results)
- Reranking: 50-100 chunks (reranker selects best subset)
More results increase recall but reduce precision and add latency.
Similarity Threshold
Filter results by minimum similarity score:
{
"threshold": 0.7 // Only return chunks with score >= 0.7
}Scores range from 0.0 (no similarity) to 1.0 (identical):
- 0.9+ - Near-duplicate content
- 0.7-0.9 - Highly relevant
- 0.5-0.7 - Somewhat relevant
- < 0.5 - Weak relevance (likely noise)
Thresholds prevent low-quality results from polluting LLM context.
Metadata Filtering
Restrict search to documents matching metadata criteria:
{
"metadata_filter": {
"product": "enterprise",
"version": ["2.4.0", "2.5.0"],
"category": "installation"
}
}Operators:
- Equality:
"key": "value"- Exact match - Array:
"key": ["val1", "val2"]- Match any value - Range:
"date": {"gte": "2024-01-01", "lt": "2024-12-31"}- Numeric/date ranges - Existence:
"key": {"exists": true}- Field is present
Filters apply before similarity search, reducing search space and improving latency.
Advanced Retrieval
Reranking
Secondary model re-scores top results for improved relevance:
{
"rerank": true,
"rerank_model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
"rerank_top_k": 3
}Process:
- Initial retrieval returns
top_k: 50candidates - Cross-encoder model scores query-document pairs
- Top
rerank_top_k: 3highest-scoring chunks returned
Reranking adds 50-200ms latency but can improve precision by 10-30%.
Without Reranking
Query: “SSL certificate installation”
Results:
- “Configuring SSL/TLS in production” (score: 0.82)
- “Certificate renewal procedures” (score: 0.79)
- “Installing packages via apt-get” (score: 0.75)
- “SSL certificate generation guide” (score: 0.74)
Result 3 is a false positive (mentions “install” but wrong context).
MMR (Maximal Marginal Relevance)
Diversify results to reduce redundancy:
{
"mmr": true,
"mmr_lambda": 0.5,
"top_k": 10
}MMR balances relevance and diversity:
- mmr_lambda: 1.0 - Pure relevance (may return duplicates)
- mmr_lambda: 0.5 - Balance relevance and diversity
- mmr_lambda: 0.0 - Pure diversity (may return less relevant results)
Use cases:
- Search UIs (avoid repetitive results)
- RAG with long context (maximize information density)
- Exploratory search (discover related topics)
Contextual Expansion
Include surrounding chunks for better context:
{
"expand_context": true,
"context_window": 1
}For each matched chunk, retrieve:
context_window: 1- Previous and next chunkcontext_window: 2- Two chunks before and after
Expanded context improves LLM understanding but increases token usage.
Search Response
Successful query returns:
{
"results": [
{
"chunk_id": "doc123-chunk5",
"text": "To configure SSL certificates, navigate to...",
"score": 0.87,
"metadata": {
"document_id": "doc123",
"filename": "ssl-guide.pdf",
"page": 12,
"product": "enterprise",
"version": "2.5.0"
},
"highlights": ["SSL", "certificates", "configure"]
}
],
"total": 127,
"latency_ms": 42
}Fields:
- chunk_id - Unique identifier for the text chunk
- text - Chunk content (truncated if > 2000 chars)
- score - Similarity score (0.0-1.0)
- metadata - Custom fields attached during indexing
- highlights - Query terms found in chunk (keyword search only)
- total - Total matching chunks before top-k filtering
- latency_ms - Query execution time
Relevance Tuning
Evaluation Metrics
Measure search quality with:
| Metric | Definition | Target |
|---|---|---|
| Precision@K | Relevant results in top-K / K | > 0.8 |
| Recall@K | Relevant results in top-K / Total relevant | > 0.6 |
| MRR (Mean Reciprocal Rank) | 1 / rank of first relevant result | > 0.7 |
| NDCG (Normalized Discounted Cumulative Gain) | Ranking quality weighted by position | > 0.75 |
Use the Evaluations dashboard to track metrics over time and compare configurations.
Tuning Strategies
Poor Precision (too many irrelevant results):
- Increase
thresholdto 0.75+ - Switch to
hybrid_alpha: 0.5(more keyword weight) - Enable reranking
- Reduce
top_kto focus on highest-scoring chunks
Poor Recall (missing relevant results):
- Decrease
thresholdto 0.5-0.6 - Increase
top_kto 20-50 - Switch to
search_type: semantic(pure vector search) - Add query expansion (synonyms, related terms)
Noisy Results (lots of near-duplicates):
- Enable MMR with
mmr_lambda: 0.5 - Increase chunk size during indexing
- Add metadata filters to narrow search scope
A/B Testing
Compare search configurations:
- Create two index versions with different chunking/embedding
- Route 50% of queries to each index
- Measure precision, recall, latency
- Promote winning configuration
The Evaluations section provides built-in A/B test tracking.
Integration Patterns
Workflow Node
Use RAG Retrieval Node in workflows:
{
"type": "rag-retrieval",
"config": {
"index_id": "support-docs",
"query": "$.data.user_question",
"search_type": "hybrid",
"top_k": 5,
"threshold": 0.7,
"metadata_filter": {
"category": "$.data.product_category"
}
}
}Output is available at $.nodes.<node_id>.output.chunks for downstream LLM nodes.
API Client
Query via REST API:
curl -X POST https://your-instance/api/rag/search \
-H "Authorization: Bearer YOUR_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"index_id": "customer-docs",
"query": "database backup procedures",
"search_type": "hybrid",
"top_k": 5
}'tRPC Client (TypeScript)
Type-safe queries from M3 Forge frontend:
const { data } = trpc.rag.search.useQuery({
indexId: 'product-docs',
query: searchQuery,
searchType: 'hybrid',
topK: 10,
metadataFilter: {
product: selectedProduct,
},
});Performance Optimization
Query Latency
Typical latencies:
- Semantic search: 20-50ms (vector index lookup)
- Keyword search: 10-30ms (full-text index)
- Hybrid search: 40-80ms (both indexes + fusion)
- With reranking: +50-200ms (cross-encoder inference)
Optimization techniques:
- Use smaller embedding models (768-dim vs 3072-dim)
- Cache frequent queries (Redis/Memcached)
- Partition large indexes by metadata
- Use approximate nearest neighbors (ANN) for > 1M chunks
Cost Control
Reduce embedding API costs:
- Cache query embeddings (same query = same embedding)
- Use smaller
top_k(fewer chunks to embed/rank) - Batch queries when possible
- Choose cost-efficient models (Jina v4 vs OpenAI)
Best Practices
Query Formulation
Good queries:
- “How do I configure SSL certificates?” (natural language, specific)
- “database connection error SQLSTATE[HY000]” (mixed natural + technical)
- “backup procedures for PostgreSQL” (clear intent, key terms)
Poor queries:
- “help” (too vague, no context)
- “How do I do the thing with the stuff?” (ambiguous pronouns)
- Overly long queries (> 100 words) - truncate or summarize first
Result Presentation
When displaying search results to users:
- Show snippet with highlighted query terms
- Include source document name and page number
- Link to full document for context
- Display relevance score for transparency
Monitoring
Track in production:
- Query latency (p50, p95, p99)
- Result quality (CTR, user feedback)
- Cache hit rate (for query caching)
- Error rate (failed queries, timeouts)
Set up alerts for latency spikes or quality degradation.
Troubleshooting
No Results Returned
Possible causes:
thresholdtoo high (relax to 0.5)metadata_filtertoo restrictive (check filter logic)- Query embedding mismatch (ensure same model as index)
- Index is empty (verify documents are uploaded)
Irrelevant Results
Solutions:
- Increase
thresholdto 0.75+ - Enable reranking
- Switch to hybrid search
- Add metadata filters
- Review chunking strategy (chunks may be too large/small)
High Latency
Optimizations:
- Reduce
top_kto minimum needed - Disable reranking for non-critical paths
- Partition index by metadata
- Scale vector database horizontally
Cross-Language Search
For multilingual retrieval:
- Use multilingual embedding model (Jina v4, Cohere multilingual-v3)
- Index documents in all target languages
- Queries automatically work across languages
- Consider language-specific indexes for better precision
Next Steps
- Configure RAG indexes optimized for your use case
- Build RAG workflows with retrieval and generation nodes
- Monitor search quality in the Evaluations dashboard