Search

Query RAG indexes with semantic search, hybrid ranking, and advanced filtering for optimal document retrieval.

Search Methods

M3 Forge supports three retrieval strategies, each optimized for different use cases.

Semantic Search

Vector similarity search using cosine distance between query and document embeddings:


{
  "index_id": "customer-docs",
  "query": "How do I configure SSL certificates?",
  "search_type": "semantic",
  "top_k": 5
}

How it works:

Query embedded with same model as documents
Vector database computes cosine similarity to all chunks
Top-K most similar chunks returned ranked by score

Best for:

Natural language questions
Conceptual queries (“how to improve performance”)
Multilingual search (with multilingual embeddings)
Queries with synonyms or paraphrasing

Limitations:

May miss exact keyword matches
Struggles with rare terms, acronyms, product codes
Requires query and documents in similar semantic space

Keyword Search (BM25)

Traditional full-text search using TF-IDF with BM25 ranking:


{
  "index_id": "product-catalog",
  "query": "SKU-12345",
  "search_type": "keyword",
  "top_k": 10
}

Best for:

Exact matches (SKUs, error codes, IDs)
Rare or technical terms
Short queries (1-3 words)
Deterministic retrieval (same query always returns same results)

Limitations:

No semantic understanding
Sensitive to query phrasing
Weak on long-form questions

Hybrid Search

Combines semantic and keyword search with weighted score fusion:


{
  "index_id": "support-tickets",
  "query": "database connection error SQLSTATE[HY000]",
  "search_type": "hybrid",
  "top_k": 5,
  "hybrid_alpha": 0.7
}

hybrid_alpha controls the balance:

0.0 - Pure keyword search
0.5 - Equal weighting
1.0 - Pure semantic search
0.7 (default) - Favor semantic with keyword boost

Best for:

General-purpose retrieval (most use cases)
Mixed queries (natural language + specific terms)
Production systems requiring robustness

Hybrid search provides the best balance of precision and recall. Start with hybrid_alpha: 0.7 and tune based on evaluation metrics.

Query Parameters

Top-K Results

Control result count with top_k:


{
  "top_k": 5  // Return 5 most relevant chunks
}

Guidelines:

RAG Context: 3-5 chunks (fits most LLM context windows)
Search UI: 10-20 chunks (user browses results)
Reranking: 50-100 chunks (reranker selects best subset)

More results increase recall but reduce precision and add latency.

Similarity Threshold

Filter results by minimum similarity score:


{
  "threshold": 0.7  // Only return chunks with score >= 0.7
}

Scores range from 0.0 (no similarity) to 1.0 (identical):

0.9+ - Near-duplicate content
0.7-0.9 - Highly relevant
0.5-0.7 - Somewhat relevant
< 0.5 - Weak relevance (likely noise)

Thresholds prevent low-quality results from polluting LLM context.

Metadata Filtering

Restrict search to documents matching metadata criteria:


{
  "metadata_filter": {
    "product": "enterprise",
    "version": ["2.4.0", "2.5.0"],
    "category": "installation"
  }
}

Operators:

Equality: "key": "value" - Exact match
Array: "key": ["val1", "val2"] - Match any value
Range: "date": {"gte": "2024-01-01", "lt": "2024-12-31"} - Numeric/date ranges
Existence: "key": {"exists": true} - Field is present

Filters apply before similarity search, reducing search space and improving latency.

Advanced Retrieval

Reranking

Secondary model re-scores top results for improved relevance:


{
  "rerank": true,
  "rerank_model": "cross-encoder/ms-marco-MiniLM-L-12-v2",
  "rerank_top_k": 3
}

Process:

Initial retrieval returns top_k: 50 candidates
Cross-encoder model scores query-document pairs
Top rerank_top_k: 3 highest-scoring chunks returned

Reranking adds 50-200ms latency but can improve precision by 10-30%.

Without Reranking

Query: “SSL certificate installation”

Results:

“Configuring SSL/TLS in production” (score: 0.82)
“Certificate renewal procedures” (score: 0.79)
“Installing packages via apt-get” (score: 0.75)
“SSL certificate generation guide” (score: 0.74)

Result 3 is a false positive (mentions “install” but wrong context).

MMR (Maximal Marginal Relevance)

Diversify results to reduce redundancy:


{
  "mmr": true,
  "mmr_lambda": 0.5,
  "top_k": 10
}

MMR balances relevance and diversity:

mmr_lambda: 1.0 - Pure relevance (may return duplicates)
mmr_lambda: 0.5 - Balance relevance and diversity
mmr_lambda: 0.0 - Pure diversity (may return less relevant results)

Use cases:

Search UIs (avoid repetitive results)
RAG with long context (maximize information density)
Exploratory search (discover related topics)

Contextual Expansion

Include surrounding chunks for better context:


{
  "expand_context": true,
  "context_window": 1
}

For each matched chunk, retrieve:

context_window: 1 - Previous and next chunk
context_window: 2 - Two chunks before and after

Expanded context improves LLM understanding but increases token usage.

Search Response

Successful query returns:


{
  "results": [
    {
      "chunk_id": "doc123-chunk5",
      "text": "To configure SSL certificates, navigate to...",
      "score": 0.87,
      "metadata": {
        "document_id": "doc123",
        "filename": "ssl-guide.pdf",
        "page": 12,
        "product": "enterprise",
        "version": "2.5.0"
      },
      "highlights": ["SSL", "certificates", "configure"]
    }
  ],
  "total": 127,
  "latency_ms": 42
}

Fields:

chunk_id - Unique identifier for the text chunk
text - Chunk content (truncated if > 2000 chars)
score - Similarity score (0.0-1.0)
metadata - Custom fields attached during indexing
highlights - Query terms found in chunk (keyword search only)
total - Total matching chunks before top-k filtering
latency_ms - Query execution time

Relevance Tuning

Evaluation Metrics

Measure search quality with:

Metric	Definition	Target
Precision@K	Relevant results in top-K / K	> 0.8
Recall@K	Relevant results in top-K / Total relevant	> 0.6
MRR (Mean Reciprocal Rank)	1 / rank of first relevant result	> 0.7
NDCG (Normalized Discounted Cumulative Gain)	Ranking quality weighted by position	> 0.75

Use the Evaluations dashboard to track metrics over time and compare configurations.

Tuning Strategies

Poor Precision (too many irrelevant results):

Increase threshold to 0.75+
Switch to hybrid_alpha: 0.5 (more keyword weight)
Enable reranking
Reduce top_k to focus on highest-scoring chunks

Poor Recall (missing relevant results):

Decrease threshold to 0.5-0.6
Increase top_k to 20-50
Switch to search_type: semantic (pure vector search)
Add query expansion (synonyms, related terms)

Noisy Results (lots of near-duplicates):

Enable MMR with mmr_lambda: 0.5
Increase chunk size during indexing
Add metadata filters to narrow search scope

A/B Testing

Compare search configurations:

Create two index versions with different chunking/embedding
Route 50% of queries to each index
Measure precision, recall, latency
Promote winning configuration

The Evaluations section provides built-in A/B test tracking.

Integration Patterns

Workflow Node

Use RAG Retrieval Node in workflows:


{
  "type": "rag-retrieval",
  "config": {
    "index_id": "support-docs",
    "query": "$.data.user_question",
    "search_type": "hybrid",
    "top_k": 5,
    "threshold": 0.7,
    "metadata_filter": {
      "category": "$.data.product_category"
    }
  }
}

Output is available at $.nodes.<node_id>.output.chunks for downstream LLM nodes.

API Client

Query via REST API:


curl -X POST https://your-instance/api/rag/search \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "index_id": "customer-docs",
    "query": "database backup procedures",
    "search_type": "hybrid",
    "top_k": 5
  }'

tRPC Client (TypeScript)

Type-safe queries from M3 Forge frontend:


const { data } = trpc.rag.search.useQuery({
  indexId: 'product-docs',
  query: searchQuery,
  searchType: 'hybrid',
  topK: 10,
  metadataFilter: {
    product: selectedProduct,
  },
});

Performance Optimization

Query Latency

Typical latencies:

Semantic search: 20-50ms (vector index lookup)
Keyword search: 10-30ms (full-text index)
Hybrid search: 40-80ms (both indexes + fusion)
With reranking: +50-200ms (cross-encoder inference)

Optimization techniques:

Use smaller embedding models (768-dim vs 3072-dim)
Cache frequent queries (Redis/Memcached)
Partition large indexes by metadata
Use approximate nearest neighbors (ANN) for > 1M chunks

Cost Control

Reduce embedding API costs:

Cache query embeddings (same query = same embedding)
Use smaller top_k (fewer chunks to embed/rank)
Batch queries when possible
Choose cost-efficient models (Jina v4 vs OpenAI)

Best Practices

Query Formulation

Good queries:

“How do I configure SSL certificates?” (natural language, specific)
“database connection error SQLSTATE[HY000]” (mixed natural + technical)
“backup procedures for PostgreSQL” (clear intent, key terms)

Poor queries:

“help” (too vague, no context)
“How do I do the thing with the stuff?” (ambiguous pronouns)
Overly long queries (> 100 words) - truncate or summarize first

Result Presentation

When displaying search results to users:

Show snippet with highlighted query terms
Include source document name and page number
Link to full document for context
Display relevance score for transparency

Monitoring

Track in production:

Query latency (p50, p95, p99)
Result quality (CTR, user feedback)
Cache hit rate (for query caching)
Error rate (failed queries, timeouts)

Set up alerts for latency spikes or quality degradation.

Troubleshooting

No Results Returned

Possible causes:

threshold too high (relax to 0.5)
metadata_filter too restrictive (check filter logic)
Query embedding mismatch (ensure same model as index)
Index is empty (verify documents are uploaded)

Irrelevant Results

Solutions:

Increase threshold to 0.75+
Enable reranking
Switch to hybrid search
Add metadata filters
Review chunking strategy (chunks may be too large/small)

High Latency

Optimizations:

Reduce top_k to minimum needed
Disable reranking for non-critical paths
Partition index by metadata
Scale vector database horizontally

Cross-Language Search

For multilingual retrieval:

Use multilingual embedding model (Jina v4, Cohere multilingual-v3)
Index documents in all target languages
Queries automatically work across languages
Consider language-specific indexes for better precision

Next Steps

Configure RAG indexes optimized for your use case
Build RAG workflows with retrieval and generation nodes
Monitor search quality in the Evaluations dashboard

Search

Search Methods

Semantic Search

Keyword Search (BM25)

Hybrid Search

Query Parameters

Top-K Results

Similarity Threshold

Metadata Filtering

Advanced Retrieval

Reranking

Without Reranking

With Reranking

MMR (Maximal Marginal Relevance)

Contextual Expansion

Search Response

Relevance Tuning

Evaluation Metrics

Tuning Strategies

A/B Testing

Integration Patterns

Workflow Node

API Client

tRPC Client (TypeScript)

Performance Optimization

Query Latency

Cost Control

Best Practices

Query Formulation

Result Presentation

Monitoring

Troubleshooting

No Results Returned

Irrelevant Results

High Latency

Cross-Language Search

Next Steps