Summarizer
Generate summaries for short and long documents. Uses Generative AI to produce concise document summaries without requiring training data.
Overview
The Summarizer processor uses LLM-based Generative AI to produce concise summaries of documents. Unlike other custom processor types, the Summarizer works out of the box without training — it uses pre-configured prompt templates and language models to generate summaries immediately.
Use cases:
- Document digests for quick review
- Executive summaries of lengthy reports
- Content previews in document management systems
- Meeting notes condensation
- Legal document summarization
- Research paper abstracts
Creating a Summarizer
Navigate to Processors → Custom Processors and click Create processor on the Summarizer card. Enter a name (e.g., “Report Summarizer”) and click Create.
The Summarizer is ready to use immediately — no training data or labeling required.
How It Works
The Summarizer uses the LLM annotator pipeline in marie-ai:
- Document ingestion — OCR text extracted from document pages
- Context assembly — Page text aggregated with document metadata
- Prompt rendering — Jinja2 summary template populated with document content
- LLM generation — Language model generates summary from prompt
- Output formatting — Summary returned in configured format (markdown or plain text)
Configuration
The Summarizer is configured through the annotator pipeline:
| Setting | Description | Default |
|---|---|---|
| Model | Language model for generation | Configurable per deployment |
| Output Format | Summary format | Markdown |
| Processing Mode | Per-page or full document | Full document |
Supported Models
The Summarizer works with any configured LLM connection:
- DeepSeek R1 (32B)
- Qwen 2.5 VL (vision-language)
- OpenAI GPT-4
- Anthropic Claude
- Any model configured in LLM Connections
The Summarizer uses Generative AI and does not require training. It works immediately after creation using your configured LLM connections.
Integration with Workflows
Add the Summarizer as a workflow node to automatically generate summaries during document processing:
Document Upload → OCR → Summarizer → Store Summary
→ Notify StakeholdersCommon workflow patterns:
- Intake summary — Summarize uploaded documents for quick triage
- Post-extraction summary — Generate summary after field extraction for human review
- Batch processing — Summarize all documents in a submission
- Conditional summary — Only summarize documents exceeding a page threshold
Output
The Summarizer produces:
- Summary text — Concise document summary in markdown or plain text
- Confidence — Generation confidence score
- Token usage — Input and output token counts for cost tracking
Output is accessible in downstream workflow nodes via JSONPath.
Best Practices
- Choose the right model — Larger models produce better summaries for complex documents
- Set appropriate length — Configure max tokens based on your summary needs
- Use in pipelines — Combine with classifiers and extractors for comprehensive processing
- Monitor costs — Track token usage in LLM Observability
- Review quality — Spot-check summaries to ensure accuracy for your document types
Comparison with Other Processor Types
| Feature | Summarizer | Extractor | Classifier | Splitter | Layout |
|---|---|---|---|---|---|
| Requires training | No | Yes | Yes | Yes | Yes |
| Output type | Free-form text | Structured fields | Category label | Split points | Variation ID |
| Uses LLM | Yes (always) | Optional | No (LayoutLMv3) | No (LayoutLMv3) | No (LayoutLMv3) |
| Min examples | 0 | 10+ | 10+ per class | 10+ | 5+ per variation |
Next Steps
- Configure LLM Connections for model access
- Build summarization Workflows
- Monitor costs in LLM Observability
- Explore Custom Extractor for structured data extraction