Summarizer

Generate summaries for short and long documents. Uses Generative AI to produce concise document summaries without requiring training data.

Overview

The Summarizer processor uses LLM-based Generative AI to produce concise summaries of documents. Unlike other custom processor types, the Summarizer works out of the box without training — it uses pre-configured prompt templates and language models to generate summaries immediately.

Use cases:

Document digests for quick review
Executive summaries of lengthy reports
Content previews in document management systems
Meeting notes condensation
Legal document summarization
Research paper abstracts

Creating a Summarizer

Navigate to Processors → Custom Processors and click Create processor on the Summarizer card. Enter a name (e.g., “Report Summarizer”) and click Create.

The Summarizer is ready to use immediately — no training data or labeling required.

How It Works

The Summarizer uses the LLM annotator pipeline in marie-ai:

Document ingestion — OCR text extracted from document pages
Context assembly — Page text aggregated with document metadata
Prompt rendering — Jinja2 summary template populated with document content
LLM generation — Language model generates summary from prompt
Output formatting — Summary returned in configured format (markdown or plain text)

Configuration

The Summarizer is configured through the annotator pipeline:

Setting	Description	Default
Model	Language model for generation	Configurable per deployment
Output Format	Summary format	Markdown
Processing Mode	Per-page or full document	Full document

Supported Models

The Summarizer works with any configured LLM connection:

DeepSeek R1 (32B)
Qwen 2.5 VL (vision-language)
OpenAI GPT-4
Anthropic Claude
Any model configured in LLM Connections

The Summarizer uses Generative AI and does not require training. It works immediately after creation using your configured LLM connections.

Integration with Workflows

Add the Summarizer as a workflow node to automatically generate summaries during document processing:


Document Upload → OCR → Summarizer → Store Summary
                                    → Notify Stakeholders

Common workflow patterns:

Intake summary — Summarize uploaded documents for quick triage
Post-extraction summary — Generate summary after field extraction for human review
Batch processing — Summarize all documents in a submission
Conditional summary — Only summarize documents exceeding a page threshold

Output

The Summarizer produces:

Summary text — Concise document summary in markdown or plain text
Confidence — Generation confidence score
Token usage — Input and output token counts for cost tracking

Output is accessible in downstream workflow nodes via JSONPath.

Best Practices

Choose the right model — Larger models produce better summaries for complex documents
Set appropriate length — Configure max tokens based on your summary needs
Use in pipelines — Combine with classifiers and extractors for comprehensive processing
Monitor costs — Track token usage in LLM Observability
Review quality — Spot-check summaries to ensure accuracy for your document types

Comparison with Other Processor Types

Feature	Summarizer	Extractor	Classifier	Splitter	Layout
Requires training	No	Yes	Yes	Yes	Yes
Output type	Free-form text	Structured fields	Category label	Split points	Variation ID
Uses LLM	Yes (always)	Optional	No (LayoutLMv3)	No (LayoutLMv3)	No (LayoutLMv3)
Min examples	0	10+	10+ per class	10+	5+ per variation

Next Steps

Configure LLM Connections for model access
Build summarization Workflows
Monitor costs in LLM Observability
Explore Custom Extractor for structured data extraction