Skip to Content
ProcessorsSummarizer

Summarizer

Generate summaries for short and long documents. Uses Generative AI to produce concise document summaries without requiring training data.

Overview

The Summarizer processor uses LLM-based Generative AI to produce concise summaries of documents. Unlike other custom processor types, the Summarizer works out of the box without training — it uses pre-configured prompt templates and language models to generate summaries immediately.

Use cases:

  • Document digests for quick review
  • Executive summaries of lengthy reports
  • Content previews in document management systems
  • Meeting notes condensation
  • Legal document summarization
  • Research paper abstracts

Creating a Summarizer

Navigate to ProcessorsCustom Processors and click Create processor on the Summarizer card. Enter a name (e.g., “Report Summarizer”) and click Create.

The Summarizer is ready to use immediately — no training data or labeling required.

How It Works

The Summarizer uses the LLM annotator pipeline in marie-ai:

  1. Document ingestion — OCR text extracted from document pages
  2. Context assembly — Page text aggregated with document metadata
  3. Prompt rendering — Jinja2 summary template populated with document content
  4. LLM generation — Language model generates summary from prompt
  5. Output formatting — Summary returned in configured format (markdown or plain text)

Configuration

The Summarizer is configured through the annotator pipeline:

SettingDescriptionDefault
ModelLanguage model for generationConfigurable per deployment
Output FormatSummary formatMarkdown
Processing ModePer-page or full documentFull document

Supported Models

The Summarizer works with any configured LLM connection:

  • DeepSeek R1 (32B)
  • Qwen 2.5 VL (vision-language)
  • OpenAI GPT-4
  • Anthropic Claude
  • Any model configured in LLM Connections

The Summarizer uses Generative AI and does not require training. It works immediately after creation using your configured LLM connections.

Integration with Workflows

Add the Summarizer as a workflow node to automatically generate summaries during document processing:

Document Upload → OCR → Summarizer → Store Summary → Notify Stakeholders

Common workflow patterns:

  • Intake summary — Summarize uploaded documents for quick triage
  • Post-extraction summary — Generate summary after field extraction for human review
  • Batch processing — Summarize all documents in a submission
  • Conditional summary — Only summarize documents exceeding a page threshold

Output

The Summarizer produces:

  • Summary text — Concise document summary in markdown or plain text
  • Confidence — Generation confidence score
  • Token usage — Input and output token counts for cost tracking

Output is accessible in downstream workflow nodes via JSONPath.

Best Practices

  1. Choose the right model — Larger models produce better summaries for complex documents
  2. Set appropriate length — Configure max tokens based on your summary needs
  3. Use in pipelines — Combine with classifiers and extractors for comprehensive processing
  4. Monitor costs — Track token usage in LLM Observability
  5. Review quality — Spot-check summaries to ensure accuracy for your document types

Comparison with Other Processor Types

FeatureSummarizerExtractorClassifierSplitterLayout
Requires trainingNoYesYesYesYes
Output typeFree-form textStructured fieldsCategory labelSplit pointsVariation ID
Uses LLMYes (always)OptionalNo (LayoutLMv3)No (LayoutLMv3)No (LayoutLMv3)
Min examples010+10+ per class10+5+ per variation

Next Steps

Last updated on