Custom Classifier
Group your documents into categories. Train a custom ML model to classify documents according to labels that you define.
Overview
The Custom Classifier categorizes documents into predefined classes. Define your label taxonomy, label example documents, and train a LayoutLMv3-based model that learns to recognize document types from both text content and visual layout.
Use cases:
- Document routing (invoices vs. contracts vs. forms)
- Type detection (W-2 vs. 1099 vs. W-9)
- Topic classification (complaint vs. inquiry vs. feedback)
- Compliance sorting (regulated vs. non-regulated)
- Triage and prioritization
Creating a Custom Classifier
Create Processor
Navigate to Processors → Custom Processors and click Create processor on the Custom Classifier card. Enter a name (e.g., “Document Type Classifier”) and click Create.
Define Labels
Open your classifier and navigate to the Label Management tab. Create labels for each document class:

| Setting | Description | Example |
|---|---|---|
| Label Name | Machine-friendly identifier | invoice |
| Display Name | Human-readable label | ”Invoice” |
| Description | What documents belong to this class | ”Purchase invoices from vendors” |
| Color | Visual indicator in labeling UI | Blue swatch |
Labels can be reordered by dragging the handle. Enable/disable labels with the toggle. Status badges show green when a label has 10+ training samples.
Aim for at least 10 labeled examples per class for initial training. More examples and balanced class sizes produce better models.
Import Training Documents
Navigate to the Documents tab and import example documents. Include a mix of all document types you want to classify.
- Supported formats: PDF, PNG, JPEG, TIFF, DOCX
- Upload via file picker, drag-and-drop, or S3 import
Label Documents
Click a document to open the Labeling Interface, which uses a two-panel layout:

Left panel — Document preview (3/4 width):
- Full-sized document image
- Zoom controls (0.25x to 3x)
- Rotate document button
- Refresh/reload document
Right panel — Label assignment (1/4 width):
- Scrollable list of all labels with color indicators
- Click a label to assign it to the current document
- Selected label shows highlighted with border and background
- Each label shows its keyboard shortcut (1-9)
Keyboard shortcuts:
1-9— Toggle label assignmentEnter— Confirm label and move to next unlabeled documentNor→— Next document (without saving)Por←— Previous documentEsc— Close labeling interface?— Show all shortcuts
The labeling workflow is designed for speed — assign a label and press Enter to move through documents quickly.
Review Dataset
The Dataset Overview tab shows:

- Total documents — All imported documents
- Labeled count — Documents with assigned labels
- Unlabeled count — Documents still needing labels
- Needs Review — Documents flagged for review
- Labeling progress — Percentage bar
- Per-label statistics — Document count per label with color indicators
- Train/Test split — 80/20 split showing assignment counts
Click Auto-assign to randomly distribute unassigned documents into training and test sets.
Train Model
Navigate to Training Jobs and click Start Training. The classifier trains a LayoutLMv3-based model that learns from both text content and document layout.
See Training for detailed configuration.
Evaluate Results
After training, the Evaluation dashboard provides:
Summary metrics (4-column grid):
- Final Accuracy (green)
- F1 Score (blue)
- Precision (info)
- Recall (yellow)
Tabs:
| Tab | Content |
|---|---|
| Overview | Training summary, epochs, loss reduction, metrics table |
| Training Curves | Bar charts for Loss and Accuracy over epochs |
| Confusion Matrix | Heatmap showing Actual vs. Predicted labels with intensity scaling |
| Per-Class | Per-label breakdown with support count, metrics, and mini progress bars |

The confusion matrix helps identify commonly confused document types. Click cells to see misclassified examples.
Deploy
Activate the trained version for production use. See Training — Production Deployment.
Dashboard
The classifier dashboard provides:
- Dataset metrics — Total documents, labeled, unlabeled, needs review
- Annotation progress — Percentage bar
- Train/Test split — 80/20 split view with assignment counts
- Per-label statistics — Document counts per label with color badges
- Quick actions — Import Documents, Start Annotating
Backend Architecture
Custom classifiers use the TransformersDocumentClassifier in marie-ai:
- Model: LayoutLMv3 via HuggingFace
AutoModelForSequenceClassification - Input: Document images + OCR text + bounding boxes
- Processing: Per-page classification with batch support
- Output: Predicted label with confidence score per page
The classifier processes both visual layout and text content, making it robust to OCR noise and layout variations.
Best Practices
- Balance your classes — Aim for similar numbers of examples per label
- Include edge cases — Add documents that are hard to classify
- Label consistently — Use the same criteria across all documents
- Start with 10+ per class — More examples improve accuracy
- Review the confusion matrix — Focus on commonly confused classes
- Use keyboard shortcuts — Process 50+ documents per hour with 1-9 + Enter
- Auto-assign train/test — Use the auto-assign button for random splitting
Next Steps
- Learn about Training job management and evaluation
- Route low-confidence classifications to HITL review
- Use classifiers to route documents in Workflows
- Combine with Custom Splitter for split-then-classify pipelines