Annotators
Annotators are the labeling interfaces used to create training data for custom processors. M3 Forge provides specialized annotation tools for each processor type, optimized for efficiency and accuracy.
Annotation Interfaces
Each processor type has a dedicated annotation interface:
| Processor Type | Interface | Primary Tasks |
|---|---|---|
| Classifier | Document Labeler | Assign class labels to documents |
| Extractor | Field Annotator | Select and label field values |
| Splitter | Boundary Marker | Mark document boundaries and page types |
| Layout | Region Annotator | Draw bounding boxes and assign region types |
All interfaces share common features:
- Document preview with zoom and pan
- Keyboard shortcuts for efficient labeling
- Progress tracking and navigation
- Quality metrics and validation
- Multi-user collaboration support
Document Labeler (Classifiers)
The Document Labeler assigns class labels to entire documents.
Features
- Side-by-side Layout — Document preview on left, label selector on right
- Quick Selection — Click class button or use number keys (1-9)
- Confidence Scoring — Optional confidence slider for ambiguous cases
- Bulk Labeling — Select multiple similar documents for batch labeling
- Search and Filter — Find unlabeled or specific document types
Workflow
Load Document
Navigate to unlabeled document from list or use arrow keys.
Review Document
Examine full document in preview pane. Zoom and scroll as needed.
Assign Label
Click class button or press number key. Document is marked labeled and next document loads automatically.
Handle Ambiguity
If uncertain, set confidence slider or flag for review. Move to next document.
Review Progress
Progress bar shows labeled vs total documents. Filter shows remaining work.
Keyboard Shortcuts
| Shortcut | Action |
|---|---|
1-9 | Assign class 1-9 |
Arrow Keys | Navigate documents |
Space | Toggle document zoom |
R | Flag for review |
U | Mark unlabeled |
? | Show all shortcuts |
Use keyboard shortcuts to achieve 100+ labels per hour. Mouse-only labeling is significantly slower.
Field Annotator (Extractors)
The Field Annotator labels field values within documents.
Features
- Text Selection — Click and drag to select field text
- Bounding Box Mode — Draw boxes around fields
- Table Support — Label table cells and relationships
- Multi-Page — Label fields across pages
- Smart Suggestions — ML-powered field predictions
- Relationship Mapping — Connect related fields (line items, key-value pairs)
Workflow
Select Field Type
Choose field from schema list (invoice_number, date, total, etc.).
Locate Field Value
Find field value in document preview.
Annotate Value
Click and drag to select text, or draw bounding box around field.
Verify Selection
Selected text appears in field preview. Correct if needed.
Continue Labeling
Select next field type and repeat. All fields labeled? Move to next document.
Annotation Modes
Text Selection Mode (default):
- Click start of text, drag to end
- Automatically captures text within selection
- Best for clear, digital text
Bounding Box Mode:
- Draw rectangle around field
- OCR applied to box region
- Best for scanned documents or poor quality text
Table Mode:
- Select entire table
- Label column headers
- Assign field types to columns
- System extracts all rows automatically
Handling Complex Extractions
Multi-Value Fields:
Arrays and lists (e.g., line items):
- Enable “Multi-Value” for field
- Label first instance
- Click ”+” to add additional instances
- System learns to find all instances
Nested Fields:
Structured data (e.g., billing address):
- Create parent field (billing_address)
- Create child fields (street, city, zip)
- Label parent region
- Label children within parent
- System preserves hierarchy
Conditional Fields:
Fields that may be absent:
- If field not present, click “Mark Absent”
- System learns when field should be null
- Improves precision and recall
For multi-page extractions, ensure you label values on all pages where they appear. Missing annotations on later pages reduce model accuracy.
Boundary Marker (Splitters)
The Boundary Marker identifies document boundaries and page types in multi-document files.
Features
- Page Gallery — Thumbnail view of all pages
- Boundary Indicators — Visual markers for document splits
- Page Type Labels — Classify page types (cover, body, separator)
- Rule Builder — Define split rules based on patterns
- Validation — Preview split results before finalizing
Workflow
Review Page Sequence
Scroll through page thumbnails to understand document structure.
Mark Boundaries
Click between pages to insert document boundary. Red line indicates split point.
Assign Page Types
Label each page as cover, body, separator, or other defined types.
Define Rules
Create rules based on page types (e.g., “Split when page type changes from body to cover”).
Validate Splits
Review split preview. Adjust boundaries or rules as needed.
Page Type Classification
Common page types:
| Page Type | Description | Example |
|---|---|---|
| Cover | First page of document | Title page, letterhead |
| Body | Main content pages | Document body text |
| Separator | Blank or divider pages | Separator sheets between docs |
| Back | Last page of document | Signature page, legal notices |
| Form | Structured form pages | Applications, questionnaires |
Define custom page types for your specific document sets.
Region Annotator (Layout)
The Region Annotator labels document regions and structure.
Features
- Bounding Box Drawing — Precise region selection
- Region Types — Header, footer, table, form field, text block
- Reading Order — Numbered sequence for text flow
- Hierarchical Regions — Parent-child relationships
- Region Properties — Attributes like font size, alignment
Workflow
Select Region Type
Choose type from toolbar (header, table, text block, etc.).
Draw Bounding Box
Click and drag to create rectangle around region.
Assign Properties
Set region attributes (reading order, alignment, etc.).
Define Relationships
Connect related regions (e.g., table caption to table).
Repeat for All Regions
Label all significant regions on page. Move to next page.
Layout Variations
Documents may have multiple layout variations:
- Create variation for each distinct layout
- Assign documents to appropriate variation
- Label representative examples for each variation
- System learns to detect and handle all variations
This is critical for document types with inconsistent layouts.
For complex layouts, start with major regions (header, body, footer) before annotating detailed elements.
Quality Management
Ensure high-quality annotations with built-in tools:
Annotation Validation
Real-time checks for:
- Overlapping selections
- Missing required fields
- Invalid values (wrong type, format)
- Duplicate annotations
Warnings appear immediately. Fix before moving to next document.
Inter-Annotator Agreement
Measure consistency across multiple annotators:
- Assign same documents to multiple annotators
- System calculates agreement metrics (Cohen’s kappa, F1)
- Review disagreements in dedicated UI
- Resolve conflicts and update guidelines
High agreement (>0.8) indicates consistent labeling. Low agreement suggests unclear guidelines or difficult examples.
Review Queues
Flag documents for expert review:
- Uncertain — Annotator unsure of correct label
- Difficult — Complex document requiring expertise
- Disagreement — Multiple annotators assigned different labels
- Low Confidence — Model prediction below threshold
Reviewers can approve, correct, or reassign flagged documents.
Audit Trails
All annotations are logged:
- Who labeled each document
- When labeling occurred
- Changes and corrections
- Time spent per document
Use audit data to identify training needs and quality issues.
Collaboration Features
Multi-User Labeling
Distribute labeling work across team:
- Create annotation project
- Invite team members
- System assigns documents to avoid overlap
- Track individual progress and quality
Load balancing ensures fair distribution.
Role-Based Access
- Annotators — Label documents, flag for review
- Reviewers — Approve/correct annotations, resolve conflicts
- Admins — Manage projects, export data, configure settings
Real-Time Updates
Changes sync instantly:
- See when teammates complete documents
- Avoid duplicate work on same documents
- Live progress metrics for entire team
Labeling Efficiency Tips
Maximize labeling throughput:
Use Keyboard Shortcuts
Learn and use shortcuts for common actions. 10x faster than mouse-only interaction.
Enable Predictions
For partially trained models:
- Enable “Show Predictions” in settings
- System pre-fills likely labels
- Verify and correct instead of labeling from scratch
- Reduces labeling time by 40-60%
Batch Similar Documents
Process similar documents together:
- Same document type, same labels
- Mental context switching minimized
- Consistent labeling criteria
- Faster completion per document
Set Daily Goals
Track velocity and set targets:
- Aim for 50-100 documents per day (classifiers)
- 20-40 documents per day (extractors)
- Regular breaks prevent fatigue and errors
Review Difficult Examples Last
Label easy, clear documents first:
- Build momentum with quick wins
- Develop intuition for edge cases
- Tackle difficult examples when experienced
The first 100 labels take longest as you refine criteria. Labeling speed increases 2-3x after initial learning curve.
Exporting Annotations
Export labeled data for analysis or external training:
- Navigate to Dataset settings
- Click Export
- Choose format:
- JSON — Structured annotations with metadata
- CSV — Tabular format for spreadsheet analysis
- COCO — Computer vision format for image models
- Custom — Define custom export schema
- Download file or export to cloud storage
Exports include:
- Document metadata
- Annotations and labels
- Annotator information
- Timestamps and audit trail
Next Steps
- Launch Training Jobs with labeled data
- Review Custom Processors for end-to-end workflow
- Integrate annotations with HITL for continuous improvement
- Build Workflows using trained processors