Annotators

Annotators are the labeling interfaces used to create training data for custom processors. M3 Forge provides specialized annotation tools for each processor type, optimized for efficiency and accuracy.

Annotation Interfaces

Each processor type has a dedicated annotation interface:

Processor Type	Interface	Primary Tasks
Classifier	Document Labeler	Assign class labels to documents
Extractor	Field Annotator	Select and label field values
Splitter	Boundary Marker	Mark document boundaries and page types
Layout	Region Annotator	Draw bounding boxes and assign region types

All interfaces share common features:

Document preview with zoom and pan
Keyboard shortcuts for efficient labeling
Progress tracking and navigation
Quality metrics and validation
Multi-user collaboration support

Document Labeler (Classifiers)

The Document Labeler assigns class labels to entire documents.

Features

Side-by-side Layout — Document preview on left, label selector on right
Quick Selection — Click class button or use number keys (1-9)
Confidence Scoring — Optional confidence slider for ambiguous cases
Bulk Labeling — Select multiple similar documents for batch labeling
Search and Filter — Find unlabeled or specific document types

Workflow

Load Document

Navigate to unlabeled document from list or use arrow keys.

Review Document

Examine full document in preview pane. Zoom and scroll as needed.

Assign Label

Click class button or press number key. Document is marked labeled and next document loads automatically.

Handle Ambiguity

If uncertain, set confidence slider or flag for review. Move to next document.

Review Progress

Progress bar shows labeled vs total documents. Filter shows remaining work.

Keyboard Shortcuts

Shortcut	Action
`1-9`	Assign class 1-9
`Arrow Keys`	Navigate documents
`Space`	Toggle document zoom
`R`	Flag for review
`U`	Mark unlabeled
`?`	Show all shortcuts

Use keyboard shortcuts to achieve 100+ labels per hour. Mouse-only labeling is significantly slower.

Field Annotator (Extractors)

The Field Annotator labels field values within documents.

Features

Text Selection — Click and drag to select field text
Bounding Box Mode — Draw boxes around fields
Table Support — Label table cells and relationships
Multi-Page — Label fields across pages
Smart Suggestions — ML-powered field predictions
Relationship Mapping — Connect related fields (line items, key-value pairs)

Workflow

Select Field Type

Choose field from schema list (invoice_number, date, total, etc.).

Locate Field Value

Find field value in document preview.

Annotate Value

Click and drag to select text, or draw bounding box around field.

Verify Selection

Selected text appears in field preview. Correct if needed.

Continue Labeling

Select next field type and repeat. All fields labeled? Move to next document.

Annotation Modes

Text Selection Mode (default):

Click start of text, drag to end
Automatically captures text within selection
Best for clear, digital text

Bounding Box Mode:

Draw rectangle around field
OCR applied to box region
Best for scanned documents or poor quality text

Table Mode:

Select entire table
Label column headers
Assign field types to columns
System extracts all rows automatically

Handling Complex Extractions

Multi-Value Fields:

Arrays and lists (e.g., line items):

Enable “Multi-Value” for field
Label first instance
Click ”+” to add additional instances
System learns to find all instances

Nested Fields:

Structured data (e.g., billing address):

Create parent field (billing_address)
Create child fields (street, city, zip)
Label parent region
Label children within parent
System preserves hierarchy

Conditional Fields:

Fields that may be absent:

If field not present, click “Mark Absent”
System learns when field should be null
Improves precision and recall

For multi-page extractions, ensure you label values on all pages where they appear. Missing annotations on later pages reduce model accuracy.

Boundary Marker (Splitters)

The Boundary Marker identifies document boundaries and page types in multi-document files.

Features

Page Gallery — Thumbnail view of all pages
Boundary Indicators — Visual markers for document splits
Page Type Labels — Classify page types (cover, body, separator)
Rule Builder — Define split rules based on patterns
Validation — Preview split results before finalizing

Workflow

Review Page Sequence

Scroll through page thumbnails to understand document structure.

Mark Boundaries

Click between pages to insert document boundary. Red line indicates split point.

Assign Page Types

Label each page as cover, body, separator, or other defined types.

Define Rules

Create rules based on page types (e.g., “Split when page type changes from body to cover”).

Validate Splits

Review split preview. Adjust boundaries or rules as needed.

Page Type Classification

Common page types:

Page Type	Description	Example
Cover	First page of document	Title page, letterhead
Body	Main content pages	Document body text
Separator	Blank or divider pages	Separator sheets between docs
Back	Last page of document	Signature page, legal notices
Form	Structured form pages	Applications, questionnaires

Define custom page types for your specific document sets.

Region Annotator (Layout)

The Region Annotator labels document regions and structure.

Features

Bounding Box Drawing — Precise region selection
Region Types — Header, footer, table, form field, text block
Reading Order — Numbered sequence for text flow
Hierarchical Regions — Parent-child relationships
Region Properties — Attributes like font size, alignment

Workflow

Select Region Type

Choose type from toolbar (header, table, text block, etc.).

Draw Bounding Box

Click and drag to create rectangle around region.

Assign Properties

Set region attributes (reading order, alignment, etc.).

Define Relationships

Connect related regions (e.g., table caption to table).

Repeat for All Regions

Label all significant regions on page. Move to next page.

Layout Variations

Documents may have multiple layout variations:

Create variation for each distinct layout
Assign documents to appropriate variation
Label representative examples for each variation
System learns to detect and handle all variations

This is critical for document types with inconsistent layouts.

For complex layouts, start with major regions (header, body, footer) before annotating detailed elements.

Quality Management

Ensure high-quality annotations with built-in tools:

Annotation Validation

Real-time checks for:

Overlapping selections
Missing required fields
Invalid values (wrong type, format)
Duplicate annotations

Warnings appear immediately. Fix before moving to next document.

Inter-Annotator Agreement

Measure consistency across multiple annotators:

Assign same documents to multiple annotators
System calculates agreement metrics (Cohen’s kappa, F1)
Review disagreements in dedicated UI
Resolve conflicts and update guidelines

High agreement (>0.8) indicates consistent labeling. Low agreement suggests unclear guidelines or difficult examples.

Review Queues

Flag documents for expert review:

Uncertain — Annotator unsure of correct label
Difficult — Complex document requiring expertise
Disagreement — Multiple annotators assigned different labels
Low Confidence — Model prediction below threshold

Reviewers can approve, correct, or reassign flagged documents.

Audit Trails

All annotations are logged:

Who labeled each document
When labeling occurred
Changes and corrections
Time spent per document

Use audit data to identify training needs and quality issues.

Collaboration Features

Multi-User Labeling

Distribute labeling work across team:

Create annotation project
Invite team members
System assigns documents to avoid overlap
Track individual progress and quality

Load balancing ensures fair distribution.

Role-Based Access

Annotators — Label documents, flag for review
Reviewers — Approve/correct annotations, resolve conflicts
Admins — Manage projects, export data, configure settings

Real-Time Updates

Changes sync instantly:

See when teammates complete documents
Avoid duplicate work on same documents
Live progress metrics for entire team

Labeling Efficiency Tips

Maximize labeling throughput:

Use Keyboard Shortcuts

Learn and use shortcuts for common actions. 10x faster than mouse-only interaction.

Enable Predictions

For partially trained models:

Enable “Show Predictions” in settings
System pre-fills likely labels
Verify and correct instead of labeling from scratch
Reduces labeling time by 40-60%

Batch Similar Documents

Process similar documents together:

Same document type, same labels
Mental context switching minimized
Consistent labeling criteria
Faster completion per document

Set Daily Goals

Track velocity and set targets:

Aim for 50-100 documents per day (classifiers)
20-40 documents per day (extractors)
Regular breaks prevent fatigue and errors

Review Difficult Examples Last

Label easy, clear documents first:

Build momentum with quick wins
Develop intuition for edge cases
Tackle difficult examples when experienced

The first 100 labels take longest as you refine criteria. Labeling speed increases 2-3x after initial learning curve.

Exporting Annotations

Export labeled data for analysis or external training:

Navigate to Dataset settings
Click Export
Choose format:
- JSON — Structured annotations with metadata
- CSV — Tabular format for spreadsheet analysis
- COCO — Computer vision format for image models
- Custom — Define custom export schema
Download file or export to cloud storage

Exports include:

Document metadata
Annotations and labels
Annotator information
Timestamps and audit trail

Next Steps

Launch Training Jobs with labeled data
Review Custom Processors for end-to-end workflow
Integrate annotations with HITL for continuous improvement
Build Workflows using trained processors