Skip to Content
ProcessorsAnnotators

Annotators

Annotators are the labeling interfaces used to create training data for custom processors. M3 Forge provides specialized annotation tools for each processor type, optimized for efficiency and accuracy.

Annotation Interfaces

Each processor type has a dedicated annotation interface:

Processor TypeInterfacePrimary Tasks
ClassifierDocument LabelerAssign class labels to documents
ExtractorField AnnotatorSelect and label field values
SplitterBoundary MarkerMark document boundaries and page types
LayoutRegion AnnotatorDraw bounding boxes and assign region types

All interfaces share common features:

  • Document preview with zoom and pan
  • Keyboard shortcuts for efficient labeling
  • Progress tracking and navigation
  • Quality metrics and validation
  • Multi-user collaboration support

Document Labeler (Classifiers)

The Document Labeler assigns class labels to entire documents.

Features

  • Side-by-side Layout — Document preview on left, label selector on right
  • Quick Selection — Click class button or use number keys (1-9)
  • Confidence Scoring — Optional confidence slider for ambiguous cases
  • Bulk Labeling — Select multiple similar documents for batch labeling
  • Search and Filter — Find unlabeled or specific document types

Workflow

Load Document

Navigate to unlabeled document from list or use arrow keys.

Review Document

Examine full document in preview pane. Zoom and scroll as needed.

Assign Label

Click class button or press number key. Document is marked labeled and next document loads automatically.

Handle Ambiguity

If uncertain, set confidence slider or flag for review. Move to next document.

Review Progress

Progress bar shows labeled vs total documents. Filter shows remaining work.

Keyboard Shortcuts

ShortcutAction
1-9Assign class 1-9
Arrow KeysNavigate documents
SpaceToggle document zoom
RFlag for review
UMark unlabeled
?Show all shortcuts

Use keyboard shortcuts to achieve 100+ labels per hour. Mouse-only labeling is significantly slower.

Field Annotator (Extractors)

The Field Annotator labels field values within documents.

Features

  • Text Selection — Click and drag to select field text
  • Bounding Box Mode — Draw boxes around fields
  • Table Support — Label table cells and relationships
  • Multi-Page — Label fields across pages
  • Smart Suggestions — ML-powered field predictions
  • Relationship Mapping — Connect related fields (line items, key-value pairs)

Workflow

Select Field Type

Choose field from schema list (invoice_number, date, total, etc.).

Locate Field Value

Find field value in document preview.

Annotate Value

Click and drag to select text, or draw bounding box around field.

Verify Selection

Selected text appears in field preview. Correct if needed.

Continue Labeling

Select next field type and repeat. All fields labeled? Move to next document.

Annotation Modes

Text Selection Mode (default):

  • Click start of text, drag to end
  • Automatically captures text within selection
  • Best for clear, digital text

Bounding Box Mode:

  • Draw rectangle around field
  • OCR applied to box region
  • Best for scanned documents or poor quality text

Table Mode:

  • Select entire table
  • Label column headers
  • Assign field types to columns
  • System extracts all rows automatically

Handling Complex Extractions

Multi-Value Fields:

Arrays and lists (e.g., line items):

  1. Enable “Multi-Value” for field
  2. Label first instance
  3. Click ”+” to add additional instances
  4. System learns to find all instances

Nested Fields:

Structured data (e.g., billing address):

  1. Create parent field (billing_address)
  2. Create child fields (street, city, zip)
  3. Label parent region
  4. Label children within parent
  5. System preserves hierarchy

Conditional Fields:

Fields that may be absent:

  1. If field not present, click “Mark Absent”
  2. System learns when field should be null
  3. Improves precision and recall

For multi-page extractions, ensure you label values on all pages where they appear. Missing annotations on later pages reduce model accuracy.

Boundary Marker (Splitters)

The Boundary Marker identifies document boundaries and page types in multi-document files.

Features

  • Page Gallery — Thumbnail view of all pages
  • Boundary Indicators — Visual markers for document splits
  • Page Type Labels — Classify page types (cover, body, separator)
  • Rule Builder — Define split rules based on patterns
  • Validation — Preview split results before finalizing

Workflow

Review Page Sequence

Scroll through page thumbnails to understand document structure.

Mark Boundaries

Click between pages to insert document boundary. Red line indicates split point.

Assign Page Types

Label each page as cover, body, separator, or other defined types.

Define Rules

Create rules based on page types (e.g., “Split when page type changes from body to cover”).

Validate Splits

Review split preview. Adjust boundaries or rules as needed.

Page Type Classification

Common page types:

Page TypeDescriptionExample
CoverFirst page of documentTitle page, letterhead
BodyMain content pagesDocument body text
SeparatorBlank or divider pagesSeparator sheets between docs
BackLast page of documentSignature page, legal notices
FormStructured form pagesApplications, questionnaires

Define custom page types for your specific document sets.

Region Annotator (Layout)

The Region Annotator labels document regions and structure.

Features

  • Bounding Box Drawing — Precise region selection
  • Region Types — Header, footer, table, form field, text block
  • Reading Order — Numbered sequence for text flow
  • Hierarchical Regions — Parent-child relationships
  • Region Properties — Attributes like font size, alignment

Workflow

Select Region Type

Choose type from toolbar (header, table, text block, etc.).

Draw Bounding Box

Click and drag to create rectangle around region.

Assign Properties

Set region attributes (reading order, alignment, etc.).

Define Relationships

Connect related regions (e.g., table caption to table).

Repeat for All Regions

Label all significant regions on page. Move to next page.

Layout Variations

Documents may have multiple layout variations:

  1. Create variation for each distinct layout
  2. Assign documents to appropriate variation
  3. Label representative examples for each variation
  4. System learns to detect and handle all variations

This is critical for document types with inconsistent layouts.

For complex layouts, start with major regions (header, body, footer) before annotating detailed elements.

Quality Management

Ensure high-quality annotations with built-in tools:

Annotation Validation

Real-time checks for:

  • Overlapping selections
  • Missing required fields
  • Invalid values (wrong type, format)
  • Duplicate annotations

Warnings appear immediately. Fix before moving to next document.

Inter-Annotator Agreement

Measure consistency across multiple annotators:

  1. Assign same documents to multiple annotators
  2. System calculates agreement metrics (Cohen’s kappa, F1)
  3. Review disagreements in dedicated UI
  4. Resolve conflicts and update guidelines

High agreement (>0.8) indicates consistent labeling. Low agreement suggests unclear guidelines or difficult examples.

Review Queues

Flag documents for expert review:

  • Uncertain — Annotator unsure of correct label
  • Difficult — Complex document requiring expertise
  • Disagreement — Multiple annotators assigned different labels
  • Low Confidence — Model prediction below threshold

Reviewers can approve, correct, or reassign flagged documents.

Audit Trails

All annotations are logged:

  • Who labeled each document
  • When labeling occurred
  • Changes and corrections
  • Time spent per document

Use audit data to identify training needs and quality issues.

Collaboration Features

Multi-User Labeling

Distribute labeling work across team:

  1. Create annotation project
  2. Invite team members
  3. System assigns documents to avoid overlap
  4. Track individual progress and quality

Load balancing ensures fair distribution.

Role-Based Access

  • Annotators — Label documents, flag for review
  • Reviewers — Approve/correct annotations, resolve conflicts
  • Admins — Manage projects, export data, configure settings

Real-Time Updates

Changes sync instantly:

  • See when teammates complete documents
  • Avoid duplicate work on same documents
  • Live progress metrics for entire team

Labeling Efficiency Tips

Maximize labeling throughput:

Use Keyboard Shortcuts

Learn and use shortcuts for common actions. 10x faster than mouse-only interaction.

Enable Predictions

For partially trained models:

  1. Enable “Show Predictions” in settings
  2. System pre-fills likely labels
  3. Verify and correct instead of labeling from scratch
  4. Reduces labeling time by 40-60%

Batch Similar Documents

Process similar documents together:

  • Same document type, same labels
  • Mental context switching minimized
  • Consistent labeling criteria
  • Faster completion per document

Set Daily Goals

Track velocity and set targets:

  • Aim for 50-100 documents per day (classifiers)
  • 20-40 documents per day (extractors)
  • Regular breaks prevent fatigue and errors

Review Difficult Examples Last

Label easy, clear documents first:

  • Build momentum with quick wins
  • Develop intuition for edge cases
  • Tackle difficult examples when experienced

The first 100 labels take longest as you refine criteria. Labeling speed increases 2-3x after initial learning curve.

Exporting Annotations

Export labeled data for analysis or external training:

  1. Navigate to Dataset settings
  2. Click Export
  3. Choose format:
    • JSON — Structured annotations with metadata
    • CSV — Tabular format for spreadsheet analysis
    • COCO — Computer vision format for image models
    • Custom — Define custom export schema
  4. Download file or export to cloud storage

Exports include:

  • Document metadata
  • Annotations and labels
  • Annotator information
  • Timestamps and audit trail

Next Steps

Last updated on