Skip to Content

Registry

Component registry view displaying installed extraction components, query planners, and hot-reload configuration for your gateway infrastructure.

Component registry showing installed parsers, validators, processors, and other extraction components with version information

Overview

The Registry view provides visibility into the extraction components and query planners installed on your gateways. This unified view combines component registry and planner registry data to show exactly what extraction capabilities are available.

Key capabilities:

  • Component categorization by type
  • Installed wheel packages with version tracking
  • Hot-reload watched directories
  • Combined component and planner registry view
  • Per-gateway registration status

Registry information is retrieved via the gateways.getRegistryInfo tRPC endpoint.

Component Categories

The Registry view organizes components into logical categories:

Parsers

Document parsers that extract structured data from various file formats.

Examples:

  • PDF parsers
  • Image parsers (TIFF, PNG, JPEG)
  • Office document parsers (DOCX, XLSX)
  • Form parsers
  • Table extractors

Icon: Document with magnifying glass

Validators

Rule-based validators that verify extraction results meet quality and business requirements.

Examples:

  • Schema validators
  • Data type validators
  • Business rule validators
  • Format validators
  • Cross-field validators

Icon: Checkmark shield

Template Builders

Components that construct extraction templates from configuration or learned patterns.

Examples:

  • Static template builders
  • Machine learning template builders
  • Adaptive template builders
  • Layout-based template builders

Icon: Blueprint/template grid

Region Processors

Specialized processors for handling specific document regions or zones.

Examples:

  • Header processors
  • Footer processors
  • Signature region processors
  • Table zone processors
  • Form field processors

Icon: Crop/region selector

Processing Visitors

Visitor pattern components that traverse and transform extraction results.

Examples:

  • Data normalization visitors
  • Entity extraction visitors
  • Relationship builders
  • Format converters
  • Quality score calculators

Icon: Workflow nodes

Context Providers

Components that supply contextual information to improve extraction accuracy.

Examples:

  • Document type context providers
  • Historical data providers
  • Business domain context
  • Reference data providers
  • Metadata enrichers

Icon: Database/context layers

Component categories help you understand the extraction pipeline stages and identify missing capabilities.

Component Display

Each component category shows:

InformationDescriptionDisplay Limit
Component CountTotal registered components in this categoryAlways shown
Component NamesIndividual component identifiersUp to 50 items
Count-Only ModeFor categories with >50 componentsShows count badge only

Understanding Component Names

Component names typically follow this pattern:

<domain>.<category>.<specific_function>

Examples:

  • extraction.parsers.pdf_standard
  • validation.rules.date_format
  • processing.visitors.entity_extractor

The naming convention helps you understand component purpose and organization.

Installed Wheels

The Registry view displays all installed Python wheel packages that provide extraction functionality:

Wheel Information

  • Package Name: The wheel’s Python package identifier
  • Version: Semantic version number (e.g., 1.2.3)
  • Installation Source: Component registry, planner registry, or both

Combined Registry

Wheels are aggregated from:

  1. Component Registry: Standard extraction components
  2. Planner Registry: Query planning and routing components

A wheel appearing in both registries indicates it provides both extraction components and query planning capabilities.

Version Tracking

Version information helps you:

  • Verify correct component versions are deployed
  • Identify version mismatches across gateways
  • Plan upgrades and rollbacks
  • Troubleshoot version-specific issues

Example:

marie-extractors: 2.4.1 marie-validators: 1.8.0 custom-parsers: 0.3.2

Watched Directories

The Registry view displays file system paths monitored for hot-reload of extraction components:

Directory Monitoring

Watched directories are paths where:

  • Component Python files are located
  • Changes trigger automatic reload
  • New components are automatically registered
  • Removed components are unregistered

Deduplication

Directory paths are deduplicated across component and planner registries to avoid showing the same path multiple times.

Example watched directories:

/opt/marie/components/extractors /opt/marie/components/validators /opt/marie/plugins/custom /home/developer/marie-dev/local-components

Changes to files in watched directories will trigger hot-reload without gateway restart. Use with caution in production.

Hot-Reload Behavior

The Registry view helps you understand hot-reload configuration and behavior:

How Hot-Reload Works

File Change Detection

The gateway monitors watched directories for:

  • New Python files added
  • Existing files modified
  • Files deleted or moved

Component Scanning

When changes are detected:

  • Python modules are reloaded
  • Component classes are re-scanned
  • Registry is updated with new/modified components

Availability

Updated components become immediately available:

  • New extraction requests use updated components
  • In-flight requests complete with old components
  • No gateway restart required

Hot-Reload Use Cases

ScenarioBenefits
DevelopmentRapid iteration without restarts
Hot FixesDeploy fixes without downtime
A/B TestingTest new components alongside stable versions
Plugin DevelopmentDevelop custom components with immediate feedback

Hot-reload in production should be used carefully. Consider these risks:

  • Untested components can break extraction
  • Memory leaks from repeated reloads
  • Inconsistent state during reload

Understanding Component Registration

Components are registered when:

  1. Gateway Start: All components in configured paths are scanned and registered
  2. Hot-Reload: File changes trigger re-registration
  3. Manual Registration: API calls to register specific components
  4. Plugin Installation: Installing wheel packages registers their components

Registration Process

Discovery

Gateway scans configured paths for Python modules containing component classes.

Validation

Each component is validated:

  • Implements required interfaces
  • Has proper metadata (name, version, category)
  • Passes initialization checks

Registration

Valid components are added to the registry:

  • Assigned to appropriate category
  • Made available to extraction pipeline
  • Listed in Registry view

Verification

Check the Registry view to verify:

  • Component appears in correct category
  • Version matches expectation
  • Wheel package is listed

Component Categories Explained

Parsers

Parsers are the entry point of the extraction pipeline, responsible for reading document files and producing initial structured data.

When to add parsers:

  • Supporting a new file format
  • Improving extraction quality for specific document types
  • Handling non-standard or legacy formats

Validators

Validators enforce quality and business rules on extraction results, ensuring data meets requirements before downstream processing.

When to add validators:

  • Implementing new business rules
  • Adding data quality checks
  • Enforcing compliance requirements

Template Builders

Template builders construct extraction templates that define how to extract structured data from specific document layouts.

When to add template builders:

  • Supporting new template formats
  • Implementing machine learning-based template generation
  • Handling dynamic document layouts

Region Processors

Region processors handle specific areas of documents, applying specialized extraction logic to headers, tables, signatures, and other zones.

When to add region processors:

  • Extracting data from specific document regions
  • Implementing zone-specific processing rules
  • Handling complex layouts

Processing Visitors

Visitors traverse and transform extraction results, normalizing data, extracting entities, and calculating quality scores.

When to add processing visitors:

  • Implementing data transformations
  • Extracting higher-level entities
  • Calculating custom quality metrics

Context Providers

Context providers supply additional information that improves extraction accuracy by incorporating business domain knowledge and historical data.

When to add context providers:

  • Integrating reference data
  • Leveraging historical extraction results
  • Applying domain-specific knowledge

Interpreting Registry Data

Healthy Registry

A well-configured gateway registry shows:

  • All expected component categories populated
  • Version numbers consistent across gateways
  • Watched directories match deployment configuration
  • No missing or duplicate components

Warning Signs

  • Empty categories: Missing component types may limit extraction capabilities
  • Version mismatches: Different gateways running different component versions
  • Unexpected wheels: Unknown packages may indicate deployment issues
  • No watched directories: Hot-reload is disabled

Compare registry data across multiple gateways to ensure consistent component deployment.

Common Scenarios

Scenario 1: Verifying Component Deployment

After deploying new extraction components:

  1. Navigate to Registry view
  2. Select the gateway where components were deployed
  3. Verify component appears in appropriate category
  4. Check wheel version matches deployed package
  5. Confirm watched directory includes component path

Scenario 2: Troubleshooting Missing Components

If expected components aren’t available:

  1. Check component category for component name
  2. Verify wheel package is listed in installed wheels
  3. Confirm watched directories include component location
  4. Review gateway logs for registration errors
  5. Check component implementation for registration issues

Scenario 3: Hot-Reload Verification

After modifying a component file:

  1. Note the component count before modification
  2. Make file changes in a watched directory
  3. Wait for hot-reload (typically 1-5 seconds)
  4. Refresh Registry view
  5. Verify component count or version updated

Scenario 4: Auditing Installed Components

For compliance or security audits:

  1. Export registry data for all gateways
  2. Compare installed wheels across environments
  3. Verify component versions match approved versions
  4. Document watched directories for security review
  5. Cross-reference with deployment manifests

Best Practices

  1. Version consistency: Deploy identical component versions across all gateways
  2. Regular audits: Periodically review installed components for unexpected changes
  3. Document custom components: Maintain inventory of custom components and their purposes
  4. Test before hot-reload: Test component changes in development before hot-reloading in production
  5. Monitor watched directories: Restrict write access to watched directories in production

Troubleshooting

Components not appearing

  • Verify component files are in a watched directory
  • Check gateway logs for registration errors
  • Confirm component implements required interfaces
  • Verify wheel package is properly installed

Hot-reload not working

  • Confirm directories are listed in watched directories
  • Check file system permissions for gateway process
  • Verify hot-reload is enabled in gateway configuration
  • Review logs for file monitoring errors

Version mismatches

  • Compare installed wheels across gateways
  • Check deployment process for version pinning
  • Verify package index sources are consistent
  • Review recent deployments for version changes

Duplicate components

  • Check for components registered from multiple sources
  • Review watched directories for overlapping paths
  • Verify wheel packages don’t conflict
  • Check for manual and automatic registration overlap

If registry data appears incorrect or incomplete, verify gateway connectivity and tRPC endpoint availability in the Debug view.

Next Steps

  • Use the Debug view to verify gateway health after component deployment
  • Check the Capacity view to ensure components aren’t exhausting resources
  • Review the Jobs view to see how new components affect extraction operations
  • Explore Custom Components for developing your own extraction components
Last updated on