Skip to content

feat: add processor plugin support#299

Open
andreatgretel wants to merge 4 commits intomainfrom
andreatgretel/feat/processor-plugins-registry
Open

feat: add processor plugin support#299
andreatgretel wants to merge 4 commits intomainfrom
andreatgretel/feat/processor-plugins-registry

Conversation

@andreatgretel
Copy link
Contributor

@andreatgretel andreatgretel commented Feb 5, 2026

Summary

Extends the plugin system to support third-party processor plugins via entry points, alongside existing column generator and seed reader plugins.

Changes

Added

Changed

  • PluginRegistry uses RLock instead of Lock to prevent deadlocks when plugin imports trigger re-entry
  • ProcessorConfigT moved from processors.py to processor_types.py for plugin injection
  • Import updates in config_builder.py, data_designer_config.py, validation.py

Docs

Attention Areas

Reviewers: Please pay special attention to the following:

Test Plan

  • All 2217 existing tests pass
  • 11 demo plugin tests pass (6 regex filter, 5 semantic dedup)
  • Plugin discovery correctly registers both processor plugins
  • Demo notebook runs end-to-end with live LLM (regex filter: 4→2 rows, semantic dedup verified)
  • CI passes

Description updated with AI

@andreatgretel andreatgretel force-pushed the andreatgretel/feat/processor-plugins branch 7 times, most recently from 403bc69 to a61848e Compare February 11, 2026 20:29
Base automatically changed from andreatgretel/feat/processor-plugins to main February 12, 2026 00:32
Add PluginType.PROCESSOR to the plugin system, enabling third-party
processor plugins via entry points. Includes a demo plugin package
with RegexFilterProcessor (process_before_batch) and
SemanticDedupProcessor (process_after_generation).

- Add PluginType.PROCESSOR with processor_type discriminator
- Create processor_types.py for ProcessorConfigT with plugin injection
- Register plugin processors in engine ProcessorRegistry
- Use RLock in PluginRegistry to prevent deadlocks during discovery
- Add demo package: data-designer-demo-processors
- Update processor and plugin documentation
@andreatgretel andreatgretel force-pushed the andreatgretel/feat/processor-plugins-registry branch from 56ccb15 to 79d6a34 Compare February 19, 2026 18:40
@andreatgretel andreatgretel changed the title feat: add processor plugin system feat: add processor plugin support Feb 19, 2026
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove all these demos! (promise!)

@andreatgretel
Copy link
Contributor Author

Lots of LoCs but most of them are for the demo and for the plan, both will be removed:

Demo plugin package 15 +470 0
Plan 1 +122 0
Docs 2 +56 -1
Core source 7 +42 -8
Core tests 1 +30 -4

Verify that processor plugins from PluginRegistry are picked up
by create_default_processor_registry and registered correctly.
@andreatgretel andreatgretel marked this pull request as ready for review February 19, 2026 22:19
@andreatgretel andreatgretel requested a review from a team as a code owner February 19, 2026 22:19
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this too I think?

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 19, 2026

Greptile Summary

This PR successfully extends the plugin system to support third-party processor plugins via entry points, following the established patterns for column generators and seed readers.

Key Changes:

  • Added PluginType.PROCESSOR enum value with processor_type discriminator in plugin.py:23
  • Created processor_types.py module defining ProcessorConfigT type union with plugin injection
  • Added inject_into_processor_config_type_union() method to PluginManager in plugin_manager.py:80-89
  • Changed Lock to RLock in PluginRegistry at registry.py:26 to prevent deadlocks during nested plugin imports
  • Modified ProcessorRegistry to automatically register plugins from PluginRegistry in registry.py:28-29
  • Included two demo processors: RegexFilterProcessor (pre-batch filtering) and SemanticDedupProcessor (post-generation deduplication)

Issues Found:

  • All demo plugin Python files (8 files) are missing the required NVIDIA SPDX license headers per project guidelines
  • Demo uses direct numpy import rather than lazy loading pattern (acceptable for external plugin, but may need documentation)

Test Coverage:

  • All 2,217 existing tests pass
  • 11 new demo plugin tests added (6 for regex filter, 5 for semantic dedup)
  • Added test verifying plugin registration in ProcessorRegistry

The implementation is architecturally sound and follows established patterns consistently. The RLock change is a critical fix for plugin discovery reliability.

Confidence Score: 4/5

  • This PR is safe to merge with minor style fixes needed
  • Score reflects solid architectural implementation following established patterns, comprehensive test coverage (2,217 existing + 11 new tests passing), and critical RLock deadlock fix. Reduced from 5 due to missing license headers in all 8 demo plugin files, which should be fixed before merge.
  • Demo plugin files need license headers added via make update-license-headers

Important Files Changed

Filename Overview
packages/data-designer-config/src/data_designer/plugins/plugin.py Added PluginType.PROCESSOR with processor_type discriminator field, extending plugin system to support processor plugins
packages/data-designer-config/src/data_designer/config/processor_types.py New file defining ProcessorConfigT type union with plugin injection, following established pattern from column_types.py
packages/data-designer-config/src/data_designer/plugin_manager.py Added inject_into_processor_config_type_union() method for processor plugin type injection
packages/data-designer-config/src/data_designer/plugins/registry.py Changed Lock to RLock to prevent deadlocks during plugin discovery with nested imports
packages/data-designer-engine/src/data_designer/engine/processing/processors/registry.py Modified to register processor plugins from PluginRegistry during initialization
demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py Demo processor removing semantically similar rows via embedding similarity in process_after_generation
demo/data_designer_demo_processors/pyproject.toml Demo package configuration with entry points for regex-filter and semantic-dedup plugins

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Third-party Plugin Package] -->|defines| B[ProcessorConfig subclass]
    A -->|defines| C[Processor implementation]
    A -->|creates| D[Plugin object]
    A -->|declares| E[pyproject.toml entry point]
    
    E -->|discovered by| F[PluginRegistry]
    F -->|loads| D
    D -->|provides| B
    D -->|provides| C
    
    F -->|injects into| G[ProcessorConfigT type union]
    F -->|registers in| H[ProcessorRegistry]
    
    H -->|instantiates| C
    C -->|uses| B
    
    I[DataDesigner] -->|validates config| G
    I -->|executes processors| H
    
    J[process_before_batch] -.->|optional| C
    K[process_after_batch] -.->|optional| C
    L[process_after_generation] -.->|optional| C
Loading

Last reviewed commit: ee33f5c

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

28 files reviewed, 9 comments

Edit Code Review Agent Settings | Greptile

@@ -0,0 +1,16 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/config.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,27 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/impl.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,7 @@
from data_designer.plugins.plugin import Plugin, PluginType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/plugin.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,16 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/config.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,51 @@
from __future__ import annotations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,7 @@
from data_designer.plugins.plugin import Plugin, PluginType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/plugin.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,53 @@
from unittest.mock import MagicMock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/tests/test_regex_filter.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

@@ -0,0 +1,63 @@
from unittest.mock import MagicMock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/tests/test_semantic_dedup.py
Line: 1

Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.

How can I resolve this? If you propose a fix, please make it concise.

import logging
from typing import TYPE_CHECKING

import numpy as np
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct import of numpy. Per AGENTS.md:208-231, heavy libraries like numpy should be lazy-loaded via lazy_heavy_imports.py. However, since this is a demo plugin in a separate package, this may be acceptable. Consider documenting this pattern for third-party plugin developers.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py
Line: 6

Comment:
Direct import of `numpy`. Per `AGENTS.md`:208-231, heavy libraries like numpy should be lazy-loaded via `lazy_heavy_imports.py`. However, since this is a demo plugin in a separate package, this may be acceptable. Consider documenting this pattern for third-party plugin developers.

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments