Conversation
403bc69 to
a61848e
Compare
Add PluginType.PROCESSOR to the plugin system, enabling third-party processor plugins via entry points. Includes a demo plugin package with RegexFilterProcessor (process_before_batch) and SemanticDedupProcessor (process_after_generation). - Add PluginType.PROCESSOR with processor_type discriminator - Create processor_types.py for ProcessorConfigT with plugin injection - Register plugin processors in engine ProcessorRegistry - Use RLock in PluginRegistry to prevent deadlocks during discovery - Add demo package: data-designer-demo-processors - Update processor and plugin documentation
56ccb15 to
79d6a34
Compare
There was a problem hiding this comment.
Will remove all these demos! (promise!)
|
Lots of LoCs but most of them are for the demo and for the plan, both will be removed:
|
Verify that processor plugins from PluginRegistry are picked up by create_default_processor_registry and registered correctly.
There was a problem hiding this comment.
Remove this too I think?
Greptile SummaryThis PR successfully extends the plugin system to support third-party processor plugins via entry points, following the established patterns for column generators and seed readers. Key Changes:
Issues Found:
Test Coverage:
The implementation is architecturally sound and follows established patterns consistently. The RLock change is a critical fix for plugin discovery reliability.
|
| Filename | Overview |
|---|---|
| packages/data-designer-config/src/data_designer/plugins/plugin.py | Added PluginType.PROCESSOR with processor_type discriminator field, extending plugin system to support processor plugins |
| packages/data-designer-config/src/data_designer/config/processor_types.py | New file defining ProcessorConfigT type union with plugin injection, following established pattern from column_types.py |
| packages/data-designer-config/src/data_designer/plugin_manager.py | Added inject_into_processor_config_type_union() method for processor plugin type injection |
| packages/data-designer-config/src/data_designer/plugins/registry.py | Changed Lock to RLock to prevent deadlocks during plugin discovery with nested imports |
| packages/data-designer-engine/src/data_designer/engine/processing/processors/registry.py | Modified to register processor plugins from PluginRegistry during initialization |
| demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py | Demo processor removing semantically similar rows via embedding similarity in process_after_generation |
| demo/data_designer_demo_processors/pyproject.toml | Demo package configuration with entry points for regex-filter and semantic-dedup plugins |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Third-party Plugin Package] -->|defines| B[ProcessorConfig subclass]
A -->|defines| C[Processor implementation]
A -->|creates| D[Plugin object]
A -->|declares| E[pyproject.toml entry point]
E -->|discovered by| F[PluginRegistry]
F -->|loads| D
D -->|provides| B
D -->|provides| C
F -->|injects into| G[ProcessorConfigT type union]
F -->|registers in| H[ProcessorRegistry]
H -->|instantiates| C
C -->|uses| B
I[DataDesigner] -->|validates config| G
I -->|executes processors| H
J[process_before_batch] -.->|optional| C
K[process_after_batch] -.->|optional| C
L[process_after_generation] -.->|optional| C
Last reviewed commit: ee33f5c
| @@ -0,0 +1,16 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/config.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,27 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/impl.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,7 @@ | |||
| from data_designer.plugins.plugin import Plugin, PluginType | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/regex_filter/plugin.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,16 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/config.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,51 @@ | |||
| from __future__ import annotations | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,7 @@ | |||
| from data_designer.plugins.plugin import Plugin, PluginType | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/plugin.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,53 @@ | |||
| from unittest.mock import MagicMock | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/tests/test_regex_filter.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| @@ -0,0 +1,63 @@ | |||
| from unittest.mock import MagicMock | |||
There was a problem hiding this comment.
Missing NVIDIA SPDX license header. Per AGENTS.md:134, all Python files must include the license header. Run make update-license-headers to add automatically.
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/tests/test_semantic_dedup.py
Line: 1
Comment:
Missing NVIDIA SPDX license header. Per `AGENTS.md`:134, all Python files must include the license header. Run `make update-license-headers` to add automatically.
How can I resolve this? If you propose a fix, please make it concise.| import logging | ||
| from typing import TYPE_CHECKING | ||
|
|
||
| import numpy as np |
There was a problem hiding this comment.
Direct import of numpy. Per AGENTS.md:208-231, heavy libraries like numpy should be lazy-loaded via lazy_heavy_imports.py. However, since this is a demo plugin in a separate package, this may be acceptable. Consider documenting this pattern for third-party plugin developers.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: demo/data_designer_demo_processors/src/data_designer_demo_processors/semantic_dedup/impl.py
Line: 6
Comment:
Direct import of `numpy`. Per `AGENTS.md`:208-231, heavy libraries like numpy should be lazy-loaded via `lazy_heavy_imports.py`. However, since this is a demo plugin in a separate package, this may be acceptable. Consider documenting this pattern for third-party plugin developers.
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.
Summary
Extends the plugin system to support third-party processor plugins via entry points, alongside existing column generator and seed reader plugins.
Changes
Added
PluginType.PROCESSORwithprocessor_typediscriminatorprocessor_types.py—ProcessorConfigTtype union with plugin injection (followscolumn_types.py/seed_source_types.pypattern)inject_into_processor_config_type_union()inPluginManagerRegexFilterProcessor— filters rows by regex pattern (process_before_batch)SemanticDedupProcessor— removes near-duplicate rows via embedding similarity (process_after_generation)Changed
PluginRegistryusesRLockinstead ofLockto prevent deadlocks when plugin imports trigger re-entryProcessorConfigTmoved fromprocessors.pytoprocessor_types.pyfor plugin injectionconfig_builder.py,data_designer_config.py,validation.pyDocs
Attention Areas
processor_types.py— New file following the_typesmodule patternregistry.py(PluginRegistry) —Lock→RLockchange to prevent deadlocks during plugin discovery with nested importsTest Plan
Description updated with AI