-
Notifications
You must be signed in to change notification settings - Fork 58
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Priority Level
Medium (Annoying but has workaround)
Describe the bug
DropColumnsProcessorConfig has two issues in notebook workflows:
- Not idempotent: Re-running add_processor with the same name but different column_names does not update the existing processor. The stale config persists.
- Cannot use <column_name>__reasoning_content to drop a reasoning column or things like "*__reasoning_content" to drop all reasoning columns at once. The validator rejects it because the literal string doesn't match any column name.
Steps/Code to reproduce bug
Issue 1: Re-running does not update config
config_builder.add_processor(
dd.DropColumnsProcessorConfig(
name="cleanup",
column_names=["col_a"],
)
)
data_designer.validate(config_builder) # OK
Now change column_names and re-run the cell:
config_builder.add_processor(
dd.DropColumnsProcessorConfig(
name="cleanup",
column_names=["col_b"], # changed
)
)
data_designer.validate(config_builder) # Now drops ["col_a", "col_b"] instead of ["col_b"] only
Issue 2:
config_builder.add_processor(
dd.DropColumnsProcessorConfig(
name="cleanup",
column_names=["col_a__reasoning_content"],
)
)
data_designer.validate(config_builder) # Error: column does not exist
Expected behavior
- Calling add_processor with the same name should replace the existing processor config (upsert), so notebook cells are safely re-runnable.
- column_names should support dropping reasoning columns
Additional context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working