Skip to content

feat: Add example rows to ValidationError for all rule failures#286

Draft
Copilot wants to merge 3 commits intomainfrom
copilot/enhance-validation-error-message
Draft

feat: Add example rows to ValidationError for all rule failures#286
Copilot wants to merge 3 commits intomainfrom
copilot/enhance-validation-error-message

Conversation

Copy link
Contributor

Copilot AI commented Feb 28, 2026

Validation failures only reported a row count, requiring a debugger session to identify which values caused failures—painful in long-running pipelines.

Changes

  • src/polars_plugin/validation_error.rs: Extended format_rule_failures to accept an optional examples parameter (dict[str, list[str]]). Updated RuleValidationError::to_string to include example rows for both schema-level and column-level rules.
  • src/polars_plugin/mod.rs: Updated all_rules_required polars plugin to accept data columns as additional inputs after the rule boolean columns. For each failing rule, computes up to 5 distinct example rows using AnyValue::Display and includes them in the lazy-execution error message.
  • dataframely/filter_result.py: Added public FailureInfo.examples(max_examples=5) helper method that returns distinct example rows (as formatted strings) for each failing rule.
  • dataframely/_plugin.py: Added data_columns parameter to all_rules_required; passes data columns as additional args alongside num_rule_columns kwarg.
  • dataframely/schema.py: Eager path calls failure.examples() and passes results to format_rule_failures. Lazy path passes cls.column_names() as data_columns to all_rules_required.
  • dataframely/collection/collection.py: Updated format_rule_failures call to pass failure.examples().
  • dataframely/_native.pyi: Updated type stub for format_rule_failures.
  • tests/schema/test_validate.py: Updated test_invalid_primary_key to assert examples appear in the error message for both eager and lazy paths.

Example

# Before
dataframely.exc.ValidationError: 1 rules failed validation:
 - 'primary_key' failed for 312 rows

# After (eager)
dataframely.exc.ValidationError: 1 rules failed validation:
 - 'primary_key' failed for 312 rows with 4 distinct examples: [{'user_id': 'abc', 'name': 'Alice'}, ...]

# After (column rule, eager)
dataframely.exc.ValidationError: 1 rules failed validation:
 * Column 'name' failed validation for 1 rules:
   - 'max_length' failed for 5 rows with 3 distinct examples: [{'user_id': 'x', 'name': 'toolongvalue'}, ...]

Examples are also included in lazy execution errors raised by the Rust plugin. Note that string values use double quotes in lazy errors (Rust AnyValue::Display format) versus single quotes in eager errors (Python str(dict) format).


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…and examples

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copilot AI changed the title [WIP] Improve validation error messaging for primary key failures feat: Add duplicate key details to primary_key ValidationError Feb 28, 2026
Copy link
Member

@borchero borchero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Adjust the format_rule_failures method to optionally include examples. Possibly add a helper method to FailureInfo if that is helpful.
  • Do not only provide examples for primary key violations but for any type of validation
  • Adjust the Rust plugin s.t. the same information is generated during lazy execution

…paths

Co-authored-by: borchero <22455425+borchero@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 28, 2026

  • Adjust the format_rule_failures method to optionally include examples. Possibly add a helper method to FailureInfo if that is helpful.
  • Do not only provide examples for primary key violations b...

Addressed in commit 462f8ce:

  1. format_rule_failures extended: Now accepts an optional examples: dict[str, list[str]] | None parameter. Each value is a list of pre-formatted example row strings. The examples are shown for both schema-level and column-level rules.

  2. Examples for all rule types: Added FailureInfo.examples() helper method (the suggested helper on FailureInfo) that computes distinct example rows from all data columns for every failing rule. This is used in both schema.py and collection/collection.py eager validation paths.

  3. Lazy execution (Rust plugin): Updated all_rules_required to accept data columns as additional inputs after the rule boolean columns (num_rule_columns kwarg tells it where the split is). For each failing rule, the plugin computes up to 5 distinct example rows using AnyValue::Display. The schema.py lazy path now passes cls.column_names() as data columns.

Example output:

# Eager
1 rules failed validation:
 - 'primary_key' failed for 2 rows with 2 distinct examples: [{'a': 1, 'b': 'x', 'c': '1'}, {'a': 1, 'b': 'y', 'c': '2'}]

# Lazy
1 rules failed validation for schema 'MySchema':
 - 'primary_key' failed for 2 rows with 2 distinct examples: [{'a': 1, 'b': "x", 'c': "1"}, {'a': 1, 'b': "y", 'c': "2"}]

Note: string quotes differ slightly between eager (Python str(dict) → single quotes) and lazy (Rust AnyValue::Display → double quotes), but both are readable.

Copilot AI changed the title feat: Add duplicate key details to primary_key ValidationError feat: Add example rows to ValidationError for all rule failures Feb 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants