Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazy validation doesn't save data in SchemaError.data #1895

Open
2 of 3 tasks
dantheand opened this issue Jan 9, 2025 · 1 comment
Open
2 of 3 tasks

Lazy validation doesn't save data in SchemaError.data #1895

dantheand opened this issue Jan 9, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@dantheand
Copy link
Contributor

Describe the bug

When doing lazy validation, pandera bubbles up the first SchemaError observed. However, in the error handling function, the error data is explicitly deleted (for the case of not duplicating data when lazy=True).

The result is that in the case of lazy=False validations, the error message does not include the input dataframe.

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandera.
  • (optional) I have confirmed this bug exists on the main branch of pandera.

Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

Minimal test that should pass when the bug is fixed:

import pandera
from pandera.typing import Series

class MySchema(pandera.DataFrameModel):
    col = Series[int]

invalid_df =  pd.DataFrame({"col": ["invalid"]})

try:
    MySchema.validate(invalid_df)
except SchemaError as e:
    assert e.data is not None

Expected behavior

e.data should have the invalid_df dataframe in it

Desktop (please complete the following information):

  • Python: 3.9
  • pandera: 0.22.1
@dantheand dantheand added the bug Something isn't working label Jan 9, 2025
@cosmicBboy
Copy link
Collaborator

Looks like there's a bug in the example code:

class MySchema(pandera.DataFrameModel):
    col: Series[int]

it should be col: Series[int], not col = Series[int]

The assertion passes when I run it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants