Skip to content
This repository was archived by the owner on Apr 18, 2025. It is now read-only.

Conversation

RoyalTS
Copy link

@RoyalTS RoyalTS commented Nov 25, 2018

resolves #13

@multimeric
Copy link
Owner

Waiting to see if this is solved by allow_empty, as discussed in #13. Even if you need slightly different behaviour to this, I think this PR should involve tweaking the behaviour of the Column object, just the InListValidation

@multimeric multimeric added the waiting for reply Needs more feedback from someone in the issue before action can be taken label Oct 3, 2019
@deponovo
Copy link

deponovo commented Aug 4, 2020

Hi, is there a chance these modifications are going to be merged? I am also interested in skipping NaNs in my application.

@multimeric
Copy link
Owner

As I said to the original author of this PR, please try allow_empty on the parent Column first, and if that doesn't work, explain why this feature is needed and is different from allow_empty.

@deponovo
Copy link

deponovo commented Aug 4, 2020

You are right. I actually tried allow_empty, but wrongly and that's why I posted the previous question. Just for documentation in case somebody comes across the same issue. Here my wrong configuration:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10, allow_empty=True)]),
])

Here the correct one:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10)], allow_empty=True),
])

Now it works as per the referred needs.

@Natalie-Caruana
Copy link

Natalie-Caruana commented Nov 3, 2020

Hi, I'm also having some issues with missing values when implementing InListValidation. Test example below:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation
schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('TestAllowEmpty', [InListValidation([0, 1, 2])],allow_empty=True),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])
test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,TestAllowEmpty,Customer ID
Gerald ,Hampton,82,0,2582GABK
Yuuwa,Miyake,270,1,7951WVLW
Edyta,Majewska ,50,2,775ANSID
'''))
test_data.at[0,'TestAllowEmpty'] = pd.NA

errors = schema.validate(test_data)
for error in errors:
    print(error)

I get the following error
AttributeError: Can only use .str accessor with string values!

Since the "TestAllowEmpty" column has missing value, test_data["TestAllowEmpty"].dtypes = dtype('O'), hence neither a categorical dtype nor a numeric dtype. So the validation source code validated = (series.str.len() > 0) & simple_validation is raising an error since entries are not strings.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
waiting for reply Needs more feedback from someone in the issue before action can be taken
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ignore NaN values in validation
4 participants