allow InListValidation to ignore missings #19

RoyalTS · 2018-11-25T08:10:35Z

resolves #13

multimeric · 2018-11-25T12:41:01Z

Waiting to see if this is solved by allow_empty, as discussed in #13. Even if you need slightly different behaviour to this, I think this PR should involve tweaking the behaviour of the Column object, just the InListValidation

deponovo · 2020-08-04T08:09:29Z

Hi, is there a chance these modifications are going to be merged? I am also interested in skipping NaNs in my application.

multimeric · 2020-08-04T08:16:53Z

As I said to the original author of this PR, please try allow_empty on the parent Column first, and if that doesn't work, explain why this feature is needed and is different from allow_empty.

deponovo · 2020-08-04T10:12:03Z

You are right. I actually tried allow_empty, but wrongly and that's why I posted the previous question. Just for documentation in case somebody comes across the same issue. Here my wrong configuration:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10, allow_empty=True)]),
])

Here the correct one:

my_schema = Schema([
    Column("a", [InRangeValidation(min=-1, max=10)], allow_empty=True),
])

Now it works as per the referred needs.

Natalie-Caruana · 2020-11-03T21:24:53Z

Hi, I'm also having some issues with missing values when implementing InListValidation. Test example below:

import pandas as pd
from io import StringIO
from pandas_schema import Column, Schema
from pandas_schema.validation import LeadingWhitespaceValidation, TrailingWhitespaceValidation, CanConvertValidation, MatchesPatternValidation, InRangeValidation, InListValidation
schema = Schema([
    Column('Given Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Family Name', [LeadingWhitespaceValidation(), TrailingWhitespaceValidation()]),
    Column('Age', [InRangeValidation(0, 120)]),
    Column('TestAllowEmpty', [InListValidation([0, 1, 2])],allow_empty=True),
    Column('Customer ID', [MatchesPatternValidation(r'\d{4}[A-Z]{4}')])
])
test_data = pd.read_csv(StringIO('''Given Name,Family Name,Age,TestAllowEmpty,Customer ID
Gerald ,Hampton,82,0,2582GABK
Yuuwa,Miyake,270,1,7951WVLW
Edyta,Majewska ,50,2,775ANSID
'''))
test_data.at[0,'TestAllowEmpty'] = pd.NA

errors = schema.validate(test_data)
for error in errors:
    print(error)

I get the following error
AttributeError: Can only use .str accessor with string values!

Since the "TestAllowEmpty" column has missing value, test_data["TestAllowEmpty"].dtypes = dtype('O'), hence neither a categorical dtype nor a numeric dtype. So the validation source code validated = (series.str.len() > 0) & simple_validation is raising an error since entries are not strings.

RoyalTS added 2 commits November 25, 2018 09:09

allow InListValidation to ignore missings

25aa064

add type annotation

f306a15

multimeric added the waiting for reply Needs more feedback from someone in the issue before action can be taken label Oct 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

allow InListValidation to ignore missings #19

allow InListValidation to ignore missings #19

Uh oh!

RoyalTS commented Nov 25, 2018

Uh oh!

multimeric commented Nov 25, 2018

Uh oh!

deponovo commented Aug 4, 2020

Uh oh!

multimeric commented Aug 4, 2020

Uh oh!

deponovo commented Aug 4, 2020

Uh oh!

Natalie-Caruana commented Nov 3, 2020 •

edited

Loading

Uh oh!

Uh oh!

allow InListValidation to ignore missings #19

Are you sure you want to change the base?

allow InListValidation to ignore missings #19

Uh oh!

Conversation

RoyalTS commented Nov 25, 2018

Uh oh!

multimeric commented Nov 25, 2018

Uh oh!

deponovo commented Aug 4, 2020

Uh oh!

multimeric commented Aug 4, 2020

Uh oh!

deponovo commented Aug 4, 2020

Uh oh!

Natalie-Caruana commented Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Natalie-Caruana commented Nov 3, 2020 •

edited

Loading