Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TST/ENH: Enabel encode_categorical handle 2 (or more ) dimensions array #1153

Merged
merged 3 commits into from
Aug 19, 2022

Conversation

Zeroto521
Copy link
Member

@Zeroto521 Zeroto521 commented Aug 19, 2022

PR Description

Please describe the changes proposed in the pull request:

test_categories_ndim_array_gt_1_in_kwargs should raise error in

arr_ndim = value.ndim
if (arr_ndim != 1) or isinstance(value, pd.MultiIndex):
raise ValueError(
f"{value} is not a 1-D array. "
"Kindly provide a 1-D array-like object."
)

when input is array = [[1, 1, 2, 2], ["red", "blue", "red", "blue"]] the ndim of pd.Index(array) is 1 not 2.
It's better to convert ndarray object first.

__________________ test_categories_ndim_array_gt_1_in_kwargs ___________________
[gw0] linux -- Python 3.10.5 /usr/share/miniconda3/envs/test/bin/python

df_checks =                        region  2007  2009
0                     Pacific  1039  2587
1                   Southwest    51   176
2  Rocky Mountains and Plains   200   338

    def test_categories_ndim_array_gt_1_in_kwargs(df_checks):
        """
        Raise ValueError if categories is provided, but is not a 1D array.
        """
        arrays = [[1, 1, 2, 2], ["red", "blue", "red", "blue"]]
        with pytest.raises(ValueError):
>           df_checks.encode_categorical(region=arrays)

tests/functions/test_encode_categorical.py:125: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/pandas_flavor/register.py:29: in __call__
    return method(self._obj, *args, **kwargs)
janitor/utils.py:283: in wrapper
    return func(*args, **kwargs)
janitor/functions/encode_categorical.py:121: in encode_categorical
    return _computations_as_categorical(df, **kwargs)
janitor/functions/encode_categorical.py:136: in _computations_as_categorical
    categories_dict = _as_categorical_checks(df, **kwargs)
janitor/functions/encode_categorical.py:211: in _as_categorical_checks
    if not value.is_unique:
pandas/_libs/properties.pyx:37: in pandas._libs.properties.CachedProperty.__get__
    ???
/usr/share/miniconda3/envs/test/lib/python3.10/site-packages/pandas/core/indexes/base.py:2237: in is_unique
    return self._engine.is_unique
pandas/_libs/index.pyx:223: in pandas._libs.index.IndexEngine.is_unique.__get__
    ???
pandas/_libs/index.pyx:230: in pandas._libs.index.IndexEngine._do_unique_check
    ???
pandas/_libs/index.pyx:287: in pandas._libs.index.IndexEngine._ensure_mapping_populated
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: unhashable type: 'list'

pandas/_libs/hashtable_class_helper.pxi:5231: TypeError

This PR resolves #1143.

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@@ -191,10 +192,9 @@ def _as_categorical_checks(df: pd.DataFrame, **kwargs) -> dict:
raise TypeError(f"{value} should be list-like or a string.")
if is_list_like(value):
if not hasattr(value, "shape"):
value = pd.Index([*value])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will convert value to pd.Index again in line 203.

Comment on lines +5 to 8
import numpy as np
import pandas as pd
import pandas_flavor as pf
from pandas.api.types import is_list_like
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lint via isort

@codecov
Copy link

codecov bot commented Aug 19, 2022

Codecov Report

Merging #1153 (f0ad5c2) into dev (ae01b7d) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##              dev    #1153   +/-   ##
=======================================
  Coverage   97.34%   97.34%           
=======================================
  Files          77       77           
  Lines        3240     3240           
=======================================
  Hits         3154     3154           
  Misses         86       86           


arr_ndim = value.ndim
if (arr_ndim != 1) or isinstance(value, pd.MultiIndex):
if (value.ndim != 1) or isinstance(value, pd.MultiIndex):
Copy link
Member Author

@Zeroto521 Zeroto521 Aug 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arr_ndim only comes once, so use value.ndim directly

@ericmjl ericmjl merged commit a027753 into pyjanitor-devs:dev Aug 19, 2022
@Zeroto521 Zeroto521 deleted the test/test_encode_categorical branch August 26, 2022 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants