Skip to content

Commit d3146bb

Browse files
samukwekusamukwekuthatlittleboy
authored
MWE - complete, process_text, expand_grid (#1013)
* mwe * mwe * mwe * mwe * Update janitor/functions/complete.py Co-authored-by: Jeremy Goh <[email protected]> * Update janitor/functions/expand_grid.py Co-authored-by: Jeremy Goh <[email protected]> * Update janitor/functions/process_text.py Co-authored-by: Jeremy Goh <[email protected]> * Update janitor/functions/process_text.py Co-authored-by: Jeremy Goh <[email protected]> * Update janitor/functions/process_text.py Co-authored-by: Jeremy Goh <[email protected]> * mwe * mwe Co-authored-by: samukweku <[email protected]> Co-authored-by: Jeremy Goh <[email protected]>
1 parent e6e0da0 commit d3146bb

File tree

3 files changed

+141
-118
lines changed

3 files changed

+141
-118
lines changed

janitor/functions/complete.py

+59-34
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ def complete(
1919
) -> pd.DataFrame:
2020
"""
2121
It is modeled after tidyr's `complete` function, and is a wrapper around
22-
`expand_grid` and `pd.merge`.
22+
[`expand_grid`][janitor.functions.expand_grid.expand_grid] and `pd.merge`.
2323
2424
Combinations of column names or a list/tuple of column names, or even a
2525
dictionary of column names and new values are possible.
@@ -28,39 +28,64 @@ def complete(
2828
2929
MultiIndex columns are not supported.
3030
31-
Functional usage syntax:
32-
33-
```python
34-
35-
import pandas as pd
36-
import janitor as jn
37-
38-
df = pd.DataFrame(...)
39-
40-
df = jn.complete(
41-
df = df,
42-
column_label,
43-
(column1, column2, ...),
44-
{column1: new_values, ...},
45-
by = label/list_of_labels
46-
)
47-
```
48-
49-
Method chaining syntax:
50-
51-
```python
52-
53-
df = (
54-
pd.DataFrame(...)
55-
.complete(
56-
column_label,
57-
(column1, column2, ...),
58-
{column1: new_values, ...},
59-
by = label/list_of_labels
60-
)
61-
```
62-
63-
:param df: A pandas dataframe.
31+
Example:
32+
33+
>>> import pandas as pd
34+
>>> import janitor
35+
>>> df = pd.DataFrame(
36+
... {
37+
... "Year": [1999, 2000, 2004, 1999, 2004],
38+
... "Taxon": [
39+
... "Saccharina",
40+
... "Saccharina",
41+
... "Saccharina",
42+
... "Agarum",
43+
... "Agarum",
44+
... ],
45+
... "Abundance": [4, 5, 2, 1, 8],
46+
... }
47+
... )
48+
>>> df
49+
Year Taxon Abundance
50+
0 1999 Saccharina 4
51+
1 2000 Saccharina 5
52+
2 2004 Saccharina 2
53+
3 1999 Agarum 1
54+
4 2004 Agarum 8
55+
56+
Expose missing pairings of `Year` and `Taxon`:
57+
58+
>>> df.complete("Year", "Taxon", sort = True)
59+
Year Taxon Abundance
60+
0 1999 Agarum 1.0
61+
1 1999 Saccharina 4.0
62+
2 2000 Agarum NaN
63+
3 2000 Saccharina 5.0
64+
4 2004 Agarum 8.0
65+
5 2004 Saccharina 2.0
66+
67+
Expose missing years from 1999 to 2004 :
68+
69+
>>> df.complete(
70+
... {"Year": range(df.Year.min(), df.Year.max() + 1)},
71+
... "Taxon",
72+
... sort=True,
73+
... )
74+
Year Taxon Abundance
75+
0 1999 Agarum 1.0
76+
1 1999 Saccharina 4.0
77+
2 2000 Agarum NaN
78+
3 2000 Saccharina 5.0
79+
4 2001 Agarum NaN
80+
5 2001 Saccharina NaN
81+
6 2002 Agarum NaN
82+
7 2002 Saccharina NaN
83+
8 2003 Agarum NaN
84+
9 2003 Saccharina NaN
85+
10 2004 Agarum 8.0
86+
11 2004 Saccharina 2.0
87+
88+
:param df: A pandas DataFrame.
6489
:param *columns: This refers to the columns to be
6590
completed. It could be column labels (string type),
6691
a list/tuple of column labels, or a dictionary that pairs

janitor/functions/expand_grid.py

+38-38
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def expand_grid(
2626
2727
2828
Data types are preserved in this function,
29-
including Pandas' extension array dtypes.
29+
including pandas' extension array dtypes.
3030
3131
The output will always be a DataFrame, usually a MultiIndex,
3232
with the keys of the `others` dictionary serving as
@@ -36,41 +36,43 @@ def expand_grid(
3636
`others`, the columns are flattened, before the final
3737
cartesian DataFrame is generated.
3838
39-
If a Pandas Series/DataFrame is passed, and has a labeled index, or
39+
If a pandas Series/DataFrame is passed, and has a labeled index, or
4040
a MultiIndex index, the index is discarded; the final DataFrame
4141
will have a RangeIndex.
4242
4343
The MultiIndexed DataFrame can be flattened using pyjanitor's
44-
`collapse_levels` method; the user can also decide to drop any of the
45-
levels, via Pandas' `droplevel` method.
46-
47-
Functional usage syntax:
48-
49-
```python
50-
51-
import pandas as pd
52-
import janitor as jn
53-
54-
df = pd.DataFrame(...)
55-
df = jn.expand_grid(df=df, df_key="...", others={...})
56-
```
57-
58-
Method-chaining usage syntax:
59-
60-
```python
61-
import pandas as pd
62-
import janitor as jn
63-
64-
df = pd.DataFrame(...).expand_grid(df_key="bla",others={...})
65-
```
66-
67-
Usage independent of a DataFrame
68-
69-
```python
70-
import pandas as pd
71-
from janitor import expand_grid
72-
73-
df = expand_grid(others = {"x":range(1,4), "y":[1,2]})
44+
[`collapse_levels`][janitor.functions.collapse_levels.collapse_levels]
45+
method; the user can also decide to drop any of the levels, via pandas'
46+
`droplevel` method.
47+
48+
Example:
49+
50+
>>> import pandas as pd
51+
>>> import janitor as jn
52+
>>> df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
53+
>>> data = {"z": [1, 2, 3]}
54+
>>> df.expand_grid(df_key="df", others=data)
55+
df z
56+
x y 0
57+
0 1 2 1
58+
1 1 2 2
59+
2 1 2 3
60+
3 2 1 1
61+
4 2 1 2
62+
5 2 1 3
63+
64+
Expand_grid works with non-pandas objects:
65+
66+
>>> data = {"x": [1, 2, 3], "y": [1, 2]}
67+
>>> jn.expand_grid(others=data)
68+
x y
69+
0 0
70+
0 1 1
71+
1 1 2
72+
2 2 1
73+
3 2 2
74+
4 3 1
75+
5 3 2
7476
7577
:param df: A pandas DataFrame.
7678
:param df_key: name of key for the dataframe.
@@ -97,12 +99,10 @@ def expand_grid(
9799

98100
if not df_key:
99101
raise KeyError(
100-
"""
101-
Using `expand_grid` as part of a
102-
DataFrame method chain requires that
103-
a string argument be provided for
104-
the `df_key` parameter.
105-
"""
102+
"Using `expand_grid` as part of a "
103+
"DataFrame method chain requires that "
104+
"a string argument be provided for "
105+
"the `df_key` parameter. "
106106
)
107107

108108
check("df_key", df_key, [str])

janitor/functions/process_text.py

+44-46
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ def process_text(
1111
df: pd.DataFrame,
1212
column_name: str,
1313
string_function: str,
14-
**kwargs: str,
14+
**kwargs,
1515
) -> pd.DataFrame:
1616
"""
1717
Apply a Pandas string method to an existing column.
@@ -21,52 +21,52 @@ def process_text(
2121
along with keyword arguments, if any, to the function.
2222
2323
This modifies an existing column; it does not create a new column;
24-
new columns can be created via pyjanitor's `transform_columns`.
25-
26-
27-
A list of all the string methods in Pandas can be accessed [here](https://pandas.pydata.org/docs/user_guide/text.html#method-summary)
28-
29-
30-
Functional usage syntax:
31-
32-
```python
33-
import pandas as pd
34-
import janitor as jn
35-
36-
df = pd.DataFrame(...)
37-
df = jn.process_text(
38-
df = df,
39-
column_name,
40-
string_function = "string_func_name_here",
41-
kwargs
42-
)
43-
```
44-
45-
Method-chaining usage syntax:
46-
47-
```python
48-
49-
import pandas as pd
50-
import janitor as jn
51-
52-
df = (
53-
pd.DataFrame(...)
54-
.process_text(
55-
column_name,
56-
string_function = "string_func_name_here",
57-
kwargs
58-
)
59-
)
60-
```
61-
24+
new columns can be created via pyjanitor's
25+
[`transform_columns`][janitor.functions.transform_columns.transform_columns].
26+
27+
28+
A list of all the string methods in Pandas can be accessed [here](https://pandas.pydata.org/docs/user_guide/text.html#method-summary).
29+
30+
31+
Example:
32+
33+
>>> import pandas as pd
34+
>>> import janitor
35+
>>> import re
36+
>>> df = pd.DataFrame({"text": ["Ragnar", "sammywemmy", "ginger"],
37+
... "code": [1, 2, 3]})
38+
>>> df
39+
text code
40+
0 Ragnar 1
41+
1 sammywemmy 2
42+
2 ginger 3
43+
>>> df.process_text(column_name="text", string_function="lower")
44+
text code
45+
0 ragnar 1
46+
1 sammywemmy 2
47+
2 ginger 3
48+
49+
For string methods with parameters, simply pass the keyword arguments:
50+
51+
>>> df.process_text(
52+
... column_name="text",
53+
... string_function="extract",
54+
... pat=r"(ag)",
55+
... expand=False,
56+
... flags=re.IGNORECASE,
57+
... )
58+
text code
59+
0 ag 1
60+
1 NaN 2
61+
2 NaN 3
6262
6363
:param df: A pandas DataFrame.
64-
:param column_name: string column to be operated on.
64+
:param column_name: String column to be operated on.
6565
:param string_function: pandas string method to be applied.
6666
:param kwargs: Keyword arguments for parameters of the `string_function`.
6767
:returns: A pandas DataFrame with modified column.
68-
:raises KeyError: if ``string_function`` is not a Pandas string method.
69-
:raises ValueError: if the text function returns a DataFrame, instead of a Series.
68+
:raises KeyError: If `string_function` is not a Pandas string method.
69+
:raises ValueError: If the text function returns a DataFrame, instead of a Series.
7070
""" # noqa: E501
7171

7272
check("column_name", column_name, [str])
@@ -86,10 +86,8 @@ def process_text(
8686

8787
if isinstance(result, pd.DataFrame):
8888
raise ValueError(
89-
"""
90-
The outcome of the processed text is a DataFrame,
91-
which is not supported in `process_text`.
92-
"""
89+
"The outcome of the processed text is a DataFrame, "
90+
"which is not supported in `process_text`."
9391
)
9492

9593
return df.assign(**{column_name: result})

0 commit comments

Comments
 (0)