Skip to content

Commit aa01bed

Browse files
samukwekusamukwekupre-commit-ci[bot]
authored
[ENH] Fix Coalesce to use only bfill (#1042)
* use bfill only for coalesce * use bfill only for coalesce * Update coalesce.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update coalesce.py Co-authored-by: samukweku <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 88f5479 commit aa01bed

File tree

1 file changed

+20
-13
lines changed

1 file changed

+20
-13
lines changed

janitor/functions/coalesce.py

+20-13
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,18 @@ def coalesce(
1717
) -> pd.DataFrame:
1818
"""Coalesce two or more columns of data in order of column names provided.
1919
20-
Given the list of column names, `coalesce` finds and returns the first
21-
non-missing value from these columns, for every row in the input dataframe.
22-
If all the column values are null for a particular row, then the
23-
`default_value` will be filled in.
20+
Given the variable arguments of column names,
21+
`coalesce` finds and returns the first non-missing value
22+
from these columns, for every row in the input dataframe.
23+
If all the column values are null for a particular row,
24+
then the `default_value` will be filled in.
25+
26+
If `target_column_name` is not provided,
27+
then the first column is coalesced.
2428
2529
This method does not mutate the original DataFrame.
2630
27-
Example: Using `coalesce` with 3 columns, "a", "b" and "c".
31+
Example: Use `coalesce` with 3 columns, "a", "b" and "c".
2832
2933
>>> import pandas as pd
3034
>>> import numpy as np
@@ -34,13 +38,21 @@ def coalesce(
3438
... "b": [2, 3, np.nan],
3539
... "c": [4, np.nan, np.nan],
3640
... })
41+
>>> df.coalesce("a", "b", "c")
42+
a b c
43+
0 2.0 2.0 4.0
44+
1 1.0 3.0 NaN
45+
2 NaN NaN NaN
46+
47+
Example: Provide a target_column_name.
48+
3749
>>> df.coalesce("a", "b", "c", target_column_name="new_col")
3850
a b c new_col
3951
0 NaN 2.0 4.0 2.0
4052
1 1.0 3.0 NaN 1.0
4153
2 NaN NaN NaN NaN
4254
43-
Example: Providing a default value.
55+
Example: Provide a default value.
4456
4557
>>> import pandas as pd
4658
>>> import numpy as np
@@ -93,13 +105,8 @@ def coalesce(
93105

94106
if target_column_name is None:
95107
target_column_name = column_names[0]
96-
# bfill/ffill combo is faster than combine_first
97-
outcome = (
98-
df.filter(column_names)
99-
.bfill(axis="columns")
100-
.ffill(axis="columns")
101-
.iloc[:, 0]
102-
)
108+
109+
outcome = df.filter(column_names).bfill(axis="columns").iloc[:, 0]
103110
if outcome.hasnans and (default_value is not None):
104111
outcome = outcome.fillna(default_value)
105112

0 commit comments

Comments
 (0)