Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]row_to_names for polars #1363

Merged
merged 58 commits into from
Jun 11, 2024
Merged
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
862c7dd
add make_clean_names function that can be applied to polars
Apr 19, 2024
01531cc
add examples for make_clean_names
Apr 20, 2024
0fb440e
changelog
Apr 20, 2024
5e944b2
limit import location for polars
Apr 20, 2024
501d9c6
limit import location for polars
Apr 20, 2024
9506832
fix polars in environment-dev.yml
Apr 20, 2024
1ae8edd
install polars in doctest
Apr 20, 2024
3b1829b
limit polars imports - user should have polars already installed
Apr 20, 2024
52fd80c
use subprocess.run
Apr 20, 2024
2dce78b
add subprocess.devnull to docstrings
Apr 20, 2024
37b3feb
add subprocess.devnull to docstrings
Apr 20, 2024
0953f2d
add subprocess.devnull to docstrings
Apr 20, 2024
d7c71b6
add subprocess.devnull to docstrings
Apr 20, 2024
40b8502
add os.devnull
Apr 20, 2024
4f11d09
add polars as requirement for docs
Apr 20, 2024
54b179c
add polars to tests requirements
Apr 20, 2024
25b39b9
delete irrelevant folder
Apr 20, 2024
a09f34b
changelog
Apr 20, 2024
1b375f8
create submodule for polars
Apr 21, 2024
799532f
fix doctests
Apr 21, 2024
dbce4b9
fix tests; add polars to documentation
Apr 21, 2024
1c642e6
fix tests; add polars to documentation
Apr 21, 2024
407d21b
import janitor.polars
Apr 21, 2024
aedfc65
control docs output for polars submodule
Apr 21, 2024
db9b486
exclude functions in docs rendering
Apr 21, 2024
6a91e67
exclude functions in docs rendering
Apr 21, 2024
7a88078
show_submodules=true
Apr 21, 2024
6d7885e
fix docstring rendering for polars
Apr 21, 2024
944fa02
Expression -> expression
Apr 21, 2024
b9aefaa
Merge dev into samukweku/polars_clean_names
ericmjl Apr 23, 2024
e9c370a
rename functions.py
Apr 23, 2024
e3021dd
add support for lazyframe
May 4, 2024
55b9a43
add support for lazyframe
May 4, 2024
25ea8d0
update typing to include lazyframe
May 4, 2024
5ca7581
update typing to include lazyframe
May 4, 2024
18e6be8
separate namespaces for lazyframe and eager dataframe
May 5, 2024
f50369f
separate namespaces for lazyframe and eager dataframe
May 5, 2024
1820fab
separate namespaces for lazyframe and eager dataframe
May 5, 2024
2a7b033
separate namespaces for lazyframe and eager dataframe
May 5, 2024
5f21470
make edits to docs
May 5, 2024
122e960
make edits to docs
May 5, 2024
6295526
use LazyFrame constructor
May 5, 2024
e2356c5
use LazyFrame constructor
May 5, 2024
c28260e
Merge dev into samukweku/polars_clean_names
ericmjl May 6, 2024
4d0e2ca
Merge dev into samukweku/polars_clean_names
ericmjl May 10, 2024
8a5552e
Merge dev into samukweku/polars_clean_names
ericmjl May 19, 2024
e84e62e
add row_to_names method for polars
May 20, 2024
3ca9820
keep only relevant file changes
May 20, 2024
04f0b31
fix doc fail
May 20, 2024
645e60f
fix doc fail
May 20, 2024
d7422e2
fix doc fail
May 20, 2024
cb4edb9
Merge dev into samukweku/polars_row_names
ericmjl May 23, 2024
7927ac7
Merge dev into samukweku/polars_row_names
ericmjl May 27, 2024
4561207
Merge branch 'dev' into samukweku/polars_row_names
samukweku Jun 3, 2024
a9e6259
fix conflicts
Jun 3, 2024
2eee6f1
Merge dev into samukweku/polars_row_names
ericmjl Jun 4, 2024
09e7ae3
Merge branch 'dev' into samukweku/polars_row_names
samukweku Jun 9, 2024
bd0d913
cleanup docs
Jun 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# Changelog

## [Unreleased]
- [ENH] Added a `row_to_names` method for polars. Issue #1352
- [ENH] `read_commandline` function now supports polars - Issue #1352

- [ENH] Improved performance for non-equi joins when using numba - @samukweku PR #1341
10 changes: 6 additions & 4 deletions janitor/functions/row_to_names.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Implementation of the `row_to_names` function."""

from __future__ import annotations

import warnings

import numpy as np
@@ -13,7 +15,7 @@
@deprecated_alias(row_number="row_numbers", remove_row="remove_rows")
def row_to_names(
df: pd.DataFrame,
row_numbers: int = 0,
row_numbers: int | list = 0,
remove_rows: bool = False,
remove_rows_above: bool = False,
reset_index: bool = False,
@@ -73,7 +75,7 @@ def row_to_names(
Note that indexing starts from 0. It can also be a list,
in which case, a MultiIndex column is created.
Defaults to 0 (first row).
remove_row: Whether the row(s) should be removed from the DataFrame.
remove_rows: Whether the row(s) should be removed from the DataFrame.
remove_rows_above: Whether the row(s) above the selected row should
be removed from the DataFrame.
reset_index: Whether the index should be reset on the returning DataFrame.
@@ -84,10 +86,10 @@ def row_to_names(
if not pd.options.mode.copy_on_write:
df = df.copy()

check("row_number", row_numbers, [int, list])
check("row_numbers", row_numbers, [int, list])
if isinstance(row_numbers, list):
for entry in row_numbers:
check("entry in the row_number argument", entry, [int])
check("entry in the row_numbers argument", entry, [int])

warnings.warn(
"The function row_to_names will, in the official 1.0 release, "
748 changes: 12 additions & 736 deletions janitor/polars/__init__.py

Large diffs are not rendered by default.

10 changes: 5 additions & 5 deletions janitor/polars/clean_names.py
Original file line number Diff line number Diff line change
@@ -115,11 +115,11 @@ def _strip_underscores_func_expr(

def _clean_column_names(
obj: str,
strip_underscores: str | bool = None,
case_type: str = "lower",
remove_special: bool = False,
strip_accents: bool = False,
truncate_limit: int = None,
strip_underscores: str | bool,
case_type: str,
remove_special: bool,
strip_accents: bool,
truncate_limit: int,
) -> str:
"""
Function to clean the column names of a polars DataFrame.
434 changes: 434 additions & 0 deletions janitor/polars/dataframe.py

Large diffs are not rendered by default.

93 changes: 93 additions & 0 deletions janitor/polars/expressions.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
from __future__ import annotations

from janitor.utils import import_message

from .clean_names import _clean_expr_names

try:
import polars as pl
except ImportError:
import_message(
submodule="polars",
package="polars",
conda_channel="conda-forge",
pip_install=True,
)


@pl.api.register_expr_namespace("janitor")
class PolarsExpr:
def __init__(self, expr: pl.Expr) -> pl.Expr:
self._expr = expr

def clean_names(
self,
strip_underscores: str | bool = None,
case_type: str = "lower",
remove_special: bool = False,
strip_accents: bool = False,
enforce_string: bool = False,
truncate_limit: int = None,
) -> pl.Expr:
"""
Clean the labels in a polars Expression.

Examples:
>>> import polars as pl
>>> import janitor.polars
>>> df = pl.DataFrame({"raw": ["Abçdê fgí j"]})
>>> df
shape: (1, 1)
┌─────────────┐
│ raw │
│ --- │
│ str │
╞═════════════╡
│ Abçdê fgí j │
└─────────────┘

Clean the column values:
>>> df.with_columns(pl.col("raw").janitor.clean_names(strip_accents=True))
shape: (1, 1)
┌─────────────┐
│ raw │
│ --- │
│ str │
╞═════════════╡
│ abcde_fgi_j │
└─────────────┘

!!! info "New in version 0.28.0"

Args:
strip_underscores: Removes the outer underscores
from all labels in the expression.
Default None keeps outer underscores.
Values can be either 'left', 'right'
or 'both' or the respective shorthand 'l',
'r' and True.
case_type: Whether to make the labels in the expression lower or uppercase.
Current case may be preserved with 'preserve',
while snake case conversion (from CamelCase or camelCase only)
can be turned on using "snake".
Default 'lower' makes all characters lowercase.
remove_special: Remove special characters from the values in the expression.
Only letters, numbers and underscores are preserved.
strip_accents: Whether or not to remove accents from
the expression.
enforce_string: Whether or not to cast the expression to a string type.
truncate_limit: Truncates formatted labels in the expression to
the specified length. Default None does not truncate.

Returns:
A polars Expression.
"""
return _clean_expr_names(
obj=self._expr,
strip_accents=strip_accents,
strip_underscores=strip_underscores,
case_type=case_type,
remove_special=remove_special,
enforce_string=enforce_string,
truncate_limit=truncate_limit,
)
Loading