Skip to content

Commit 2dff4f6

Browse files
authored
read_commandline supports polars engine (#1356)
Enable `read_commandline` into a `polars` dataframe.
1 parent a672fef commit 2dff4f6

File tree

2 files changed

+23
-6
lines changed

2 files changed

+23
-6
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Changelog
22

33
## [Unreleased]
4+
- [ENH] `read_commandline` function now supports polars - Issue #1352
45

56
- [ENH] Improved performance for non-equi joins when using numba - @samukweku PR #1341
67
- [ENH] Added a `clean_names` method for polars - it can be used to clean the column names, or clean column values . Issue #1343 @samukweku

janitor/io.py

+22-6
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ def read_csvs(
9393
return dfs_dict
9494

9595

96-
def read_commandline(cmd: str, **kwargs: Any) -> pd.DataFrame:
96+
def read_commandline(cmd: str, engine="pandas", **kwargs: Any) -> Mapping:
9797
"""Read a CSV file based on a command-line command.
9898
9999
For example, you may wish to run the following command on `sep-quarter.csv`
@@ -111,26 +111,42 @@ def read_commandline(cmd: str, **kwargs: Any) -> pd.DataFrame:
111111
```
112112
113113
This function assumes that your command line command will return
114-
an output that is parsable using `pandas.read_csv` and StringIO.
115-
We default to using `pd.read_csv` underneath the hood.
116-
Keyword arguments are passed through to read_csv.
114+
an output that is parsable using the relevant engine and StringIO.
115+
This function defaults to using `pd.read_csv` underneath the hood.
116+
Keyword arguments are passed through as-is.
117117
118118
Args:
119119
cmd: Shell command to preprocess a file on disk.
120+
engine: DataFrame engine to process the output of the shell command.
121+
Currently supports both pandas and polars.
120122
**kwargs: Keyword arguments that are passed through to
121-
`pd.read_csv()`.
123+
the engine's csv reader.
124+
122125
123126
Returns:
124-
A pandas DataFrame parsed from the stdout of the underlying
127+
A DataFrame parsed from the stdout of the underlying
125128
shell.
126129
"""
127130

128131
check("cmd", cmd, [str])
132+
if engine not in {"pandas", "polars"}:
133+
raise ValueError("engine should be either pandas or polars.")
129134
# adding check=True ensures that an explicit, clear error
130135
# is raised, so that the user can see the reason for the failure
131136
outcome = subprocess.run(
132137
cmd, shell=True, capture_output=True, text=True, check=True
133138
)
139+
if engine == "polars":
140+
try:
141+
import polars as pl
142+
except ImportError:
143+
import_message(
144+
submodule="polars",
145+
package="polars",
146+
conda_channel="conda-forge",
147+
pip_install=True,
148+
)
149+
return pl.read_csv(StringIO(outcome.stdout), **kwargs)
134150
return pd.read_csv(StringIO(outcome.stdout), **kwargs)
135151

136152

0 commit comments

Comments
 (0)