-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add pandas-path integration #135
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
try: | ||
from pandas_path.accessor import register_path_accessor | ||
except ImportError: | ||
raise ImportError("To use the .cloud accessor, you must pip install pandas-path.") | ||
|
||
from ...cloudpath import CloudPath | ||
|
||
|
||
register_path_accessor("cloud", CloudPath) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,6 +12,7 @@ mkdocs-material>=7 | |
mkdocstrings>=0.15 | ||
mypy | ||
pandas | ||
pandas-path>=0.3.0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wonder if it makes sense to add a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's kind of conceptually nice to have the extras be focused on cloud providers since that's the major differentiator. |
||
pillow | ||
pydantic | ||
pytest | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
from cloudpathlib.pandas import cloud # noqa | ||
|
||
import pandas as pd | ||
|
||
|
||
def test_joins(rig): | ||
s = pd.Series( | ||
[ | ||
f"{rig.cloud_prefix}bucket/a/b/c.txt", | ||
f"{rig.cloud_prefix}bucket/a/b/c", | ||
f"{rig.cloud_prefix}bucket/a/d/e.txt", | ||
] | ||
) | ||
|
||
# make sure we don't register the default `path` accessor from pandas-path | ||
assert not hasattr(s, "path") | ||
|
||
# test path manipulations | ||
assert s.cloud.name.tolist() == ["c.txt", "c", "e.txt"] | ||
assert s.cloud.stem.tolist() == ["c", "c", "e"] | ||
assert s.cloud.parent.tolist() == [ | ||
f"{rig.cloud_prefix}bucket/a/b", | ||
f"{rig.cloud_prefix}bucket/a/b", | ||
f"{rig.cloud_prefix}bucket/a/d", | ||
] | ||
|
||
# test cloud specific methods | ||
if hasattr(rig.path_class, "bucket"): | ||
assert s.cloud.bucket.tolist() == ["bucket"] * 3 | ||
elif hasattr(rig.path_class, "container"): | ||
assert s.cloud.container.tolist() == ["bucket"] * 3 | ||
|
||
# test joins work as expected | ||
s = pd.Series( | ||
[ | ||
f"{rig.cloud_prefix}bucket/a/b", | ||
f"{rig.cloud_prefix}bucket/a/c", | ||
f"{rig.cloud_prefix}bucket/a/d", | ||
] | ||
) | ||
|
||
assert (s.cloud / ["file1.txt", "file2.txt", "file3.txt"]).tolist() == [ | ||
f"{rig.cloud_prefix}bucket/a/b/file1.txt", | ||
f"{rig.cloud_prefix}bucket/a/c/file2.txt", | ||
f"{rig.cloud_prefix}bucket/a/d/file3.txt", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the extra module layer, instead of just
import cloudpathlib.pandas
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to mirror the the pandas-path import which looks like
from pandas_path import path
. I think it's nice to (1) make the structure similar, and (2) see the name of the accessor when you do the import so you can see how they might get tied together.