-
Notifications
You must be signed in to change notification settings - Fork 871
Arrow X Datasets Blog #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Arrow X Datasets Blog #1283
Conversation
First commit for the arrow-datasets post
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome ! Just a few comments on the code blocks formatting and suggestions :)
```python | ||
python = dset.map(lambda table: table.filter(pc.field("lang") == "Python"), batched=True, batch_size=500_000, num_proc=10) | ||
print(f'{python.num_rows:,}', "rows") | ||
>>> 250,000 rows | ||
CPU times: user 818 ms, sys: 255 ms, total: 1.07 s | ||
Wall time: 3.46 s | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
python = dset.map(lambda table: table.filter(pc.field("lang") == "Python"), batched=True, batch_size=500_000, num_proc=10) | |
print(f'{python.num_rows:,}', "rows") | |
>>> 250,000 rows | |
CPU times: user 818 ms, sys: 255 ms, total: 1.07 s | |
Wall time: 3.46 s | |
``` | |
```python | |
>>> %time python = dset.map(lambda table: table.filter(pc.field("lang") == "Python"), batched=True, batch_size=500_000, num_proc=10) | |
CPU times: user 818 ms, sys: 255 ms, total: 1.07 s | |
Wall time: 3.46 s | |
>>> print(f'{python.num_rows:,}', "rows") | |
250,000 rows | |
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use the standard lib for measuring time instead of the magic commands so that this code works in the (standard) REPL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some comments:
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Mario Šaško <[email protected]>
Co-authored-by: Julien Chaumond <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
Co-authored-by: Mario Šaško <[email protected]>
Co-authored-by: Quentin Lhoest <[email protected]>
…akiki-arrow-datasets
No description provided.