feat: rename `to_parquet` to `write_parquet` #6909

bingbong-sempai · 2023-08-20T17:20:45Z

Is your feature request related to a problem?

No response

Describe the solution you'd like

I know pairing read_ and to_ functions follows Pandas' style but I think read_ and write_ makes more sense (it's also what polars uses).
For example, read_parquet and write_parquet for reading/writing to disk, from_pandas and to_pandas for reading/writing to memory.

What version of ibis are you running?

6.1.0

What backend(s) are you using, if any?

No response

Code of Conduct

I agree to follow this project's Code of Conduct

lostmygithubaccount · 2023-08-20T18:00:14Z

my personal opinion is not strictly opposed. though I don't like having a lot of aliases for methods, perhaps this is a good case (and thus wouldn't need to be a breaking change)

bingbong-sempai · 2023-08-20T23:50:41Z

yeah i agree that too many aliases is a bad thing. it's just that pairing read_ and write_ (same with from_ and to_) feels like the "correct" thing to do.

gforsyth · 2023-08-22T15:29:40Z

I agree that there's a nice consistency in using read_ and write_ for file-based things and from_ and to_ for memory-based things.
It's also going to break a lot of users. I'm not sure that's worth it. It also potentially leaves users thinking that we don't support writing to various filetypes because they're expecting it under to_ and not tab-completing out to write_.

There's also the issue that depending on the backend, we're sometimes uploading a file to, say, Snowflake, which isn't a cheap operation, vs. creating a view based only on parquet metadata, with DuckDB, which IS cheap.

Polars also has a scan_parquet method to distinguish this kind of behavior, but if we follow that convention we'd end up with a different API for a given backend, because one would only support read_ and the other would only support scan_.

Overall, I'm inclined to leave things as they are.

I am a -1 on adding an alias.

bingbong-sempai · 2023-08-22T23:11:44Z

that's unfortunate, i was hoping ibis would be able to make API changes that pandas couldn't

bingbong-sempai · 2023-08-22T23:27:31Z

an option can be to alias read_ to from_ instead?
from_ can cover both read and scan operations.
that way io could pair from_ and to_.

gforsyth · 2023-08-23T15:30:01Z

Hey @bingbong-sempai -- I think we're going to stick with our current convention. Most users are probably coming from pandas and there are other libraries that follow the convention. There's also a benefit in seeing all of the targets for commands like read_ in one block when using tab-completion.
Hopefully you find other parts of Ibis to enjoy and this API choice isn't a blocker for you.

Going to close this out -- thanks!

lostmygithubaccount · 2023-08-23T15:35:01Z

just to add on a bit -- write makes sense for files (CSV, parquet) but less sense for other things Ibis can output (PyArrow tables, PyArrow batches, pandas, torch, etc.)

bingbong-sempai · 2024-04-25T02:16:03Z

Just to add, both Polars and DuckDB use write_parquet and not to_parquet.

lostmygithubaccount · 2024-04-25T03:10:28Z

not sure how we feel about it in general as the Ibis project, but I'm personally not opposed (similar to my earlier comment) to adding aliases for common spellings of existing APIs -- e.g. a write_parquet method that perhaps we don't even put on the website (or do and note it's just an alias of to_parquet)

I'll reopen this so it shows in our issue triage and we can discuss

kszucs · 2024-04-26T17:15:55Z

My preference is to use write_<parquet|csv|...> which triggers computation. I would like to have lazy counterparts eventually which could use to_<format> prefix but no strong opinions there. I think methods triggering actions should be verbs.

jcrist · 2024-04-26T17:21:50Z

I would like to have lazy counterparts eventually which could use to_<format> prefix but no strong opinions there.

Note that we already have to_pandas/to_pyarrow/to_polars which run eagerly and return an in memory result. I think those names are good, but mixing to_* methods that don't run eagerly in with them would be confusing IMO.

I'm not opposed to a convention of:

write_*: eager methods that compute and write the output somewhere
to_*: eager methods that compute and return some in-memory object

That said, I don't find the current convention of putting both of these under to_* bad. A user can always do expr.to_<TAB> to see all the things they can convert an expression into.

The only thing I'm against is having any aliases where both expr.to_parquet and expr.write_parquet exist and do the same thing. As a user I would find that confusing and would spend a bunch of time trying to figure out if/how the expr.to_* and expr.write_* methods differed. (to be clear, short lived aliases for deprecation/migration are fine).

NickCrews · 2025-01-19T15:45:21Z

I think the fraction of data formats is really small for which in-memory and on-disk format really makes sense. Eg I could see both a write_json() and a to_json(), but I don't see how an in-memory representation of a CSV makes sense, you always are going to write this to disk. I think this is the case for most of our formats (parquet, delta, csv). Therefore, since for most formats we would either have a to_format() or a write_format() (and not both), I think it is not going to be obvious enough what the distinction is between the two, and mostly it's just going to appear inconsistent to users.

I think (no evidence) that most users when they see to_csv() they understand that something is getting written to disk and so some computation is happening. Being consistent with duckdb and Polars would be nice, but I'd rather be internally consistent and simple.

I think my preferred solution would be to_format(..., path: str | Path | None = NOT_SET, ...), where if the user explicitly passes a path of None, then the result gets returned in-memory (if possible for that format). The default should be a NOT_SET sentinel value so that if they forget to pass something explicitly they get an error.

NickCrews · 2025-01-19T15:49:02Z

I vote no adding alias write_ methods. Keep our code simple.

I vote on closing this out as not planned for now.

bingbong-sempai added the feature Features or general enhancements label Aug 20, 2023

gforsyth closed this as completed Aug 23, 2023

lostmygithubaccount closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2023

lostmygithubaccount reopened this Apr 25, 2024

lostmygithubaccount added this to Ibis planning and roadmap Apr 25, 2024

github-project-automation bot moved this to backlog in Ibis planning and roadmap Apr 25, 2024

gforsyth added the needs love Issues that need concerted focus and effort to address holistically label Apr 26, 2024

kszucs self-assigned this Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: rename `to_parquet` to `write_parquet` #6909

feat: rename `to_parquet` to `write_parquet` #6909

bingbong-sempai commented Aug 20, 2023

lostmygithubaccount commented Aug 20, 2023

bingbong-sempai commented Aug 20, 2023

gforsyth commented Aug 22, 2023

bingbong-sempai commented Aug 22, 2023

bingbong-sempai commented Aug 22, 2023 •

edited

Loading

gforsyth commented Aug 23, 2023

lostmygithubaccount commented Aug 23, 2023

bingbong-sempai commented Apr 25, 2024

lostmygithubaccount commented Apr 25, 2024 •

edited

Loading

kszucs commented Apr 26, 2024

jcrist commented Apr 26, 2024

NickCrews commented Jan 19, 2025

NickCrews commented Jan 19, 2025 •

edited

Loading

feat: rename to_parquet to write_parquet #6909

feat: rename to_parquet to write_parquet #6909

Comments

bingbong-sempai commented Aug 20, 2023

Is your feature request related to a problem?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

lostmygithubaccount commented Aug 20, 2023

bingbong-sempai commented Aug 20, 2023

gforsyth commented Aug 22, 2023

bingbong-sempai commented Aug 22, 2023

bingbong-sempai commented Aug 22, 2023 • edited Loading

gforsyth commented Aug 23, 2023

lostmygithubaccount commented Aug 23, 2023

bingbong-sempai commented Apr 25, 2024

lostmygithubaccount commented Apr 25, 2024 • edited Loading

kszucs commented Apr 26, 2024

jcrist commented Apr 26, 2024

NickCrews commented Jan 19, 2025

NickCrews commented Jan 19, 2025 • edited Loading

feat: rename `to_parquet` to `write_parquet` #6909

feat: rename `to_parquet` to `write_parquet` #6909

bingbong-sempai commented Aug 22, 2023 •

edited

Loading

lostmygithubaccount commented Apr 25, 2024 •

edited

Loading

NickCrews commented Jan 19, 2025 •

edited

Loading