-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: rename to_parquet
to write_parquet
#6909
Comments
my personal opinion is not strictly opposed. though I don't like having a lot of aliases for methods, perhaps this is a good case (and thus wouldn't need to be a breaking change) |
yeah i agree that too many aliases is a bad thing. it's just that pairing |
I agree that there's a nice consistency in using There's also the issue that depending on the backend, we're sometimes uploading a file to, say, Snowflake, which isn't a cheap operation, vs. creating a view based only on parquet metadata, with DuckDB, which IS cheap. Polars also has a Overall, I'm inclined to leave things as they are. I am a -1 on adding an alias. |
that's unfortunate, i was hoping ibis would be able to make API changes that pandas couldn't |
an option can be to alias |
Hey @bingbong-sempai -- I think we're going to stick with our current convention. Most users are probably coming from Going to close this out -- thanks! |
just to add on a bit -- |
Just to add, both Polars and DuckDB use |
not sure how we feel about it in general as the Ibis project, but I'm personally not opposed (similar to my earlier comment) to adding aliases for common spellings of existing APIs -- e.g. a I'll reopen this so it shows in our issue triage and we can discuss |
My preference is to use |
Note that we already have I'm not opposed to a convention of:
That said, I don't find the current convention of putting both of these under The only thing I'm against is having any aliases where both |
I think the fraction of data formats is really small for which in-memory and on-disk format really makes sense. Eg I could see both a write_json() and a to_json(), but I don't see how an in-memory representation of a CSV makes sense, you always are going to write this to disk. I think this is the case for most of our formats (parquet, delta, csv). Therefore, since for most formats we would either have a to_format() or a write_format() (and not both), I think it is not going to be obvious enough what the distinction is between the two, and mostly it's just going to appear inconsistent to users. I think (no evidence) that most users when they see to_csv() they understand that something is getting written to disk and so some computation is happening. Being consistent with duckdb and Polars would be nice, but I'd rather be internally consistent and simple. I think my preferred solution would be |
I vote no adding alias write_ methods. Keep our code simple. I vote on closing this out as not planned for now. |
Is your feature request related to a problem?
No response
Describe the solution you'd like
I know pairing read_ and to_ functions follows Pandas' style but I think read_ and write_ makes more sense (it's also what polars uses).
For example, read_parquet and write_parquet for reading/writing to disk, from_pandas and to_pandas for reading/writing to memory.
What version of ibis are you running?
6.1.0
What backend(s) are you using, if any?
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: