Skip to content

Regression in 30.0.0 write_parquet ignores all options #7423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sergiimk opened this issue Aug 26, 2023 · 3 comments
Closed

Regression in 30.0.0 write_parquet ignores all options #7423

sergiimk opened this issue Aug 26, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@sergiimk
Copy link
Contributor

sergiimk commented Aug 26, 2023

Describe the bug

After upgrading to DataFusion 30.0.0 I noticed that my carefully-crafted options for DataFrame::write_parquet are now being discarded

I see there was some major refactoring, but is this regression intentional? I don't see it called out in breaking changes of the release 29/30.

We use options to force dictionary and delta encoding on some columns. How can this be achieved now?

To Reproduce

No response

Expected behavior

  • No regression in functionality would be preferred
  • Release notes should call this out and suggest a workaround
  • API probably should panic with unimplemented!() when options are specified instead of ignoring silently

Additional context

No response

@sergiimk sergiimk added the bug Something isn't working label Aug 26, 2023
@devinjdangelo
Copy link
Contributor

Yes, this was an intentional regression. I agree raising an error on passing configuration settings would have been a good idea for this release. I am actively working on adding support for configuration back in (bulk of the work to reenable is in #7390).

The old write implementation is still in DataFusion 30.0.0 just not via the DataFrame methods directly. You can call the old write_parquet code via SessionContext::write_parquet https://github.com/apache/arrow-datafusion/blob/e0961d55479aab1c4f92eca817fcce4ec25d7c3e/datafusion/core/src/execution/context.rs#L1325-L1333

Apologies for the inconvenience in the current release and not calling out this particular breaking change more clearly.

@alamb alamb changed the title Regression: write_parquet ignores all options Regression in 30.0.0 write_parquet ignores all options Aug 26, 2023
@sergiimk
Copy link
Contributor Author

@devinjdangelo thanks for explanation and your hard work! Will stick with older API for now.

I will close this issue and start watching #7390

@sergiimk
Copy link
Contributor Author

Note: Switching to SessionContext::write_parquet didn't work right away, resulting in IoError(Os { code: 2, kind: NotFound, message: "No such file or directory" }).

The source of the error was this call to .canonicalize() that seems to expect that destination path exists. I worked around this by creating an empty directory before saving to Parquet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants