feat: configurable column encoding for parquet checkpoint files #3214
+128
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds a table configuration option to enable or disable run length encoding for checkpoint files.
Note: I'm unsure if the table option is the right way to go - In the original issue it was propose to expose
writerProperties
on create_checkpoint, however, after evaluating this, I figured this has a few downsides:writerProperties
writerProperties
would expose too much control, i.e. I don't really need that level of control over the checkpoint writingInstead I went down the route of table properties, unsure however if the Delta Lake spec allows implementation specific table properties, or if these should be a well-defined set of properties. Since in doubt, I've prefixed the table property with
delta-rs
instead ofdelta
. Happy to discuss!If the approach is validated I can update/add documentation as well.
Related Issue(s)
Documentation