-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29287: Iceberg: Variant Shredding support #6152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
same thing as apache/iceberg#14297 |
| TableScan | ||
| alias: tbl_shredded_variant | ||
| filterExpr: (UDFToDouble(variant_get(data, '$.age')) > 25.0D) (type: boolean) | ||
| Statistics: Num rows: 3 Data size: 1020 Basic stats: COMPLETE Column stats: NONE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PPD is not supported here, would be addressed in a separate JIRA
|
|
I tested variant_type_shredding.q by removing 'variant.shredding.enabled'='true' from the table properties, and the qtest still passes without any failures. so maybe we can add a JUnit test (e.g., TestVariantShredding) that: |



What changes were proposed in this pull request?
Support for variant shredding, enabling Hive to write shredded variant data into Iceberg tables.
Ideally, this should follow the approach described in the reader/writer API proposal for Iceberg V4, where an execution engine provides the shredded writer schema.
As an interim solution, this PR introduces a writer that infers the shredded schema from the sample record captured before the Parquet writer is initialized.
Why are the changes needed?
Enables data skipping (predicate pushdown)
Does this PR introduce any user-facing change?
No
How was this patch tested?
variant_type_shredding.q