fix: check leaf column is root column in Parquet schema #1347
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A
What changes are included in this PR?
This PR is to fix a bug in checking if a leaf column is root column in Parquet schema. The current solution is to compare the leaf column's index with root column's index, this is not reliable as the column index may not correspond exactly after a nested column. For example,
For column A, B.K and B.V, the currently logic works well. It won't return error for A, but will do for B.K and B.V, this is correct. But for column C, it will return error as its leaf column index (3) will not match root column index (2), this is incorrect. Column C is a valid leaf column to be in a predicate.
I think
get_column_root().is_group()
may be a more reliable way to check in this situation. Just to check if a leaf column's root column is a group column, we can determine a leaf column is actually on the root.Are these changes tested?
Yes