Skip to content

fix: check leaf column is root column in Parquet schema #1347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 23, 2025

Conversation

burmecia
Copy link
Contributor

@burmecia burmecia commented May 18, 2025

Which issue does this PR close?

N/A

What changes are included in this PR?

This PR is to fix a bug in checking if a leaf column is root column in Parquet schema. The current solution is to compare the leaf column's index with root column's index, this is not reliable as the column index may not correspond exactly after a nested column. For example,

  • col A: String
  • col B: Map
    • key B.K: String
    • value B.V: String
  • col C: String

For column A, B.K and B.V, the currently logic works well. It won't return error for A, but will do for B.K and B.V, this is correct. But for column C, it will return error as its leaf column index (3) will not match root column index (2), this is incorrect. Column C is a valid leaf column to be in a predicate.

I think get_column_root().is_group() may be a more reliable way to check in this situation. Just to check if a leaf column's root column is a group column, we can determine a leaf column is actually on the root.

Are these changes tested?

Yes

Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix, thank you!

@Xuanwo Xuanwo merged commit 12485be into apache:main May 23, 2025
17 checks passed
@burmecia burmecia deleted the bo/fix/leaf-column-idx branch May 24, 2025 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants