Skip to content

Conversation

@cravani
Copy link

@cravani cravani commented Nov 19, 2025

Rationale for this change

When reading Parquet files with large metadata (e.g., files with thousands of columns), the default Thrift message size limit can be insufficient, causing TTransportException: Message size exceeds limit errors. Currently, the Thrift protocol configuration uses default max message size (100MB), preventing users from reading files with exceptionally large metadata footers.

What changes are included in this PR?

Add a new configuration key: parquet.thrift.string.size.limit
Default value: 100 MB (104857600 bytes)
Allow users to override this via Configuration

Are these changes tested?

Yes and added a Test case TestParquetFileReaderMaxMessageSize.java

Are there any user-facing changes?

Not by default, user can set config parquet.thrift.string.size.limit= to increase it based on need.

Closes #GH-3358

@cravani cravani force-pushed the GH-3358 branch 2 times, most recently from b70fa30 to e8de6b5 Compare November 19, 2025 17:07
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants