GH-3358: Add Configurable Thrift Max Message Size for Parquet Metadata Reading #3359
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rationale for this change
When reading Parquet files with large metadata (e.g., files with thousands of columns), the default Thrift message size limit can be insufficient, causing TTransportException: Message size exceeds limit errors. Currently, the Thrift protocol configuration uses default max message size (100MB), preventing users from reading files with exceptionally large metadata footers.
What changes are included in this PR?
Add a new configuration key: parquet.thrift.string.size.limit
Default value: 100 MB (104857600 bytes)
Allow users to override this via Configuration
Are these changes tested?
Yes and added a Test case TestParquetFileReaderMaxMessageSize.java
Are there any user-facing changes?
Not by default, user can set config parquet.thrift.string.size.limit= to increase it based on need.
Closes #GH-3358