Description
Describe the bug
#4389 introduced truncation on column indices for binary columns, where the min/max values for a binary column may be arbitrarily large. As noted, this matches the behaviour in parquet-mr for shortening columns.
However, the value in the statistics is written un-truncated. This differs from the behaviour of parquet-mr where the statistics are truncated too: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L715
To Reproduce
There is a test in delta-io/delta-rs#1805 which demonstrates this, but in general write a parquet file with a long binary column and observe that the stats for that column are not truncated.
Expected behavior
Matching parquet-mr, the statistics should be truncated as well.
Additional context
Found this when looking into delta-io/delta-rs#1805. delta-rs uses the column stats to serialize into the delta log, which leads to very bloated entries.
I think it is sufficient to just call truncate_min_value/truncate_max_value when creating the column metadata here: https://github.com/apache/arrow-rs/blob/master/parquet/src/column/writer/mod.rs#L858-L859 but I don't know enough about the internals of arrow to know if that change is correct.