You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'd like to be able to to specify the column encoding the parquet writer is using for the creation of the checkpoint files.
Currently, the writer properties are hard-coded in checkpoints.rs
Use Case
Microsoft Fabric currently has a limitation and doesn't support run length encoded parquet files for the checkpoint parquet files. The current checkpoint files make the SQL analytics endpoint error when trying to read a delta lake table created by delta-rs which includes a check-point.
Workaround
I currently use a post-processing step to remove the encoding of the the checkpoint parquet file like this.
defparquet_file_convert_encoding_to_plain(file_path: str):
importpyarrow.parquetaspqimportostable=pq.read_table(file_path)
# Write the table to a temporary file using the new encoding properties# which force the use of PLAIN encodingtmp_file=file_path+".tmp"pq.write_table(table, tmp_file, use_dictionary=False, column_encoding="PLAIN")
# Replace the original file with the new oneos.replace(tmp_file, file_path)
Fabric is happily reading the delta lake table once the checkpoint file has been post-processed this way.
Related Issue(s)
The text was updated successfully, but these errors were encountered:
Description
I'd like to be able to to specify the column encoding the parquet writer is using for the creation of the checkpoint files.
Currently, the writer properties are hard-coded in checkpoints.rs
Use Case
Microsoft Fabric currently has a limitation and doesn't support run length encoded parquet files for the checkpoint parquet files. The current checkpoint files make the SQL analytics endpoint error when trying to read a delta lake table created by delta-rs which includes a check-point.
Workaround
I currently use a post-processing step to remove the encoding of the the checkpoint parquet file like this.
Fabric is happily reading the delta lake table once the checkpoint file has been post-processed this way.
Related Issue(s)
The text was updated successfully, but these errors were encountered: