Skip to content

[epic] address manifest reader feature gaps between rust and python implementations #1714

@kevinjqliu

Description

@kevinjqliu

What's the feature are you trying to implement?

See apache/iceberg-python#2004 for the integration; pyiceberg using rust-based manifest reader

Heres the error log from make integration, grouped by error type:
https://gist.github.com/kevinjqliu/db6352f0b6d0ab8a717af67a1b71355e

  • Convert raw literal (bytes) to binary type
    • "pyo3_runtime.PanicException: called Result::unwrap() on an Err value: DataInvalid => Unable to convert raw literal (bytes) fail convert to type binary for: todo: rust avro doesn't support deserialize any bytes representation now"
  • Convert raw literal (bytes) to decimal(5,2) type
    • "pyo3_runtime.PanicException: called Result::unwrap() on an Err value: DataInvalid => Unable to convert raw literal (bytes) fail convert to type decimal(5,2) for: todo: rust avro doesn't support deserialize any bytes representation now"
  • partition field with special string characters, special#string+field
  • partition field with uuid
  • V3 manifests
    • Fail to parse format version in manifest metadata
  • files metadata table lower_bounds
    • tests/integration/test_inspect_table.py::test_inspect_files[2] - AssertionError: Difference in column lower_bounds: {} != {2147483546: b's3://warehouse/default/table_metadata_files/data/00000-0-f5c93fd4-42af-481f-bcc0-140fad66f25a.parquet', 2147483545: b'\x00\x00\x00\x00\x00\x00\x00\x00'}
    • PR is out: feat: Include statistics for Reserved Fields #1849
  • manifest file content after merge
    • tests/integration/test_writes/test_writes.py::test_merge_manifests_file_content[2] - AssertionError: assert [(2, 78), (4,...(8, 118), ...] == [(1, 49), (2,... (6, 94), ...]
  • equality_ids can be optional (fixed by refactor: Move equality-ids closer to the spec #1705)
  • uuid support (fixed by Add UUID support for the Avro schema #1706)
  • enable zstd (fixed by feat: Enable zstd #1692)

Willingness to contribute

None

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic issue

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions