Skip to content

Conversation

Rashampreet4114
Copy link

@Rashampreet4114 Rashampreet4114 commented Sep 17, 2025

Rationale for this change

This PR adds documentation for PyIceberg ⇄ PyArrow datatype mapping /conversion table
The Iceberg spec currently defines type mappings only for Avro, Parquet, and ORC.
This change fills that gap by providing a clear reference for Python developers working with PyIceberg and PyArrow.

The documentation:

  • Lists all supported PyIceberg → PyArrow conversions, derived from the _ConvertToArrowSchema visitor class.
  • Provides the natural PyArrow → PyIceberg reverse mapping for round-tripping.
  • Highlights important details such as handling of field IDs, documentation metadata, decimal precision, large string/binary handling, and timestamp precision/timezones.

Are these changes tested?

No

Are there any user-facing changes?

Yes.

  • New Markdown documentation: pyiceberg_pyarrow_mapping.md (with aligned tables and Python type classes).
  • No changes to runtime behavior—purely documentation.

Closes #2226

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs: add a table for data type conversion between arrow and iceberg types

1 participant