tbl.append(df): schema validation of tbl & df during compares the order & data types

### Apache Iceberg version

0.6.1

### Please describe the bug 🐞

while writing dataframe to iceberg through tbl.append(df), there happens to be a schema validation of table schema & df schema.

this function in append `_check_schema_compatible(self.schema(), other_schema=df.schema)` does the schema validation.

here table schema & df schema are converted to pyarrow schema of struct type, and compared with order of dataframe columns with data types.

this results in the following error:
`Traceback (most recent call last):
  File "/Users/apple/Projects/bright/brightmoney_collections_system/utils/index.py", line 172, in <module>
    dff = write_to_iceberg(
  File "/Users/apple/Projects/bright/brightmoney_collections_system/utils/index.py", line 163, in write_to_iceberg
    table.append(pyarrow_df)
  File "/Users/apple/Projects/bright/brightmoney_collections_system/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 1057, in append
    _check_schema_compatible(self.schema(), other_schema=df.schema)
  File "/Users/apple/Projects/bright/brightmoney_collections_system/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 175, in _check_schema_compatible
    raise ValueError(f"Mismatch in fields:\n{console.export_text()}")
ValueError: Mismatch in fields:
┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Table field                             ┃ Dataframe field                         ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ ✅ │ 1: a: optional timestamptz     │ 1: a: optional timestamptz     │
│ ✅ │ 2: b: optional timestamptz    │ 2: b: optional timestamptz    │
│ ✅ │ 3: x: optional string          │ 3: x: optional string          │
│ ✅ │ 4: y: optional string     │ 4: y: optional string     │
└────┴─────────────────────────────────────────┴─────────────────────────────────────────┘`

yet there is no mismatch in field of table & dataframe.

ideally the schema compatibility should not consider the order in which dataframe is send?






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

tbl.append(df): schema validation of tbl & df during compares the order & data types #1088

Apache Iceberg version

Please describe the bug 🐞

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tbl.append(df): schema validation of tbl & df during compares the order & data types #1088

Description

Apache Iceberg version

Please describe the bug 🐞

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions