feat(datafusion): implement the project node to add the partition columns #1602

fvaleye · 2025-08-13T13:16:45Z

Which issue does this PR close?

Closes Implement Project Node: Caculate partition value #1542

What changes are included in this PR?

Implement a physical execution plan node that projects Iceberg partition columns from source data, supporting nested fields and all Iceberg transforms.

Are these changes tested?

Yes, with unit tests

…umns defined in Iceberg. Implement physical execution plan node that projects Iceberg partition columns from source data, supporting nested fields and all Iceberg transforms.

…to-datafusion

CTTY · 2025-08-21T01:20:29Z

crates/integrations/datafusion/src/physical_plan/project.rs

+            let field_path = Self::find_field_path(&self.table_schema, source_field.id)?;
+            let index_path = Self::resolve_arrow_index_path(batch_schema.as_ref(), &field_path)?;
+
+            let source_column = Self::extract_column_by_index_path(batch, &index_path)?;


This looks very interesting! I actually came across the similar issue when implementing the sort node, and I was leaning toward implementing a new SchemaWithPartnerVisitor, wdyt?

Perfect 👌
I was initially thinking this was needed just for this implementation, but it seems the right place would be closer to the Schema definition. Since this is a standard method for accessing column values by index, it makes sense to generalize!

I drafted a PartitionValueVisitor here to help extract partition values from a record batch in tree-traversal style

Pleast let me know what you think!

I just saw this implementation to extract partition values and it actually makes more sense to me that it leverages the existing RecordBatchProjector: #1040

Good, thanks for sharing. I will use #1040 when merged!

fvaleye · 2025-08-21T13:42:50Z

crates/integrations/datafusion/src/physical_plan/project.rs

+    }
+
+    /// Find the path to a field by its ID (e.g., ["address", "city"]) in the Iceberg schema
+    fn find_field_path(table_schema: &Schema, field_id: i32) -> DFResult<Vec<String>> {


We might need to consider this function as well @CTTY following our discussion here.
It may not be the right place at the moment.

crates/integrations/datafusion/src/physical_plan/project.rs

…n containing all the partitions values

fvaleye force-pushed the feature/implement-project-node-for-insert-into-datafusion branch from b3a8601 to 40a225a Compare August 13, 2025 13:17

feat(datafusion): implement the project node to add the partition col…

4d59f87

…umns defined in Iceberg. Implement physical execution plan node that projects Iceberg partition columns from source data, supporting nested fields and all Iceberg transforms.

fvaleye force-pushed the feature/implement-project-node-for-insert-into-datafusion branch from 40a225a to 4d59f87 Compare August 13, 2025 14:50

Merge branch 'main' into feature/implement-project-node-for-insert-in…

d930df9

…to-datafusion

CTTY reviewed Aug 21, 2025

View reviewed changes

fvaleye commented Aug 21, 2025

View reviewed changes

CTTY reviewed Aug 22, 2025

View reviewed changes

crates/integrations/datafusion/src/physical_plan/project.rs Outdated Show resolved Hide resolved

feat(datafusion): adapt IcebergProjectExec to use one partition colum…

803199a

…n containing all the partitions values

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(datafusion): implement the project node to add the partition columns #1602

feat(datafusion): implement the project node to add the partition columns #1602

Uh oh!

fvaleye commented Aug 13, 2025

Uh oh!

CTTY Aug 21, 2025

Uh oh!

fvaleye Aug 21, 2025

Uh oh!

CTTY Aug 21, 2025

Uh oh!

CTTY Aug 21, 2025

Uh oh!

fvaleye Aug 22, 2025

Uh oh!

fvaleye Aug 21, 2025

Uh oh!

Uh oh!

Uh oh!

feat(datafusion): implement the project node to add the partition columns #1602

Are you sure you want to change the base?

feat(datafusion): implement the project node to add the partition columns #1602

Uh oh!

Conversation

fvaleye commented Aug 13, 2025

Which issue does this PR close?

What changes are included in this PR?

Are these changes tested?

Uh oh!

CTTY Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

fvaleye Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

CTTY Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

CTTY Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

fvaleye Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

fvaleye Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!