Skip to content

Column projection added along with test function#19449

Open
Arijit6258 wants to merge 2 commits into
apache:masterfrom
Arijit6258:Column-Projection-In-Druid-Iceberg-Extension
Open

Column projection added along with test function#19449
Arijit6258 wants to merge 2 commits into
apache:masterfrom
Arijit6258:Column-Projection-In-Druid-Iceberg-Extension

Conversation

@Arijit6258
Copy link
Copy Markdown

Fixes #19267 .

Description

Added column projection to determine which columns to read from iceberg table. This change will help greatly to improve read efficiency for use cases where whole table scan is not intended.


Key changed/added classes in this PR
  • IcebergCatalog
  • IcebergInputSource
  • IcebergInputSourceTest

This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.

Copy link
Copy Markdown
Member

@FrankChen021 FrankChen021 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 3 of 3 changed files.


This is an automated review by Codex GPT-5.5

.map(Types.NestedField::name)
.filter(columnsFilter::apply)
.collect(Collectors.toList());
tableScan = tableScan.select(new ArrayList<>(projectedColumns));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Projection is discarded before data is read

This selects projected columns on the Iceberg TableScan, but the method only returns task.file().location() afterward. The projected FileScanTask schema is discarded, and IcebergInputSource builds the warehouse delegate from the same raw file paths, so Druid's Parquet reader still opens the original files without the Iceberg projection. The new test also manually projects with Parquet.read(...).project(...), so it would pass even if this select had no effect. To make column projection work, the projected schema/split information needs to be carried into the reader path or pruning needs to happen in the delegate input format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Druid Iceberg Extension] Add column projection support to reduce I/O and improve query performance

2 participants