Column projection added along with test function#19449
Conversation
FrankChen021
left a comment
There was a problem hiding this comment.
| Severity | Findings |
|---|---|
| P0 | 0 |
| P1 | 0 |
| P2 | 1 |
| P3 | 0 |
| Total | 1 |
Reviewed 3 of 3 changed files.
This is an automated review by Codex GPT-5.5
| .map(Types.NestedField::name) | ||
| .filter(columnsFilter::apply) | ||
| .collect(Collectors.toList()); | ||
| tableScan = tableScan.select(new ArrayList<>(projectedColumns)); |
There was a problem hiding this comment.
[P2] Projection is discarded before data is read
This selects projected columns on the Iceberg TableScan, but the method only returns task.file().location() afterward. The projected FileScanTask schema is discarded, and IcebergInputSource builds the warehouse delegate from the same raw file paths, so Druid's Parquet reader still opens the original files without the Iceberg projection. The new test also manually projects with Parquet.read(...).project(...), so it would pass even if this select had no effect. To make column projection work, the projected schema/split information needs to be carried into the reader path or pruning needs to happen in the delegate input format.
Fixes #19267 .
Description
Added column projection to determine which columns to read from iceberg table. This change will help greatly to improve read efficiency for use cases where whole table scan is not intended.
Key changed/added classes in this PR
IcebergCatalogIcebergInputSourceIcebergInputSourceTestThis PR has: