Skip to content

Substrait consumer incorrectly parses references to outer query fields #20438

@neilconway

Description

@neilconway

Describe the bug

In a substrait plan containing correlated subqueries, references to outer query fields are parsed incorrectly: we only look at the schema for the inner/subquery schema, which produced incorrect column names and types.

To Reproduce

For example, consider TPC-H Q21, which is parsed as:

        Projection: SUPPLIER.S_NAME, count(Int64(1)) AS NUMWAIT
           Limit: skip=0, fetch=100
             Sort: count(Int64(1)) DESC NULLS FIRST, SUPPLIER.S_NAME ASC NULLS LAST
               Aggregate: groupBy=[[SUPPLIER.S_NAME]], aggr=[[count(Int64(1))]]
                 Projection: SUPPLIER.S_NAME
                   Filter: SUPPLIER.S_SUPPKEY = LINEITEM.L_SUPPKEY AND ORDERS.O_ORDERKEY = LINEITEM.L_ORDERKEY AND ORDERS.O_ORDERSTATUS = Utf8("F") AND LINEITEM.L_RECEIPTDATE > LINEITEM.L_COMMITDATE AND EXISTS (<subquery>) AND NOT EXISTS (<subquery>) AND SUPPLIER.S_NATIONKEY = NATION.N_NATIONKEY AND NATION.N_NAME = U\
 tf8("SAUDI ARABIA")
                     Subquery:
                       Filter: LINEITEM.L_ORDERKEY = LINEITEM.L_TAX AND LINEITEM.L_SUPPKEY != LINEITEM.L_LINESTATUS
                         TableScan: LINEITEM
                     Subquery:
                       Filter: LINEITEM.L_ORDERKEY = LINEITEM.L_TAX AND LINEITEM.L_SUPPKEY != LINEITEM.L_LINESTATUS AND LINEITEM.L_RECEIPTDATE > LINEITEM.L_COMMITDATE
                         TableScan: LINEITEM
                     Cross Join:
                       Cross Join:
                         Cross Join:
                           TableScan: SUPPLIER
                           TableScan: LINEITEM
                         TableScan: ORDERS
                       TableScan: NATION

Note that in the subquery, the filter has the clause LINEITEM.L_SUPPKEY != LINEITEM.L_LINESTATUS. This is not what Q21 contains; and in fact the types of those two columns (Int64 and Utf8) are not even compatible, although we don't currently reject this.

Expected behavior

Parse references to outer query fields correctly.

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions