Skip to content

Conversation

@mgree
Copy link
Contributor

@mgree mgree commented May 12, 2023

#18666 introduced an MIR typechecker. This PR is meant to address one of its imprecisions.

Join equivalences use raw Datum equality for inner joins, which would equate NULLs---when we compile a join from SQL we are careful to avoid hitting this case, but it's not 100% clear that we avoid this in all cases.

The typechecker has code that emits debug-level traces for when a join equivalence class is all nullable fields. But a number of queries---including some that are part of the startup sequence---generate queries that cause these traces to happen.

Motivation

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered.
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • This PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way) and therefore is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:

@mgree
Copy link
Contributor Author

mgree commented May 12, 2023

The following query is a fairly short example of a query that leads to debug traces.

create table idxs     (id int not null,       on_id int);
create table idx_cols (index_id int not null, on_position text not null);
create table obj_cols (id int,       position text not null);

insert into idxs values (47, null), (48, 1);
insert into idx_cols values (47, 'hi'), (47, 'bye'), (47, 'hello'), (47, 'goodbye');
insert into obj_cols values (null, 'hi'), (null, 'bye'), (1, 'hello'), (1, 'goodbye');

SELECT
    idxs.id
FROM
    idxs
    JOIN idx_cols ON idxs.id = idx_cols.index_id
    LEFT JOIN obj_cols ON
        idxs.on_id = obj_cols.id AND idx_cols.on_position = obj_cols.position
GROUP BY idxs.id;

@aalexandrov, @ggevay, and I suspected that adding MirRelationExpr::rejected_nulls would make the typechecker precise enough to figure things out, but either the code is buggy or it's not enough.

UPDATE: the EXPLAIN has no traces, but the query itself does.

@ggevay ggevay added A-optimization Area: query optimization and transformation A-compute Area: compute labels May 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-compute Area: compute A-optimization Area: query optimization and transformation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants