This repository was archived by the owner on Jan 7, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 28
This repository was archived by the owner on Jan 7, 2025. It is now read-only.
Tracking: parity with Postgres for TPC-H cardinality estimations #127
Copy link
Copy link
Open
Description
Notes
- Sometimes Postgres does really bad (even worse than our magic numbers!). However, the goal right now is simply to match Postgres, not to match the truecard. When I say fix, I mean match Postgres.
- This is because we know exactly what we need to do to match Postgres but we don't know what we need to do to match the truecard.
- Experiments ran with scale factor 1.0, seed 15721
- If no other PR is mentioned, then the query was run based on feat: caching optd stats, 12x speedup on TPC-H SF1 #132
Queries
- Q1
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q2 (already matching in feat: caching optd stats, 12x speedup on TPC-H SF1 #132)
- Q3: truecard=10, pgcard=10, dfcard=10
- Fixed by fix: limit row cnt #138
- Q4
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q5: truecard=5, pgcard=25, dfcard=25
- Q6: truecard=1, pgcard=1, dfcard=1
- feat: using proper magic numbers in various edge cases #143 revealed the problem here
- Fixed by feat: add cost estimation for agg #144
- Q7: truecard=4, pgcard=6119, dfcard=125000
- Fixing join predicates and fixing multi-dim group by would definitely help with this, but it's not clear whether it would completely fix it.
- feat: join selectivity #145 changed dfcard from 1 to 125000
- Q8: truecard=2, pgcard=2406, dfcard=200
- Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on
EXTRACT(year FROM orders.o_orderdate)
and it simply uses the N-Distinct of orders.o_orderdate as the cardinality of the query.
- Fixing single-dim group by and pulling expressions up to the group by should fix this. Postgres identifies that the group by is done on
- Q9: truecard=175, pgcard=60150, dfcard=5000
- Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of
p_name like '%forest'
and just useo_orderdate as o_year
, you get exactly 60150 rows. - feat: join selectivity #145 changed dfcard from 25 to 5000
- Fixing single-dim group by and pulling expressions up to the group by should fix this. When you get rid of
- Q10: truecard=20, pgcard=20, dfcard=20
- Q11: truecard=869, pgcard=10667, dfcard=67936
- feat: join selectivity #145 changed dfcard from 1 to 67936
- I'm not sure how Postgres gets to 10667.
- Q12: truecard=2, pgcard=7, dfcard=7
- Q13: truecard=42, pgcard=200, dfcard=200
- Fixed by feat: join selectivity #145
- Q14: truecard=1, pgcard=1, dfcard=1
- Making aggregates give rows=1 should fix this. It's just an aggregate.
- Fixed by feat: add cost estimation for agg #144
- Q15
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q16
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q17: truecard=1, pgcard=1, dfcard=1
- Making aggregates give rows=1 should fix this. It's just an aggregate.
- Fixed by feat: add cost estimation for agg #144
- Q18
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q19: truecard=1, pgcard=1, dfcard=1
- Q20
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q21
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
- Q22
- Not running. See Tracking: make sure optd does not crash for TPC-H queries #68
Metadata
Metadata
Assignees
Labels
No labels