Skip to content

feat: Add ProgressiveEval operator #10490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

NGA-TRAN
Copy link
Contributor

@NGA-TRAN NGA-TRAN commented May 13, 2024

Which issue does this PR close?

Closes #10488

Rationale for this change

In InfluxDB IOx, when the inputs of SortPreservingMerge are all sorted on the sort key and their data do not overlap, we replace SortPreservingMerge with ProgressiveEval which:

  1. Avoids starting all input streams at once
  2. Avoids having to compare any keys (doesn't actually do a merge)

We wrote about using this operator here: https://www.influxdata.com/blog/making-recent-value-queries-hundreds-times-faster/

What changes are included in this PR?

Adding a new ProgressiveEvalExec

Are these changes tested?

Yes, unit tests

Are there any user-facing changes?

Not yet because it is not hooked in any plans yet

@NGA-TRAN
Copy link
Contributor Author

@alamb : The PR is ready for review

@alamb alamb changed the title feat: porting ProgressiveEval from InfluxDB IOx feat: Add ProgressiveEval operator May 13, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @NGA-TRAN

What is the plan for actually using this code in DataFusion?

I am hesitant to put code in DataFusion that is not useful to other users of DataFusion as that basically adds maintenance burden to the community for no benefit.

We should at least have a plan / POC / description of how we are going to use this operator in plans.

I think perhaps we can use the code added by @suremarc in thttps://github.com//issues/7490 to determine conditions when the optimizer can use ProgressiveEval instead of SortPreservingMerge

@NGA-TRAN
Copy link
Contributor Author

What is the plan for actually using this code in DataFusion?

I am hesitant to put code in DataFusion that is not useful to other users of DataFusion as that basically adds maintenance burden to the community for no benefit.

We should at least have a plan / POC / description of how we are going to use this operator in plans.

I think perhaps we can use the code added by @suremarc in thttps://github.com/#7490 to determine conditions when the optimizer can use ProgressiveEval instead of SortPreservingMerge

@alamb : My first step after this is merged in to use it in IOx. So it will be used but I understand it won't be visible to DF.

Then I am happy to look into #7490 to see if there is a quick change to use ProgressiveEval instead of SortPreservingMerge

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label May 13, 2024
@NGA-TRAN
Copy link
Contributor Author

@alamb I have just looked at the code how IOx use ProgressiveEval and we can make it general. So next steps will be:

  1. After this is merged, I will upgrade it to IOx and use ProgressiveEval in IOx
  2. Port what I do in (1) to DF

This means the optimization will be fully in DF, not in IOx at all

@alamb
Copy link
Contributor

alamb commented May 14, 2024

closing/reopening to retrigger CI checks that have gotten stuck for some reason

@alamb alamb closed this May 14, 2024
@alamb alamb reopened this May 14, 2024
@NGA-TRAN
Copy link
Contributor Author

@alamb and I have chatted and even though the ProgressesiveEval operator in this PR is fully implemented and tested, it is not integrated into any query plan yet. The usage of this operator in InfluxDB IOx is very specific to Influx. To make it more general to DF, it will need more investigation. Thus we decided not merge this PR until someone is interested in intergrating it in DF query plans

@alamb
Copy link
Contributor

alamb commented May 15, 2024

Follow on / next steps here: #10316 (comment)

Marking this PR as draft until someone is ready to hook it in to the optimzier

@alamb alamb marked this pull request as draft May 15, 2024 19:23
Copy link

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ProgressiveEval operator for optimize SortPreservingMerge
2 participants