Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling reduce_then_scan for "Set" family of scan APIs #1879

Merged
merged 15 commits into from
Oct 28, 2024

Conversation

danhoeflinger
Copy link
Contributor

@danhoeflinger danhoeflinger commented Sep 30, 2024

Enabling reduce then scan algorithm to be used for set family APIs:
set_intersection
set_union
set_difference
set_symmetric_difference

Adds high-level helpers required to convert set family to use reduce_then_scan on GPU with size 32 subgroups.

Also makes set consistent with the other scan algorithms to select algorithm in the __parallel_* level of function call.

This provides significant performance improvements, especially at small sizes of n. At larger sizes of n, this is an improvement over the existing algorithm, but not by a lot.

There is still significant room for improvement here, this algorithm is very inefficient.


Three future things to try (from easiest to hardest) to improve the set family within the reduce then scan alg:

  1. We could explore our alternative __pstl_lower_bound, __shars_lower_bound to see if it provides benefit.
  2. We could propagate a "hint" minimum search index between blocks (this provides a lower bound and better "blocks" the second set, in addition to the blocking we do inherently on the first set)
  3. We could propagate a similar "hint" minimum search index subgroup from one iteration to the next. (This prevents redundant computation for consecutive iterations for a subgroup. Since the sets are ordered, subsequent searches will never go backward as long as the element we are searching's index in the first increases, which it does).

Right now, we are wasting lots of memory accesses and comparisons on sections of the second set which should be able to ruled out from knowledge we have access to at the time we are making the search (in the "Reduce" predicate). We also lose the "blocking" cache advantage of reduce_then_scan, because the second set always searches from the start of the list.

Copy link
Contributor

@adamfidel adamfidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass of comments. I think I generally understand the algorithm and the logic seems good to me.

danhoeflinger and others added 12 commits October 14, 2024 13:47
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
wording and constexpr

Co-authored-by: Adam Fidel <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
It is only initialized with a constexpr, later updated at runtime

Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger danhoeflinger force-pushed the dev/shared/reduce_then_scan_set_family branch from 01ee3a9 to adc6cc5 Compare October 14, 2024 18:13
Signed-off-by: Dan Hoeflinger <[email protected]>
@danhoeflinger
Copy link
Contributor Author

I realize I never expanded the set of reviewers from a small group, though I know we agreed not to attempt to spend too much effort on perfecting this approach (for performance) before merging. I've now added a few more for visibility.

Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small styling nitpicks. I am ready to approve after.

Signed-off-by: Dan Hoeflinger <[email protected]>
Copy link
Contributor

@mmichel11 mmichel11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@danhoeflinger danhoeflinger merged commit 28cb633 into main Oct 28, 2024
22 checks passed
@danhoeflinger danhoeflinger deleted the dev/shared/reduce_then_scan_set_family branch October 28, 2024 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants