Swap out some simple PriorityQueue subclasses for one using a Comparator #14705

thecoop · 2025-05-23T16:37:40Z

Related to #11338 (comment), this replaces some simple uses of PriorityQueue subclasses with ones using a Comparator. This reduces the boilerplate required, the number of classes defined, and moves the definition of the PriorityQueue close to where it is used, so the relevant code is more self-contained and easier to understand.

thecoop · 2025-05-23T16:38:27Z

lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90CompoundFormat.java

    entries.writeVInt(numFiles);
    // first put files in ascending size order so small files fit more likely into one page
-    SizedFileQueue pq = new SizedFileQueue(numFiles);
+    List<SizedFile> files = new ArrayList<>(numFiles);


This one doesn't need to use a PriorityQueue at all

Sorting should also be faster here. The Lucene PQ is only needed when items in queue should fall out at botton when its full. In other cases the ln-overhaed is larger than a simple sorting at end.

github-actions · 2025-05-23T16:38:36Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

github-actions · 2025-05-23T16:40:01Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

Missed a comparator

github-actions · 2025-05-23T16:44:24Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

thecoop · 2025-05-23T16:47:01Z

Next stage is to run some perf tests to check this doesn't introduce a slowdown due to changes to the virtual method calls in key hotspots.

uschindler

Good idea. Why not make PQ final and pass a comparator or maybe a simpler functional interface directly in ctor. This would reduce the number of subclasses more and we only have a single PQ which is final on top.

uschindler · 2025-05-23T17:02:10Z

This reduces the boilerplate required, the number of classes defined,...

But it does not reduce the number of loaded classes. The lambdas are still hidden classes.

jpountz

I like it, it looks more like idiomatic Java. Can you check if OrdinalMapBenchmark reports any slowdown? This is the only benchmark that could be impacted by this change that I can think of.

It's slightly awkward that a PriorityQueue can be defined either using a comparator or overriding lessThan, let's only support providing a comparator as Uwe suggested?

…arators

github-actions · 2025-06-03T09:24:14Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you will stop receiving this reminder on future updates to the PR.

dweiss

Nice. I feel tempted to make PriorityQueue final and then move the comparator to a final field. Why bother having both ways of defining it?

thecoop · 2025-06-03T11:17:28Z

This is only the first pass, covering single-line comparators. There are several non-trivial implementations that will require more detailed refactoring (if that is indeed desirable) - using a Comparator requires differentiating equality, which lessThan does not require. There's also some that are part of the public API. I have a subsequent PR covering multi-line comparators, and then several to follow for non-trivial implementations.

dweiss · 2025-06-03T11:23:43Z

Good point, thanks.

dweiss · 2025-06-03T11:25:46Z

We could also declare a custom interface with just lessThan and then provide an adapter in usingComparator... this would dodge the need for implementing full equality checks across the board and you could still use jdk's Comparator utilities, where things are simple.

thecoop · 2025-06-03T12:55:11Z

That's a good idea - I'll look at that when I'm moving onto the more complex implementations

uschindler · 2025-06-03T13:17:26Z

We should really only have one final PQ implementation to make sure most of the code can be inlined.

uschindler · 2025-06-03T13:18:31Z

Basically the lessThan interface should be marked @FunctionalInterface and wherever possible the lessTahn check should be a lambda or method ref.

thecoop · 2025-06-03T13:51:36Z

That's the desired end state, yes. There's still plenty of implementations to get through before we can make it final.

dweiss

It would be good to see what the benchmarks show after this change. These pq's are sort of crucial to performance. I wonder if we can observe any change after subclasses are replaced by delegation.

In general, I'm fine to making baby steps although it could also be developed into a larger patch that adds this additional "smaller-than" interface and makes the PQ final. I wonder what others think.

thecoop · 2025-06-05T10:38:37Z

I wasn't able to run OrdinalMapBenchmark, but a few lucene-bench runs show no large changes - several percent +/- either way

jpountz · 2025-06-05T12:45:54Z

For reference, there is an interesting PR at #14714 that replaces PriorityQueue with a LongHeap when computing top-k hits by score. Results suggest that the PriorityQueue is not a bottleneck at all when k=100, but is a bottleneck for our fastest queries when k=1000. This PR doesn't change computing top-k hits by field, so tasks TermMonthSort, TermTitleSort, TermDayOfYearSort or TermDTSort would still use a PriorityQueue to track top hits.

jpountz · 2025-06-05T13:03:01Z

I wasn't able to run OrdinalMapBenchmark

I ran it on my machine. Got the following results on main (2 runs):

id: 651.43124 msec
name: 955.36439 msec
country_code: 0.37753 msec
time_zone: 1.03004 msec

id: 605.82902 msec
name: 923.61305 msec
country_code: 0.30129 msec
time_zone: 0.87078 msec

And the following results on your branch (2 runs as well):

id: 641.80320 msec
name: 945.66611 msec
country_code: 0.31938 msec
time_zone: 0.89374 msec

id: 631.90609 msec
name: 970.12461 msec
country_code: 0.32877 msec
time_zone: 0.91463 msec

So it looks fine to me as far as this benchmark is concerned.

jpountz · 2025-06-05T13:05:59Z

In general, I'm fine to making baby steps although it could also be developed into a larger patch that adds this additional "smaller-than" interface and makes the PQ final. I wonder what others think.

My intuition was that this "smaller-than" interface idea you brought up would be relatively easy to introduce to all our PriorityQueue impls, so it would be nice if we could skip the intermediate state where there are two ways to define how heap entries are ordered. But maybe I'm underestimating the effort.

thecoop · 2025-06-05T13:25:42Z

There are some non-trivial subclasses (TopDocs.MergeSortQueue, MultiTermsEnum.TermMergeQueue, NearSpansUnordered.SpanTotalLengthEndPositionWindow), some queues that themselves are subclassed (FieldValueHitQueue, TopOrdAndNumberQueue), and some part of the public API (HitQueue, FieldValueHitQueue, SuggestWordQueue). This PR handles the trivial cases, I was going to create subsequent PRs to cover the more complex refactorings so they can be considered separately, especially those that change the public API.

dweiss

I'm fine with merging, given the explanation. Please add a note to changes and migration? Seems like something worth mentioning.

jpountz · 2025-06-05T13:37:39Z

I'm fine with merging too.

dweiss · 2025-06-05T15:33:38Z

Merged. Thank you, @thecoop

uschindler · 2025-06-05T15:40:59Z

Thanks! Cool first step!

jpountz · 2025-06-27T20:29:21Z

FYI it looks like this change may be responsible for the slowdown on the CountOrMany and TermTitleSort tasks: https://benchmarks.mikemccandless.com/2025.06.05.18.05.16.html.

If you're curious what classes these tasks relate too, it's BooleanScorer for CountOrMany and TermOrdValComparator for TermTitleSort, which both rely on PriorityQueues.

dweiss · 2025-06-28T05:32:43Z

When I look at it over time it doesn't seem like there is a consistent slowdown though?
https://benchmarks.mikemccandless.com/TermTitleSort.html

jpountz · 2025-06-28T06:57:52Z

It looks consistent to me? The slowdown happened between annotations IN and IO, after this change was merged. Performance then improved with IP, but this was an unrelated change, it looks like this change introduced extra overhead and we'd get a speedup if we fix it.

Maybe it's just about finishing the change and always requiring a comparator so that we don't chain 3 virtual calls (LessThan#lessThan, Comparator#compare, Function#apply) for every comparison like today.

dweiss · 2025-06-28T07:52:20Z

Ah, ok... Yes, could be, could be. It'll be tricky to optimize for whatever c2 is going to decide to do. I agree the Comparator abstraction - while very neat from source code point of view - may result in different optimizing decisions at runtime. I wonder if we implemented new lessThan interface in some places (instead of the comparator chain) it'd help.

uschindler · 2025-06-28T08:14:48Z

I would definitely go away from Comparator and use the LessThan interface as the semantics are a bit different. We could supply a wrapper, but where it is simple to do (and does not need the Comparator static magic) I'd prefer a native interface.

jpountz · 2025-06-28T20:22:27Z

That would work for me.

thecoop · 2025-06-30T09:04:21Z

The last set of classes needed to do that are #14817 (comment) - I've raised #14872 for discussion

thecoop added 2 commits May 14, 2025 21:27

Use a list instead of a queue

ad9c707

Swap out single-value priority queues for ones using Comparators

ad0f48a

github-project-automation bot added this to OpenSearch Lucene & Core Performance Tracking May 23, 2025

github-project-automation bot moved this to Open in OpenSearch Lucene & Core Performance Tracking May 23, 2025

github-actions bot added module:core/index module:core/search module:highlighter module:benchmark module:core/codecs module:facet module:grouping module:sandbox module:misc module:queries labels May 23, 2025

thecoop commented May 23, 2025

View reviewed changes

Drop stray change

f53ae10

Missed a comparator

thecoop force-pushed the priority-queue-comparators branch from 1f07b03 to f53ae10 Compare May 23, 2025 16:43

uschindler reviewed May 23, 2025

View reviewed changes

jpountz reviewed May 25, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into priority-queue-comp…

10939bf

…arators

dweiss reviewed Jun 3, 2025

View reviewed changes

thecoop requested a review from dweiss June 5, 2025 10:07

dweiss reviewed Jun 5, 2025

View reviewed changes

dweiss approved these changes Jun 5, 2025

View reviewed changes

Add a CHANGES entry

df82c4c

github-actions bot added this to the 11.0.0 milestone Jun 5, 2025

dweiss merged commit 9afcfdb into apache:main Jun 5, 2025
7 checks passed

github-project-automation bot moved this from Open to Merged in OpenSearch Lucene & Core Performance Tracking Jun 5, 2025

thecoop deleted the priority-queue-comparators branch June 5, 2025 15:35

thecoop mentioned this pull request Jun 6, 2025

Convert more PriorityQueues to use Comparator #14761

Merged

thecoop mentioned this pull request Jun 30, 2025

Use LessThan here rather than Comparator for some key PriorityQueues #14871

Merged

Swap out some simple PriorityQueue subclasses for one using a Comparator #14705

Swap out some simple PriorityQueue subclasses for one using a Comparator #14705

Uh oh!

Conversation

thecoop commented May 23, 2025

Uh oh!

thecoop May 23, 2025

Choose a reason for hiding this comment

Uh oh!

uschindler May 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

github-actions bot commented May 23, 2025

Uh oh!

thecoop commented May 23, 2025

Uh oh!

uschindler left a comment

Choose a reason for hiding this comment

Uh oh!

uschindler commented May 23, 2025

Uh oh!

jpountz left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jun 3, 2025

Uh oh!

dweiss left a comment

Choose a reason for hiding this comment

Uh oh!

thecoop commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dweiss commented Jun 3, 2025

Uh oh!

dweiss commented Jun 3, 2025

Uh oh!

thecoop commented Jun 3, 2025

Uh oh!

uschindler commented Jun 3, 2025

Uh oh!

uschindler commented Jun 3, 2025

Uh oh!

thecoop commented Jun 3, 2025

Uh oh!

dweiss left a comment

Choose a reason for hiding this comment

Uh oh!

thecoop commented Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpountz commented Jun 5, 2025

Uh oh!

jpountz commented Jun 5, 2025

Uh oh!

jpountz commented Jun 5, 2025

Uh oh!

thecoop commented Jun 5, 2025

Uh oh!

dweiss left a comment

Choose a reason for hiding this comment

Uh oh!

jpountz commented Jun 5, 2025

Uh oh!

Uh oh!

dweiss commented Jun 5, 2025

Uh oh!

uschindler commented Jun 5, 2025

Uh oh!

jpountz commented Jun 27, 2025

Uh oh!

dweiss commented Jun 28, 2025

Uh oh!

jpountz commented Jun 28, 2025

Uh oh!

dweiss commented Jun 28, 2025

Uh oh!

thecoop commented Jun 3, 2025 •

edited

Loading

thecoop commented Jun 5, 2025 •

edited

Loading

thecoop commented Jun 30, 2025 •

edited

Loading