-
Notifications
You must be signed in to change notification settings - Fork 815
Yet another labels-to-Child lookup optimization #460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54e2e97
to
8ab5802
Compare
Thanks for the PR. I think a custom hash table implementation is going a bit far, I'd prefer not to have to maintain that. Could improvements be made to the JVM's implementations that everyone would benefit from? |
@brian-brazil conceptually the implementation is really quite simple, just happens to be tailored for the specific requirements here including not having to create a new object just for the purpose of hash/equals comparison, and not caring about modification performance relative to lookups. The java SDK includes various data structure implementations (such as For this reason there's a wealth of third-party utility libraries which fill in gaps and cater to more specific use cases for example the excellent eclipse collections and JCTools libraries. Impl-wise what's done here is kind of a combination of the JDK's Given some of the reasons metrics are collected in the first place, I expect it's commonly done in performance sensitive contexts. Hence imho a first-order goal of a client library such as this should be minimal runtime overhead. This change eliminates all garbage on individual events and improves throughput by 2-3x. Maintenance-wise I really don't feel it would be much of an issue given the simplicity of the data structure and the fact that this project is open source with many eyes on it. |
@brian-brazil I can close my #445 to favour this one from @njhill 👍
I vote for JCtools but I'm just biased on it :) |
Dependencies are purposefully avoided in this library, to avoid issues including it.
That doesn't mean that the implementation is correct. I prefer the other proposed approaches. |
Thanks @brian-brazil @franz1981
This makes a lot of sense, although if there was one which would be really useful you could still consider shading? But I wasn't actually suggesting any of those would necessarily be applicable here, just giving examples of how there are many specialized impls for different use cases which would never be added to the JDK itself.
Could you elaborate on what you mean by correct here? I did actually find a small bug and have pushed another commit which fixes that, has some other refinements, and adds a bunch of comments explaining that "big" method. Given your comments, I was still thinking to restructure it a bit further so that the "map" impl is better encapsulated/separated from the existing logic.
I'd be interested to understand the reasoning behind this. This PR isn't much different to those in terms of amount of additional code. If you don't care about performance then why would you make any change at all? If you do care about performance then why would you not go with this approach that outperforms the others considerably (and eliminates all allocations on the hot path)? FWIW here's new benchmarks on the latest update. I think the areformentioned hash "fix" may have actually improved things even more (note I ran on a machine with less noise this time so the baselines are higher too). Before:
After:
|
38d2ba5
to
c77a19a
Compare
For reference I ran the same benchmark on the other proposed impls: #445:
#459:
|
And for completeness here's allocations from GC profile: Current:
This PR:
#445:
#459:
|
They don't involve implementing a bespoke data structure, so should be more maintainable and more likely to be correct. |
@brian-brazil you sure drive a hard bargain :) As data structures go, it's about as simple as it gets. My code may be obfuscating this a bit but we are just putting the labels/child pairs into an array instead of a map. Lookups are just a straight loop over it, the hashcode of the labels is used to decide where to start looking from. The copy-on-write aspect just means that any time we are going to modify the array we copy it first and replace the whole thing with the modified copy. So again in terms of maintenance overhead I don't see how this would be much different to the other "specialized" logic being considered. A few more questions (and thank you for continuing to humour me!):
|
This is an optimization of the SimpleCollector.labels(...) lookups with a similar goal to prometheus#445 and prometheus#459. It has some things in common with those PRs (including overridden fixed-args versions) but aims to provide best of all worlds - zero garbage and higher throughput for all label counts, without any reliance on thread reuse. To achieve this, ConcurrentHashMap is abandoned in favour of a custom copy-on-write linear-probe hashtable. Benchmark results Before: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84731357.558 ± 535745.023 ops/s oneLabel thrpt 20 36415789.294 ± 441116.974 ops/s twoLabels thrpt 20 33301282.259 ± 313669.132 ops/s threeLabels thrpt 20 24560630.904 ± 2247040.286 ops/s fourLabels thrpt 20 24424456.896 ± 288989.596 ops/s fiveLabels thrpt 20 18356036.944 ± 949244.712 ops/s After: Benchmark Mode Cnt Score Error Units baseline thrpt 20 84866162.495 ± 823753.503 ops/s oneLabel thrpt 20 84554174.645 ± 804735.949 ops/s twoLabels thrpt 20 85004332.529 ± 689559.035 ops/s threeLabels thrpt 20 73395533.440 ± 3022384.940 ops/s fourLabels thrpt 20 68736143.734 ± 1872048.923 ops/s fiveLabels thrpt 20 53482207.003 ± 488751.990 ops/s This benchmark, like the prior ones, only tests with a single sequence of labels for each count. It would be good to extend it to cover cases where the map is populated with a larger number of children. Signed-off-by: nickhill <[email protected]>
Signed-off-by: nickhill <[email protected]>
Closing in favour of #486 |
prometheus#460 proposed an optimized zero-GC version of the child lookup logic which was deemed too specialized for inclusion in the library. Subjectively less complex alternatives were also proposed which provide some but not as much performance/garbage improvement, and rely for example on some per-thread overhead. This PR aims to add a minimally-invasive mechanism to allow users to plug in an implementation of their choice, so that performance sensitive consumers can opt for minimal overhead without the core library having to include the corresponding code. Signed-off-by: nickhill <[email protected]>
This is an optimization of the
SimpleCollector.labels(...)
lookups with a similar goal to #445 and #459.It has some things in common with those PRs (including overridden fixed-args versions) but aims to provide best of all worlds - zero garbage and higher throughput for all label counts, without any reliance on thread reuse.
To achieve this,
ConcurrentHashMap
is abandoned in favour of a custom copy-on-write linear-probe hashtable.Benchmark results:
Before:
After:
This benchmark, like the prior ones, only tests with a single sequence of labels for each count. It would be good to extend it to cover cases where the map is populated with a larger number of children.