-
Notifications
You must be signed in to change notification settings - Fork 9
Vectorized HttpUserAgentParser.TryExtractVersion #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@BenjaminAbt we added tests for the faulty user agent that caused troubles the last time? I don't remember it... |
// Find first digit | ||
int start = -1; | ||
for (int i = 0; i < haystack.Length; i++) | ||
if (Vector256.IsHardwareAccelerated && haystack.Length >= 2 * Vector256<short>.Count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure the implementation is correct, but should we add some kind of toggle (env variable) with which vectorization can be disabled in case there's a bug in the code (i.e. from some strange user agent that we don't know at the moment), so a user could still use the lib w/o the need to wait for hotfix release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the environment variable is visible to users. It's kind of hidden black magic and users will remove the library without checking. For me it feels the IndexOf way is the most stable and a very fast solution for now, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IndexOf
is quite good, but for the 11.0) like Gecko
benchmark it's +75% slower and such short versions are quite common.
The specialized vectorized code is nice, but it's a lot of hard to maintain code so actually I don't like this approach very much.
I'd like to keep this PR open for a moment, so I'll be able to explore other approaches for speed-up too.
The easiest one is to shorten the "window" in
HttpUserAgentParser/src/HttpUserAgentParser/HttpUserAgentParser.cs
Lines 210 to 214 in 4a82130
const int Window = 128; | |
if (haystack.Length > Window) | |
{ | |
haystack = haystack.Slice(0, Window); | |
} |
Further the biggest speed-up may come from a not linear scan through all possible patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further the biggest speed-up may come from a not linear scan through all possible patterns.
In my head I'm ready with such an approach, but I need to finish a work-project, then I'll prototype it and see how it goes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free, have complete confidence in your everything looks impressively good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first try with that idea is nice, as the user agent only needs to be scaned once ($O(n)$) and not for every possibility.
But for some browsers, where we depend on the order in the arrays, the wrong result is yielded. So I have to research a bit more.
Maybe there will move also some things around (w/o API breaking changes), so I'd like to leave this PR open so that all together can be done or only parts of it.
I hope to continue on this next weeks.
Fixes #73
I tried quite a couple of variants (more than in the below benchmark code), but it's really hard to beat the scalar implementation, as the code is quite trivial and straightforward for the cpu to prefetch, etc.
So for vectorization something along the comment in
HttpUserAgentParser/src/HttpUserAgentParser/HttpUserAgentParser.cs
Lines 212 to 219 in 57bfa71
benchmark code
The cases where a version is found profit from vectorization (the second benchmark is is juse in the ns-range different), but when no version is found the vectorized approach is slower -- here the
IndexOf
-variant shines.Overall I'm not really shure if the added code-complexity is worth it to add this to the codebase.$O(n)$ and this will give more perf (maybe by some kind of tree structure). I'll think about this later, leaving the decision to take this PR open to @BenjaminAbt 😉.
I see more potential in finding browser, bots, etc. by a better approach. ATM we loop over the known sets, but I think there's some better approach that isn't