-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8366588: VectorAPI: Re-intrinsify VectorMask.laneIsSet where the input index is a variable #27113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…t index is a variable Intrinsic support for `VectorMask.laneIsSet` with a **variable** input index was introduced in PR openjdk#14200, but was inadvertently broken by PR openjdk#25673. This PR restores the intrinsic functionality and adds some JTReg tests. Benchmarks on Nvidia Grace machine with 128-bit SVE: ``` Benchmark Unit Before Score Error After Score Error Uplift microMaskLaneIsSetByte128_var ops/ms 21702.14415 91.902159 103472.9391 36.057447 4.767867 microMaskLaneIsSetByte64_var ops/ms 21468.51868 107.94177 103365.6561 69.47736 4.814754 microMaskLaneIsSetDouble128_var ops/ms 77489.32791 153.242699 413499.4127 311.854079 5.336211 microMaskLaneIsSetFloat128_var ops/ms 41034.95204 399.421823 206840.0988 74.702234 5.040583 microMaskLaneIsSetFloat64_var ops/ms 77607.40268 175.938921 413745.3001 149.716794 5.33126 microMaskLaneIsSetInt128_var ops/ms 41452.48893 76.143208 206845.9754 59.371129 4.989953 microMaskLaneIsSetInt64_var ops/ms 77726.2542 173.180518 413427.8838 363.575023 5.319024 microMaskLaneIsSetLong128_var ops/ms 77646.11218 177.496587 413403.4404 236.609314 5.3242 microMaskLaneIsSetShort128_var ops/ms 21374.93265 48.13101 103417.4618 34.827021 4.838259 microMaskLaneIsSetShort64_var ops/ms 41066.19395 353.320621 206801.109 106.408938 5.035799 ``` Benchmarks on Intel 6444y machine with 512-bit avx3: ``` Benchmark Unit Before Score Error After Score Error Uplift microMaskLaneIsSetByte128_var ops/ms 57658.45497 240.209309 211643.8406 29.214532 3.670647 microMaskLaneIsSetByte256_var ops/ms 57451.68169 116.994128 211609.4652 160.48513 3.683259 microMaskLaneIsSetByte512_var ops/ms 57530.22411 311.63868 199802.8084 408.144015 3.473005 microMaskLaneIsSetByte64_var ops/ms 57642.2672 161.406221 205252.4464 196.86852 3.560797 microMaskLaneIsSetDouble256_var ops/ms 114401.3789 231.797375 361400.344 565.593984 3.159055 microMaskLaneIsSetDouble512_var ops/ms 57379.27882 159.699503 211476.1138 136.980026 3.685583 microMaskLaneIsSetFloat128_var ops/ms 113943.9512 141.062663 360855.3915 494.471996 3.166955 microMaskLaneIsSetFloat256_var ops/ms 57682.78182 138.142053 211659.5098 30.167972 3.66937 microMaskLaneIsSetFloat512_var ops/ms 57617.66405 301.748599 211246.8588 597.18949 3.666355 microMaskLaneIsSetInt128_var ops/ms 113914.5062 118.681382 360856.4465 555.097397 3.167783 microMaskLaneIsSetInt256_var ops/ms 57681.79883 112.391639 211555.6742 217.556981 3.667633 microMaskLaneIsSetInt512_var ops/ms 57350.20346 206.146723 211657.7207 68.461571 3.690618 microMaskLaneIsSetLong256_var ops/ms 113838.3822 415.784529 360782.0645 710.076899 3.169247 microMaskLaneIsSetLong512_var ops/ms 57314.02695 190.1762 211690.8492 26.47233 3.693526 microMaskLaneIsSetShort128_var ops/ms 57675.58965 65.940976 211549.9551 276.57545 3.667928 microMaskLaneIsSetShort256_var ops/ms 57628.8642 91.957833 211694.0864 16.559412 3.673403 microMaskLaneIsSetShort512_var ops/ms 57845.35211 160.537421 211358.872 660.777147 3.65386 microMaskLaneIsSetShort64_var ops/ms 113848.8846 222.787418 360294.6295 491.425656 3.164674 ```
👋 Welcome back erifan! A progress list of the required criteria for merging this PR into |
@erifan This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 101 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@shipilev, @XiaohongGong, @eme64) but any other Committer may sponsor as well. ➡️ To flag this PR as ready for integration with the above commit message, type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me. I took another look at JDK-8358749, and I think this is the only place where we can really accept the non-constant input. In all other cases, we either pull is_con()
or const_oop()
out of the input.
I think we will bikeshed about the tests a bit.
test/micro/org/openjdk/bench/jdk/incubator/vector/VectorExtractBenchmark.java
Show resolved
Hide resolved
The patch looks reasonable, thanks for fixing this and writing an IR test! |
Thanks for your help @eme64 @XiaohongGong @shipilev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Patch looks good to me, testing passed :)
Thanks for working on this @erifan !
/integrate |
/sponsor |
Going to push as commit 53b3e05.
Your commit was automatically rebased without conflicts. |
@XiaohongGong @erifan Pushed as commit 53b3e05. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Intrinsic support for
VectorMask.laneIsSet
with a variable input index was introduced in PR #14200, but was inadvertently broken by PR #25673. This PR restores the intrinsic functionality and adds some JTReg tests.Benchmarks on Nvidia Grace machine with 128-bit SVE:
Benchmarks on Intel 6444y machine with 512-bit avx3:
Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27113/head:pull/27113
$ git checkout pull/27113
Update a local copy of the PR:
$ git checkout pull/27113
$ git pull https://git.openjdk.org/jdk.git pull/27113/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 27113
View PR using the GUI difftool:
$ git pr show -t 27113
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27113.diff
Using Webrev
Link to Webrev Comment