perf: fold LSB-test i32.and X 1 into i32.ctz in boolean contexts#8562
Open
ggreif wants to merge 3 commits intoWebAssembly:mainfrom
Open
perf: fold LSB-test i32.and X 1 into i32.ctz in boolean contexts#8562ggreif wants to merge 3 commits intoWebAssembly:mainfrom
i32.and X 1 into i32.ctz in boolean contexts#8562ggreif wants to merge 3 commits intoWebAssembly:mainfrom
Conversation
…X; if E T` An if-else conditioned on `(i32.and X (i32.const 1))` tests the LSB of X. Since `i32.ctz X == 0` iff the LSB of X is set, we can replace the condition with `i32.ctz X` and swap the branches — saving one instruction. Handles the constant on either side (left or right of `and`). Relates to: WebAssembly#5752 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…an context In boolean contexts (if, br_if, select), `eqz(and X 1)` and `ctz X` have the same truthiness: both are truthy iff LSB(X) == 0. Replacing eqz+and with ctz saves one instruction and covers the primary pattern from WebAssembly#5752: i32.const 1; i32.and; i32.eqz; br_if N ==> i32.ctz; br_if N This fires via `optimizeBoolean`, so it covers `if`, `br_if`, and `select` conditions in one place. Observed ~26–105 hits across Motoko RTS variants. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
i32.and X 1; if T E into i32.ctz X; if E Ti32.and X 1 into i32.ctz in boolean contexts
ggreif
added a commit
to caffeinelabs/motoko
that referenced
this pull request
Apr 1, 2026
Add ggreif/binaryen (branch gabor/lsb-if-ctz-flake) as a flake input, exposing a patched wasm-opt that folds LSB-test `i32.and X 1` patterns into `i32.ctz` (WebAssembly/binaryen#8562). Apply it to the non-debug RTS variants in installPhase, yielding ~0.2% instruction count reductions in GC-heavy benchmarks. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Member
|
Interesting. I worry this is not always faster, though: AND usually has a cost of 1, while TZCNT often has 2: https://www.agner.org/optimize/instruction_tables.pdf Perhaps check what LLVM does here? They likely reasoned about this thoroughly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
An if-else conditioned on
(i32.and X (i32.const 1))tests the least significant bit of X. Sincei32.ctz X == 0iff the LSB of X is set, we can replace the condition withi32.ctz Xand swap the branches — saving one instruction.The second commit extends this to the primary pattern from the issue —
eqz(and X 1)as a boolean condition (used inbr_if,if,select) — handled inoptimizeBooleanso all three sites benefit from one insertion.and)visitIf:(and X 1); if T E→(ctz X); if E ToptimizeBoolean:eqz(and X 1)→ctz X— covers the typicalbr_if (eqz (and X 1))patternMotivation
Filed in #5752. The Motoko compiler already implements this in its own peephole optimizer (
instrList.ml); the goal is to bring it towasm-optso that hand-written Wasm (e.g. the Motoko RTS, written in Rust) benefits too.The
optimizeBooleanrule alone fires 26–105 times across the three Motoko RTS variants (mo-rts-eop,mo-rts-incremental,mo-rts-non-incremental), targeting theis_skewed/is_scalarpointer-tagging checks in the GC hot path.Applying
wasm-opt --optimize-instructionsto the Motoko RTS and running the benchmark suite shows the following gross effects (the submitted optimisation is a contributing factor alongside other rules triggered in the same pass):heap-32(GC-heavy, run 1)heap-32(run 2)heap-64(run 1)heap-64(run 2)bignumcandid-subtype-costThe GC-heavy heap benchmarks benefit most, consistent with the
is_skewedcheck firing frequently during pointer traversal.Test plan
test/lit/passes/optimize-instructions-lsb-if.wastcoversif(const left and right) andbr_if (eqz (and X 1))i32.ctzin the output🤖 Generated with Claude Code