perf: Replace BTreeMap with sorted vec by michael-weigelt · Pull Request #1 · michael-weigelt/regalloc2

michael-weigelt · 2026-06-05T19:30:18Z

A flamegraph showed that during (non-optimizing) compilation of a Wasm function with 1000s of locals, try_to_allocate_bundle_to_reg spends a lot of time (14% overall) peeking and walking a BTreeMap. After bytecodealliance#261, those 14% are more like 30%, so this PR replaces that BTreeMap by a sorted vec (with a BTreeMap-like interface), which is much more cache-friendly. The trade-off is that mutations are O(log n) instead of O(1), but they seem to be rare enough compared to walking the tree that it's worth it.

Note that I have not found a benchmark for the general-purpose case, so I can only report the speedups of my slightly degenerate usecase of many-locals-functions, compiled with all optimizations turned off.

Compiling a Wasm function with X locals before and after bytecodealliance#261 and after this PR:

                                  PR 261's improvement      this PR's improvement
1000 locals: 780ms compilation -> 363ms   -53%   -> (PR2) 216ms  -40%  overall: -72%
10k  locals:  66s compilation ->   23s    -65%   ->        10s   -56%  overall: -84%
40k  locals: 952s compilation ->  356s    -62%   ->       166s   -53%  overall: -82%

cfallin · 2026-06-05T20:43:17Z

(I saw this referenced from the other PR, I hope you don't mind my comments here)

While the speedup is indeed impressive on your particular use-case, I think this is a change we will not be able to take, for an algorithmic-complexity reason: the allocation map per preg will see a lot of insertions into the middle, and can see removals (on backtracking) as well. A sorted Vec is great if we can sort once and then rely on bsearch to locate keys and scan from there, but will degrade to quadratic behavior overall in any workload that allocates a lot of liveranges in any order that is not top-to-bottom (and in general liveranges are sorted by weight/score so they'll be processed in some arbitrary order). Thus this will pose a huge blowup risk to compile time.

(Nevertheless I'll be curious what you find on Sightglass with other benchmarks, especially large ones like SpiderMonkey; that will tell us how common this blowup risk is and can still feed into thinking.)

replace by sorted vec

d7b1cb2

michael-weigelt marked this pull request as draft June 5, 2026 20:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Replace BTreeMap with sorted vec#1

perf: Replace BTreeMap with sorted vec#1
michael-weigelt wants to merge 1 commit into
mwe/conflict_setfrom
mwe/live_range_set

michael-weigelt commented Jun 5, 2026 •

edited

Loading

Uh oh!

cfallin commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

michael-weigelt commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michael-weigelt commented Jun 5, 2026 •

edited

Loading