Skip to content

perf: Replace BTreeMap with sorted vec#1

Draft
michael-weigelt wants to merge 1 commit into
mwe/conflict_setfrom
mwe/live_range_set
Draft

perf: Replace BTreeMap with sorted vec#1
michael-weigelt wants to merge 1 commit into
mwe/conflict_setfrom
mwe/live_range_set

Conversation

@michael-weigelt

@michael-weigelt michael-weigelt commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Follow-up to bytecodealliance#261.

A flamegraph showed that during (non-optimizing) compilation of a Wasm function with 1000s of locals, try_to_allocate_bundle_to_reg spends a lot of time (14% overall) peeking and walking a BTreeMap. After bytecodealliance#261, those 14% are more like 30%, so this PR replaces that BTreeMap by a sorted vec (with a BTreeMap-like interface), which is much more cache-friendly. The trade-off is that mutations are O(log n) instead of O(1), but they seem to be rare enough compared to walking the tree that it's worth it.

Note that I have not found a benchmark for the general-purpose case, so I can only report the speedups of my slightly degenerate usecase of many-locals-functions, compiled with all optimizations turned off.

Compiling a Wasm function with X locals before and after bytecodealliance#261 and after this PR:

                                  PR 261's improvement      this PR's improvement
1000 locals: 780ms compilation -> 363ms   -53%   -> (PR2) 216ms  -40%  overall: -72%
10k  locals:  66s compilation ->   23s    -65%   ->        10s   -56%  overall: -84%
40k  locals: 952s compilation ->  356s    -62%   ->       166s   -53%  overall: -82%

@michael-weigelt michael-weigelt marked this pull request as draft June 5, 2026 20:30
@cfallin

cfallin commented Jun 5, 2026

Copy link
Copy Markdown

(I saw this referenced from the other PR, I hope you don't mind my comments here)

While the speedup is indeed impressive on your particular use-case, I think this is a change we will not be able to take, for an algorithmic-complexity reason: the allocation map per preg will see a lot of insertions into the middle, and can see removals (on backtracking) as well. A sorted Vec is great if we can sort once and then rely on bsearch to locate keys and scan from there, but will degrade to quadratic behavior overall in any workload that allocates a lot of liveranges in any order that is not top-to-bottom (and in general liveranges are sorted by weight/score so they'll be processed in some arbitrary order). Thus this will pose a huge blowup risk to compile time.

(Nevertheless I'll be curious what you find on Sightglass with other benchmarks, especially large ones like SpiderMonkey; that will tell us how common this blowup risk is and can still feed into thinking.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants