perf: Replace BTreeMap with sorted vec#1
Conversation
|
(I saw this referenced from the other PR, I hope you don't mind my comments here) While the speedup is indeed impressive on your particular use-case, I think this is a change we will not be able to take, for an algorithmic-complexity reason: the allocation map per preg will see a lot of insertions into the middle, and can see removals (on backtracking) as well. A sorted Vec is great if we can sort once and then rely on bsearch to locate keys and scan from there, but will degrade to quadratic behavior overall in any workload that allocates a lot of liveranges in any order that is not top-to-bottom (and in general liveranges are sorted by weight/score so they'll be processed in some arbitrary order). Thus this will pose a huge blowup risk to compile time. (Nevertheless I'll be curious what you find on Sightglass with other benchmarks, especially large ones like SpiderMonkey; that will tell us how common this blowup risk is and can still feed into thinking.) |
Follow-up to bytecodealliance#261.
A flamegraph showed that during (non-optimizing) compilation of a Wasm function with 1000s of locals,
try_to_allocate_bundle_to_regspends a lot of time (14% overall) peeking and walking aBTreeMap. After bytecodealliance#261, those 14% are more like 30%, so this PR replaces thatBTreeMapby a sorted vec (with aBTreeMap-like interface), which is much more cache-friendly. The trade-off is that mutations areO(log n)instead ofO(1), but they seem to be rare enough compared to walking the tree that it's worth it.Note that I have not found a benchmark for the general-purpose case, so I can only report the speedups of my slightly degenerate usecase of many-locals-functions, compiled with all optimizations turned off.
Compiling a Wasm function with X locals before and after bytecodealliance#261 and after this PR: