Potential perf improvements #26

matklad · 2021-11-19T12:23:13Z

Look at what APIs jemalloc itself has for introspection
Replace pub static TID: RefCell<usize> = RefCell::new(0); and the like with pub static TID: Cell<usize> = Cell::new(0);
Replace with const thread locals once it is stable Tracking Issue for const-initialized thread locals rust-lang/rust#84223
Replace with thread::current().id().as_u64() once that is stable
In general, minimize usage of thread-locals, they are quite slow in today's Rust
Spread out MEM_SIZE array more to avoid false sharing. Threads get sequential IDs, so may hammer the same cache line during allocation
Replace SeqCst with relaxed, I think that should be enough for our use case.
Remove TOTAL_MEMORY_USAGE as it's the sum of per-thread usages.
rand::thread_rng().gen_range(0, 100) feels like something optimizable: thread_rng is crypto secure I thing (or at least stroger than xorshift).
https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.html I think uses "counting memory usage by threads" as one of the running examples, might be worth reading (warning: deep rabbit hole).
https://blog.mozilla.org/nnethercote/category/memory-allocation/ might also be an interesting read

The text was updated successfully, but these errors were encountered:

pmnoxx · 2021-12-04T04:33:23Z

@matklad Thanks. I added a benchmark, which shows performance degradation: #27

pmnoxx · 2021-12-16T09:10:16Z

7. Remove TOTAL_MEMORY_USAGE as it's the sum of per-thread usages.

@matklad I did some bench-marking, it saves round 0.4-0.6ns from each run.
#52

Though, this benchmark is single threaded. This variable could potentially, affect multi threaded applications more during high congestion.

pmnoxx · 2021-12-16T09:38:48Z

8. rand::thread_rng().gen_range(0, 100) feels like something optimizable: thread_rng is crypto secure I thing (or at least stroger than xorshift).

Yes, that's a big issue. I reduced time from 38ns to 12ns, by removing usage of rand. #53

pmnoxx · 2021-12-16T16:57:40Z

6. Replace SeqCst with relaxed, I think that should be enough for our use case.

matklad mentioned this issue Nov 19, 2021

Performance degradation related to near-rust-allocator-proxy near/nearcore#4157

Closed

pmnoxx added the C-discusson label Dec 5, 2021

pmnoxx self-assigned this Dec 5, 2021

pmnoxx removed their assignment Mar 7, 2022

Provide feedback