Skip to content
This repository was archived by the owner on Nov 6, 2020. It is now read-only.

Conditional -Zorbit compilation for maximum performance #1941

Closed
MagaTailor opened this issue Aug 17, 2016 · 10 comments
Closed

Conditional -Zorbit compilation for maximum performance #1941

MagaTailor opened this issue Aug 17, 2016 · 10 comments
Labels
Z1-question 🙋‍♀️ Issue is a question. Closer should answer.

Comments

@MagaTailor
Copy link

MagaTailor commented Aug 17, 2016

I've noticed MIR trans provides some benefit to the trie group of benchmarks (at least on ARM) - which crates should be compiled with -Z orbit to get the most benefit, disabling MIR for the rest?

Here's a recent comparison (using a 1.12 LLVM 3.9-based nightly):

name                             util-gcc6-mir ns/iter  util-gcc6-old ns/iter    diff ns/iter   diff %
bench_decode_nested_empty_lists  3,388                         3,700                                    312    9.21%
bench_decode_u256_value          904                           901                                       -3   -0.33%
bench_decode_u64_value           399                           452                                       53   13.28%
bench_stream_1000_empty_lists    88,257                        87,543                                  -714   -0.81%
bench_stream_nested_empty_lists  3,950                         4,259                                    309    7.82%
bench_stream_u256_value          6,190                         6,503                                    313    5.06%
bench_stream_u64_value           2,466                         2,874                                    408   16.55%
sha3x10000                       186,941,393                   193,395,637                        6,454,244    3.45%
trie_insertions_32_mir_1k        72,408,100                    92,209,137                        19,801,037   27.35%
trie_insertions_32_ran_1k        71,814,096                    91,622,633                        19,808,537   27.58%
trie_insertions_random_mid       62,045,029                    84,110,081                        22,065,052   35.56%
trie_insertions_six_high         57,629,898                    76,251,927                        18,622,029   32.31%
trie_insertions_six_low          130,386,901                   175,460,013                       45,073,112   34.57%
trie_insertions_six_mid          79,026,246                    104,855,125                       25,828,879   32.68%
triehash_insertions_32_mir_1k    38,926,869                    41,735,688                         2,808,819    7.22%
triehash_insertions_32_ran_1k    38,605,667                    41,325,686                         2,720,019    7.05%
triehash_insertions_random_mid   20,372,241                    21,985,652                         1,613,411    7.92%
triehash_insertions_six_high     25,663,877                    27,670,791                         2,006,914    7.82%
triehash_insertions_six_low      37,859,561                    40,780,982                         2,921,421    7.72%
triehash_insertions_six_mid      29,056,901                    31,329,316                         2,272,415    7.82%
u128_mul                         1,611,461                     635,104                             -976,357  -60.59%
u256_add                         1,655,461                     628,904                           -1,026,557  -62.01%
u256_full_mul                    25,836,778                    21,379,947                        -4,456,831  -17.25%
u256_mul                         2,387,266                     1,203,308                         -1,183,958  -49.59%
u256_sub                         1,655,711                     628,804                           -1,026,907  -62.02%
u512_add                         1,498,710                     1,015,707                           -483,003  -32.23%
u512_sub                         1,606,111                     1,066,007                           -540,104  -33.63%

@arkpar?

@arkpar
Copy link
Collaborator

arkpar commented Aug 17, 2016

Ideally all the crates should enable MIR but we'll have to wait till it hits stable version of the compiler. It's really weird that bigint arithmetics is actually slower. This requires additional investigation.

With 1.12 we can also enable "lto" in cargo.toml without crashing the compiler. Should also result in overall performance boost (About 5% faster block import on my machine)

@MagaTailor
Copy link
Author

Right, been compilng with lto for at least 2 months but never benchamrked the effect until yesterday:

name                             util-gcc6-lto-old ns/iter  util-gcc6-nonlto-old ns/iter    diff ns/iter  diff %
bench_decode_nested_empty_lists  3,716                      3,700                                    -16  -0.43%
bench_decode_u256_value          901                        901                                        0   0.00%
bench_decode_u64_value           452                        452                                        0   0.00%
bench_stream_1000_empty_lists    87,972                     87,543                                  -429  -0.49%
bench_stream_nested_empty_lists  4,264                      4,259                                     -5  -0.12%
bench_stream_u256_value          6,500                      6,503                                      3   0.05%
bench_stream_u64_value           2,875                      2,874                                     -1  -0.03%
sha3x10000                       186,930,893                193,395,637                        6,464,744   3.46%
trie_insertions_32_mir_1k        85,721,192                 92,209,137                         6,487,945   7.57%
trie_insertions_32_ran_1k        85,223,189                 91,622,633                         6,399,444   7.51%
trie_insertions_random_mid       78,398,642                 84,110,081                         5,711,439   7.29%
trie_insertions_six_high         70,994,491                 76,251,927                         5,257,436   7.41%
trie_insertions_six_low          163,879,533                175,460,013                       11,580,480   7.07%
trie_insertions_six_mid          97,783,676                 104,855,125                        7,071,449   7.23%
triehash_insertions_32_mir_1k    39,232,671                 41,735,688                         2,503,017   6.38%
triehash_insertions_32_ran_1k    38,866,268                 41,325,686                         2,459,418   6.33%
triehash_insertions_random_mid   20,716,343                 21,985,652                         1,269,309   6.13%
triehash_insertions_six_high     26,069,380                 27,670,791                         1,601,411   6.14%
triehash_insertions_six_low      38,494,066                 40,780,982                         2,286,916   5.94%
triehash_insertions_six_mid      29,556,304                 31,329,316                         1,773,012   6.00%
u128_mul                         656,704                    635,104                              -21,600  -3.29%
u256_add                         649,804                    628,904                              -20,900  -3.22%
u256_full_mul                    22,177,053                 21,379,947                          -797,106  -3.59%
u256_mul                         1,239,008                  1,203,308                            -35,700  -2.88%
u256_sub                         673,604                    628,804                              -44,800  -6.65%
u512_add                         1,048,307                  1,015,707                            -32,600  -3.11%
u512_sub                         1,100,107                  1,066,007                            -34,100  -3.10%

@MagaTailor
Copy link
Author

However, my question was about the current situation, where you have to use -Zorbit=off. Which crates would you recommend for -Zorbit compilation (even manually via cargo -p) to exploit the positive effect MIR has?

@gavofyork gavofyork added the Z1-question 🙋‍♀️ Issue is a question. Closer should answer. label Aug 17, 2016
@arkpar
Copy link
Collaborator

arkpar commented Aug 17, 2016

It is hard to say without doing benchmark on all of them. ethcore-util seems like obvious candidate. ethcore might benefit too. Bigint regression seems to be caused by rust-lang/rust#35662. Once it is fixed it should be safe to enable MIR for all crates.

@MagaTailor
Copy link
Author

MagaTailor commented Aug 17, 2016

Doubt it, the bigint MIR problem was already present at the beginning of June, probably earlier (a9234c11e 2016-06-10) but it slipped under my radar :)

@rphmeier
Copy link
Contributor

rphmeier commented Aug 18, 2016

As they're currently in the process of removing the old, AST based rustc_trans backend, I'm not sure it's worthwhile to investigate selective combinations of crates to compile with and without it.

There are a few regressions in performance due to MIR codegen issues, but it's reasonable to expect those to improve and potentially surpass the previous trans backend in the coming months.

@MagaTailor
Copy link
Author

Meanwhile, it should be possible to mix and match. However, as @arkpar noted, it might not be as straightforward as compiling a few crates with MIR enabled.

@MagaTailor
Copy link
Author

@arkpar Also, BTW, due to the nature of the regression (ubiquitous int arithmetic) , #[rustc_no_mir] is probably useless?

@arkpar
Copy link
Collaborator

arkpar commented Aug 22, 2016

afaik #[rustc_no_mir] operates on a function level. So it can be used for now. But there's little sense in doing that since AST-based backend will be removed soon anyway.

@MagaTailor
Copy link
Author

MagaTailor commented Aug 22, 2016

Very well, I'll try spicing up a few functions locally.

#![feature(rustc_attrs)] needs adding to affected crates' attributes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Z1-question 🙋‍♀️ Issue is a question. Closer should answer.
Projects
None yet
Development

No branches or pull requests

4 participants