Description
Our compile times and memory footprint regressed substantially between Rust 1.49.0 and 1.50.0.
This regression is a hard blocker for us to be able to upgrade our project from 1.49.0. As this is a severe issue for us, please let us know if there's anything we can do to provide more diagnostics, test changes, etc.
This may be related to other recent issues that describe similar behavior:
- Compiler using over 20GiB memory #82406
- High memory use on Rust 1.51.0 with thin LTO and debuginfo #83911
Code
This regression is observed on recent versions of the Linkerd proxy. It's known that the proxy can manifest large type signatures, so builds disable debug symbols by default
This regression is not obvious with other, smaller projects that we maintain, so I'm unable to provide a smaller repro.
Version it worked on
With Rust 1.49.0, the binary compiles in a little over two minutes, using a little over 1GB of memory:
cargo clean && /usr/bin/time -v cargo +1.49.0 build -p linkerd2-proxy
...
Finished dev [unoptimized] target(s) in 2m 12s
Command being timed: "cargo +1.49.0 build -p linkerd2-proxy"
User time (seconds): 326.00
System time (seconds): 14.55
Percent of CPU this job got: 256%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.78
...
Maximum resident set size (kbytes): 1308088
...
Exit status: 0
Version with regression
Using Rust 1.50.0, rustc runs for about 40 minutes before exhausting the system's memory:
cargo clean && /usr/bin/time -v cargo +1.50.0 build -p linkerd2-proxy
process didn't exit successfully: `rustc --crate-name linkerd2_proxy --edition=2018 linkerd2-proxy/src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="multicore"' --cfg 'feature="num_cpus"' -C metadata=c2caa01c24c285f7 -C extra-filename=-c2caa01c24c285f7 --out-dir /home/ver/b/linkerd2-proxy/target/debug/deps -C incremental=/home/ver/b/linkerd2-proxy/target/debug/incremental -L dependency=/home/ver/b/linkerd2-proxy/target/debug/deps --extern futures=/home/ver/b/linkerd2-proxy/target/debug/deps/libfutures-01f6240a9e4201d3.rlib --extern linkerd_app=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_app-dd18fcabeef913ae.rlib --extern linkerd_signal=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_signal-14263c92fcb0fc0b.rlib --extern num_cpus=/home/ver/b/linkerd2-proxy/target/debug/deps/libnum_cpus-74ee144d51bc047e.rlib --extern tokio=/home/ver/b/linkerd2-proxy/target/debug/deps/libtokio-de277ade05efd960.rlib --extern tracing=/home/ver/b/linkerd2-proxy/target/debug/deps/libtracing-03fd6826b817dabe.rlib -L native=/home/ver/b/linkerd2-proxy/target/debug/build/ring-a8b802a4aa425398/out` (signal: 9, SIGKILL: kill)
Command exited with non-zero status 101
Command being timed: "cargo +1.50.0 build -p linkerd2-proxy"
User time (seconds): 2986.18
System time (seconds): 101.73
Percent of CPU this job got: 133%
Elapsed (wall clock) time (h:mm:ss or m:ss): 38:29.28
...
Maximum resident set size (kbytes): 62953524
...
Exit status: 101
rustc --version --verbose
:
rustc 1.50.0 (cb75ad5db 2021-02-10)
binary: rustc
commit-hash: cb75ad5db02783e8b0222fee363c5f63f7e2cf5b
commit-date: 2021-02-10
host: x86_64-unknown-linux-gnu
release: 1.50.0
We see similar behavior with more recent versions of Rust as well, including 1.51.0 and nightly (cargo 1.53.0-nightly (0ed318d18 2021-04-23)
) as well.
cc @hawkw, who can provide some more details. I believe we've tested with lto=off
without any changes in behavior.
@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged
Activity
jyn514 commentedon May 3, 2021
@olix0r can you paste the output of
cargo rustc -- -Z time-passes
? It should have some output before it hangs.jyn514 commentedon May 3, 2021
If you can come up with a smaller example than "linkerd" that would also be helpful, but it will be more difficult than just running time-passes.
olix0r commentedon May 3, 2021
time-passes
olix0r commentedon May 3, 2021
We're going to try to see if we can avoid this with boxing, which may help us identify a smaller repro, but this may take some time...
jyn514 commentedon May 3, 2021
Wow, that is a lot of memory in LLVM and a lot of time in
partition_and_assert_distinct_symbols
. I wouldn't expect that to be so expensive:rust/compiler/rustc_mir/src/monomorphize/partitioning/mod.rs
Lines 350 to 362 in 716394d
Not sure who to ask about that - maybe @wesleywiser has ideas what's going on?
13 remaining items
Prepare for Rust 1.50+ by boxing large futures (#1003)
audunska commentedon May 14, 2021
I reported a bug against tokio-rs/tracing, which turned out to be because of this bug. My reproducing repo could maybe be helpful as a minimal reproducing example?
-Zverbose
mode #86240audunska commentedon Aug 6, 2021
I got curious and tested my reproducing repo on nightly, and compile times turned normal again between June 12 and 13. So #86240 seems to have fixed it for me.
apiraino commentedon Aug 6, 2021
thank you @audunska for the feedback! Now I'm curious, it would be interesting to hear also from @olix0r how compiling the linkerd-proxy codebase looks like now
olix0r commentedon Aug 9, 2021
This looks promising for us! We ended up adding
Box
es throughout the stack so we could upgrade Rust, but we still see improvements on nightly compiling from the latest main (linkerd/linkerd2-proxy@d843eb1):cargo 1.54.0 (5ae8d74b3 2021-06-22)
cargo 1.56.0-nightly (cc17afbb0 2021-08-02)
Nightly uses less than half of the RSS of 1.54.0 and we shave 25% off of the compile time as well!
pnkfelix commentedon Nov 18, 2022
Visiting for P-high review
I'm happy to hear progress has been observed here.
I think the main task that remains is for someone on our team to go back and check how the newer versions of the compiler behave on the old versions of linkerd, prior to when they put all the extra boxes in to work around this issue.
lqd commentedon Nov 19, 2022
It seems the progress has been for the other reproduction thanks to #86240, and by the use of boxing in linkerd.
The original issue looks to be still present when checking the old version of
linkerd-proxy
(bba24dcdeda40105615db8d33e3fa04f980cf128) and #86240 doesn't seem to have improved things like it did on #84873 (comment).linkerd-proxy
revision)However, the good news is that using v0 mangling on 1.65/nightly brings things back to 1.49 levels: around 2m15s wall-time, 1.35GB max-rss.