Skip to content

Compile time+memory regression between 1.49.0 and 1.50.0 #84873

Open
@olix0r

Description

@olix0r

Our compile times and memory footprint regressed substantially between Rust 1.49.0 and 1.50.0.

This regression is a hard blocker for us to be able to upgrade our project from 1.49.0. As this is a severe issue for us, please let us know if there's anything we can do to provide more diagnostics, test changes, etc.

This may be related to other recent issues that describe similar behavior:

Code

This regression is observed on recent versions of the Linkerd proxy. It's known that the proxy can manifest large type signatures, so builds disable debug symbols by default

This regression is not obvious with other, smaller projects that we maintain, so I'm unable to provide a smaller repro.

Version it worked on

With Rust 1.49.0, the binary compiles in a little over two minutes, using a little over 1GB of memory:

cargo clean && /usr/bin/time -v cargo +1.49.0 build -p linkerd2-proxy

...
    Finished dev [unoptimized] target(s) in 2m 12s
        Command being timed: "cargo +1.49.0 build -p linkerd2-proxy"
        User time (seconds): 326.00
        System time (seconds): 14.55
        Percent of CPU this job got: 256%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:12.78
...
        Maximum resident set size (kbytes): 1308088
...
        Exit status: 0

Version with regression

Using Rust 1.50.0, rustc runs for about 40 minutes before exhausting the system's memory:

cargo clean && /usr/bin/time -v cargo +1.50.0 build -p linkerd2-proxy

  process didn't exit successfully: `rustc --crate-name linkerd2_proxy --edition=2018 linkerd2-proxy/src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no --cfg 'feature="default"' --cfg 'feature="multicore"' --cfg 'feature="num_cpus"' -C metadata=c2caa01c24c285f7 -C extra-filename=-c2caa01c24c285f7 --out-dir /home/ver/b/linkerd2-proxy/target/debug/deps -C incremental=/home/ver/b/linkerd2-proxy/target/debug/incremental -L dependency=/home/ver/b/linkerd2-proxy/target/debug/deps --extern futures=/home/ver/b/linkerd2-proxy/target/debug/deps/libfutures-01f6240a9e4201d3.rlib --extern linkerd_app=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_app-dd18fcabeef913ae.rlib --extern linkerd_signal=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_signal-14263c92fcb0fc0b.rlib --extern num_cpus=/home/ver/b/linkerd2-proxy/target/debug/deps/libnum_cpus-74ee144d51bc047e.rlib --extern tokio=/home/ver/b/linkerd2-proxy/target/debug/deps/libtokio-de277ade05efd960.rlib --extern tracing=/home/ver/b/linkerd2-proxy/target/debug/deps/libtracing-03fd6826b817dabe.rlib -L native=/home/ver/b/linkerd2-proxy/target/debug/build/ring-a8b802a4aa425398/out` (signal: 9, SIGKILL: kill)
Command exited with non-zero status 101
	Command being timed: "cargo +1.50.0 build -p linkerd2-proxy"
	User time (seconds): 2986.18
	System time (seconds): 101.73
	Percent of CPU this job got: 133%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 38:29.28
...
	Maximum resident set size (kbytes): 62953524
...
	Exit status: 101

rustc --version --verbose:

rustc 1.50.0 (cb75ad5db 2021-02-10)
binary: rustc
commit-hash: cb75ad5db02783e8b0222fee363c5f63f7e2cf5b
commit-date: 2021-02-10
host: x86_64-unknown-linux-gnu
release: 1.50.0

We see similar behavior with more recent versions of Rust as well, including 1.51.0 and nightly (cargo 1.53.0-nightly (0ed318d18 2021-04-23)) as well.

cc @hawkw, who can provide some more details. I believe we've tested with lto=off without any changes in behavior.

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged

Activity

added
C-bugCategory: This is a bug.
regression-untriagedUntriaged performance or correctness regression.
on May 3, 2021
added
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.
and removed
I-prioritizeIssue: Indicates that prioritization has been requested for this issue.
regression-untriagedUntriaged performance or correctness regression.
on May 3, 2021
jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

@olix0r can you paste the output of cargo rustc -- -Z time-passes? It should have some output before it hangs.

added
I-compilememIssue: Problems and improvements with respect to memory usage during compilation.
I-compiletimeIssue: Problems and improvements with respect to compile times.
on May 3, 2021
jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

If you can come up with a smaller example than "linkerd" that would also be helpful, but it will be more difficult than just running time-passes.

added
T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.
on May 3, 2021
olix0r

olix0r commented on May 3, 2021

@olix0r
Author
error: could not compile `linkerd2-proxy`

Caused by:
  process didn't exit successfully: `rustc --crate-name linkerd2_proxy --edition=2018 linkerd2-proxy/src/main.rs --error-format=json --json=diagnostic-rendered-ansi --crate-type bin --emit=dep-info,link -C embed-bitcode=no -Z time-passes --cfg 'feature="default"' --cfg 'feature="multicore"' --cfg 'feature="num_cpus"' -C metadata=4b636c9abf0080e5 -C extra-filename=-4b636c9abf0080e5 --out-dir /home/ver/b/linkerd2-proxy/target/debug/deps -C incremental=/home/ver/b/linkerd2-proxy/target/debug/incremental -L dependency=/home/ver/b/linkerd2-proxy/target/debug/deps --extern futures=/home/ver/b/linkerd2-proxy/target/debug/deps/libfutures-ba905e7fd290ac0d.rlib --extern linkerd_app=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_app-413b736e1b748ce1.rlib --extern linkerd_signal=/home/ver/b/linkerd2-proxy/target/debug/deps/liblinkerd_signal-e33579c6c7162bd5.rlib --extern num_cpus=/home/ver/b/linkerd2-proxy/target/debug/deps/libnum_cpus-372d7745cb8ee5a2.rlib --extern tokio=/home/ver/b/linkerd2-proxy/target/debug/deps/libtokio-6439a3cdc075b537.rlib --extern tracing=/home/ver/b/linkerd2-proxy/target/debug/deps/libtracing-9cda1e0af1393c64.rlib -L native=/home/ver/b/linkerd2-proxy/target/debug/build/ring-de36c98917ce89f3/out` (signal: 9, SIGKILL: kill)
Command exited with non-zero status 101
	Command being timed: "cargo +nightly rustc -- -Z time-passes"
	User time (seconds): 1210.51
	System time (seconds): 102.65
	Percent of CPU this job got: 128%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 16:58.61
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 62990612
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 5127
	Minor (reclaiming a frame) page faults: 99760499
	Voluntary context switches: 38121
	Involuntary context switches: 30180
	Swaps: 0
	File system inputs: 595200
	File system outputs: 2482104
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 101
time-passes
time:   0.000; rss:   55MB ->   55MB (   +0MB)	parse_crate
time:   0.000; rss:   55MB ->   55MB (   +0MB)	attributes_injection
time:   0.000; rss:   55MB ->   55MB (   +0MB)	incr_comp_prepare_session_directory
time:   0.000; rss:   55MB ->   55MB (   +0MB)	incr_comp_garbage_collect_session_directories
time:   0.000; rss:   55MB ->   58MB (   +2MB)	recursion_limit
time:   0.000; rss:   58MB ->   58MB (   +0MB)	plugin_loading
time:   0.000; rss:   58MB ->   58MB (   +0MB)	plugin_registration
time:   0.000; rss:   58MB ->   58MB (   +0MB)	crate_injection
time:   0.063; rss:   58MB ->  139MB (  +81MB)	expand_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	check_unused_macros
time:   0.063; rss:   58MB ->  139MB (  +81MB)	macro_expand_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	maybe_building_test_harness
time:   0.000; rss:  139MB ->  139MB (   +0MB)	AST_validation
time:   0.000; rss:  139MB ->  139MB (   +0MB)	maybe_create_a_macro_crate
time:   0.000; rss:  139MB ->  139MB (   +0MB)	finalize_imports
time:   0.000; rss:  139MB ->  139MB (   +0MB)	finalize_macro_resolutions
time:   0.002; rss:  139MB ->  142MB (   +3MB)	late_resolve_crate
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_check_unused
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_report_errors
time:   0.000; rss:  142MB ->  142MB (   +0MB)	resolve_postprocess
time:   0.003; rss:  139MB ->  142MB (   +3MB)	resolve_crate
time:   0.000; rss:  142MB ->  142MB (   +0MB)	complete_gated_feature_checking
time:   0.066; rss:   58MB ->  142MB (  +84MB)	configure_and_expand
time:   0.000; rss:  142MB ->  142MB (   +0MB)	prepare_outputs
time:   0.000; rss:  142MB ->  142MB (   +0MB)	blocked_on_dep_graph_loading
time:   0.002; rss:  142MB ->  144MB (   +3MB)	hir_lowering
time:   0.000; rss:  144MB ->  144MB (   +0MB)	early_lint_checks
time:   0.000; rss:  144MB ->  144MB (   +0MB)	setup_global_ctxt
time:   0.000; rss:  144MB ->  144MB (   +0MB)	create_global_ctxt
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_entry_point
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_plugin_registrar
time:   0.000; rss:  148MB ->  148MB (   +0MB)	looking_for_derive_registrar
time:   0.002; rss:  148MB ->  151MB (   +3MB)	misc_checking_1
time:   0.001; rss:  151MB ->  155MB (   +4MB)	type_collecting
time:   0.000; rss:  155MB ->  155MB (   +0MB)	impl_wf_inference
time:   0.000; rss:  155MB ->  155MB (   +0MB)	unsafety_checking
time:   0.000; rss:  155MB ->  155MB (   +0MB)	orphan_checking
time:   0.000; rss:  155MB ->  155MB (   +0MB)	coherence_checking
time:   0.002; rss:  155MB ->  159MB (   +4MB)	wf_checking
time:   0.026; rss:  159MB ->  197MB (  +38MB)	item_types_checking
time:   0.074; rss:  197MB ->  228MB (  +31MB)	item_bodies_checking
time:   0.102; rss:  151MB ->  228MB (  +77MB)	type_check_crate
time:   0.002; rss:  228MB ->  229MB (   +1MB)	match_checking
time:   0.001; rss:  229MB ->  230MB (   +1MB)	liveness_and_intrinsic_checking
time:   0.004; rss:  228MB ->  230MB (   +2MB)	misc_checking_2
time:   0.052; rss:  230MB ->  248MB (  +18MB)	MIR_borrow_checking
time:   0.001; rss:  248MB ->  248MB (   +0MB)	MIR_effect_checking
time:   0.000; rss:  248MB ->  248MB (   +0MB)	layout_testing
time:   0.000; rss:  248MB ->  248MB (   +0MB)	death_checking
time:   0.000; rss:  248MB ->  248MB (   +0MB)	unused_lib_feature_checking
time:   0.005; rss:  248MB ->  248MB (   +0MB)	crate_lints
time:   0.001; rss:  248MB ->  248MB (   +0MB)	module_lints
time:   0.006; rss:  248MB ->  248MB (   +0MB)	lint_checking
time:   0.002; rss:  248MB ->  249MB (   +1MB)	privacy_checking_modules
time:   0.009; rss:  248MB ->  249MB (   +1MB)	misc_checking_3
time:   0.000; rss:  249MB ->  249MB (   +0MB)	monomorphization_collector_root_collections
time:  63.851; rss:  249MB ->  878MB ( +629MB)	monomorphization_collector_graph_walk
time: 385.658; rss:  878MB -> 34473MB (+33594MB)	partition_and_assert_distinct_symbols
time:   0.000; rss: 34462MB -> 34462MB (   +0MB)	write_allocator_module
time:   0.000; rss: 34462MB -> 34462MB (   +0MB)	find_cgu_reuse
time:   0.000; rss: 34475MB -> 34475MB (   +0MB)	LLVM_module_optimize_function_passes(23w1nf295xb6m3dz)
time:   0.000; rss: 34481MB -> 34481MB (   +0MB)	LLVM_module_optimize_function_passes(3wiubnia116xo13k)
time:   0.000; rss: 34481MB -> 34481MB (   +0MB)	LLVM_module_optimize_module_passes(3wiubnia116xo13k)
time:   0.027; rss: 34475MB -> 34485MB (  +10MB)	LLVM_module_optimize_module_passes(23w1nf295xb6m3dz)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_function_passes(6nf2a3p991xv6ss)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_function_passes(qnwxkbovt0otq4i)
time:   0.000; rss: 35405MB -> 35405MB (   +0MB)	LLVM_module_optimize_module_passes(qnwxkbovt0otq4i)
time:   0.028; rss: 35405MB -> 35411MB (   +6MB)	LLVM_module_optimize_module_passes(6nf2a3p991xv6ss)
time:   0.000; rss: 35420MB -> 35420MB (   +0MB)	LLVM_module_optimize_function_passes(3l9ye29vj5xd30gg)
time:   0.022; rss: 35420MB -> 35433MB (  +13MB)	LLVM_module_optimize_module_passes(3l9ye29vj5xd30gg)
time:   0.000; rss: 45480MB -> 45480MB (   +0MB)	LLVM_module_optimize_function_passes(47zf6aqohnqlwwkn)
time:   0.000; rss: 45480MB -> 45480MB (   +0MB)	LLVM_module_optimize_module_passes(47zf6aqohnqlwwkn)
time:   0.001; rss: 61521MB -> 61522MB (   +0MB)	LLVM_module_optimize_function_passes(2shzcrj12b6yuzi2)
time:   0.027; rss: 61522MB -> 61529MB (   +7MB)	LLVM_module_optimize_module_passes(2shzcrj12b6yuzi2)
time:   0.000; rss: 62206MB -> 62206MB (   +0MB)	LLVM_module_optimize_function_passes(22god1f9uaf09skc)
time:   0.000; rss: 62206MB -> 62206MB (   +0MB)	LLVM_module_optimize_module_passes(22god1f9uaf09skc)
time:   0.000; rss: 62231MB -> 62231MB (   +0MB)	LLVM_module_optimize_function_passes(4k1c83m91mx8hb0j)
time:   0.000; rss: 62246MB -> 62246MB (   +0MB)	LLVM_module_optimize_function_passes(5p5qsa0r7t1m74p)
time:   0.000; rss: 62246MB -> 62247MB (   +1MB)	LLVM_module_optimize_module_passes(5p5qsa0r7t1m74p)
time:   0.017; rss: 62231MB -> 62248MB (  +17MB)	LLVM_module_optimize_module_passes(4k1c83m91mx8hb0j)
time:   0.001; rss: 62089MB -> 62089MB (   +0MB)	LLVM_module_optimize_function_passes(47yf53g8fyfg1095)
time:   0.029; rss: 62090MB -> 62101MB (  +11MB)	LLVM_module_optimize_module_passes(47yf53g8fyfg1095)
time:   0.000; rss: 62109MB -> 62109MB (   +0MB)	LLVM_module_optimize_function_passes(3io72qv49rwp317)
time:   0.000; rss: 62109MB -> 62109MB (   +0MB)	LLVM_module_optimize_module_passes(3io72qv49rwp317)
time:   0.000; rss: 62547MB -> 62547MB (   +1MB)	LLVM_module_optimize_function_passes(1s1n98zkzp6om5cq)
time:   0.004; rss: 62547MB -> 62558MB (  +11MB)	LLVM_module_optimize_module_passes(1s1n98zkzp6om5cq)
time:   0.000; rss: 62640MB -> 62640MB (   +0MB)	LLVM_module_optimize_function_passes(3qs0oaz67heipbqh)
time:   0.000; rss: 62640MB -> 62640MB (   +0MB)	LLVM_module_optimize_module_passes(3qs0oaz67heipbqh)
olix0r

olix0r commented on May 3, 2021

@olix0r
Author

We're going to try to see if we can avoid this with boxing, which may help us identify a smaller repro, but this may take some time...

jyn514

jyn514 commented on May 3, 2021

@jyn514
Member

Wow, that is a lot of memory in LLVM and a lot of time in partition_and_assert_distinct_symbols. I wouldn't expect that to be so expensive:

let (codegen_units, _) = tcx.sess.time("partition_and_assert_distinct_symbols", || {
sync::join(
|| {
&*tcx.arena.alloc_from_iter(partition(
tcx,
&mut items.iter().cloned(),
tcx.sess.codegen_units(),
&inlining_map,
))
},
|| assert_symbols_are_distinct(tcx, items.iter()),
)
});

Not sure who to ask about that - maybe @wesleywiser has ideas what's going on?

13 remaining items

added a commit that references this issue on May 12, 2021
audunska

audunska commented on May 14, 2021

@audunska

I reported a bug against tokio-rs/tracing, which turned out to be because of this bug. My reproducing repo could maybe be helpful as a minimal reproducing example?

audunska

audunska commented on Aug 6, 2021

@audunska

I got curious and tested my reproducing repo on nightly, and compile times turned normal again between June 12 and 13. So #86240 seems to have fixed it for me.

apiraino

apiraino commented on Aug 6, 2021

@apiraino
Contributor

thank you @audunska for the feedback! Now I'm curious, it would be interesting to hear also from @olix0r how compiling the linkerd-proxy codebase looks like now

olix0r

olix0r commented on Aug 9, 2021

@olix0r
Author

This looks promising for us! We ended up adding Boxes throughout the stack so we could upgrade Rust, but we still see improvements on nightly compiling from the latest main (linkerd/linkerd2-proxy@d843eb1):

cargo 1.54.0 (5ae8d74b3 2021-06-22)

Finished release [optimized] target(s) in 4m 01s
    Command being timed: "cargo build --release"
    User time (seconds): 1520.92
    System time (seconds): 34.53
    Percent of CPU this job got: 643%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 4:01.74
    Maximum resident set size (kbytes): 11131808

cargo 1.56.0-nightly (cc17afbb0 2021-08-02)

Finished release [optimized] target(s) in 3m 01s
    Command being timed: "cargo +nightly build --release"
    User time (seconds): 1144.12
    System time (seconds): 23.53
    Percent of CPU this job got: 644%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 3:01.11
    Maximum resident set size (kbytes): 4557984

Nightly uses less than half of the RSS of 1.54.0 and we shave 25% off of the compile time as well!

pnkfelix

pnkfelix commented on Nov 18, 2022

@pnkfelix
Member

Visiting for P-high review

I'm happy to hear progress has been observed here.

I think the main task that remains is for someone on our team to go back and check how the newer versions of the compiler behave on the old versions of linkerd, prior to when they put all the extra boxes in to work around this issue.

self-assigned this
on Nov 18, 2022
lqd

lqd commented on Nov 19, 2022

@lqd
Member

It seems the progress has been for the other reproduction thanks to #86240, and by the use of boxing in linkerd.

The original issue looks to be still present when checking the old version of linkerd-proxy (bba24dcdeda40105615db8d33e3fa04f980cf128) and #86240 doesn't seem to have improved things like it did on #84873 (comment).

  • on 1.49.0: similar results to the OP (2m11s wall-time, 1.2GB max-rss)
  • on 1.65.0 and nightly-2022-11-18: OOMs after using all of the system's memory. (If you want to reproduce this on recent compilers, note that warnings are denied in this linkerd-proxy revision)

However, the good news is that using v0 mangling on 1.65/nightly brings things back to 1.49 levels: around 2m15s wall-time, 1.35GB max-rss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

C-bugCategory: This is a bug.I-compilememIssue: Problems and improvements with respect to memory usage during compilation.I-compiletimeIssue: Problems and improvements with respect to compile times.P-highHigh priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @pnkfelix@olix0r@lqd@wesleywiser@audunska

      Issue actions

        Compile time+memory regression between 1.49.0 and 1.50.0 · Issue #84873 · rust-lang/rust