-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
tl;dr -- I believe Cargo -jN
limits you to N
concurrent codegen workers, while x.py -jN
limits you to N * number of CPUs
concurrent codegen workers, which is rough for memory usage during bootstrap. E.g. on an 8 CPU system, -j4
would not limit you to 4, but to 32 concurrent codegen workers.
With Cargo, my understanding is that -j4
limits you not only to compiling at most four crates at once, but that it limits you to running at most four of certain types of jobs within rustc at once, across all rustc instances under that Cargo invocation (maybe across multiple concurrent Cargo invocations?). This relates to jobs whose concurrency is dictated by the jobserver mechanism, like codegen.
(Actually, I'm not sure that's 100% correct. It's easy to see that -j
limits the max number of codegen workers active within an individual rustc instance, but I don't know how to verify if it limits them across all instances. The number of rustc threads in general isn't limited by this from what I've seen.)
But with x.py, -j4
only seems to limit you to compiling at most four crates at once. I can see that each rustc instance is still able to have up to 8 codegen workers simultaneously, including the main thread (probably 8 because my system has 8 CPUs). But I'm not sure if all of the rustc instances are globally limited to 8 concurrent jobs, or if they're individually limited to 8, so that the global limit is effectively 32 concurrent jobs.
I expect that changing this to limit concurrency the way Cargo does would significantly reduce memory usage during bootstrap. It would also reduce the amount of concurrency, unless you increase the -j
setting, but it's possible that the extra concurrency was overkill.
Activity
Mark-Simulacrum commentedon Feb 10, 2021
I'm not sure if there's anything we can do - to my knowledge, x.py isn't doing anything interesting here, we just pass -j down to Cargo.
If you can provide instructions to observe this or detect it somehow, then there may be more that we can do. It's worth noting that I expect each rustc to start roughly the same number of threads as codegen units, it's just that they should all immediately stop waiting on a job server token. I might not be remembering this piece right though.
the8472 commentedon Feb 10, 2021
According to this related zulip discussion rustc could perhaps use a threadpool that only ramps up on demand. Then if compilers are starved for job tokens they won't need to start additional threads.
Is it the threads themselves that are consuming memory? Have you measured the actual task utilization or only the number of threads spawned?
Tangentially, I have found that if lld is being used as linker then it also runs multi-threaded by default even if rustc itself is told to use only 1 thread. See #81942. That's probably not a problem during bootstrap though since it won't do as much linking as the UI tests.
Mark-Simulacrum commentedon Feb 10, 2021
Yeah, in theory this is what we want but seems pretty clearly out of scope for this issue IMO; it's a hard challenge to get right, particularly given the blocking nature of the token acquisition we currently have to deal with.
the8472 commentedon Feb 10, 2021
Results from profiling
./x.py -j1 test library/core/
and looking at the thread lanes during stage0 std and compiler artifact building:rustc
itself runs sequentialy as expected.Only linking with
lld
utilizes all cores, but that doesn't happen for most dependencies. Doesn't apply if you're using a different linker.tgnottingham commentedon Feb 10, 2021
It's that there are more LLVM modules in memory at once. The main thread tries to codegen CGUs to unoptimized LLVM modules a bit ahead of time to anticipate the needs of the workers (who run optimization passes on the modules). The way everything works out, the more workers that exist at once, the more LLVM modules can exist at once, between those being worked on, and those that are queued up ahead of time by the main thread. (This area could use a lot of improvement, by the way, but that's a different issue.)
I'll try to come up with a way to demonstrate it, or convince myself that I was out of my mind when I first tested it.
@the8472 Can you tell me how you profiled that? Or is that just using
-Z self-profile
+ crox?the8472 commentedon Feb 10, 2021
That's
perf record
+ https://github.com/KDAB/hotspottgnottingham commentedon Feb 10, 2021
Okay, nothing to see here. :)
The build of the
bootstrap
binary itself doesn't use the-j
flag, and I was getting my information from that stage of the build. The stages after that do respect the-j
flag.Thanks folks. Closing.