Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLVM_ENABLE_RUNTIMES=flang-rt for flang-runtime-cuda-* #393

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

Meinersbur
Copy link
Member

@Meinersbur Meinersbur commented Feb 25, 2025

Add depends_on_projects=['flang-rt'] to the flang-runtime-cuda-gcc and flang-runtime-cuda-clang builders. This prepares the removal of the "projects" build of the flang runtime in llvm/llvm-project#124126.

Split off from #333

Affected builders:

  • flang-runtime-cuda-gcc
    This previously only built the runtime using the top-level CMakeLists.txt in flang/runtime/CMakeLists.txt. This is going to be replaced with the "standalone runtimes build", with the top-level runtimes/CMakeLists.txt. This still needs Flang to successed, hence replacing with a bootstrap-build where the FLANG_RT_* options are internally forwarded to the runtimes build.
  • flang-runtime-cuda-clang
    This is a manual bootstrapping build which first compiles Clang, then the runtime out-of-tree. This is replaced with a standalone runtimes build as described above. Because it needs Flang, also adding Flang to the enabled projects of the stage1 build.

Neither build runs the check-* targets, probably due to the lack of actual CUDA hardware which running the runtime unittests require.

Affected workers:

  • as-builder-7

Admins listed for those workers:

@Meinersbur Meinersbur force-pushed the flang_runtime_flang-runtime-cuda branch from c424475 to 0097624 Compare February 26, 2025 08:34
@Meinersbur Meinersbur marked this pull request as ready for review February 26, 2025 08:34
@vzakhari
Copy link
Contributor

Thank you, Michael! For flang-runtime-cuda-gcc build, does it imply that flang will be always rebuilt? If yes, then we will need to decide what to do about the increased time required by the build. I think there is some limit, but I do not know for sure.

@Meinersbur
Copy link
Member Author

Meinersbur commented Feb 26, 2025

Thank you, Michael! For flang-runtime-cuda-gcc build, does it imply that flang will be always rebuilt?

Yes, but it uses ccache so re-builds usually don't take the maximum time.

If yes, then we will need to decide what to do about the increased time required by the build. I think there is some limit, but I do not know for sure.

LLVM's Buildbot is configured without absolute timeout. There are much slower workers in labs.llvm.org, taking up 16 hours to build.

If the as-builder-7 is becoming too slow1, I can combine flang-runtime-cuda-gcc and flang-runtime-cuda-clang so Flang is built only once. For Polly builders, I tricked it to use the same ccache cache for all builders.

Footnotes

  1. as-builder-7 is also building llvm-nvptx-nvidia-ubuntu and llvm-nvptx64-nvidia-ubuntu.

@vzakhari
Copy link
Contributor

Thanks for the explanation, Michael! It looks good to me, but I am not the one to approve it.

@Meinersbur Meinersbur requested a review from vvereschaka March 4, 2025 17:01
Copy link
Contributor

@vvereschaka vvereschaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Meinersbur Meinersbur merged commit 17c126f into llvm:main Mar 6, 2025
2 checks passed
Meinersbur added a commit to llvm/llvm-project that referenced this pull request Mar 26, 2025
The production buildbot master apparently has not yet been restarted
since llvm/llvm-zorg#393 landed.

This reverts commit 96d1bae.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Mar 26, 2025
The production buildbot master apparently has not yet been restarted
since llvm/llvm-zorg#393 landed.

This reverts commit 96d1bae.
@Meinersbur
Copy link
Member Author

@gkistanova Landing llvm/llvm-project#124126 depends on this configuration update becoming active. Could you restart https://lab.llvm.org/buildbot ? Last restart was Feb 12.

@Meinersbur
Copy link
Member Author

@gkistanova After the buildbot master restart, as-builder-7 behaves strangely. Could you have a look?

@gkistanova
Copy link
Contributor

I do not think anything is wrong with the worker.

We are still investigating, but it seems the failures are related to recent changes to flang-rt.

@Meinersbur
Copy link
Member Author

Meinersbur commented Apr 2, 2025

It is reporting connection failures and progress timeouts ("command timed out: 1200 seconds without output running") during the build-flang-rt/build-flang-default step, before even bulding flang-rt. The only possible explanation I have is that without building the flang-rt targets in-between, there are more of the memory-intensive flang compilation jobs running concurrently (from up to two concurrent builds the worker is configured to do), leading to swapping and eventual denial-of-service.

@vvereschaka
Copy link
Contributor

I tried to figure out the source of that situation and only I found that the flang related components became an extremally resource consumed during the build. Currently both of flang-runtime-cuda-* builders have 99% chances to freeze the build host even they getting started alone without concurrency with the other builds.
The single build and 64 threads instead of 128 solves a freezing of the host during the build, but also the host became underutilized with that configuration. It would be good to reduce resource (memory?) consumption when building the flang parts.

Here is the build timings for the gcc build (16 threads):
gcc-build-tracing-core.json
gcc-build-tracing-runtime.json
(to see the graph, open chrome://tracing/ in Chrome and open json file)

The most time consuming files are flang related as far as I noticed.

image

I have updated the builder configurations to use less threads and only one concurrent build - #424 It should help for now, but I hope we will be able to get back at least two concurrent builds on the worker later.

@Meinersbur
Copy link
Member Author

Meinersbur commented Apr 3, 2025

Compiling flang taking an unusual amount of memory is a known issue and will not change without getting away from the template-centric architecture. llvm/llvm-project#127364 introduced a way to limit the number of flang compile jobs only, but note that those will also not fall under the LLVM_PARALLEL_COMPILE_JOBS limit anymore.

@vvereschaka
Copy link
Contributor

Compiling flang taking an unusual amount of memory is a known issue and will not change without getting away from the template-centric architecuture

Oh, I see. Ok, I'll play with FLANG_PARALLEL_COMPILE_JOBS, thank you for pointing. Probably It may help to load the build host more optimally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants