Skip to content

Add AMDGCN option similar to cuda-compute-capabilities #4860

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

Thyre
Copy link
Contributor

@Thyre Thyre commented Apr 25, 2025

Summary

This PR aims to implement a similar option to cuda-compute-capabilities (and related options) for AMD GPUs.
The option can then replace the manual handling done in some EasyBlocks, e.g. Clang & LLVM, allowing to enable (some) GPU builds without the need to alter the EasyConfig.

Most of the handling was copied from CUDA, while some options were skipped as they don't make much sense, e.g. cuda_cc_space_sep_no_period.

The used regex should support all GPU architectures starting from gfx600, including the more recent generic targets.
Actual compiler support then needs to be present in the compiler consuming these architectures. Both GCC and LLVM accept the same naming, i.e. gfx[...], including generic targets.


Missing features compared to CUDA

  • cuda_cache_dir option is missing. I haven't found something similar for HIP yet, but may simply have missed it
  • "int only" options are missing, though hard to provide with generic targets and targets like gfx90a
    • Maybe a target without gfx?

More to be determined.

Known issues

  • The regex for generic targets is not perfect, allowing e.g. gfx10--generic to pass, even though it is not allowed.

Resolves #4829

@Thyre Thyre force-pushed the support-passing-amdgcn branch from 958ad0a to bff1bfb Compare April 25, 2025 21:39
@Thyre Thyre force-pushed the support-passing-amdgcn branch from bff1bfb to 0e7aaf3 Compare April 25, 2025 22:55
@Thyre Thyre changed the title Add AMDGCN options similar to cuda-compute-capabilities Add AMDGCN option similar to cuda-compute-capabilities Apr 25, 2025
@boegel boegel added this to the 5.x milestone May 7, 2025
@Thyre Thyre force-pushed the support-passing-amdgcn branch from 0e7aaf3 to d4ba387 Compare May 10, 2025 12:14
@Thyre
Copy link
Contributor Author

Thyre commented May 10, 2025

Started to create a test set of EasyConfig & EasyBlock changes to test the option, starting with LLVM & CMake...
The next logical step would be to build some HIP application with CMake, and maybe try something more special like AdaptiveCpp. I'll use a system ROCm for this, but at the end, everything should also work with an EB built ROCm.

Let's see if this works the way I expect.

https://github.com/Thyre/easybuild-custom/tree/support-passing-amdgcn

Thyre and others added 4 commits July 7, 2025 06:28
AMD doesn't name this compute capabilities, and amdhsa is only used when
lowering to HSA (but amdpal & mesa3d are also possible). Therefore,
simple the name option 'amdgcn-capabilities'.

Signed-off-by: Jan Andre Reuter <[email protected]>
This allows users to handle cases like LLVM, where building with GPU
support is optional, but users might still want to install the software
without GPU support.

Signed-off-by: Jan Andre Reuter <[email protected]>
Signed-off-by: Jan André Reuter <[email protected]>
@Thyre Thyre force-pushed the support-passing-amdgcn branch from db9a681 to 4af19e3 Compare July 7, 2025 04:32
@Thyre Thyre force-pushed the support-passing-amdgcn branch from 6e32eac to afa6558 Compare July 7, 2025 04:44
Copy link
Contributor

@Micket Micket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

I really don't have any hardware to test any of this on. I trust you have tested this quite a bit?

@Micket
Copy link
Contributor

Micket commented Jul 15, 2025

We are hitting rate limits (again?)
We need to rethink those frameworks tests. Bunch of issues like this

ERROR: test_fetch_easyconfigs_from_commit (test.framework.github.GithubTest)
Test fetch_easyconfigs_from_commit function.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/test/framework/github.py", line 561, in test_fetch_easyconfigs_from_commit
    res = fetch_easyconfigs_from_commit(test_commit)
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/easybuild/tools/github.py", line 807, in fetch_easyconfigs_from_commit
    return fetch_files_from_commit(commit, files=files, path=path, github_repo=GITHUB_EASYCONFIGS_REPO)
  File "/tmp/runner/a2a179c7e5ef5c3a44bda1281d10211f5940d494/lib/python3.8/site-packages/easybuild/tools/github.py", line 748, in fetch_files_from_commit
    raise EasyBuildError(error_msg, exit_code=EasyBuildExit.FAIL_GITHUB)
easybuild.tools.build_log.EasyBuildError: 'Failed to download diff for easybuilders/easybuild-easyconfigs commit 6515b44cd84a20fe7876cb4bdaf3c0080e688566! (HTTP Error 403: rate limit exceeded)'

@Thyre
Copy link
Contributor Author

Thyre commented Jul 15, 2025

lgtm

I really don't have any hardware to test any of this on. I trust you have tested this quite a bit?

I've basically used this to build all of the ROCm software on two separate machines which I'm trying to bring to EasyBuild (after my vacation).

You'll find quite a few test reports from my Arch Linux machine (or jrc0850) with the config parameter being in the config.

Some test reports:

What I haven‘t explicitly tested (again) is using the generic targets, also because they‘re still quite new in ROCm.
Let me try that (and explicitly passing nothing to ensure that e.g. LLVM 19 works with ’gfx1201` in the config file) works as expected. That will have to wait until next week though.

@Micket
Copy link
Contributor

Micket commented Jul 15, 2025

OK so i'll let you also test that before merging then? I'll also be away traveling after this week, so if anyone else wants to hit merge please go ahead.

@Thyre
Copy link
Contributor Author

Thyre commented Jul 15, 2025

Yeah, I'll test those things once I'm back home. If everything works, I'll ping in our merge-sprint channel 😄

amdgcn_cc_regex = re.compile(r'gfx[0-9]+[a-z]?$')
# Generic convention.
# Regex is not perfect, as it doesn't catch gfx[...]--generic
amdgcn_generic_regex = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$')
Copy link
Contributor

@Crivella Crivella Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the -NUMBER be in a group? EG

Suggested change
amdgcn_generic_regex = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$')
amdgcn_generic_regex = re.compile(r'gfx[0-9]+(\-[0-9])?-generic$')

Atleast from the LLVM 20.1.7 targets i dont see any --generic ones without the number in between

crivella@crivella-desktop:~$ llc -march=amdgcn -mattr=help
Available CPUs for this target:

  bonaire         - Select the bonaire processor.
  carrizo         - Select the carrizo processor.
  fiji            - Select the fiji processor.
  generic         - Select the generic processor.
  generic-hsa     - Select the generic-hsa processor.
  gfx10-1-generic - Select the gfx10-1-generic processor.
  gfx10-3-generic - Select the gfx10-3-generic processor.
  gfx1010         - Select the gfx1010 processor.
  gfx1011         - Select the gfx1011 processor.
  gfx1012         - Select the gfx1012 processor.
  gfx1013         - Select the gfx1013 processor.
  gfx1030         - Select the gfx1030 processor.
  gfx1031         - Select the gfx1031 processor.
  gfx1032         - Select the gfx1032 processor.
  gfx1033         - Select the gfx1033 processor.
  gfx1034         - Select the gfx1034 processor.
  gfx1035         - Select the gfx1035 processor.
  gfx1036         - Select the gfx1036 processor.
  gfx11-generic   - Select the gfx11-generic processor.
  gfx1100         - Select the gfx1100 processor.
  gfx1101         - Select the gfx1101 processor.
  gfx1102         - Select the gfx1102 processor.
  gfx1103         - Select the gfx1103 processor.
  gfx1150         - Select the gfx1150 processor.
  gfx1151         - Select the gfx1151 processor.
  gfx1152         - Select the gfx1152 processor.
  gfx1153         - Select the gfx1153 processor.
  gfx12-generic   - Select the gfx12-generic processor.
  gfx1200         - Select the gfx1200 processor.
  gfx1201         - Select the gfx1201 processor.
  gfx600          - Select the gfx600 processor.
  gfx601          - Select the gfx601 processor.
  gfx602          - Select the gfx602 processor.
  gfx700          - Select the gfx700 processor.
  gfx701          - Select the gfx701 processor.
  gfx702          - Select the gfx702 processor.
  gfx703          - Select the gfx703 processor.
  gfx704          - Select the gfx704 processor.
  gfx705          - Select the gfx705 processor.
  gfx801          - Select the gfx801 processor.
  gfx802          - Select the gfx802 processor.
  gfx803          - Select the gfx803 processor.
  gfx805          - Select the gfx805 processor.
  gfx810          - Select the gfx810 processor.
  gfx9-4-generic  - Select the gfx9-4-generic processor.
  gfx9-generic    - Select the gfx9-generic processor.
  gfx900          - Select the gfx900 processor.
  gfx902          - Select the gfx902 processor.
  gfx904          - Select the gfx904 processor.
  gfx906          - Select the gfx906 processor.
  gfx908          - Select the gfx908 processor.
  gfx909          - Select the gfx909 processor.
  gfx90a          - Select the gfx90a processor.
  gfx90c          - Select the gfx90c processor.
  gfx940          - Select the gfx940 processor.
  gfx941          - Select the gfx941 processor.
  gfx942          - Select the gfx942 processor.
  gfx950          - Select the gfx950 processor.
  hainan          - Select the hainan processor.
  hawaii          - Select the hawaii processor.
  iceland         - Select the iceland processor.
  kabini          - Select the kabini processor.
  kaveri          - Select the kaveri processor.
  mullins         - Select the mullins processor.
  oland           - Select the oland processor.
  pitcairn        - Select the pitcairn processor.
  polaris10       - Select the polaris10 processor.
  polaris11       - Select the polaris11 processor.
  stoney          - Select the stoney processor.
  tahiti          - Select the tahiti processor.
  tonga           - Select the tonga processor.
  tongapro        - Select the tongapro processor.
  verde           - Select the verde processor.

Also not sure if we want to limit the possible number of hits for the first number based on what follows eg

rgx1 = re.compile(r'gfx[0-9]{3,4}')
rgx2 = re.compile(r'gfx[0-9]{2,3}[a-z]')
rgx3 = re.compile(r'gfx[0-9]{1,2}(\-[0-9])?\-generic')

Copy link
Contributor Author

@Thyre Thyre Jul 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect to see --generic at all. We should treat this as an invalid pattern.
My regex knowledge is limited in that regard though, so any better idea for a catching this is appreciated 😄

Also not sure if we want to limit the possible number of hits for the first number based on what follows

Hm, I'd probably leave this a bit more generic, to make sure that we don't have to update this regularly. I wouldn't expect AMD to add generic targets for something like gfx600, but who knows what will be introduced in the future. Our check for cuda-compute-capabilities is also fairly generic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If --generic is never a thing i think having them grouped is the way to go

>>> import re
>>> rgx = re.compile(r'gfx[0-9]+(\-[0-9])?-generic$')
>>> correct = ['gfx10-1-generic', 'gfx10-3-generic', 'gfx11-generic', 'gfx12-ge\
neric', 'gfx9-4-generic', 'gfx9-generic']
>>> wrong = ['gfx10-1', 'gfx10--generic']
>>> [rgx.match(_) for _ in correct]
[<re.Match object; span=(0, 15), match='gfx10-1-generic'>, <re.Match object; span=(0, 15), match='gfx10-3-generic'>, <re.Match object; span=(0, 13), match='gfx11-generic'>, <re.Match object; span=(0, 13), match='gfx12-generic'>, <re.Match object; span=(0, 14), match='gfx9-4-generic'>, <re.Match object; span=(0, 12), match='gfx9-generic'>]
>>> [rgx.match(_) for _ in wrong]
[None, None]
>>> 

if you do them without grouping also --generic would be accepted

>>> rgx = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$')
>>> [rgx.match(_) for _ in wrong]
[None, <re.Match object; span=(0, 14), match='gfx10--generic'>]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: Introduce cuda_compute_capabilities (and related) options for AMD GPU architectures
4 participants