-
Notifications
You must be signed in to change notification settings - Fork 211
Add AMDGCN option similar to cuda-compute-capabilities
#4860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
958ad0a
to
bff1bfb
Compare
bff1bfb
to
0e7aaf3
Compare
cuda-compute-capabilities
cuda-compute-capabilities
0e7aaf3
to
d4ba387
Compare
Started to create a test set of EasyConfig & EasyBlock changes to test the option, starting with LLVM & CMake... Let's see if this works the way I expect. https://github.com/Thyre/easybuild-custom/tree/support-passing-amdgcn |
9f3fc25
to
5a82798
Compare
5a82798
to
db9a681
Compare
Signed-off-by: Jan Andre Reuter <[email protected]>
AMD doesn't name this compute capabilities, and amdhsa is only used when lowering to HSA (but amdpal & mesa3d are also possible). Therefore, simple the name option 'amdgcn-capabilities'. Signed-off-by: Jan Andre Reuter <[email protected]>
This allows users to handle cases like LLVM, where building with GPU support is optional, but users might still want to install the software without GPU support. Signed-off-by: Jan Andre Reuter <[email protected]>
Signed-off-by: Jan André Reuter <[email protected]>
db9a681
to
4af19e3
Compare
Signed-off-by: Jan André Reuter <[email protected]>
6e32eac
to
afa6558
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I really don't have any hardware to test any of this on. I trust you have tested this quite a bit?
We are hitting rate limits (again?)
|
I've basically used this to build all of the ROCm software on two separate machines which I'm trying to bring to EasyBuild (after my vacation). You'll find quite a few test reports from my Arch Linux machine (or Some test reports:
What I haven‘t explicitly tested (again) is using the generic targets, also because they‘re still quite new in ROCm. |
OK so i'll let you also test that before merging then? I'll also be away traveling after this week, so if anyone else wants to hit merge please go ahead. |
Yeah, I'll test those things once I'm back home. If everything works, I'll ping in our |
amdgcn_cc_regex = re.compile(r'gfx[0-9]+[a-z]?$') | ||
# Generic convention. | ||
# Regex is not perfect, as it doesn't catch gfx[...]--generic | ||
amdgcn_generic_regex = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the -NUMBER
be in a group? EG
amdgcn_generic_regex = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$') | |
amdgcn_generic_regex = re.compile(r'gfx[0-9]+(\-[0-9])?-generic$') |
Atleast from the LLVM 20.1.7 targets i dont see any --generic
ones without the number in between
crivella@crivella-desktop:~$ llc -march=amdgcn -mattr=help
Available CPUs for this target:
bonaire - Select the bonaire processor.
carrizo - Select the carrizo processor.
fiji - Select the fiji processor.
generic - Select the generic processor.
generic-hsa - Select the generic-hsa processor.
gfx10-1-generic - Select the gfx10-1-generic processor.
gfx10-3-generic - Select the gfx10-3-generic processor.
gfx1010 - Select the gfx1010 processor.
gfx1011 - Select the gfx1011 processor.
gfx1012 - Select the gfx1012 processor.
gfx1013 - Select the gfx1013 processor.
gfx1030 - Select the gfx1030 processor.
gfx1031 - Select the gfx1031 processor.
gfx1032 - Select the gfx1032 processor.
gfx1033 - Select the gfx1033 processor.
gfx1034 - Select the gfx1034 processor.
gfx1035 - Select the gfx1035 processor.
gfx1036 - Select the gfx1036 processor.
gfx11-generic - Select the gfx11-generic processor.
gfx1100 - Select the gfx1100 processor.
gfx1101 - Select the gfx1101 processor.
gfx1102 - Select the gfx1102 processor.
gfx1103 - Select the gfx1103 processor.
gfx1150 - Select the gfx1150 processor.
gfx1151 - Select the gfx1151 processor.
gfx1152 - Select the gfx1152 processor.
gfx1153 - Select the gfx1153 processor.
gfx12-generic - Select the gfx12-generic processor.
gfx1200 - Select the gfx1200 processor.
gfx1201 - Select the gfx1201 processor.
gfx600 - Select the gfx600 processor.
gfx601 - Select the gfx601 processor.
gfx602 - Select the gfx602 processor.
gfx700 - Select the gfx700 processor.
gfx701 - Select the gfx701 processor.
gfx702 - Select the gfx702 processor.
gfx703 - Select the gfx703 processor.
gfx704 - Select the gfx704 processor.
gfx705 - Select the gfx705 processor.
gfx801 - Select the gfx801 processor.
gfx802 - Select the gfx802 processor.
gfx803 - Select the gfx803 processor.
gfx805 - Select the gfx805 processor.
gfx810 - Select the gfx810 processor.
gfx9-4-generic - Select the gfx9-4-generic processor.
gfx9-generic - Select the gfx9-generic processor.
gfx900 - Select the gfx900 processor.
gfx902 - Select the gfx902 processor.
gfx904 - Select the gfx904 processor.
gfx906 - Select the gfx906 processor.
gfx908 - Select the gfx908 processor.
gfx909 - Select the gfx909 processor.
gfx90a - Select the gfx90a processor.
gfx90c - Select the gfx90c processor.
gfx940 - Select the gfx940 processor.
gfx941 - Select the gfx941 processor.
gfx942 - Select the gfx942 processor.
gfx950 - Select the gfx950 processor.
hainan - Select the hainan processor.
hawaii - Select the hawaii processor.
iceland - Select the iceland processor.
kabini - Select the kabini processor.
kaveri - Select the kaveri processor.
mullins - Select the mullins processor.
oland - Select the oland processor.
pitcairn - Select the pitcairn processor.
polaris10 - Select the polaris10 processor.
polaris11 - Select the polaris11 processor.
stoney - Select the stoney processor.
tahiti - Select the tahiti processor.
tonga - Select the tonga processor.
tongapro - Select the tongapro processor.
verde - Select the verde processor.
Also not sure if we want to limit the possible number of hits for the first number based on what follows eg
rgx1 = re.compile(r'gfx[0-9]{3,4}')
rgx2 = re.compile(r'gfx[0-9]{2,3}[a-z]')
rgx3 = re.compile(r'gfx[0-9]{1,2}(\-[0-9])?\-generic')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect to see --generic
at all. We should treat this as an invalid pattern.
My regex knowledge is limited in that regard though, so any better idea for a catching this is appreciated 😄
Also not sure if we want to limit the possible number of hits for the first number based on what follows
Hm, I'd probably leave this a bit more generic, to make sure that we don't have to update this regularly. I wouldn't expect AMD to add generic targets for something like gfx600
, but who knows what will be introduced in the future. Our check for cuda-compute-capabilities
is also fairly generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If --generic
is never a thing i think having them grouped is the way to go
>>> import re
>>> rgx = re.compile(r'gfx[0-9]+(\-[0-9])?-generic$')
>>> correct = ['gfx10-1-generic', 'gfx10-3-generic', 'gfx11-generic', 'gfx12-ge\
neric', 'gfx9-4-generic', 'gfx9-generic']
>>> wrong = ['gfx10-1', 'gfx10--generic']
>>> [rgx.match(_) for _ in correct]
[<re.Match object; span=(0, 15), match='gfx10-1-generic'>, <re.Match object; span=(0, 15), match='gfx10-3-generic'>, <re.Match object; span=(0, 13), match='gfx11-generic'>, <re.Match object; span=(0, 13), match='gfx12-generic'>, <re.Match object; span=(0, 14), match='gfx9-4-generic'>, <re.Match object; span=(0, 12), match='gfx9-generic'>]
>>> [rgx.match(_) for _ in wrong]
[None, None]
>>>
if you do them without grouping also --generic
would be accepted
>>> rgx = re.compile(r'gfx[0-9]+[-]?[0-9]?-generic$')
>>> [rgx.match(_) for _ in wrong]
[None, <re.Match object; span=(0, 14), match='gfx10--generic'>]
Summary
This PR aims to implement a similar option to
cuda-compute-capabilities
(and related options) for AMD GPUs.The option can then replace the manual handling done in some EasyBlocks, e.g. Clang & LLVM, allowing to enable (some) GPU builds without the need to alter the EasyConfig.
Most of the handling was copied from CUDA, while some options were skipped as they don't make much sense, e.g.
cuda_cc_space_sep_no_period
.The used regex should support all GPU architectures starting from
gfx600
, including the more recent generic targets.Actual compiler support then needs to be present in the compiler consuming these architectures. Both GCC and LLVM accept the same naming, i.e.
gfx[...]
, including generic targets.Missing features compared to CUDA
cuda_cache_dir
option is missing. I haven't found something similar for HIP yet, but may simply have missed itgfx90a
gfx
?More to be determined.
Known issues
gfx10--generic
to pass, even though it is not allowed.Resolves #4829