HIP: force max threads per block to be 1024 #11621

fxzjshm · 2025-02-03T14:34:46Z

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM.

IMbackK

over all i am fine with this, but will defer to @slaren on if this kind of vendor behavior is something we want to support (see discussion in #11619)

IMbackK · 2025-02-03T22:29:54Z

ggml/src/ggml-hip/CMakeLists.txt

@@ -40,6 +40,9 @@ find_package(hip     REQUIRED)
 find_package(hipblas REQUIRED)
 find_package(rocblas REQUIRED)

+# Workaround old compilers


please move this down a bit as the find_package calls and the version check below are logically related operations

slaren · 2025-02-03T22:37:46Z

I saw the discussion, but don't have any knowledge about HIP/ROCm to have an opinion about this. If you think that it is not likely to cause issues to other users, feel free to merge it.

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggerganov#10610, ggerganov#11619 Signed-off-by: fxzjshm <[email protected]>

fxzjshm · 2025-02-04T05:24:31Z

@IMbackK Moved. Is this place proper?

@slaren This compiler flag is documented at https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang-gpu-max-threads-per-block. I've also compiled with ROCm 6.3.1 and no compile error is given, now testing test-backend-ops.

Update: test-backend-ops w/ ROCm 6.3.1 on gfx1100 passed.

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Feb 3, 2025

IMbackK reviewed Feb 3, 2025

View reviewed changes

HIP: force max threads per block to be 1024

59ad593

Some old compilers still use 256. Explicitly set it to 1024 to get correct result from ops like ARGMAX and GROUP_NORM. Related: ggerganov#10610, ggerganov#11619 Signed-off-by: fxzjshm <[email protected]>

fxzjshm force-pushed the hip-launch_bounds branch from 7e596d4 to 59ad593 Compare February 4, 2025 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIP: force max threads per block to be 1024 #11621

HIP: force max threads per block to be 1024 #11621

fxzjshm commented Feb 3, 2025

IMbackK left a comment

IMbackK Feb 3, 2025

slaren commented Feb 3, 2025

fxzjshm commented Feb 4, 2025 •

edited

Loading

HIP: force max threads per block to be 1024 #11621

Are you sure you want to change the base?

HIP: force max threads per block to be 1024 #11621

Conversation

fxzjshm commented Feb 3, 2025

IMbackK left a comment

Choose a reason for hiding this comment

IMbackK Feb 3, 2025

Choose a reason for hiding this comment

slaren commented Feb 3, 2025

fxzjshm commented Feb 4, 2025 • edited Loading

fxzjshm commented Feb 4, 2025 •

edited

Loading