Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU: Improvements for sorting / thrust external allocator #14103

Merged
merged 7 commits into from
Mar 25, 2025

Conversation

davidrohr
Copy link
Collaborator

  • Provide clean way to provide external allocator to thrust
  • Move the thrust device-wide sorting to a function in GPUCommonAlgorithm, use from there
  • And try to use cub instead of thrust for sorting (compile-time switch)
  • Cleanup and simplification in a couple of places.

Unfortunately, this PR causes a major performance regression both on my NVIDIA GPU and on EPNs. I don't understand why yet. Just want to check it in the CI, and show @mconcas how to use the external allocator.

@davidrohr davidrohr requested a review from a team as a code owner March 24, 2025 15:54
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

@davidrohr
Copy link
Collaborator Author

@mconcas : For using the thrust external allocator, see https://github.com/davidrohr/AliceO2/blob/c9e82dd6c8452852ea2e2eadfff5c4ac9887c901/GPU/Common/GPUCommonAlgorithmThrust.h#L96
You can query the allocator from GPUReconstruction, and then pass the object to thrust later. It works for both CUDA and HIP. Unfortunately, you have to pass the object around, there is no way around it.

The way it works is:

  • You can use the allocator from GPUReconstruction::getThrustVolatileDeviceAllocator() to allocate "volatile" memory from the GPU pool.
  • You can call GPUReconstruction::ReturnVolatileDeviceMemory() to return the memory to the pool.
  • In the meantime, you must not use AllocateUnmanagedMemory(), that would lead to memory corruption. I'll add a protection in the future, but for now please be careful.
    Let me know if you think that this can work for you.

@davidrohr davidrohr changed the title GPU: Improvements for sorting / thrust external allocator - DON'T MERGE YET GPU: Improvements for sorting / thrust external allocator Mar 24, 2025
@davidrohr
Copy link
Collaborator Author

performance problem fixed

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for c9e82dd at 2025-03-24 18:55:

## sw/BUILD/o2codechecker-latest/log
100% tests passed, 0 tests failed out of 1


## sw/BUILD/O2-latest/log
/sw/SOURCES/O2/14103-slc9_x86-64/0/GPU/Common/GPUCommonAlgorithmThrust.h:101:14: error: use of undeclared identifier 'hipcub'
/sw/SOURCES/O2/14103-slc9_x86-64/0/GPU/Common/GPUCommonAlgorithmThrust.h:103:14: error: use of undeclared identifier 'hipcub'
ninja: build stopped: subcommand failed.

Full log here.

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for 9ceb34e at 2025-03-24 20:27:

## sw/BUILD/O2-latest/log
/sw/SOURCES/O2/14103-slc9_x86-64/0/GPU/Common/GPUCommonAlgorithmThrust.h:101:14: error: use of undeclared identifier 'hipcub'
/sw/SOURCES/O2/14103-slc9_x86-64/0/GPU/Common/GPUCommonAlgorithmThrust.h:103:14: error: use of undeclared identifier 'hipcub'
ninja: build stopped: subcommand failed.

Full log here.

@alibuild
Copy link
Collaborator

Error while checking build/O2/fullCI_slc9 for 471fdbf at 2025-03-25 00:08:

## sw/BUILD/O2-latest/log
CMake Error at GPU/GPUTracking/Base/hip/cmake_install.cmake:69 (file):
ninja: build stopped: subcommand failed.

Full log here.

@davidrohr davidrohr merged commit 3e56e55 into AliceO2Group:dev Mar 25, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants