-
Notifications
You must be signed in to change notification settings - Fork 465
GPU stream implementation for ONNX runtime #14117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU stream implementation for ONNX runtime #14117
Conversation
Please consider the following formatting changes to AliceO2Group#14069
…ce and one initialization
Please consider the following formatting changes to AliceO2Group#14069
REQUEST FOR PRODUCTION RELEASES:
This will add The following labels are available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, just one change needed
#else // HIP | ||
void* GPUReconstructionHIP::getGPUPointer(void* ptr) | ||
{ | ||
void* retVal = nullptr; | ||
GPUChkErr(hipHostGetDevicePointer(&retVal, ptr, 0)); | ||
return retVal; | ||
} | ||
|
||
#ifdef GPUCA_HAS_ONNX | ||
int32_t GPUReconstructionCUDA::SetONNXGPUStream(OrtSessionOptions* session_options, int32_t stream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hier sollte HIP stehen statt CUDA
@@ -16,6 +16,7 @@ | |||
#include "GPUReconstructionCUDAIncludesHost.h" | |||
|
|||
#include <cuda_profiler_api.h> | |||
#include "ML/OrtInterface.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warum brauchst du hier den Interface header?
Please consider the following formatting changes to AliceO2Group#14117
The actual CCDB fetching and loading of models will be adjusted in the next PR. For now, this will build and run with some (techincally unneccesary) extra processing steps for model initialization. |
Error while checking build/O2/fullCI_slc9 for 46fb1e1 at 2025-03-28 11:28:
Full log here. |
Marking as draft until shadowProcessors are implemented |
… will merge AliceO2Group#14069 to have the changes in GPUChainTrackingClusterizer.
Triggering the CI |
Interestingly the ORT variables are all OFF / 0 in the slc9 builder even though the ROCM and CUDA builds should be enabled. Are there dependencies missing in the build container? From the previous build:
|
I also saw that the ORT variables are OFF, and I cannot really understand why. However, why do we have the link failure? That seems to be a genuine error if the ORT variables are off, since you cannot expect that all dependencies are available. I.e., the software should also build correctly with GPU stuff available but ONNX ORT stuff not available.
That should be fixed in any case. Could you have a look (while the ORT variables are still OFF)? Then for why the ORT variables are off, I have no idea. The FullCI should use this docker container: |
Error while checking build/O2/fullCI_slc9 for e46cdfa at 2025-04-13 13:16:
Full log here. |
Thanks, linker issue is fixed. It now fails due to the FullCI simulation issue, which I just fixed. Next time the FullCI runs, it should pass it. And I think I understand what happens with the ORT builds |
Please consider the following formatting changes to AliceO2Group#14117
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChSonnabend : I have a couple of mostly cosmetic comments, please have a look, besides the PR is fine with me now, and it passes the CI, now also compiling with ORT_ROCM_BUILD=1. So once you have addressed my points and you write that you are satisfied with the current state, we can merge it.
PRIVATE_INCLUDE_DIRECTORIES | ||
${CMAKE_SOURCE_DIR}/Detectors/Base/src | ||
${CMAKE_SOURCE_DIR}/Detectors/TRD/base/src | ||
${CMAKE_SOURCE_DIR}/DataFormats/Reconstruction/src | ||
${CMAKE_CURRENT_SOURCE_DIR} | ||
TARGETVARNAME targetName) | ||
|
||
message("Compile definitions for ONNX runtime (CUDA):") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not put all of this here, CUDA build is only done when compiling with CUDA, so the printout will only be there if we build for CUDA, which makes no sense for the ROCm variable to be honest. Do we actually need this, since we anyway have this printout in the cmake line with -DCMAKE....
In general, I would propose to do the following:
- Remove the printouts here.
- Create a separate PR, in which:
- you move the
include(dependencies/FindONNXRuntime.cmake)
from the main CMakeLists.txt todependencies/O2Dependencies.cmake
. In there, you create a one-line printout that shows "ONNXRuntime Found: 0/1 ORT_CUDA_BUILD 0/1 ...".
- you move the
target_compile_definitions(${targetName} PRIVATE | ||
GPUCA_HAS_ONNX=1 | ||
ORT_ROCM_BUILD=$<BOOL:${ORT_ROCM_BUILD}> | ||
ORT_CUDA_BUILD=$<BOOL:${ORT_CUDA_BUILD}> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need ROCM and MIGRAPHX compile definitions for CUDA? I assume we can remove them here, and add them only for the ROCm one?
o2_add_library(ML | ||
SOURCES src/OrtInterface.cxx | ||
TARGETVARNAME targetName | ||
PRIVATE_LINK_LIBRARIES O2::Framework ONNXRuntime::ONNXRuntime) | ||
|
||
# Pass ORT variables as a preprocessor definition | ||
target_compile_definitions(${targetName} PRIVATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of setting 0/1 definitions, I would set only the =1 definition, if the CMake variable is set.
Then, in the code further below you don't need
#if defined(FOO) && FOO=1
, but you can simply use #ifdef FOO
|
||
void GPUReconstructionCUDA::SetONNXGPUStream(Ort::SessionOptions& session_options, int32_t stream, int32_t* deviceId) | ||
{ | ||
#if defined(ORT_CUDA_BUILD) && ORT_CUDA_BUILD == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do as I wrote above, you can use a simple #ifdef here.
PRIVATE_INCLUDE_DIRECTORIES | ||
${CMAKE_SOURCE_DIR}/Detectors/Base/src | ||
${CMAKE_SOURCE_DIR}/Detectors/TRD/base/src | ||
${CMAKE_SOURCE_DIR}/DataFormats/Reconstruction/src | ||
${GPUCA_HIP_SOURCE_DIR} | ||
TARGETVARNAME targetName) | ||
|
||
message("Compile definitions for ONNX runtime (HIP / ROCM):") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as I commented for CUDA applies here
GPU/GPUTracking/CMakeLists.txt
Outdated
PUBLIC_INCLUDE_DIRECTORIES ${INCDIRS} | ||
SOURCES ${SRCS} ${SRCS_NO_CINT} ${SRCS_NO_H}) | ||
|
||
target_include_directories( | ||
${targetName} | ||
PRIVATE $<TARGET_PROPERTY:O2::Framework,INTERFACE_INCLUDE_DIRECTORIES>) | ||
|
||
target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2 GPUCA_HAS_ONNX=1) | ||
target_compile_definitions(${targetName} PRIVATE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change the formatting if not needed.
@@ -42,6 +42,7 @@ | |||
#ifdef GPUCA_HAS_ONNX | |||
#include "GPUTPCNNClusterizerKernels.h" | |||
#include "GPUTPCNNClusterizerHost.h" | |||
// #include "ML/3rdparty/GPUORTFloat16.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you add a commented file?
GPU/Workflow/src/GPUWorkflowSpec.cxx
Outdated
@@ -118,6 +119,7 @@ GPURecoWorkflowSpec::GPURecoWorkflowSpec(GPURecoWorkflowSpec::CompletionPolicyDa | |||
mConfig.reset(new GPUO2InterfaceConfiguration); | |||
mConfParam.reset(new GPUSettingsO2); | |||
mTFSettings.reset(new GPUSettingsTF); | |||
mNNClusterizerSettings.reset(new GPUSettingsProcessingNNclusterizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this here, all these settings should already be made available from mConfig->configProcessing.nn
GPU/Workflow/src/GPUWorkflowSpec.cxx
Outdated
o2::tpc::NeuralNetworkClusterizer nnClusterizerFetcher; | ||
nnClusterizerFetcher.initCcdbApi(mNNClusterizerSettings->nnCCDBURL); | ||
std::map<std::string, std::string> ccdbSettings = { | ||
{"nnCCDBURL", mNNClusterizerSettings->nnCCDBURL}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking, all CCDB-related settings so far are in GPUSettingsO2
, not in GPUSettingsProcessing
. Perhaps it would be cleaner to also move the NN CCDB settings there, but since they are anyway encapsulated in your own struct, I don't care much.
Error while checking build/O2/fullCI_slc9 for a67b634 at 2025-04-19 11:43:
Full log here. |
Please consider the following formatting changes to AliceO2Group#14117
Technically, the Ort Session is now capable of reserving arena memory with a custom allocator using the volatile memory allocation (tested via print-outs). Unfortunately, when using IO binding I haven't figured out a way yet to also use volatile allocation when doing the tensor allocation, which is actually the significant part of the total memory. For now this isn't used anyway, so adjusting for comments and good to go from my side (will open a separate PR for that once figured out). |
Error while checking build/O2/fullCI_slc9 for 4d3f54d at 2025-04-20 01:22:
Full log here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good now, can be squash-merged once CI is green
No description provided.