GPU clusterizer with neural networks #13981

ChSonnabend · 2025-02-19T12:45:37Z

This PR brings a neural network based implementation for the online GPU cluster finder. The code is parallelized for GPU application but requires the correct ONNX runtime version to run on GPUs (see alidist: alisw/alidist#5622).

… gpu_clusterizer

…l executions. Clusters are not getting published yet (FIXME)

…alues "GPU_proc.[setting]=...;..."

Please consider the following formatting changes to AliceO2Group#13610

Please consider the following formatting changes to AliceO2Group#13709

ChSonnabend · 2025-03-13T11:55:21Z

GPU/GPUTracking/CMakeLists.txt

                 PUBLIC_LINK_LIBRARIES O2::GPUUtils
                                       O2::GPUCommon
                                       O2::ReconstructionDataFormats
                                       O2::TPCFastTransformation
                                       O2::ML
                 PRIVATE_LINK_LIBRARIES O2::DataFormatsTPC
                 SOURCES ${SRCS_DATATYPES})
-  add_compile_definitions(GPUCA_HAS_ONNX=1)


@davidrohr This is needed, otherwise the code doesn't find GPUCA_HAS_ONNX internally. E.g. tpcNNClusterer objects are not created

We cannot globally add compile definitions, it should be bound to a target, then we have to understand why it didn't work.

Found it: See the new commit

But wasn't it already there in my change:

target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2 GPUCA_HAS_ONNX=1)

But GPUCA_HAS_ONNX=1 is needed in two places. Not sure which changes you refer to, but I saw it only in one place before. Maybe I missed it

Ah, now I understand. You added the library O2::ML to the GPUDataTypes library, not the the GPUTracking library, same for the first define. Can you remove both from GPUDataTypes, and add both only to GPUTracking?

davidrohr · 2025-03-13T12:28:31Z

Marking as WIP to disable the CI until preparatory PRs are merged.

Please consider the following formatting changes to AliceO2Group#13981

ChSonnabend · 2025-03-14T00:04:18Z

Triggered the CI to see what the build / unit tests result in with the new changes

alibuild · 2025-03-14T02:56:01Z

Error while checking build/O2/fullCI_slc9 for 78c342d at 2025-03-14 03:57:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:

Full log here.

davidrohr · 2025-03-14T08:08:19Z

I don't like to copy&paste the toNative function.
I have split it back into finalize() and toNative() in #14063, so you can use toNative directly.
Could you rebase to origin/dev. Also your bfloat16 PR got merged, so all the changes to bfloat16 should be removed from this PR.

davidrohr · 2025-03-14T08:29:53Z

GPU/GPUTracking/Global/GPUChainTrackingClusterizer.cxx

+        } else {
+          runKernel<GPUTPCCFDeconvolution>({GetGrid(clusterer.mPmemory->counters.nPositions, lane), {iSector}});
+          DoDebugAndDump(RecoStep::TPCClusterFinding, 262144 << 4, clusterer, &GPUTPCClusterFinder::DumpChargeMap, *mDebugFile, "Split Charges");
+          runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane, GPUReconstruction::krnlDeviceType::CPU), {iSector}}, 0);


Here you incorrectly added a GPUReconstruction::krnlDeviceType::CPU option for the default code path, which has to go. Otherwise it will break clusterization on the GPU.

I removed it also for my kernels now as they should (in the future) also run on GPU memory

davidrohr · 2025-03-14T08:30:28Z

GPU/GPUTracking/Global/GPUChainTrackingClusterizer.cxx

@@ -1073,4 +1172,4 @@ int32_t GPUChainTracking::RunTPCClusterizer(bool synchronizeOutput)

 #endif
  return 0;
-}
+}


Your editor seems to regular misformat the files removing newlines at end of files. Could you please have a look and fix this?

davidrohr · 2025-03-14T08:32:30Z

GPU/GPUTracking/kernels.cmake

@@ -111,7 +114,15 @@ o2_gpu_add_kernel("GPUTPCCFNoiseSuppression, noiseSuppression"        "= TPCCLUS
 o2_gpu_add_kernel("GPUTPCCFNoiseSuppression, updatePeaks"             "= TPCCLUSTERFINDER"                                    LB)
 o2_gpu_add_kernel("GPUTPCCFDeconvolution"                             "= TPCCLUSTERFINDER"                                    LB)
 o2_gpu_add_kernel("GPUTPCCFClusterizer"                               "= TPCCLUSTERFINDER"                                    LB int8_t onlyMC)
-o2_gpu_add_kernel("GPUTPCCFMCLabelFlattener, setRowOffsets"           "= TPCCLUSTERFINDER")
+if(NOT ALIGPU_BUILD_TYPE STREQUAL "Standalone")
+o2_gpu_add_kernel("GPUTPCNNClusterizerKernels, runCfClusterizer"        "= TPCNNCLUSTERFINDER"                                LB GPUConstantMem* processors uint8_t sector int8_t dtype int8_t onlyMC uint batchStart)


The GPUConstantMem* processors here seems pretty weird, that should not be there. Are you sure you need it? If yes, I need to check what is going wrong.

Yes I need it. When you call runKernel, the runKernelBackendInternal function is called internally (GPUReconstructionCPU.h / .cxx). This function generates a thread instance with this signature:
T::template Thread<I>(x.nBlocks, 1, iB, 0, smem, T::Processor(*mHostConstantMem)[y.index], args...);
We did not define the Processor function in my classes and the use instances of GPUTPCNNClusterizer and GPUTPCClusterFinder where just (semi-)random pointers. This lead to the pointer change inside the kernel that I mentioned to you in person. The easy way to fix this for now was to pass the GPUConstantMem instance and use the class-instances directly from there. Then everything works.

You can check GPUTPCCFClusterizer.h, there is a Processor defined. However I need both instances of GPUTPCNNClsuterizer and the normal CFClusterFinder in my kernels

Well, you derive from GPUKernelTemplate, which has the Processor function defined here:

https://github.com/AliceO2Group/AliceO2/blob/705ebfb083c41183183c554c0cb17a6a9423e4c5/GPU/GPUTracking/Base/GPUGeneralKernels.h#L83

Took me a while to understand... :(.
The problem is that you pass in iSector twice:

runKernel<GPUTPCNNClusterizerKernels, GPUTPCNNClusterizerKernels::determineClass1Labels>({GetGrid(iSize, lane, GPUReconstruction::krnlDeviceType::CPU), {iSector}}, processors(), iSector, clustererNN.nnClusterizerDtype, 0, batchStart);

With that, you get the iSector'th instance of processors, and in there the iSector'th instance of the clusterer.
That is indeed not obvious, I'll have to add some protection here, sorry that you run into this...

Please change the first {iSector} to krnlRunRangeNone, then it will work without passing it in manually.
Note that passing it in manually will not work on the GPU, it must be passed in automatically since on the GPU this processors struct lies in constant memory and there is no pointer to it.

davidrohr · 2025-03-14T10:40:42Z

PR looks good now, except for newlines at end of files...
Also the MacCI has passed now, though it is totally unclear to me why it works now when it crahsed before...

ChSonnabend and others added 30 commits May 16, 2024 09:32

Copying kernels to implement NN clusterizer

d4dc46e

Merge branch 'dev' into gpu_clusterizer

c191885

First version of clusterizer in GPU code

05831ef

Merge branch 'gpu_clusterizer' of github.com:ChSonnabend/AliceO2 into…

8515290

… gpu_clusterizer

Adding a compiling and running version with single-threaded ONNX mode…

3f6c934

…l executions. Clusters are not getting published yet (FIXME)

Clusters now working by a hack

8ba6805

Working implementation of settings via GPUSettings.h and --configKeyV…

6ec3c46

…alues "GPU_proc.[setting]=...;..."

Merge branch 'AliceO2Group:dev' into gpu_clusterizer

626a46f

Modifying the onnx_interface to include the right headers

ab4653a

Adjusting initialization for new ONNXRuntime version

04084c8

Adjusting global settings and CF code for several settings

01dc4a1

Adding return statement if cluster is rejected

accd7ab

Merge branch 'AliceO2Group:dev' into gpu_clusterizer

019b388

Adding some statements back

3473a06

Merge branch 'dev' into gpu_clusterizer

dfffdf5

Update to latest status of gpu clusterization

df21c96

Fixing uchar -> uint8_t

06737fd

Adding utils header

b148449

Updating kernels.cmake to uint8_t

534da50

Please consider the following formatting changes

bb2cb6e

Merge pull request #6 from alibuild/alibot-cleanup-13610

027e225

Please consider the following formatting changes to AliceO2Group#13610

Adding an ONNX CPU library in the O2 framework

25093b3

Merge branch 'AliceO2Group:dev' into onnxruntime-cpu

74cf0e7

Please consider the following formatting changes

9232328

Merge pull request #7 from alibuild/alibot-cleanup-13709

9a6a9e8

Please consider the following formatting changes to AliceO2Group#13709

Fixing macOS build issues with calling O*.data()

7251c5c

Fixing compiler issues and char -> uint8_t

d0f4dd8

Fixing curly braces

7859ab2

Fixing std::make_shared

c6cb3e6

Merge branch 'onnxruntime-cpu' into gpu_clusterizer

55621f0

ChSonnabend commented Mar 13, 2025

View reviewed changes

Adjusting CMakeLIsts and other bugs

3377435

davidrohr changed the title ~~GPU clusterizer with neural networks~~ [WIP] GPU clusterizer with neural networks Mar 13, 2025

ChSonnabend added 2 commits March 13, 2025 13:53

Adding GPUCA_HAS_ONNX only to tracking

9893b43

Changing to fixed size for number of clusters

bce04bc

ChSonnabend changed the title ~~[WIP] GPU clusterizer with neural networks~~ GPU clusterizer with neural networks Mar 13, 2025

ChSonnabend changed the title ~~GPU clusterizer with neural networks~~ [WIP] GPU clusterizer with neural networks Mar 13, 2025

Fixed segfault. Not producing the right number of clusters yet.

713dd64

alibuild mentioned this pull request Mar 13, 2025

Please consider the following formatting changes to #13981 ChSonnabend/AliceO2#17

Merged

ChSonnabend and others added 7 commits March 14, 2025 00:34

Network now accepts clusters over all sectors

e66efb1

Whitespaces...

2b9b8da

Merge dev + fix-ups

34419f3

Some weird formatting

85d185e

Please consider the following formatting changes

49352ab

Merge pull request #17 from alibuild/alibot-cleanup-13981

90ef464

Please consider the following formatting changes to AliceO2Group#13981

Removing white-spaces

78c342d

ChSonnabend changed the title ~~[WIP] GPU clusterizer with neural networks~~ GPU clusterizer with neural networks Mar 14, 2025

Adding necessary if-statement to avoid automatic model loading

6a7b17c

Merge dev and fixes

41d80d2

davidrohr requested changes Mar 14, 2025

View reviewed changes

Removing GPUConstantMem, adding interOpNumThreads option

bb163ea

ChSonnabend added 2 commits March 14, 2025 11:40

Found the bug where I loose clusters

eabba5f

Editor configured for whitespaces at EOF

1e80754

davidrohr merged commit b5ab60d into AliceO2Group:dev Mar 14, 2025
6 of 7 checks passed

ChSonnabend deleted the gpu_clusterizer branch March 14, 2025 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU clusterizer with neural networks #13981

GPU clusterizer with neural networks #13981

ChSonnabend commented Feb 19, 2025

ChSonnabend Mar 13, 2025 •

edited

Loading

davidrohr Mar 13, 2025

ChSonnabend Mar 13, 2025

davidrohr Mar 13, 2025

ChSonnabend Mar 13, 2025

davidrohr Mar 13, 2025

ChSonnabend Mar 13, 2025

davidrohr commented Mar 13, 2025

ChSonnabend commented Mar 14, 2025

alibuild commented Mar 14, 2025 •

edited

Loading

davidrohr commented Mar 14, 2025

davidrohr Mar 14, 2025

ChSonnabend Mar 14, 2025

davidrohr Mar 14, 2025

davidrohr Mar 14, 2025

ChSonnabend Mar 14, 2025 •

edited

Loading

davidrohr Mar 14, 2025

davidrohr commented Mar 14, 2025

GPU clusterizer with neural networks #13981

GPU clusterizer with neural networks #13981

Conversation

ChSonnabend commented Feb 19, 2025

ChSonnabend Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidrohr commented Mar 13, 2025

ChSonnabend commented Mar 14, 2025

alibuild commented Mar 14, 2025 • edited Loading

davidrohr commented Mar 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ChSonnabend Mar 14, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidrohr commented Mar 14, 2025

ChSonnabend Mar 13, 2025 •

edited

Loading

alibuild commented Mar 14, 2025 •

edited

Loading

ChSonnabend Mar 14, 2025 •

edited

Loading