Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU clusterizer with neural networks #13981

Merged
merged 104 commits into from
Mar 14, 2025

Conversation

ChSonnabend
Copy link
Contributor

This PR brings a neural network based implementation for the online GPU cluster finder. The code is parallelized for GPU application but requires the correct ONNX runtime version to run on GPUs (see alidist: alisw/alidist#5622).

ChSonnabend and others added 30 commits May 16, 2024 09:32
…l executions. Clusters are not getting published yet (FIXME)
Please consider the following formatting changes to AliceO2Group#13610
Please consider the following formatting changes to AliceO2Group#13709
PUBLIC_LINK_LIBRARIES O2::GPUUtils
O2::GPUCommon
O2::ReconstructionDataFormats
O2::TPCFastTransformation
O2::ML
PRIVATE_LINK_LIBRARIES O2::DataFormatsTPC
SOURCES ${SRCS_DATATYPES})
add_compile_definitions(GPUCA_HAS_ONNX=1)
Copy link
Contributor Author

@ChSonnabend ChSonnabend Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidrohr This is needed, otherwise the code doesn't find GPUCA_HAS_ONNX internally. E.g. tpcNNClusterer objects are not created

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot globally add compile definitions, it should be bound to a target, then we have to understand why it didn't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found it: See the new commit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wasn't it already there in my change:

target_compile_definitions(${targetName} PRIVATE GPUCA_O2_LIB GPUCA_TPC_GEOMETRY_O2 GPUCA_HAS_ONNX=1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But GPUCA_HAS_ONNX=1 is needed in two places. Not sure which changes you refer to, but I saw it only in one place before. Maybe I missed it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now I understand. You added the library O2::ML to the GPUDataTypes library, not the the GPUTracking library, same for the first define. Can you remove both from GPUDataTypes, and add both only to GPUTracking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@davidrohr davidrohr changed the title GPU clusterizer with neural networks [WIP] GPU clusterizer with neural networks Mar 13, 2025
@davidrohr
Copy link
Collaborator

Marking as WIP to disable the CI until preparatory PRs are merged.

@ChSonnabend ChSonnabend changed the title [WIP] GPU clusterizer with neural networks GPU clusterizer with neural networks Mar 13, 2025
@ChSonnabend ChSonnabend changed the title GPU clusterizer with neural networks [WIP] GPU clusterizer with neural networks Mar 13, 2025
@ChSonnabend ChSonnabend changed the title [WIP] GPU clusterizer with neural networks GPU clusterizer with neural networks Mar 14, 2025
@ChSonnabend
Copy link
Contributor Author

Triggered the CI to see what the build / unit tests result in with the new changes

@alibuild
Copy link
Collaborator

alibuild commented Mar 14, 2025

Error while checking build/O2/fullCI_slc9 for 78c342d at 2025-03-14 03:57:

## sw/BUILD/O2Physics-latest/log
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:
Error in cling::AutoLoadingVisitor::InsertIntoAutoLoadingState:

Full log here.

@davidrohr
Copy link
Collaborator

I don't like to copy&paste the toNative function.
I have split it back into finalize() and toNative() in #14063, so you can use toNative directly.
Could you rebase to origin/dev. Also your bfloat16 PR got merged, so all the changes to bfloat16 should be removed from this PR.

} else {
runKernel<GPUTPCCFDeconvolution>({GetGrid(clusterer.mPmemory->counters.nPositions, lane), {iSector}});
DoDebugAndDump(RecoStep::TPCClusterFinding, 262144 << 4, clusterer, &GPUTPCClusterFinder::DumpChargeMap, *mDebugFile, "Split Charges");
runKernel<GPUTPCCFClusterizer>({GetGrid(clusterer.mPmemory->counters.nClusters, lane, GPUReconstruction::krnlDeviceType::CPU), {iSector}}, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you incorrectly added a GPUReconstruction::krnlDeviceType::CPU option for the default code path, which has to go. Otherwise it will break clusterization on the GPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it also for my kernels now as they should (in the future) also run on GPU memory

@@ -1073,4 +1172,4 @@ int32_t GPUChainTracking::RunTPCClusterizer(bool synchronizeOutput)

#endif
return 0;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your editor seems to regular misformat the files removing newlines at end of files. Could you please have a look and fix this?

@@ -111,7 +114,15 @@ o2_gpu_add_kernel("GPUTPCCFNoiseSuppression, noiseSuppression" "= TPCCLUS
o2_gpu_add_kernel("GPUTPCCFNoiseSuppression, updatePeaks" "= TPCCLUSTERFINDER" LB)
o2_gpu_add_kernel("GPUTPCCFDeconvolution" "= TPCCLUSTERFINDER" LB)
o2_gpu_add_kernel("GPUTPCCFClusterizer" "= TPCCLUSTERFINDER" LB int8_t onlyMC)
o2_gpu_add_kernel("GPUTPCCFMCLabelFlattener, setRowOffsets" "= TPCCLUSTERFINDER")
if(NOT ALIGPU_BUILD_TYPE STREQUAL "Standalone")
o2_gpu_add_kernel("GPUTPCNNClusterizerKernels, runCfClusterizer" "= TPCNNCLUSTERFINDER" LB GPUConstantMem* processors uint8_t sector int8_t dtype int8_t onlyMC uint batchStart)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GPUConstantMem* processors here seems pretty weird, that should not be there. Are you sure you need it? If yes, I need to check what is going wrong.

Copy link
Contributor Author

@ChSonnabend ChSonnabend Mar 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I need it. When you call runKernel, the runKernelBackendInternal function is called internally (GPUReconstructionCPU.h / .cxx). This function generates a thread instance with this signature:
T::template Thread<I>(x.nBlocks, 1, iB, 0, smem, T::Processor(*mHostConstantMem)[y.index], args...);
We did not define the Processor function in my classes and the use instances of GPUTPCNNClusterizer and GPUTPCClusterFinder where just (semi-)random pointers. This lead to the pointer change inside the kernel that I mentioned to you in person. The easy way to fix this for now was to pass the GPUConstantMem instance and use the class-instances directly from there. Then everything works.

You can check GPUTPCCFClusterizer.h, there is a Processor defined. However I need both instances of GPUTPCNNClsuterizer and the normal CFClusterFinder in my kernels

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you derive from GPUKernelTemplate, which has the Processor function defined here:

https://github.com/AliceO2Group/AliceO2/blob/705ebfb083c41183183c554c0cb17a6a9423e4c5/GPU/GPUTracking/Base/GPUGeneralKernels.h#L83

Took me a while to understand... :(.
The problem is that you pass in iSector twice:

runKernel<GPUTPCNNClusterizerKernels, GPUTPCNNClusterizerKernels::determineClass1Labels>({GetGrid(iSize, lane, GPUReconstruction::krnlDeviceType::CPU), {iSector}}, processors(), iSector, clustererNN.nnClusterizerDtype, 0, batchStart);

With that, you get the iSector'th instance of processors, and in there the iSector'th instance of the clusterer.
That is indeed not obvious, I'll have to add some protection here, sorry that you run into this...

Please change the first {iSector} to krnlRunRangeNone, then it will work without passing it in manually.
Note that passing it in manually will not work on the GPU, it must be passed in automatically since on the GPU this processors struct lies in constant memory and there is no pointer to it.

@davidrohr
Copy link
Collaborator

PR looks good now, except for newlines at end of files...
Also the MacCI has passed now, though it is totally unclear to me why it works now when it crahsed before...

@davidrohr davidrohr merged commit b5ab60d into AliceO2Group:dev Mar 14, 2025
6 of 7 checks passed
@ChSonnabend ChSonnabend deleted the gpu_clusterizer branch March 14, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants